Documentation

The project relies on two web services: the Harvard’s Dataverse and GitHub.com. If you want to collaborate, you can sign up to both services (or to just one of the them) for free.

If you are want to understand what Dataverse and GitHub.com, here two short introductory videos:

Video introduction to Dataverse

Video introduction to GitHub.com

Workflow

A typical workflow would include these steps:

Identify a web source that services interesting data but in a format that doesn’t allow data analysis.
Write code (a little computer program) to scrape the resource and publish in a dedicated Repository on github.com.
Collect the raw data and store it in a Dataset within the project’s Dataverse on dataverse.harvard.edu.
Write code to make the raw data more readable, and publish the code on github.com and the data on dataverse.harvard.edu.
Document the data on github.com.

Getting started

Video introduction to Dataverse

Video introduction to GitHub.com

Workflow

Essential glossary

Coding for data

Storing data