The project relies on two web services: the Harvard’s Dataverse and GitHub.com. If you want to collaborate, you can sign up to both services (or to just one of the them) for free.
If you are want to understand what Dataverse and GitHub.com, here two short introductory videos:
Video introduction to Dataverse
Video introduction to GitHub.com
Workflow
A typical workflow would include these steps:
Identify a web source that services interesting data but in a format that doesn’t allow data analysis.
Write code (a little computer program) to scrape the resource and publish in a dedicated Repository on github.com.
Collect the raw data and store it in a Dataset within the project’s Dataverse on dataverse.harvard.edu.
Write code to make the raw data more readable, and publish the code on github.com and the data on dataverse.harvard.edu.
Document the data on github.com.
Essential glossary
Repository
On github.com, a repository (or repo) is a collection of files in their current version and in their previous versions.
Dataverse
On dataverse.harvard.edu, a dataverse is a collection of datasets.
Dataset
On dataverse.harvard.edu, a dataset is a collection of files and their metadata.
Coding for data
If you are interested in coding for the project, read this documentation.