ScrapeOpen

ScrapeOpen is a collaborative project. It curates the publication of code to scrape websites serving public data sets and presents them in a format accessible to humans and machines.

What's a public data set

A public data set is a collection of observations published by a public agency (not-for-profit) and having public value. In order to be published, the data set must not be proprietary.

What's an accessible format

Data sets published by ScrapeOpen project have 3 stars in the 5-star Open Data plan:

★ make your stuff available on the Web (whatever format) under an open license

★★ make it available as structured data (e.g., Excel instead of image scan of a table)

★★★ make it available in a non-proprietary open format (e.g., CSV instead of Excel)

★★★★ use URIs to denote things, so that people can point at your stuff

★★★★★ link your data to other data to provide context

That is, every data set is

structured, with metadata defining every attribute of each observations,
and published in a non proprietary format.

Possibly, entities described in the data sets (e.g. people, places) are also linked to other data.

How you can help

Indicate interesting websites
Write or maintain the code to scrape a resource
Control or improve the quality of the data sets

If you want to contribute, please have a look at the project documentation.

★	make your stuff available on the Web (whatever format) under an open license
★★	make it available as structured data (e.g., Excel instead of image scan of a table)
★★★	make it available in a non-proprietary open format (e.g., CSV instead of Excel)
★★★★	use URIs to denote things, so that people can point at your stuff
★★★★★	link your data to other data to provide context