Open datasets contains three parts:
- The dataset, in format of
.csv,.json, or a directory of those files. - The scripts involved for generating the dataset, e.g. scraper, data cleaning logics, data transformations. See Dataprep for more information.
- A
README.mdto show the basic information of this dataset.
Create a new file called README.md:
Write the description file which usually contains introduction of data source, the background of research, data fields, and data size, limitation and license. If you do not know what licenses are available, we suggest you to use CC 4.0. The file is written in markdown language, which has simple syntax that is legible in either plaintext format or rendered HTML format.
The overall shape of an open dataset looks like this: (you are looking at the rendered version of the markdown file)
See homework2 for a complete example.
Note: The "limitation" is an important section in your README file. For example, you may only be able to crawl 95% of the original dataset due to technical problems. Highlighting that in your description file is crucial for other people to base their analysis on your dataset. No dataset is ideal. Incomplete dataset is also valuable. The principle is full reporting.
Example: We will use the data from Openrice as an example and do the restaurant analysis. Assuming that we have already got certain amount of data from Openrice and saved it into csv file.
Here is the link of csv file which can be downloaded here.
Click "raw" on the right upper corner.
You can see the raw csv file as below.
Right click(or control+click in Mac) and choose "save as"
Then the csv file can be saved as csv(comma-separated values).
One can directly preview a Python notebook on GitHub. However, GitHub prohibits Javascript execution for security reasons. If you have interactive chart, e.g. from echart, plotly, those will not render on GitHub. NBViewer supports javascript and it is the first free online tool to preview Python notebook, so we recommend it. For concrete examples of dynamic charts, @ChicoXYC can find one notebook from our project archive: https://github.com/data-projects-archive .
please see here
Basically, index.html is the default file served by the web server. So it is equivalent to visit example.com and example.com/index.html. Naming your file as index.html can lead to this more concise notation in browser's address bar and in communication campaigns -- the naming in the world of web is usually the shorter the better. More explanations are here .
Use issue tracker as Q/A forum:
Use issue tracker as blog post backend:
- @fouber's blog, written in Chinese, from a senior frontend engineer.
Use issue tracker as web comment store:







