GitHub

GitHub

Host open data sets on GitHub

Open datasets contains three parts:

The dataset, in format of .csv, .json, or a directory of those files.
The scripts involved for generating the dataset, e.g. scraper, data cleaning logics, data transformations. See Dataprep for more information.
A README.md to show the basic information of this dataset.

Create a new file called README.md:

Write the description file which usually contains introduction of data source, the background of research, data fields, and data size, limitation and license. If you do not know what licenses are available, we suggest you to use CC 4.0. The file is written in markdown language, which has simple syntax that is legible in either plaintext format or rendered HTML format.

The overall shape of an open dataset looks like this: (you are looking at the rendered version of the markdown file)

See homework2 for a complete example.

Note: The "limitation" is an important section in your README file. For example, you may only be able to crawl 95% of the original dataset due to technical problems. Highlighting that in your description file is crucial for other people to base their analysis on your dataset. No dataset is ideal. Incomplete dataset is also valuable. The principle is full reporting.

How to download a file from GitHub web page

Example: We will use the data from Openrice as an example and do the restaurant analysis. Assuming that we have already got certain amount of data from Openrice and saved it into csv file.

Here is the link of csv file which can be downloaded here.

Click "raw" on the right upper corner.

You can see the raw csv file as below.

Right click(or control+click in Mac) and choose "save as"

Then the csv file can be saved as csv(comma-separated values).

why we should preview Jupyter notebook on NBview? Are there any relationship with Github?

One can directly preview a Python notebook on GitHub. However, GitHub prohibits Javascript execution for security reasons. If you have interactive chart, e.g. from echart, plotly, those will not render on GitHub. NBViewer supports javascript and it is the first free online tool to preview Python notebook, so we recommend it. For concrete examples of dynamic charts, @ChicoXYC can find one notebook from our project archive: https://github.com/data-projects-archive .

How to change default branch for GitHub pages?

please see here

gh-pages

What is index.html

Basically, index.html is the default file served by the web server. So it is equivalent to visit example.com and example.com/index.html. Naming your file as index.html can lead to this more concise notation in browser's address bar and in communication campaigns -- the naming in the world of web is usually the shorter the better. More explanations are here .

Any real world example of using GitHub issue tracker?

Use issue tracker as Q/A forum:

Builder Book

Use issue tracker as blog post backend:

@fouber's blog, written in Chinese, from a senior frontend engineer.

Use issue tracker as web comment store:

gitalk. See the comment thread here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub

Host open data sets on GitHub

How to download a file from GitHub web page

why we should preview Jupyter notebook on NBview? Are there any relationship with Github?

How to change default branch for GitHub pages?

gh-pages

What is index.html

Any real world example of using GitHub issue tracker?

FilesExpand file tree

github.md

Latest commit

History

github.md

File metadata and controls

GitHub

Host open data sets on GitHub

How to download a file from GitHub web page

why we should preview Jupyter notebook on NBview? Are there any relationship with Github?

How to change default branch for GitHub pages?

gh-pages

What is index.html

Any real world example of using GitHub issue tracker?