Skip to content

Recipes #10

@vincentarelbundock

Description

@vincentarelbundock

In terms of repository structure, I think it would be beneficial to split each data source into separate files. The idea would be to create a standarized "recipe" format that would include all info about the dataset (e.g. where to download, bibtex cite, name of cleaning script, date updated), and then a cleaning script that does all the magic we need.

I use something like that locally, where I have a YAML file that specifies all the info and then an accompanying python script that I use for cleaning.

This makes user contributions very easy. They just cut and paste another "recipe" and include an R script that does the cleaning. The only thing psData has to do is provide a proper API to parse the recipe, download the data, and activate the cleaning script.

Think of something like the homebrew install for mac and its library of "formulas":

https://github.com/Homebrew/homebrew/tree/master/Library/Formula

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions