Skip to content

Latest commit

 

History

History
44 lines (36 loc) · 1.82 KB

File metadata and controls

44 lines (36 loc) · 1.82 KB

Data Workspaces

Data Workspaces is an open source framework for maintaining the state of a data science project, including data sets, intermediate data, results, and code. It supports reproducability through snapshotting and lineage models and collaboration through a push/pull model inspired by source control systems like Git.

Data Workspaces is installed as a Python 3 package and provides a Git-like command line interface and programming APIs. Specific data science tools and workflows are supported through extensions called kits. Currently, this includes Scikit-learn, TensorFlow, and Jupyter Notebooks. The goal is to provide the reproducibility and collaboration benefits with minimal changes to your current projects and processes.

Data Workspaces runs on Unix-like systems, including Linux, MacOS, and on Windows via the Windows Subsystem for Linux.

Quick Start

Please see the Quickstart Section of the documentation.

Documentation

The documentation is available here: https://data-workspaces-core.readthedocs.io/en/latest/. The source for the documentation is under docs. To build it locally, install Sphinx and run the following:

cd docs
pip install -r requirements.txt # extras needed to build the docs
make html

To view the local documentation, open the file docs/_build/html/index.html in your browser.

License

This code is copyright 2018 - 2021 by the Max Planck Institute for Software Systems and Benedat LLC. It is licensed under the Apache 2.0 license. See the file LICENSE.txt for details.