NatLabRockies
diff --git a/‎.buildinfo‎
Lines changed: 4 additions & 0 deletions b/‎.buildinfo‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎.nojekyll‎ b/‎.nojekyll‎
diff --git a/‎_sources/explanation/index.md.txt‎
Lines changed: 11 additions & 0 deletions b/‎_sources/explanation/index.md.txt‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎_sources/how_tos/getting_started/index.md.txt‎
Lines changed: 9 additions & 0 deletions b/‎_sources/how_tos/getting_started/index.md.txt‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎_sources/how_tos/getting_started/installation.md.txt‎
Lines changed: 47 additions & 0 deletions b/‎_sources/how_tos/getting_started/installation.md.txt‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎_sources/how_tos/getting_started/quick_start.md.txt‎
Lines changed: 43 additions & 0 deletions b/‎_sources/how_tos/getting_started/quick_start.md.txt‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎_sources/how_tos/index.md.txt‎
Lines changed: 15 additions & 0 deletions b/‎_sources/how_tos/index.md.txt‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎_sources/how_tos/ingest_multiple_tables.md.txt‎
Lines changed: 72 additions & 0 deletions b/‎_sources/how_tos/ingest_multiple_tables.md.txt‎
Lines changed: 72 additions & 0 deletions
diff --git a/‎_sources/how_tos/map_time_config.md.txt‎
Lines changed: 90 additions & 0 deletions b/‎_sources/how_tos/map_time_config.md.txt‎
Lines changed: 90 additions & 0 deletions
diff --git a/‎_sources/how_tos/spark_backend.md.txt‎
Lines changed: 96 additions & 0 deletions b/‎_sources/how_tos/spark_backend.md.txt‎
Lines changed: 96 additions & 0 deletions
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 516f3e47e59ddbc79ec112ccfed74fbf
+tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -0,0 +1,11 @@
+```{eval-rst}
+.. _explanation-page:
+```
+# Explanation
+
+```{eval-rst}
+.. toctree::
+    :maxdepth: 2
+    :caption: Contents:
+
+```
@@ -0,0 +1,9 @@
+# Getting Started
+
+```{eval-rst}
+.. toctree::
+   :maxdepth: 2
+
+   installation
+   quick_start
+```
@@ -0,0 +1,47 @@
+
+```{eval-rst}
+.. _installation:
+```
+
+# Installation
+
+1. Install Python 3.11 or later.
+
+#. Create a Python 3.11+ virtual environment. This example uses the ``venv`` module in the standard
+library to create a virtual environment in your home directory. You may prefer a single
+`python-envs` in your home directory instead of the current directory. You may also prefer ``conda``
+or ``mamba``.
+
+```{eval-rst}
+.. code-block:: console
+
+   $ python -m venv env
+```
+
+2. Activate the virtual environment.
+
+```{eval-rst}
+.. code-block:: console
+
+   $ source env/bin/activate
+```
+
+Whenever you are done using chronify, you can deactivate the environment by running ``deactivate``.
+
+3. Install the Python package `chronify`.
+
+To use DuckDB or SQLite as the backend:
+```{eval-rst}
+.. code-block:: console
+
+    $ pip install chronify
+```
+
+To use Apache Spark via Apache Thrift Server as the backend, you must install pyhive.
+This command will install the necessary dependencies.
+
+```{eval-rst}
+.. code-block:: console
+
+    $ pip install "chronify[spark]"
+```
@@ -0,0 +1,43 @@
+# Quick Start
+
+```python
+
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+from chronify import DatetimeRange, Store, TableSchema
+
+store = Store.create_file_db(file_path="time_series.db")
+resolution = timedelta(hours=1)
+time_range = pd.date_range("2020-01-01", "2020-12-31 23:00:00", freq=resolution)
+store.ingest_tables(
+    (
+        pd.DataFrame({"timestamp": time_range, "value": np.random.random(8784), "id": 1}),
+        pd.DataFrame({"timestamp": time_range, "value": np.random.random(8784), "id": 2}),
+    ),
+    TableSchema(
+        name="devices",
+        value_column="value",
+        time_config=DatetimeRange(
+            time_column="timestamp",
+            start=datetime(2020, 1, 1, 0),
+            length=8784,
+            resolution=timedelta(hours=1),
+        ),
+        time_array_id_columns=["id"],
+    )
+ )
+query = "SELECT timestamp, value FROM devices WHERE id = ?"
+df = store.read_query("devices", query, params=(2,))
+df.head()
+```
+
+```
+            timestamp     value  id
+0 2020-01-01 00:00:00  0.594748   2
+1 2020-01-01 01:00:00  0.608295   2
+2 2020-01-01 02:00:00  0.297535   2
+3 2020-01-01 03:00:00  0.870238   2
+4 2020-01-01 04:00:00  0.376144   2
+```
@@ -0,0 +1,15 @@
+```{eval-rst}
+.. _how-tos-page:
+```
+# How Tos
+
+```{eval-rst}
+.. toctree::
+    :maxdepth: 2
+    :caption: Contents:
+
+    getting_started/index
+    ingest_multiple_tables
+    map_time_config
+    spark_backend
+```
@@ -0,0 +1,72 @@
+# How to Ingest Multiple Tables Efficiently
+
+There are a few important considerations when ingesting many tables:
+- Use one database connection.
+- Avoid loading all tables into memory at once, if possible.
+- Ensure additions are atomic. If anything fails, the final state should be the same as the initial
+state.
+
+**Setup**
+
+The input data are in CSV files. Each file contains a timestamp column and one value column per
+device.
+
+```python
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+from chronify import DatetimeRange, Store, TableSchema, CsvTableSchema
+
+store = Store.create_in_memory_db()
+resolution = timedelta(hours=1)
+time_config = DatetimeRange(
+    time_column="timestamp",
+    start=datetime(2020, 1, 1, 0),
+    length=8784,
+    resolution=timedelta(hours=1),
+)
+src_schema = CsvTableSchema(
+    time_config=time_config,
+    column_dtypes=[
+        ColumnDType(name="timestamp", dtype=DateTime(timezone=False)),
+        ColumnDType(name="device1", dtype=Double()),
+        ColumnDType(name="device2", dtype=Double()),
+        ColumnDType(name="device3", dtype=Double()),
+    ],
+    value_columns=["device1", "device2", "device3"],
+    pivoted_dimension_name="device",
+)
+dst_schema = TableSchema(
+    name="devices",
+    value_column="value",
+    time_array_id_columns=["id"],
+)
+```
+
+## Automated through chronfiy
+Chronify will manage the database connection and errors.
+```python
+store.ingest_from_csvs(
+    src_schema,
+    dst_schema,
+    (
+        "/path/to/file1.csv",
+        "/path/to/file2.csv",
+        "/path/to/file3.csv",
+    ),
+ )
+
+```
+
+## Self-Managed
+Open one connection to the database for the duration of your additions. Handle errors.
+```python
+with store.engine.connect() as conn:
+    try:
+        store.ingest_from_csv(src_schema, dst_schema, "/path/to/file1.csv")
+        store.ingest_from_csv(src_schema, dst_schema, "/path/to/file2.csv")
+        store.ingest_from_csv(src_schema, dst_schema, "/path/to/file3.csv")
+    except Exception:
+        conn.rollback()
+```
@@ -0,0 +1,90 @@
+# How to Map Time
+This recipe demonstrates how to map a table's time configuration from one type to another.
+
+**Source table**: data is stored in representative time where there is one week of data per month by
+hour for one year.
+
+**Destination table**: data is stored with `datetime` timestamps for each hour of the year.
+
+**Workflow**:
+- Add the source table to the database.
+- Call `Store.map_table_time_config()`
+- Chronify adds the destination table to the database.
+
+This example creates a representative time table used in chronify's tests.
+
+1. Ingest the source data.
+
+```python
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+
+from chronify import (
+    DatetimeRange,
+    RepresentativePeriodFormat,
+    RepresentativePeriodTimeNTZ,
+    Store,
+    CsvTableSchema,
+    TableSchema,
+)
+
+src_table_name = "ev_charging"
+dst_table_name = "ev_charging_datetime"
+hours_per_year = 12 * 7 * 24
+num_time_arrays = 3
+df = pd.DataFrame({
+    "id": np.concatenate([np.repeat(i, hours_per_year) for i in range(1, 1 + num_time_arrays)]),
+    "month": np.tile(np.repeat(range(1, 13), 7 * 24), num_time_arrays),
+    "day_of_week": np.tile(np.tile(np.repeat(range(7), 24), 12), num_time_arrays),
+    "hour": np.tile(np.tile(range(24), 12 * 7), num_time_arrays),
+    "value": np.random.random(hours_per_year * num_time_arrays),
+})
+schema = TableSchema(
+    name=src_table_name,
+    value_column="value",
+    time_config=RepresentativePeriodTimeNTZ(
+        time_format=RepresentativePeriodFormat.ONE_WEEK_PER_MONTH_BY_HOUR,
+    ),
+    time_array_id_columns=["id"],
+)
+store = Store.create_in_memory_db()
+store.ingest_table(df, schema)
+store.read_query(src_table_name, f"SELECT * FROM {src_table_name} LIMIT 5").head()
+```
+
+```
+   id  month  day_of_week  hour     value
+0   1      1            0     0  0.578496
+1   1      1            0     1  0.092271
+2   1      1            0     2  0.111521
+3   1      1            0     3  0.671668
+4   1      1            0     4  0.782365
+```
+
+2. Map the table's time to datetime.
+```python
+dst_schema = TableSchema(
+    name=dst_table_name,
+    value_column="value",
+    time_array_id_columns=["id"],
+    time_config=DatetimeRange(
+        time_column="timestamp",
+        start=datetime(2020, 1, 1, 0),
+        length=8784,
+        resolution=timedelta(hours=1),
+    )
+)
+store.map_table_time_config(src_table_name, dst_schema)
+store.read_query(dst_table_name, f"SELECT * FROM {dst_table_name} LIMIT 5").head()
+```
+
+```
+   id     value           timestamp
+0   3  0.006213 2020-01-01 00:00:00
+1   3  0.865765 2020-01-01 01:00:00
+2   3  0.187256 2020-01-01 02:00:00
+3   3  0.336157 2020-01-01 03:00:00
+4   3  0.582281 2020-01-01 04:00:00
+```
@@ -0,0 +1,96 @@
+# Apache Spark Backend
+Download Spark from https://spark.apache.org/downloads.html and install it. Spark provides startup
+scripts for UNIX operating systems (not Windows).
+
+## Install chronify with Spark support
+```
+$ pip install "chronify[spark]"
+```
+
+## Installation on a development computer
+Installation can be as simple as
+```
+$ tar -xzf spark-4.0.1-bin-hadoop3.tgz
+$ export SPARK_HOME=$(pwd)/spark-4.0.1-bin-hadoop3
+```
+
+Start a Thrift server. This allows JDBC clients to send SQL queries to an in-process Spark cluster
+running in local mode.
+```
+$ $SPARK_HOME/sbin/start-thriftserver.sh --master=spark://$(hostname):7077
+```
+
+The URL to connect to this server is `hive://localhost:10000/default`
+
+## Installation on an HPC
+The chronify development team uses these
+[scripts](https://github.com/NREL/HPC/tree/master/applications/spark) to run Spark on NREL's HPC.
+
+## Chronify Usage
+This example creates a chronify Store with Spark as the backend and then adds a view to a Parquet
+file. Chronify will run its normal time checks.
+
+First, create the Parquet file and chronify schema.
+
+```python
+from datetime import datetime, timedelta
+
+import numpy as np
+import pandas as pd
+from chronify import DatetimeRange, Store, TableSchema, CsvTableSchema
+
+initial_time = datetime(2020, 1, 1)
+end_time = datetime(2020, 12, 31, 23)
+resolution = timedelta(hours=1)
+timestamps = pd.date_range(initial_time, end_time, freq=resolution, unit="us")
+dfs = []
+for i in range(1, 4):
+    df = pd.DataFrame(
+        {
+            "timestamp": timestamps,
+            "id": i,
+            "value": np.random.random(len(timestamps)),
+        }
+    )
+    dfs.append(df)
+df = pd.concat(dfs)
+df.to_parquet("data.parquet", index=False)
+schema = TableSchema(
+    name="devices",
+    value_column="value",
+    time_config=DatetimeRange(
+        time_column="timestamp",
+        start=initial_time,
+        length=len(timestamps),
+        resolution=resolution,
+    ),
+    time_array_id_columns=["id"],
+)
+```
+
+```python
+from chronify import Store
+
+store = Store.create_new_hive_store("hive://localhost:10000/default")
+store.create_view_from_parquet("data.parquet")
+```
+
+Verify the data:
+```python
+store.read_table(schema.name).head()
+```
+```
+            timestamp  id     value
+0 2020-01-01 00:00:00   1  0.785399
+1 2020-01-01 01:00:00   1  0.102756
+2 2020-01-01 02:00:00   1  0.178587
+3 2020-01-01 03:00:00   1  0.326194
+4 2020-01-01 04:00:00   1  0.994851
+```
+
+## Time configuration mapping
+The primary use case for Spark is to map datasets that are larger than can be processed by DuckDB
+on one computer. In such a workflow a user would call
+```python
+store.map_table_time_config(src_table_name, dst_schema, output_file="mapped_data.parquet")
+```