From 9df9752df528f909cb88081419fabefc1f7f1001 Mon Sep 17 00:00:00 2001
From: "Thomas Neep (Advanced Research Computing)" <t.j.neep@bham.ac.uk>
Date: Thu, 14 Aug 2025 16:21:23 +0100
Subject: [PATCH 1/6] Update analysis page.

Remove any specific data which may not be public.
Make a tabbed interface for the CLI and Python API.
---
 docs/analyse.md | 401 ++++++++++++++++++++++++++----------------------
 1 file changed, 214 insertions(+), 187 deletions(-)

diff --git a/docs/analyse.md b/docs/analyse.md
index b3238cb..c11e425 100644
--- a/docs/analyse.md
+++ b/docs/analyse.md
@@ -14,7 +14,7 @@ so that once installed, the Onyx client will automatically be configured.
 ## Onyx client basics
 
 First, let's install the Onyx client, which is available through the
-[conda-forge package](https://anaconda.org/conda-forge/climb-onyx-client) 
+[conda-forge package](https://anaconda.org/conda-forge/climb-onyx-client)
 `climb-onyx-client` and can thus be installed
 with `conda`.  As advised in the [CLIMB docs on installing
 software](https://docs.climb.ac.uk/notebook-servers/installing-software-with-conda/),
@@ -28,205 +28,232 @@ Let's activate this environment.
 jovyan:~$ conda activate onyx
 ```
 On Bryn's Notebook Servers, the client will automatically be configured.
-Try running the command-line client with
-```
-(onyx) jovyan:~$ onyx
-```
-This should show you some options and commands that are available.
-Have a look at your own profile with
-```
-(onyx) jovyan:~$ onyx profile
-```
-and which projects you have access to with
-```
-(onyx) jovyan:~$ onyx projects
-```
-You should see `mscape` listed.
+We will now have access to both the Python API and a command-line client.
+Let's walk through some of the commands available to us.
+In each case you can choose between the Python API or the command-line interface (CLI).
+
+### Initial setup
+
+=== "CLI"
+	No additional setup is required if you are running the CLI in a CLIMB
+	notebook. You can try running the command-line client with
+
+	```console
+	(onyx) jovyan:~$ onyx
+	```
+	to see some of the options and commands available to you.
+
+=== "Python"
+	If you are using onyx in Python, then you need to import the required modules and configure a client.
+	```python
+	import os
+	from onyx import OnyxConfig, OnyxEnv, OnyxClient
+
+	config = OnyxConfig(
+	    domain=os.environ[OnyxEnv.DOMAIN],
+	    token=os.environ[OnyxEnv.TOKEN],
+	)
+
+	client = OnyxClient(config=config)
+	```
+
+	!!! note
+
+	    In all the Python API examples, arguments will be
+		explicitly passed as keyword arguments e.g. `arg=value`,
+		however, in all cases shown on this page, the argument names 
+		can be omitted.
+
+### Profile
+
+You can view information about your profile (username, site, and email) with
+
+=== "CLI"
+
+	```console
+	(onyx) jovyan:~$ onyx profile
+	```
+
+=== "Python"
+
+	```python
+	client.profile()
+	```
+
+### Projects
+
+You can view the projects you have access to with
+
+=== "CLI"
+
+	```console
+	(onyx) jovyan:~$ onyx projects
+	```
+
+=== "Python"
+
+	```python
+	client.projects()
+	```
 
 ## Querying data
 
 As an example task, we'll see if we can find any sequencing data performed
-for ZymoBIOMICS sources.  These are designed with 
+for ZymoBIOMICS sources.  These are designed with
 [a particular specification](https://files.zymoresearch.com/protocols/_d6300_zymobiomics_microbial_community_standard.pdf)
-of DNA from eight bacteria and two yeasts.  We can use these to see if our protocol
-correctly recovers the DNA fractions. I.e. if our protocol is biased.
+of DNA from eight bacteria and two yeasts.
+We will search the `mscape` project, but bear in mind you may not
+have access to that particular project.
 
-From the command line, the main route to querying Onyx is via the `filter` command.
-On its own, this queries the database with *no* filters.  The command
-```
-(onyx) jovyan:~$ onyx filter mscape
-```
-will produce tens of thousands of lines of JSON, so let's not
-do that just yet.  To first see which fields are available in the database,
-we can use
-```
-(onyx) jovyan:~$ onyx fields mscape
-...
-├────────────────────────────────┼──────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
-│ extraction_enrichment_protocol │ optional │ text              │ Details of nucleic acid extraction and optional enrichment steps.            │                                                                             │
-├────────────────────────────────┼──────────┼───────────────────┼──────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
-...
-```
-Let's search the database for entries with `zymo` (case-insensitive) in this field.
-```
-(onyx) jovyan:~$ onyx filter mscape --field extraction_enrichment_protocol.icontains=zymo
-...
-```
-That should return JSON data for a few entries.  You may wish to format the
-data as CSV or TSV with `--format csv` or `--format tsv`, respectively.
+To see every entry in the entire database for a particular project we can do
 
-## Inspecting some pipeline output on the command line
+=== "CLI"
 
-When data is ingested into Onyx, a taxonomic classification is automatically run.
-The last part of the JSON data is usually some of this, in JSON format.
-The complete reports can be found in the S3 buckets given in the
-`'taxon_report'` field.  You can find this in the output you've already produced
-or modify the `filter` command to only request them using the `--include` flag. e.g.
-```
-(onyx) jovyan:~$ onyx filter mscape --field extraction_enrichment_protocol.icontains=zymo --include=taxon_reports
-[
-    {
-        "taxon_reports": "s3://mscape-published-taxon-reports/C-FDE50853AD/"
-    },
-    {
-        "taxon_reports": "s3://mscape-published-taxon-reports/C-04F4495068/"
-    }
-]
-```
-Multiple fields can be requested with the `--include` flag e.g.
-```
-(onyx) jovyan:~$ onyx filter mscape --field extraction_enrichment_protocol.icontains=zymo --include climb_id,taxon_reports
-[
-    {
-        "climb_id": "C-FDE50853AD",
-        "taxon_reports": "s3://mscape-published-taxon-reports/C-FDE50853AD/"
-    },
-    {
-        "climb_id": "C-04F4495068",
-        "taxon_reports": "s3://mscape-published-taxon-reports/C-04F4495068/"
-    }
-]
-```
-You can conversely exclude individual fields using the `--exclude`
-flag in the same way.
+	```console
+	(onyx) jovyan:~$ onyx filter mscape
+	```
 
-Either way, you now have the location of the taxonomy reports.  Let's have a look
-with `s3cmd`.
-```
-(onyx) jovyan:~$ s3cmd ls s3://mscape-published-taxon-reports/C-FDE50853AD/
-2023-11-10 12:56   146K  s3://mscape-published-taxon-reports/C-FDE50853AD/PlusPF.kraken.json
-2023-11-10 12:56     2G  s3://mscape-published-taxon-reports/C-FDE50853AD/PlusPF.kraken_assignments.tsv
-2023-11-10 12:56   193K  s3://mscape-published-taxon-reports/C-FDE50853AD/PlusPF.kraken_report.txt
-```
-The plain text report is what we're after, so let's download that with `s3cmd`:
-```
-(onyx) jovyan:~$ s3cmd get s3://mscape-published-taxon-reports/C-FDE50853AD/PlusPF.kraken_report.txt
-download: 's3://mscape-published-taxon-reports/C-FDE50853AD/PlusPF.kraken_report.txt' -> './PlusPF.kraken_report.txt'  [1 of 1]
- 197750 of 197750   100% in    0s     3.79 MB/s  done
-```
+=== "Python"
 
-If you've never seen one of these reports before, it's worth having a
-quick look with a tool like `less` or by opening it using the
-JupyterLab file browser.  For reference, it's worth showing the header
-```
-(onyx) jovyan:~$ head -n 1 PlusPF.kraken_report.txt
-% of Seqs       Clades  Taxonomies      Rank    Taxonomy ID     Scientific Name
-```
-The Zymo sample is prepared with 12% *Bacillus subtilis*.  Let's see how much
-was actually reported in the results:
-```
-(onyx) jovyan:~$ grep "Bacillus subtilis" PlusPF.kraken_report.txt
- 20.30  435278  1452    G1      653685                    Bacillus subtilis group
-  0.12  2624    1952    S       1423                        Bacillus subtilis
-  0.03  565     242     S1      135461                        Bacillus subtilis subsp. subtilis
-  0.01  108     108     S2      1404258                         Bacillus subtilis subsp. subtilis str. OH 131.1
-  ...
-```
-Looks like 20.3%, though classified under *Bacillus subtilis* "subgroup",
-rather than *Bacillus subtilis*, which reportedly only comprises 0.12% of the sample.
-Most of that 20.3% is under *Bacillus spizizenii*.
-
-An important detail here is that the fraction reported in this output
-is not calculated in the same way as what's used in the reference values (12% for bacteria; 2% for yeasts).
-Let's make a fairer comparison using the JSON taxonomic data.
-
-## Working with database output in Python
-
-To fairly compare the taxonomic data with the reference values in the
-Zymo community, we need to know the proportions of gDNA, so we need to
-compute the number of base pairs that were assigned to each taxon.
-Let's make this comparison in Python using the Onyx client's Python
-API.
-
-Let's first run the same query for the Zymo data.  We'll follow the
-examples in the Onyx documentation and run the query in a context
-manager.
-```py
-import os
-from onyx import OnyxConfig, OnyxEnv, OnyxClient
-
-config = OnyxConfig(
-    domain=os.environ[OnyxEnv.DOMAIN],
-    token=os.environ[OnyxEnv.TOKEN],
-)
-
-with OnyxClient(config) as client:
-    records = list(client.filter(
-        "mscape",
-        fields={
-            "extraction_enrichment_protocol__icontains": "zymo",
-        },
-    ))
-```
-We've wrapped the `filter` call in a `list` because otherwise
-we get a generator.
-
-If you want to inspect the data, it's a bit easier to read if formatted with
-indentation, which can be done using the standard `json.dumps` function:
-```py
-import json
-print(json.dumps(records[0], indent=2))  # show first record
-```
-In each record, the `'taxa_files'` key gives us a list of dictionaries
-that each has a number of reads and a mean length, the product of
-which is the total number of base pairs that were read for that
-taxon.  A simple first step is to convert the taxonomic data (for the first record)
-into a Pandas DataFrame with
-```py
-import pandas as pd
-
-df = pd.DataFrame(records[0]['taxa_files'])
-```
-We also need to drop a few lower-level taxa that are already
-accounted for in higher ones. e.g. the reads for *Bacillus spizizenii TU-B-10* are
-among the reads counted for *Bacillus spizizenii*.  A quick way of doing this
-is by selecting the rows that have only two words in their names.
-```py
-df = df.loc[df['human_readable'].apply(lambda name: len(name.split()) == 2)]
+	```python
+	# client.filter returns a generator that we can iterate over
+	entires = client.filter(project="mscape")
+	```
+
+On its own, this command queries the database with *no* filters, and
+could return thousands of entries.
+
+### Fields
+
+We can see what fields exist in a particular database with
+
+=== "CLI"
+
+	```console
+	(onyx) jovyan:~$ onyx fields mscape
+	```
+
+=== "Python"
+
+	```python
+	client.fields(project="mscape")
+	```
+
+### Filtering
+
+We can filter the returned records to just select the entries in the
+database that we are interested in. For this example we'll see if we
+can find any sequencing data performed for ZymoBIOMICS sources.  These
+are designed with [a particular
+specification](https://files.zymoresearch.com/protocols/_d6300_zymobiomics_microbial_community_standard.pdf)
+of DNA from eight bacteria and two yeasts.
+
+To select these samples, we can ask that the `control_type_details`
+equals `zymo-mc_D6300`.
+
+=== "CLI"
+
+	```console
+	(onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300
+	```
+
+=== "Python"
+
+	```python
+	# client.filter returns a generator that we can iterate over
+    entries = client.filter(project="mscape", fields={"control_type_details": "zymo-mc_D6300"})
+	```
+
+This returns a small number of entries that we can more easily work
+with. Note that this returns every field for each record that is
+found, which can be much more information than we need. We can select
+specific fields to include using e.g.
+
+=== "CLI"
+
+	```console
+	(onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300 --include climb_id,biosample_id,taxon_reports
+	```
+
+=== "Python"
+
+	```python
+	query = {"control_type_details": "zymo-mc_D6300"}
+	fields_to_include = ["climb_id", "biosample_id" , "taxon_reports"]
+	# client.filter returns a generator that we can iterate over
+    entries = client.filter("mscape", fields=query, include=fields_to_include)
+	```
+
+
+### Taxonomic information
+
+By default, the filter command will not return taxonomic
+information. To access that information for an individual record use the `get` command.
+
+=== "CLI"
+
+	```console
+	(onyx) jovyan:~$ onyx get mscape <CLIMB_ID>
+	```
+
+=== "Python"
+
+	```python
+	record = client.get(project="mscape", climb_id=<CLIMB_ID>)
+	```
+where `<CLIMB_ID>` is replaced with the CLIMB ID of the record you
+want to retrieve.
+This will you give you all the information about a particular record
+including binned reads and all classifier calls.
+
+## Tips
+
+### `jq`
+
+If you are using the CLI, you may find [`jq`](https://jqlang.org)
+useful. `jq` can be installed into your conda environment
+
+```console
+(onyx) jovyan:~$ conda install jq
 ```
-Now, let's add columns for the total number of base pairs associated with
-each taxon and what proportion that is of the total.
-```py
-df['gDNA'] = df['n_reads']*df['mean_len']
-df['proportion'] = df['gDNA']/df['gDNA'].sum()
+You can then pipe the output of your onyx queries
+e.g. `onyx filter ...` into `jq` using the pipe operator `|`.
+This will colourise the output and may make reading the data easier.
+```console
+(onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300 | jq
 ```
-Finally, let's make a rough plot with a black dashed line at 12%.
-```py
-import matplotlib.pyplot as plt
+`jq` has many powerful features, including filtering, selecting, and formatting data.
 
-plt.plot(df['human_readable'], df['proportion']*100, 'o')
-plt.axhline(12, c='k', ls='--');
-plt.xticks(rotation=22.5, ha='right');
-```
 
-![Measured gDNA proportions of a Zymo community](./zymo-comparison.png)
+### Python context manager
+
+If you are using the Python client, and performing more than one query to
+the onyx database in a single code block e.g. in a `for` loop. Then we
+recommend you use the `OnyxClient` as a context manager.
 
-There are some clear discrepancies—*Pseudomonas aeruginosa* is
-underreported and *Bacillus spizizenii* is overreported—but this
-matches results by e.g. [Nicholls et
-al. (2019)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6520541/).
+```python
+# ...
+# Setup omitted
+# ...
+client = OnyxClient(config=config)
+
+# Perform several onyx operations in this block
+with client:
+	# Get the first entry in the database for the mscape project
+    first_entry = next(client.filter(project="mscape"))
+	
+	# Get the CLIMB ID of the entry
+    climb_id = first_entry["climb_id"]
+	
+	# Get the full record for this CLIMB ID using the `get` method
+    full_record = client.get(project="mscape", climb_id=climb_id)
+	
+	# Count the number of taxa_files
+    n_taxa_files = len(full_record["taxa_files"])
+    print(f"CLIMB_ID: {climb_id} has {n_taxa_files} taxa files")
+```
 
-This short example is intended as a basic demonstration of what's
-possible in CLIMB-TRE.  We're always interested to hear more examples
-of research questions that CLIMB-TRE can answer, so let us know if you
-have an example that could be included as a guide for others.
+This is more efficient that not using the context manager as the
+client will re-use the same session for all requests, rather than
+creating a new session for each request. For more information, see:
+<https://requests.readthedocs.io/en/master/user/advanced/#session-objects`>

From 7b870267f8d9626d1a1fabf374bd06bcc10f6b50 Mon Sep 17 00:00:00 2001
From: "Thomas Neep (Advanced Research Computing)" <t.j.neep@bham.ac.uk>
Date: Wed, 20 Aug 2025 14:55:58 +0100
Subject: [PATCH 2/6] Load required extensions

---
 mkdocs.yml | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mkdocs.yml b/mkdocs.yml
index 3a3f2f2..a3c9e4e 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -30,6 +30,7 @@ theme:
     # - navigation.top
     - toc.integrate
     - content.code.copy
+    - content.tabs.link
 
 plugins:
   - search
@@ -53,6 +54,7 @@ plugins:
       verbose: true
 
 markdown_extensions:
+  - admonition
   - attr_list
   - pymdownx.highlight:
       anchor_linenums: true
@@ -61,6 +63,8 @@ markdown_extensions:
   - pymdownx.inlinehilite
   - pymdownx.snippets
   - pymdownx.superfences
+  - pymdownx.tabbed:
+      alternate_style: true
   - pymdownx.magiclink
   - toc:
       permalink: true

From c9be60460c97c535a74723c70d27ec4c05a9edd0 Mon Sep 17 00:00:00 2001
From: "Thomas Neep (Advanced Research Computing)" <t.j.neep@bham.ac.uk>
Date: Thu, 21 Aug 2025 11:44:22 +0100
Subject: [PATCH 3/6] Add next steps

---
 docs/analyse.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/analyse.md b/docs/analyse.md
index c11e425..3a58813 100644
--- a/docs/analyse.md
+++ b/docs/analyse.md
@@ -184,7 +184,6 @@ specific fields to include using e.g.
     entries = client.filter("mscape", fields=query, include=fields_to_include)
 	```
 
-
 ### Taxonomic information
 
 By default, the filter command will not return taxonomic
@@ -257,3 +256,8 @@ This is more efficient that not using the context manager as the
 client will re-use the same session for all requests, rather than
 creating a new session for each request. For more information, see:
 <https://requests.readthedocs.io/en/master/user/advanced/#session-objects`>
+
+## Next steps
+
+Complete documentation of Onyx for both the CLI and Python API can be
+found [here](https://CLIMB-TRE.github.io/onyx-client/).

From 252e884132542b63a69013d00ab16f533d17406f Mon Sep 17 00:00:00 2001
From: "Thomas Neep (Advanced Research Computing)" <t.j.neep@bham.ac.uk>
Date: Wed, 10 Sep 2025 16:37:09 +0100
Subject: [PATCH 4/6] WIP comments from Tom B

---
 docs/analyse.md | 239 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 164 insertions(+), 75 deletions(-)

diff --git a/docs/analyse.md b/docs/analyse.md
index 3a58813..20669a3 100644
--- a/docs/analyse.md
+++ b/docs/analyse.md
@@ -4,8 +4,8 @@
 
 Once data and metadata have been ingested into the Onyx database, you
 can query it using the Onyx client, which provides a command line interface (CLI)
-and Python API.  This short example
-demonstrates a few principal functions.  More are described in the
+and Python API.  This tutorial is intended as a basic demonstration of what is 
+possible. All capabilities of the Onyx client can be found in the
 [`onyx-client` documentation](https://climb-tre.github.io/onyx-client/).
 
 This guide also assumes that you're using a Notebook Server on CLIMB,
@@ -35,34 +35,34 @@ In each case you can choose between the Python API or the command-line interface
 ### Initial setup
 
 === "CLI"
-	No additional setup is required if you are running the CLI in a CLIMB
-	notebook. You can try running the command-line client with
+    No additional setup is required if you are running the CLI in a CLIMB
+    notebook. You can try running the command-line client with
 
-	```console
-	(onyx) jovyan:~$ onyx
-	```
-	to see some of the options and commands available to you.
+    ```console
+    (onyx) jovyan:~$ onyx
+    ```
+    to see some of the options and commands available to you.
 
 === "Python"
-	If you are using onyx in Python, then you need to import the required modules and configure a client.
-	```python
-	import os
-	from onyx import OnyxConfig, OnyxEnv, OnyxClient
+    If you are using onyx in Python, then you need to import the required modules and configure a client.
+    ```python
+    import os
+    from onyx import OnyxConfig, OnyxEnv, OnyxClient
 
-	config = OnyxConfig(
-	    domain=os.environ[OnyxEnv.DOMAIN],
-	    token=os.environ[OnyxEnv.TOKEN],
-	)
+    config = OnyxConfig(
+        domain=os.environ[OnyxEnv.DOMAIN],
+        token=os.environ[OnyxEnv.TOKEN],
+    )
 
-	client = OnyxClient(config=config)
-	```
+    client = OnyxClient(config=config)
+    ```
 
-	!!! note
+    !!! note
 
-	    In all the Python API examples, arguments will be
-		explicitly passed as keyword arguments e.g. `arg=value`,
-		however, in all cases shown on this page, the argument names 
-		can be omitted.
+        In all the Python API examples, arguments will be
+        explicitly passed as keyword arguments e.g. `arg=value`,
+        however, in all cases shown on this page, the argument names 
+        can be omitted.
 
 ### Profile
 
@@ -70,15 +70,15 @@ You can view information about your profile (username, site, and email) with
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx profile
-	```
+    ```console
+    (onyx) jovyan:~$ onyx profile
+    ```
 
 === "Python"
 
-	```python
-	client.profile()
-	```
+    ```python
+    client.profile()
+    ```
 
 ### Projects
 
@@ -86,15 +86,15 @@ You can view the projects you have access to with
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx projects
-	```
+    ```console
+    (onyx) jovyan:~$ onyx projects
+    ```
 
 === "Python"
 
-	```python
-	client.projects()
-	```
+    ```python
+    client.projects()
+    ```
 
 ## Querying data
 
@@ -109,16 +109,16 @@ To see every entry in the entire database for a particular project we can do
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx filter mscape
-	```
+    ```console
+    (onyx) jovyan:~$ onyx filter mscape
+    ```
 
 === "Python"
 
-	```python
-	# client.filter returns a generator that we can iterate over
-	entires = client.filter(project="mscape")
-	```
+    ```python
+    # client.filter returns a generator that we can iterate over
+    entries = client.filter(project="mscape")
+    ```
 
 On its own, this command queries the database with *no* filters, and
 could return thousands of entries.
@@ -129,15 +129,15 @@ We can see what fields exist in a particular database with
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx fields mscape
-	```
+    ```console
+    (onyx) jovyan:~$ onyx fields mscape
+    ```
 
 === "Python"
 
-	```python
-	client.fields(project="mscape")
-	```
+    ```python
+    client.fields(project="mscape")
+    ```
 
 ### Filtering
 
@@ -153,16 +153,16 @@ equals `zymo-mc_D6300`.
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300
-	```
+    ```console
+    (onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300
+    ```
 
 === "Python"
 
-	```python
-	# client.filter returns a generator that we can iterate over
+    ```python
+    # client.filter returns a generator that we can iterate over
     entries = client.filter(project="mscape", fields={"control_type_details": "zymo-mc_D6300"})
-	```
+    ```
 
 This returns a small number of entries that we can more easily work
 with. Note that this returns every field for each record that is
@@ -171,18 +171,35 @@ specific fields to include using e.g.
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300 --include climb_id,biosample_id,taxon_reports
-	```
+    ```console
+    (onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300 --include climb_id,biosample_id,taxon_reports
+    ```
 
 === "Python"
 
-	```python
-	query = {"control_type_details": "zymo-mc_D6300"}
-	fields_to_include = ["climb_id", "biosample_id" , "taxon_reports"]
-	# client.filter returns a generator that we can iterate over
+    ```python
+    query = {"control_type_details": "zymo-mc_D6300"}
+    fields_to_include = ["climb_id", "biosample_id" , "taxon_reports"]
+    # client.filter returns a generator that we can iterate over
     entries = client.filter("mscape", fields=query, include=fields_to_include)
-	```
+    ```
+
+Likewise, should we want to *exclude* certain fields, that is also possible
+
+=== "CLI"
+
+    ```console
+    (onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300 --exclude batch_id,study_id
+    ```
+
+=== "Python"
+
+    ```python
+    query = {"control_type_details": "zymo-mc_D6300"}
+    fields_to_exclude = ["batch_id", "study_id"]
+    # client.filter returns a generator that we can iterate over
+    entries = client.filter("mscape", fields=query, exclude=fields_to_exclude)
+    ```
 
 ### Taxonomic information
 
@@ -191,20 +208,92 @@ information. To access that information for an individual record use the `get` c
 
 === "CLI"
 
-	```console
-	(onyx) jovyan:~$ onyx get mscape <CLIMB_ID>
-	```
+    ```console
+    (onyx) jovyan:~$ onyx get mscape <CLIMB_ID>
+    ```
 
 === "Python"
 
-	```python
-	record = client.get(project="mscape", climb_id=<CLIMB_ID>)
-	```
+    ```python
+    record = client.get(project="mscape", climb_id=<CLIMB_ID>)
+    ```
 where `<CLIMB_ID>` is replaced with the CLIMB ID of the record you
 want to retrieve.
 This will you give you all the information about a particular record
 including binned reads and all classifier calls.
 
+### Accessing data from s3 buckets
+
+You can also use the Onyx client to find the `s3` path where the taxon
+reports are stored. These can then be directly downloaded for further analysis.
+
+=== "CLI"
+
+    ```console
+    (onyx) jovyan:~$ onyx filter mscape --field control_type_details=zymo-mc_D6300 --include "taxon_reports"
+    [
+    {
+        "taxon_reports": "s3://mscape-published-taxon-reports/CLIMB_ID_1/"
+    },
+    {
+        "taxon_reports": "s3://mscape-published-taxon-reports/CLIMB_ID_2/"
+    },
+    {
+        "taxon_reports": "s3://mscape-published-taxon-reports/CLIMB_ID_3/"
+    }
+    ]
+    ```
+    where `CLIMB_ID_i` will be CLIMB ID of the sample. 
+    These can be inspect and downloaded using either of the `s3cmd` or `aws s3` commands.
+    For example
+    ```console
+    (onyx) jovyan:~$ s3cmd ls s3://mscape-published-taxon-reports/CLIMB_ID_1/
+    2024-04-26 14:04   163K  s3://mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken.json
+    2024-04-26 14:04    28M  s3://mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_assignments.tsv
+    2024-04-26 14:04   457K  s3://mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.json
+    2024-04-26 14:04   133K  s3://mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.txt
+    (onyx) jovyan:~$ s3cmd get s3://mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.txt
+    download: 's3://mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.txt' -> './CLIMB_ID_1_PlusPF.kraken_report.txt'  [1 of 1]
+     136562 of 136562   100% in    0s   988.65 KB/s  done
+    ```
+
+=== "Python"
+
+    ```python
+    for i in client.filter("mscape", fields={"control_type_details": "zymo-mc_D6300"}, include=["taxon_reports"]):
+        print(i)
+    ```
+    will give something like
+    ```
+    {'taxon_reports': 's3://mscape-published-taxon-reports/CLIMB_ID_1/'}
+    {'taxon_reports': 's3://mscape-published-taxon-reports/CLIMB_ID_2/'}
+    {'taxon_reports': 's3://mscape-published-taxon-reports/CLIMB_ID_3/'}
+    ```
+    Which can either be downloaded using the `s3cmd` or `aws s3` commands shown 
+    in the CLI tab of this block, or using a python library capable of reading 
+    from s3, such as [`s3fs`](https://s3fs.readthedocs.io).
+    ```python
+    import s3fs  # Install into conda environment first!
+    s3 = s3fs.S3FileSystem()
+    s3.ls("s3://mscape-published-taxon-reports/CLIMB_ID_1/")
+    ```
+    which will show the files in that s3 path
+    ```
+    ['mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken.json',
+     'mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_assignments.tsv',
+     'mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.json',
+     'mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.txt']
+    ```
+    which you can then download using
+    ```python
+    s3.get_file("mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.txt", ".")
+    ```
+    or read directly as if it were any other file on your system
+    ```python
+    with s3.open("mscape-published-taxon-reports/CLIMB_ID_1/CLIMB_ID_1_PlusPF.kraken_report.txt", "r") as f:
+        # do something with file
+    ```
+
 ## Tips
 
 ### `jq`
@@ -238,16 +327,16 @@ client = OnyxClient(config=config)
 
 # Perform several onyx operations in this block
 with client:
-	# Get the first entry in the database for the mscape project
+    # Get the first entry in the database for the mscape project
     first_entry = next(client.filter(project="mscape"))
-	
-	# Get the CLIMB ID of the entry
+    
+    # Get the CLIMB ID of the entry
     climb_id = first_entry["climb_id"]
-	
-	# Get the full record for this CLIMB ID using the `get` method
+    
+    # Get the full record for this CLIMB ID using the `get` method
     full_record = client.get(project="mscape", climb_id=climb_id)
-	
-	# Count the number of taxa_files
+    
+    # Count the number of taxa_files
     n_taxa_files = len(full_record["taxa_files"])
     print(f"CLIMB_ID: {climb_id} has {n_taxa_files} taxa files")
 ```

From bb2c2ee81890989cb94be07b0c4494ca4e554e2f Mon Sep 17 00:00:00 2001
From: "Thomas Neep (Advanced Research Computing)" <t.j.neep@bham.ac.uk>
Date: Thu, 11 Sep 2025 10:48:11 +0100
Subject: [PATCH 5/6] Add information about output formats

---
 docs/analyse.md | 49 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/docs/analyse.md b/docs/analyse.md
index 20669a3..82a5133 100644
--- a/docs/analyse.md
+++ b/docs/analyse.md
@@ -4,7 +4,7 @@
 
 Once data and metadata have been ingested into the Onyx database, you
 can query it using the Onyx client, which provides a command line interface (CLI)
-and Python API.  This tutorial is intended as a basic demonstration of what is 
+and Python API. This tutorial is intended as a basic demonstration of what is
 possible. All capabilities of the Onyx client can be found in the
 [`onyx-client` documentation](https://climb-tre.github.io/onyx-client/).
 
@@ -61,7 +61,7 @@ In each case you can choose between the Python API or the command-line interface
 
         In all the Python API examples, arguments will be
         explicitly passed as keyword arguments e.g. `arg=value`,
-        however, in all cases shown on this page, the argument names 
+        however, in all cases shown on this page, the argument names
         can be omitted.
 
 ### Profile
@@ -123,6 +123,35 @@ To see every entry in the entire database for a particular project we can do
 On its own, this command queries the database with *no* filters, and
 could return thousands of entries.
 
+### Output formats
+
+The default behaviour of Onyx is to return data as JSON. If you prefer
+your data to be in a different format then that is possible.
+
+=== "CLI"
+
+    To get data in `csv` or `tsv` format, simply add the `--format <csv/tsv>`
+    option to your filter command. For example, to get the data in csv format
+    rather than JSON, you can do
+
+    ```console
+    (onyx) jovyan:~$ onyx filter mscape --format csv
+    ```
+
+=== "Python"
+
+    The Python client has [a method to write your data to a csv file](https://climb-tre.github.io/onyx-client/api/documentation/client/#onyx.OnyxClient.to_csv).
+    It can often be convenient to use a library like
+    [`pandas`](https://pandas.pydata.org) to perform analysis.
+    You can easily create a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) like so
+    ```python
+    import pandas as pd  # Install into conda environment first!
+    df = pd.DataFrame(client.filter(project="mscape"))
+    ```
+    You cnan then write your data to
+    [any of the output formats](https://pandas.pydata.org/docs/user_guide/io.html)
+    supported by `pandas`.
+
 ### Fields
 
 We can see what fields exist in a particular database with
@@ -198,7 +227,7 @@ Likewise, should we want to *exclude* certain fields, that is also possible
     query = {"control_type_details": "zymo-mc_D6300"}
     fields_to_exclude = ["batch_id", "study_id"]
     # client.filter returns a generator that we can iterate over
-    entries = client.filter("mscape", fields=query, exclude=fields_to_exclude)
+    entries = client.filter(project="mscape", fields=query, exclude=fields_to_exclude)
     ```
 
 ### Taxonomic information
@@ -243,7 +272,7 @@ reports are stored. These can then be directly downloaded for further analysis.
     }
     ]
     ```
-    where `CLIMB_ID_i` will be CLIMB ID of the sample. 
+    where `CLIMB_ID_i` will be CLIMB ID of the sample.
     These can be inspect and downloaded using either of the `s3cmd` or `aws s3` commands.
     For example
     ```console
@@ -260,7 +289,7 @@ reports are stored. These can then be directly downloaded for further analysis.
 === "Python"
 
     ```python
-    for i in client.filter("mscape", fields={"control_type_details": "zymo-mc_D6300"}, include=["taxon_reports"]):
+    for i in client.filter(project="mscape", fields={"control_type_details": "zymo-mc_D6300"}, include=["taxon_reports"]):
         print(i)
     ```
     will give something like
@@ -269,8 +298,8 @@ reports are stored. These can then be directly downloaded for further analysis.
     {'taxon_reports': 's3://mscape-published-taxon-reports/CLIMB_ID_2/'}
     {'taxon_reports': 's3://mscape-published-taxon-reports/CLIMB_ID_3/'}
     ```
-    Which can either be downloaded using the `s3cmd` or `aws s3` commands shown 
-    in the CLI tab of this block, or using a python library capable of reading 
+    Which can either be downloaded using the `s3cmd` or `aws s3` commands shown
+    in the CLI tab of this block, or using a python library capable of reading
     from s3, such as [`s3fs`](https://s3fs.readthedocs.io).
     ```python
     import s3fs  # Install into conda environment first!
@@ -329,13 +358,13 @@ client = OnyxClient(config=config)
 with client:
     # Get the first entry in the database for the mscape project
     first_entry = next(client.filter(project="mscape"))
-    
+
     # Get the CLIMB ID of the entry
     climb_id = first_entry["climb_id"]
-    
+
     # Get the full record for this CLIMB ID using the `get` method
     full_record = client.get(project="mscape", climb_id=climb_id)
-    
+
     # Count the number of taxa_files
     n_taxa_files = len(full_record["taxa_files"])
     print(f"CLIMB_ID: {climb_id} has {n_taxa_files} taxa files")

From 27de0d53e7eba1c58b8a62b726759ba9d99de1da Mon Sep 17 00:00:00 2001
From: "Thomas Neep (Advanced Research Computing)" <t.j.neep@bham.ac.uk>
Date: Thu, 11 Sep 2025 12:04:21 +0100
Subject: [PATCH 6/6] Use Tom B's suggested example

---
 docs/analyse.md | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/docs/analyse.md b/docs/analyse.md
index 82a5133..79b156d 100644
--- a/docs/analyse.md
+++ b/docs/analyse.md
@@ -349,6 +349,7 @@ the onyx database in a single code block e.g. in a `for` loop. Then we
 recommend you use the `OnyxClient` as a context manager.
 
 ```python
+from onyx.exceptions import OnyxHTTPError
 # ...
 # Setup omitted
 # ...
@@ -356,18 +357,26 @@ client = OnyxClient(config=config)
 
 # Perform several onyx operations in this block
 with client:
-    # Get the first entry in the database for the mscape project
-    first_entry = next(client.filter(project="mscape"))
-
-    # Get the CLIMB ID of the entry
-    climb_id = first_entry["climb_id"]
-
-    # Get the full record for this CLIMB ID using the `get` method
-    full_record = client.get(project="mscape", climb_id=climb_id)
-
-    # Count the number of taxa_files
-    n_taxa_files = len(full_record["taxa_files"])
-    print(f"CLIMB_ID: {climb_id} has {n_taxa_files} taxa files")
+    try:
+        records = client.filter(
+            project="mscape",
+            fields={
+                "control_type_details": "zymo-mc_D6300",
+                "published_date__range": ["2025-01-01", "2025-05-01"],
+            },
+            include=["climb_id", "published_date", "taxon_reports"],
+        )
+
+        for record in records:
+            climb_id = record["climb_id"]
+
+            full_record = client.get(project="mscape", climb_id=climb_id)
+
+            n_taxa_files = len(full_record["taxa_files"])
+            print(f"CLIMB_ID: {climb_id} has {n_taxa_files} taxa files entries")
+
+    except OnyxHTTPError as e:
+        print(e.response.json())
 ```
 
 This is more efficient that not using the context manager as the