Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
934a3a2
Added 'heasarc_catalog_contents' to the heasarc_catalogs_index file.
Feb 4, 2026
9d3fba3
Adding a new bite-size skeleton for the 'exploring the contents of he…
Feb 4, 2026
7f165d8
Started filling out the heasarc_catalog_contents.md sections a bit mo…
Feb 5, 2026
f2918d1
Fleshed out section 1 of the heasarc_catalog_contents.md notebook. Fo…
Feb 5, 2026
97329dd
Making progress on heasarc_catalog_contents.md notebook for issue #21…
Feb 5, 2026
966e8c4
Added more context about ADQL and how we're counting rows to the heas…
Feb 5, 2026
85aacc3
Added a link to a short course on ADQL to the heasarc_catalog_content…
Feb 5, 2026
11518de
Fleshed out section 3 of the heasarc_catalog_contents.md notebook. Fo…
Feb 5, 2026
c9fef6e
Maybe finished off the heasarc_catalog_contents.md notebook. Added th…
Feb 5, 2026
1638901
Put some single quotes around the name of the other catalog tutorial …
Feb 5, 2026
b4454e7
Added the Fornax run time to the heasarc_catalog_contents.md. Should …
Feb 5, 2026
16373ad
Fixing some broken links in heasarc_catalog_contents.md. For issue #2…
Feb 5, 2026
16d208d
Links are no longer broken but there is an extra ./ at the beginning.…
Feb 5, 2026
ee0bb80
Merge branch 'main' into notebook/biteSizeUsingHEASARCCatalogs
DavidT3 Feb 5, 2026
4ed6717
Changed the URL to the NAVO catalog queries tutorial in the heasarc_c…
Feb 6, 2026
7ca18f7
Updated last modified dates in heasarc_catalog_contents.md
Feb 6, 2026
02df9b9
Fixed the incorrect statement about SELECT * that I made in heasarc_c…
Feb 12, 2026
65d238b
Specified the number of rows to retrieve when fetching the whole ACCE…
Feb 12, 2026
613b298
Added TOP statements to each ADQL query in the heasarc_catalog_conten…
Feb 12, 2026
1618eee
Needed to make some strings f-strings in the heasarc_catalog_contents…
Feb 12, 2026
da6b0af
Merge branch 'main' into notebook/biteSizeUsingHEASARCCatalogs
DavidT3 Feb 12, 2026
abde493
Added a comment to a code cell in heasarc_catalog_contents.md
Feb 12, 2026
ad77741
Added a comment to a code cell in heasarc_catalog_contents.md noteboo…
Feb 12, 2026
0357753
Made a tiny change to trigger a rebuild in order to debug why the art…
Feb 23, 2026
1a470c1
Another small change to trigger a rebuild, then test the redirector a…
Feb 23, 2026
0069cda
Altered the heasarc_catalog_contents.md to use Astroquery pre-release…
Mar 5, 2026
6335c88
Continue modifying heasarc_catalog_contents.md to use pre-release Ast…
Mar 5, 2026
9e6a6bb
Rolled back the non-ADQL way of counting the number of rows in a cata…
Mar 5, 2026
193d040
Perhaps finalized the changes to the heasarc_catalog_contents.md note…
Mar 5, 2026
acaa002
Changed the heasarc_catalog_contents.md notebook so that it doesn't i…
Mar 5, 2026
fe0d4e6
Merge branch 'main' into notebook/biteSizeUsingHEASARCCatalogs
DavidT3 Mar 5, 2026
669e1df
Install astroquery pre-release in the CircleCI config.yml
Mar 5, 2026
a0fa24e
Install astroquery pre-release in the CircleCI config.yml
Mar 5, 2026
3068246
Throw away commit to check if the table rendering in heasarc_catalog_…
Mar 5, 2026
14462b1
Throw away commit to check if the table rendering in heasarc_catalog_…
Mar 5, 2026
706f8fd
Hopefully fixed the odd rendering by pinning sphinx and myst-nb versions
Mar 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -129,15 +129,16 @@ jobs:
name: Installing extra dependencies
# TODO THIS METHOD OF DEFINING DEPS IS NOT GOOD ENOUGH, EVEN FOR A TEMPORARY SOLUTION
command: |
micromamba install -y -c conda-forge -n heasoft astroquery pyvo tqdm aplpy s3fs boto3 scikit-learn umap-learn
micromamba install -y -c conda-forge -n heasoft pyvo tqdm aplpy s3fs boto3 scikit-learn umap-learn
micromamba run -n heasoft pip install xga
micromamba run -n heasoft pip install --pre astroquery --upgrade
micromamba install -y -c conda-forge -n sas astroquery pyvo tqdm aplpy s3fs boto3
micromamba run -n sas pip install xga

- run:
name: Create the Sphinx build environment
command: |
micromamba create -n build_docs -y -c conda-forge sphinx sphinx-book-theme sphinx-copybutton myst-nb
micromamba create -n build_docs -y -c conda-forge "sphinx<9" sphinx-book-theme sphinx-copybutton "myst-nb<1.4.0"

# To ensure that the build environment can activate the HEASoft, CIAO, etc. kernels when required
- run:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
---
authors:
- name: David Turner
affiliations: ['University of Maryland, Baltimore County', 'HEASARC, NASA Goddard']
email: djturner@umbc.edu
orcid: 0000-0001-9658-1396
website: https://davidt3.github.io/
date: '2026-03-05'
file_format: mystnb
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.17.3
kernelspec:
display_name: heasoft
language: python
name: heasoft
title: Exploring the contents of HEASARC catalogs using Python
---

# Exploring the contents of HEASARC catalogs using Python

## Learning Goals

This notebook will teach you:
- How to retrieve and explore a HEASARC catalog's column names, descriptions, and units.
- How to retrieve the entire contents of a HEASARC catalog.
- How to retrieve a subset of a HEASARC catalog with easy-to-use Astroquery features.

## Introduction

This bite-sized tutorial will show you how to retrieve and explore the contents of HEASARC catalogs in Python.

To learn how to use Python to search for a particular HEASARC catalog, please see the '{doc}`Find specific HEASARC catalogs using Python <finding_relevant_heasarc_catalog>`' tutorial.

### Runtime

As of 5th March 2026, this notebook takes ~30 s to run to completion on Fornax using the 'small' server with 8GB RAM/ 2 cores.

## Imports

This notebook uses features from an Astroquery pre-release. You will need to install
the latest version using the command below. We will remove this once Astroquery
v0.4.12 is officially released.

```
pip install --pre astroquery --upgrade
```

```{code-cell} python
from astroquery.heasarc import Heasarc
```

***

## 1. Listing a HEASARC catalog's columns

For this demonstration, we're assuming that you already have a HEASARC-hosted catalog
in mind; if not, you might find the
'{doc}`Find specific HEASARC catalogs using Python <finding_relevant_heasarc_catalog>`'
tutorial useful.

We will use the Archive of Chandra Cluster Entropy Profile Tables (ACCEPT) catalog
([Cavagnolo K. W. et al. 2009](https://ui.adsabs.harvard.edu/abs/2009ApJS..182...12C/abstract))
as an example.

The best way to get an idea of a catalog's contents is to list the column
names and descriptions. We can do this using the `Heasarc.list_columns(...)`
method, passing the name of the catalog as the first argument.

Each HEASARC catalog has a subset of 'standard' columns that will be returned by
default, which is why the table below contains only a few column names, descriptions, and
units even though this catalog has *79* columns:

```{code-cell} python
# Pass the name of the catalog as the first argument
Heasarc.list_columns("acceptcat")
```

If, as is likely, you want to examine the full set of columns, you can pass the
`full=True` argument:

```{code-cell} python
all_accept_cols = Heasarc.list_columns("acceptcat", full=True)
all_accept_cols
```

If you examine the output of the above cell, you'll notice that only part of the
table has been displayed; this is a common behavior when displaying long tables in
Jupyter notebooks, across multiple modules (e.g., Astropy, Pandas, etc.), as 'printing'
many lines can dramatically affect Jupyter's performance.

On the other hand, a table like this isn't going to destroy your computer, so we can
safely sidestep this issue by using the `pprint_all()` method of the `list_columns()`
output (an Astropy `Table` object, more on them in [Section 4](#4-interacting-with-heasarc-catalog-contents)):

```{code-cell} python
# The 'pprint' stands for 'pretty print'
all_accept_cols.pprint_all()
```

## 2. Retrieving the entire contents of a HEASARC catalog

The simplest use case of a HEASARC catalog is that you want to retrieve the
entire table.

We can easily fetch the entire catalog using Astroquery functions, but
before we do, we should check how many rows there are - we want to know what we're
getting into with respect to the size of the table.

Counting the rows in a HEASARC catalog involves writing a very simple 'Astronomical
Data Query Language' (ADQL) query.

ADQL is a cousin of the extremely popular 'Structured Query Language' (SQL) that has
been used for database management in industry for many years; the syntax is similar, but
with additions specific to astronomical searches.

We use the `COUNT(*)` function to return the number of rows in a table:

```{code-cell} python
# Send query designed to count the rows of a catalog
accept_nrow_res = Heasarc.query_tap("SELECT COUNT(*) FROM acceptcat")

# Store the integer number of rows in a variable
accept_nrows = accept_nrow_res["count"][0]

# Visualize the returned table
accept_nrow_res
```

From the output above, we can see that there are 'only' 240 rows in the catalog; combine that information with
the number of columns (which we explored in [Section 1](#1-listing-a-heasarc-catalogs-columns)), and you
get a sense of the table's scale.

As the ACCEPT catalog is quite small (relatively speaking), we can retrieve the whole table without worrying
about download time or memory issues.

```{seealso}
A general tutorial on the many uses and features of ADQL is out of the scope of this
bite-sized demonstration. Various resources for learning ADQL are available online, such
as [this short course](https://docs.g-vo.org/adql/) ([Demleitner M. and Heinl H. 2024](https://dc.g-vo.org/voidoi/q/lp/custom/10.21938/uH0_xl5a6F7tKkXBSPnZxg)),
or the NASA Astronomical Virtual Observatories (NAVO)
[catalog queries tutorial](https://nasa-navo.github.io/navo-workshop/content/reference_notebooks/catalog_queries.html).
```

On the other hand, HEASARC hosts much larger catalogs than ACCEPT. The Chandra Source
Catalog 2 (CSC 2; [Evans I. N. et al. 2024](https://ui.adsabs.harvard.edu/abs/2024ApJS..274...22E/abstract)),
for instance:

```{code-cell} python
# Same again, but CSC
Heasarc.query_tap("SELECT COUNT(*) FROM csc")
```

```{warning}
For large catalogs like the CSC, we do not recommend retrieving the entire table at once.
```

Finally, now we know that retrieving the entire ACCEPT catalog is reasonable, we can
use the `query_region(...)` method of `Heasarc` to do just that. Few arguments are
required:
- `catalog="acceptcat"` - specifies the name of the catalog table to retrieve.
- `spatial="all-sky"` - overrides `query_region`'s default behavior of searching a catalog around a given coordinate, and instead considers the entire catalog.
- `columns="*"` - specifies that all columns should be returned (otherwise you will get a small subset, as discussed in [Section 1](#1-listing-a-heasarc-catalogs-columns).

```{code-cell} python
accept_cat = Heasarc.query_region(catalog="acceptcat", spatial="all-sky", columns="*")
accept_cat
```

## 3. Retrieving a subset of a HEASARC catalog

If you aren't interested in the _entire_ catalog, then we can also use Astroquery to
impose some restrictions on the rows we retrieve, based on the values of certain columns.

For example, perhaps we're only interested in galaxy clusters with a $z>0.4$. We saw in
[Section 1](#1-listing-a-heasarc-catalogs-columns) that the ACCEPT catalog includes
a column called `redshift`, we can use that to filter the rows we retrieve.

The `query_region(...)` call below will return all columns (`columns="*"`), and all rows where the
value of the `redshift` column is greater than 0.4 (`column_filters={"redshift": (">", "0.4")}`):

```{code-cell} python
accept_cat_higherz = Heasarc.query_region(
catalog="acceptcat",
spatial="all-sky",
column_filters={"redshift": (">", "0.4")},
columns="*",
)
accept_cat_higherz
```

If we want to further restrict the results, we can use boolean operators to add extra
filtering conditions. Here, for instance, we've decided we only want the
higher-redshift, low-central-entropy, galaxy clusters to be returned:

```{code-cell} python
accept_cat_higherz_lowk = Heasarc.query_region(
catalog="acceptcat",
spatial="all-sky",
column_filters={"redshift": (">", "0.4"), "bf_core_entropy_1": ("<", "15")},
columns="*",
)
accept_cat_higherz_lowk
```

## 4. Interacting with HEASARC catalog contents

The returns from our calls to the `Heasarc.query_region` method are Astropy
`Table` objects:

```{code-cell} python
type(accept_cat)
```

You can extract information similarly to a Pandas `DataFrame`; e.g., indexing with a
column name string retrieves the entries in that column:

```{code-cell} python
# Extract source names from our subset of the ACCEPT catalog
accept_cat_higherz_lowk["name"]
```

To retrieve the entries for a **row** in the table, you can index with an
integer; e.g., `0` for the first row:

```{code-cell} python
accept_cat_higherz_lowk[0]
```

You can also convert the return to a Pandas `DataFrame` if you prefer working with
one of these data structures:

```{code-cell} python
accept_cat_higherz_lowk_pd = accept_cat_higherz_lowk.to_pandas()
accept_cat_higherz_lowk_pd
```

## About this notebook

Author: David Turner, HEASARC Staff Scientist

Updated On: 2026-03-05

+++

### Additional Resources

Support: [HEASARC Helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback?selected=heasarc)

[Latest Astroquery Documentation](https://astroquery.readthedocs.io/en/latest/)

[Short Course on ADQL Website](https://docs.g-vo.org/adql/)

[NAVO catalog queries tutorial](https://nasa-navo.github.io/navo-workshop/content/reference_notebooks/catalog_queries.html#using-the-tap-to-cross-correlate-and-combine)

[Latest PyVO Documentation](https://pyvo.readthedocs.io/en/latest/)

### Acknowledgements

### References

[Ginsburg, Sipőcz, Brasseur et al. (2019)](https://ui.adsabs.harvard.edu/abs/2019AJ....157...98G/abstract) - _astroquery: An Astronomical Web-querying Package in Python_

[Cavagnolo K. W., Donahue M., Voit G. M., Sun M. (2009)](https://ui.adsabs.harvard.edu/abs/2009ApJS..182...12C/abstract) - _Intracluster Medium Entropy Profiles for a Chandra Archival Sample of Galaxy Clusters_

[Evans I. N., Evans J. D., Martínez-Galarza J. R., Miller J. B. et al. (2024)](https://ui.adsabs.harvard.edu/abs/2024ApJS..274...22E/abstract) - _The Chandra Source Catalog Release 2 Series_

[Chandra Source Catalog 2 DOI - doi:10.25574/csc2](https://doi.org/10.25574/csc2)

[Demleitner M. and Heinl H. (2024)](https://dc.g-vo.org/voidoi/q/lp/custom/10.21938/uH0_xl5a6F7tKkXBSPnZxg) - _A Short Course on ADQL; Virtual Observatory Resource_
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,6 @@ caption: HEASARC catalog tutorials
---

finding_relevant_heasarc_catalog

heasarc_catalog_contents
```
Loading