This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Docstrings should use the ReStructuredText (reST) format. This is important for generating documentation and for consistency across the codebase. Docstrings should always start with a one-line summary followed by a more detailed paragraph - also including usage examples, for instance. If appropriate, docstrings should not only describe a method or function but also shed some light on the design rationale.
Documentation should also be appropriate in length. For simple functions, a brief docstring is sufficient. For more complex functions or classes, more detailed explanations and examples should be provided.
An example docstring may look like this:
def multiply(a: int, b: int) -> int:
"""
Multiply two integers `a` and `b`.
This function takes two integers as input and returns their product.
Example:
... code-block:: python
result = multiply(3, 4)
print(result) # Output: 12
:param a: The first integer to multiply.
:param b: The second integer to multiply.
:return: The product of the two integers.
"""
return a * b# Run all tests
python -m pytest tests/
# Run specific test file
python -m pytest tests/test_<module_name>.py
# Run tests with verbose output
python -m pytest tests/ -v# Install in development mode
uv pip install -e .This is a Python package for managing Visual Graph Datasets (VGDs) designed for graph neural networks and explainable AI research.
-
visual_graph_datasets/data.py: Dataset loading, saving, and management utilities. Key classes includeVisualGraphDatasetReaderandVisualGraphDatasetWriter. -
visual_graph_datasets/processing/: Domain-specific graph processing modules:base.py: Core processing interfaces and base classesmolecules.py: SMILES/molecular graph processingcolors.py: Color graph processinggeneric.py: Generic graph processing
-
visual_graph_datasets/visualization/: Graph visualization utilities:base.py: Core visualization functionsimportances.py: Importance/attribution visualizationmolecules.py: Molecular visualizationcolors.py: Color graph visualization
-
visual_graph_datasets/generation/: Synthetic dataset generation utilities -
visual_graph_datasets/experiments/: Experiment scripts for dataset creation, especiallygenerate_molecule_dataset_from_csv.pywhich serves as a base for creating molecular datasets from CSV files
VGDs store each graph as two files:
- A JSON file containing the full graph representation (nodes, edges, attributes, positions)
- A PNG file with the canonical visualization
Key graph structure in JSON:
node_indices,node_attributes: Node dataedge_indices,edge_attributes: Edge datanode_positions: Pixel coordinates in the visualizationnode_importances_*,edge_importances_*: Ground truth explanations (optional)
The package provides a CLI via visual_graph_datasets.cli for:
- Downloading datasets from remote providers
- Listing available datasets
- Managing configuration
Uses visual_graph_datasets/config.py with YAML configuration files stored in $HOME/.visual_graph_datasets/config.yaml.
Most dataset generation is done through experiment files in visual_graph_datasets/experiments/. To create a new molecular dataset from CSV:
- Create a new experiment file extending
generate_molecule_dataset_from_csv.py - Set required parameters:
CSV_FILE_NAME,SMILES_COLUMN_NAME,TARGET_TYPE,TARGET_COLUMN_NAMES,DATASET_NAME - Run the experiment to generate the VGD dataset
Key dependencies include:
rdkit: Molecular processingnetworkx: Graph operationsmatplotlib: Visualizationnumpy: Numerical operationspycomex: Experiment management framework