PyCC.id (or simply PyCC) is a flexible Python library for equation discovery. It is designed to discover the grounded ordinary differential equations (ODEs) from time-dependent data and is built on a hypothesis-driven methodology, enabling users (such as researchers and engineers) to easily incorporate prior domain knowledge into the discovery process. This approach utilizes structural skeletons that are structurally identifiable, drawing motivation from the definition and physical interpretation of characteristic curves.
This approach empowers users to explicitly propose a specific model (or a structural family of models) based on their expertise, and rigorously test whether that proposal is consistent (or not) with the given data.
By centering the workflow around hypothesis testing, PyCC provides a structured framework to help tackle several common challenges in equation discovery, such as identifiability, interpretability, and physical consistency. It also enables a modular approach that allows the use of complex functional representations by taking advantage of universality theorems from neural networks while maintaining transparency. These topics are briefly discussed below:
First time you see this library? We recommend starting with our Google Colab Notebook !
| Colab & Tutorial | Forums | Paper |
|---|---|---|
When inferring dynamical equations from experimental data (which is often noisy and sampled) multiple distinct mathematical models routinely fit the observations with comparable accuracy. This leads to ambiguities in model selection, a core issue known as the identifiability challenge. This ambiguity is intrinsically connected to the ill-posed nature of the inverse problem. Attempting to reconstruct the true physical laws from a specific data realization (constrained by sampling rate, observation window, and specific initial conditions) often results in practically unidentifiable underlying equations using generic, structure-agnostic methods.
PyCC circumvents this identifiability challenge by empowering the user to guide the search with prior physical knowledge or specific hypotheses. Specifically, the library allows the user to force a structural 'skeleton' and to easily impose additional properties or constraints to ensure the discovered models are physically sound.
The choice of the structural skeleton directly affects identifiability: while certain skeletons possess structural identifiability, others can result in non-identifiable or ambiguous representations. When the skeleton is shown to be theoretically identifiable in phase space (see, e.g., [Gonzalez2026]), PyCC offers a formal framework to analyze if the proposed equation is consistent with the data or should be discarded and reformulated. Consequently, it enables the validation or elimination of hypothesized models, providing a clear pathway through the challenge of identifiability.
Beyond identifiability, a major challenge in data-driven modeling is the interpretability of the obtained models. Even if a complex mathematical formulation fits the data perfectly, its structure can be too opaque to extract meaningful physical insights. This lack of interpretability obscures the underlying physics, leaving practitioners (at best) with accurate predictions but no understanding of the system structure.
To explicitly address this issue, PyCC relies on the concept of characteristic curves (CCs). This concept is grounded in the concept of the constitutive relation of an element. The constitutive relation links two variables and (in the scalar case) can be parametrized by a one dimensional (1D) curve known as the CC of the corresponding element. Thus, the CC completely defines the element. PyCC offers a flexible and easy notation to help the user to define the equation skeletons that incorporate unknown 1D functions and/or parameters to be discovered. If the user defines skeletons where the unknown functions correspond to the CCs of the system, the functions themselves have a physical meaning.
To illustrate this, we consider three skeleton structures in the following (which are also structurally identifiable, as shown in [Gonzalez2026]):
- First-order systems:
Here,
- Second-order systems with position-dependent friction:
In this structure,
- Second-order systems with velocity-dependent friction:
Here,
Additionally, the PyCC approach allows the users to visualize the CCs to verify their hypotheses or define new ones. This transforms an abstract mathematical representation into a direct visual tool, significantly enhancing interpretability.
For instance, consider the second-order systems with velocity-dependent friction defined above. Suppose that after using some training method, the obtained CC results in visually a straight line, thus, it is an indication that the elastic element is linear (in this case, we could add this as an additional hypothesis and retrain the model). If, instead, the CC results in a parabola, it is an indication that the elastic element is nonlinear (based on the obtained CC, we can add new hypotheses and retrain the model). This matching between the functions of the models and individual elements of the system ensures the final model maintains physical consistency and allows the user to incorporate prior insights easily.
This transparent approach based on CCs has a simple but profound implication: the objective shifts from finding precise parameter values for the CCs expanded in some basis functions to finding the functional form of the CCs themselves.
As a consequence, it allows us to use fitting methods with thousands of parameters such as neural networks (NNs) but maintaining physical consistency and interpretability.
Summary:
-
Traditional approach: "Find the coefficients
$k$ and$c$ assuming linear dynamics." - PyCC approach: "Find the shapes of the stiffness and damping curves."
Because PyCC prioritizes discovering the shape of these CCs rather than fitting predefined coefficients, the specific parametric form of the curves (e.g., whether they are polynomial, exponential, or trigonometric) does not need to be postulated a priori. This flexibility unlocks a highly modular framework that is ideal to compare different paradigms in data-driven modeling.
For instance, the CCs can be parameterized using universal approximators, such as Neural Networks (NNs). This specific implementation (referred to as the NN method) is particularly powerful for discovering complex physical laws. Backed by universal approximation theorems, the model can adapt to any continuous shape and also capture intricate dynamics such as sharp transitions and non-smooth behaviors without requiring prior mathematical intuition about the functional form.
Crucially, this approach preserves transparency. While NNs can be considered as opaque "black boxes" in high-dimensional settings, PyCC restricts them to learning strictly 1D functions. A "black box" with a single input and a single output is effectively a curve that can be plotted, visually inspected, and physically understood.
Note
The Core Philosophy: Instead of asking "What is the global equation?", PyCC asks "Given this physical structure (skeleton), what are the specific shapes of the CCs?" These curves are the constitutive relations of the system; once they are identified, the identification problem is effectively solved.
pyCC frames discovery as a hypothesis-testing loop. The user proposes a structure (e.g., "a second-order system with velocity dependent friction"), and the library determines the optimal shapes of the internal functions to decide if the hypothesized structure is coherent with the data or not (see [Gonzalez2026] for further details).
Figure 1: The pyCC workflow. (a-c) A hypothesized model structure is proposed. (d-f) A representation for the CCs is selected (via NN, SymbReg, etc.), and optional constraints are defined. (g-j) The resulting curves are inspected for physical validity and forward simulations are performed. Edited from [Gonzalez2026]
The workflow proceeds in three main stages:
-
Hypothesis & Setup: Select state variables and propose a Structural Skeleton (e.g.,
$\ddot{x} + f_1(\dot{x}) + f_2(x) = F_{ext}$ ). -
Physics-Informed Optimization: The library automatically constructs a loss function to fit the data, enforcing prior physical knowledge such as symmetries (for instance, forcing
$$f_1$$ to be an odd function). - Discovery & Validation: The outputs are the Characteristic Curves themselves. These can be visually inspected for physical meaning, converted to analytic equations via Symbolic Regression, and validated via forward simulations.
For many physical systems, the dynamics can be described by a set of first-order ordinary differential equations (ODEs):
Here,
The core philosophy of pyCC.id is to decompose this complex function
We express this decomposition as:
where:
-
$\mathbf{x}$ and$\mathbf{F}_{ext}(t)$ are the model inputs:$\mathbf{x}$ represents the dynamical variables or the state of the system; while$\mathbf{F}_{ext}(t)$ denotes a set of known, time-dependent external forces. These are the quantities typically measured and/or controlled during an experiment. -
The semicolon
;is used to separate the system variables from the components to be identified. The terms to the left are the inputs and states, while those to the right are the unknowns to be discovered, including both functional forms and scalar parameters. -
$\{\mathbf{f}\}$ is a set of functions to be discovered, referred to as the Characteristic Curves (CCs). In this framework, each function in the set depends on only a single state variable$x_i$ , ensuring high interpretability. For instance, in the context of a 1D mechanical oscillator, the$$\mathbf{G}$$ structure could be expressed as$$\ddot{x}=\mathbf{G}(x,\dot{x},\{ f_1,f_2\} ,\{m\}) $$ , where$$f_1(x)$$ represents the nonlinear stiffness (the spring),$$f_2(\dot{x})$$ represents the nonlinear damping or friction, and$$m$$ is the mass. -
$\mathbf{a}$ is a vector of scalar parameters to be discovered, such as mass, damping coefficients, or other physical constants to be identified. Within the pyCC library, these paraemeters are reserved under the names$$a_1 , a_2, \ldots, a_n$$ . -
$\mathbf{G}$ represents a proposed model structure. It represents a formal hypothesis proposed by the practitioner, defining the template that dictates how the building blocks (the functions$\{\mathbf{f}\}$ and parameters$\mathbf{a}$ ) are combined with the state$\mathbf{x}$ to compute the system's evolution.
The goal of pyCC is to discover the optimal functions
Before installing PyCC, it is highly recommended to use an isolated Python environment to manage dependencies and avoid system-wide conflicts. Miniconda provides a lightweight and efficient way to handle this.
To set up a Miniconda environment, the user should follow these steps:
- Download and install Miniconda from the official website.
- Open a terminal (or Anaconda Prompt on Windows) and create a new environment named
pycc_env(Python 3.10 or newer is recommended):conda create -n pycc_env python=3.10
- Activate the new environment
conda activate pycc_env
Once the virtual environment is active, proceed with the installation based on the target hardware.
Some features in PyCC include using the Symbolic Regression (pySR) package. To install both packages use:
pip install pycc.idTo run pyCC library on Intel XPUs, the user must first install the intel-extension-for-pytorch package compatible with their operative system. Please refer to the official instructions at https://pytorch-extension.intel.com/installation.
Below are examples for installing version v2.8.10+xpu. For Linux/WSL2 OS; first, install PyTorch and Intel extension packages:
python -m pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/xpu
python -m pip install intel-extension-for-pytorch==2.8.10+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
python -m pip install oneccl_bind_pt==2.8.0+xpu --index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/For Windows OS; use instead:
python -m pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/xpu
python -m pip install intel-extension-for-pytorch==2.8.10+xpu --index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/Final step: once the environment is set up, install the remaining packages from the PyCC library:
pip install pycc.idDownload or clone the repository and install with:
pip install -e .Let's consider a second-order nonlinear differential equation:
where
For compatibility with higher-order systems, we recommend rewritting the system into a set of first-order equations. By defining the state variables
After simulating this system, the set {$x_i$, $\dot{x}{i}$, $F{ext}$} will be used for defining the database for system identification.
With pyCC.id, you can face the identification problem in several ways:
In the functional approach, we assume the structure of the equation but leave key components as unknown functions to be discovered from data. The practitioner starts by hypothesizing the skeleton, which in this case could be a second-order system with a velocity-dependent friction force and external driving force:
This equation implies two CCs: a damping force
Figure 2: The architecture for a second-order system with a velocity-dependent friction force. Two independent neural networks (NNβ and NNβ) approximate the CCs to be discovered.
Why this architecture matters:
Crucially, this architecture enforces uniqueness and physical consistency. Even if the training data contains complex transient behaviors, the model cannot learn spurious cross-terms (like
We can express the proposed system equation as a set of two first-order equations as follows:
The goal is to find the shapes of the characteristic curves
If the practitioner has a strong hypothesis regarding specific functional forms, pyCC can be used to identify the unknown parameters directly, effectively acting as a robust parameter estimation framework. For instance, the system equations can be defined as:
The goal is to find the optimal values for the parameters
The pyCC library also enables a hybrid identification approach, combining functional and parametric methods. Practitioners can prescribe known functional forms for specific terms (anchoring the model in established physical laws) while leaving other components as unknown functions to be discovered from the data. For instance, the practitioner may define the following system equations:
Here, the objective is to simultaneously identify the unknown function
# Import the package into your Python environment
import pycc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# This example shows:
# 1) how to simulate a stick-slip second order system using pycc.simulate()
# 2) how to train the NN-CC method to identify the model [pycc.train()]
# 3) how to simulate the identified model [pycc.simulate()]
##############################################
# 1) simulating a stick-slip second order system using pycc.simulate()
# 1a) define parameters and functions
alpha=1.0;beta=0.2;delta=0.1;Omega=1.0;
x0=0.0;v0=0.0; y0=[x0,v0] # initial conditions
t_span=(0, 20); t_eval=np.linspace(*t_span, 1000)
def F1_th(x_dot):
return delta * x_dot + 0.5 * np.tanh(500*x_dot)
def F2_th(x):
return alpha * x + beta * x**3
def F_ext(t):
return np.cos(Omega * t)
# 1b) define equation
eqs_th = ['x1_dot = x2',
'x2_dot = F_ext - f1(x2) - f2(x1)']
# 1c) define simulation parameters
params_th = {
't_span': t_span,
'y0': y0,
't_eval': t_eval,
'method': 'LSODA',
'local_funcs': {'f1': lambda t: F1_th(t),'f2': lambda t: F2_th(t),'F_ext': lambda t: F_ext(t)}
}
# 1d) integrate forward the theoretical equation
sol,derivatives = pycc.simulate(eqs_th,method="Theoretical", params=params_th)
# 1e) extract data from theoretical solution
time_data = sol.t
x1_data = sol.y[0]
x2_data = sol.y[1]
x1_dot_data=derivatives[0]
x2_dot_data=derivatives[1]
F_ext_val = F_ext(time_data)
# define database for training
df = pd.DataFrame({
'x1':x1_data,
'x2':x2_data,
'x1_dot':x1_dot_data,
'x2_dot':x2_dot_data,
'F_ext': F_ext_val
})
##############################################
# 2) training a model with the NN-CC method to identify the system [pycc.train()]
# 2a) define equations to be used for identification (fi functions and ai parameters).
eqs = [
'x1_dot = x2', #*exp(a1-2.0)',
'x2_dot = F_ext - f1(x2) - f2(x1)'
]
# 2b) define constraints (optional)
constraints = [ # adding prior known information
{'constraint': 'f2(0)=0'},
{'constraint': 'f1 odd'},
{'constraint': 'f2 odd'},
]
# 2c) define training parameters (optional)
params_NN = {
'neurons': 100,
'layers':3,
'lr': 1e-4,
'epochs': 2000,
'error_threshold': 1e-6,
'extrapolation': None,
'device':'cpu',
'weight_loss_param': 1e-3,
'constraints': constraints,
}
# 2d) train/fit/identify the model
models, evals, obtained_coefs = pycc.train(df, eqs,method='NN', params=params_NN)
# plotting obtained functions f1 and f2
x_f1_cc, f1_cc, x_f2_cc, f2_cc = evals
fig, ax = plt.subplots(1, 2, figsize=(12, 6))
ax[0].plot(x_f1_cc, f1_cc, label='$f_1$ learned NN-CC')
ax[0].plot(x_f1_cc, F1_th(x_f1_cc), '--', label="$f_1$ theory")
ax[0].set_xlabel('$x_2$')
ax[0].set_ylabel('$f_1(x_2)$')
ax[0].legend()
ax[1].plot(x_f2_cc, f2_cc, label='$f_2$ learned NN-CC')
ax[1].plot(x_f2_cc, F2_th(x_f2_cc), '--', label="$f_2$ theory")
ax[1].set_xlabel('$x_1$')
ax[1].set_ylabel('$f_2(x_1)$')
ax[1].legend()
plt.tight_layout()
plt.show()
# Print learned parameters (if any)
if obtained_coefs:
print("\nLearned scalar parameters:")
for name, val in obtained_coefs.items():
print(f"{name} = {val.item():.4f}")
##############################################
# 3) simulating forward the identified model [pycc.simulate()]
### Forward simulation using the NN models
print("simulation with NN simul")
# 3a) define simulation parameters
params_NN_simul = {
'models': models,
'obtained_coefs': obtained_coefs,
'local_funcs': {'F_ext': lambda t: F_ext(t)},
't_span':t_span,
'y0': y0,
't_eval': t_eval,
'method': 'LSODA', # solve_ivp
'atol': 1e-8,
'rtol': 1e-6,
'check_nan': True
}
# 3b) integrate identified equations
sol,_ = pycc.simulate(eqs, method='NN', params=params_NN_simul)
print("Integration success:", sol.success)
time_sim=sol.t
x1_sim=sol.y[0]
x2_sim=sol.y[1]
# Identified vs theoretical solution
plt.figure()
plt.plot(time_sim, x1_sim, label="x(t) simulated NN(+sym+SR)")
plt.plot(time_data, x1_data, label="x(t) th")
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend()
plt.show()β³ Initial import delay : The first time you run import pycc, it may take around 3 minutes to set up dependencies. This is a one-time process; after that, imports will be nearly instantaneous.
First time you see this library? We recommend starting with our Google Colab Notebook !
Additionally, various tutorials and examples are available in the Tutorials folder. You can download or copy these files to your local machine or cluster, and execute them directly, for example:
python Tutorial1.py- Interpretable Models: Decomposes complex dynamics into simpler, physically meaningful functions.
- Flexible Function Parametrization: Supports various techniques to model the characteristic curves, including:
- Neural Networks (NN-CC) β Compatible with multicore CPUs and GPUs from both NVIDIA (CUDA) and Intel (XPU) architectures. GPU acceleration on Intel devices is enabled through the intel_extension_for_pytorch.
- Polynomials (Poly-CC) β Using polynomial expansion basis functions for comparison.
- Symbolic Regression (SymbR-CC) β Parallelized for multicore CPU execution, using the internal parallelization features of PySR.
- Physics-Informed Discovery: Incorporate known physical constraints, such as symmetries (e.g., even and odd functions) or conservation laws, to guide the discovery process and ensure robust, physically consistent models.
- Built-in Simulator: Includes a module for simulating higher-order and coupled ODEs, fully compatible with all identification methodologies.
- User-Focused Design: Offers an API that is both easy to use for standard problems and highly customizable for advanced research.
- Documentation and tutorials: Provides a quick-start Google Colab tutorial with an accompanying YouTube video, along with a complete documentation, examples, and recommended workflows.
General reference to this package: Gonzalez2026code
In case of using NN-CC method, please cite:
- Gonzalez, F. J. "Integrating prior knowledge in equation discovery: Interpretable symmetry-informed neural networks and symbolic regression via characteristic curves." arXiv preprint arXiv:2601.21720 (2026).
- Gonzalez, F. J. and Lara, L. P. "Interpretable neural network system identification method for two families of second-order systems based on characteristic curves." Nonlinear Dyn. 113, 33063β33086 (2025).
In case of using Poly-CC method, please cite:
- Gonzalez, F.J. "System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems." Nonlinear Dyn. 112, 16167β16197 (2024).
- Gonzalez, F.J. "Determination of the characteristic curves of a nonlinear first order system from fourier analysis." Sci. Rep., vol. 13, 1955, (2023).
In case of using post-SR and/or SymbReg-CC methods, please cite:
- Gonzalez, F. J. "Integrating prior knowledge in equation discovery: Interpretable symmetry-informed neural networks and symbolic regression via characteristic curves." arXiv preprint arXiv:2601.21720 (2026).
- Cranmer, M. "Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl." arXiv preprint arXiv:2305.01582 (2023).
@article{Gonzalez2026,
title={Integrating prior knowledge in equation discovery: Interpretable symmetry-informed neural networks and symbolic regression via characteristic curves},
author={Federico J. Gonzalez},
year={2026},
eprint={2601.21720},
archivePrefix={arXiv},
primaryClass={nlin.CD},
url={https://arxiv.org/abs/2601.21720},
}
@article{Gonzalez2025nody,
title = {{Interpretable neural network system identification method for two families of second-order systems based on characteristic curves}},
volume = {113},
ISSN = {1573-269X},
DOI = {10.1007/s11071-025-11744-6},
number = {24},
journal = {Nonlinear Dyn.},
publisher = {Springer Science and Business Media LLC},
author = {Gonzalez, Federico J. and Lara, Luis P.},
year = {2025},
month = sep,
pages = {33063β33086}
}
@article{Gonzalez2024,
title = {System identification based on characteristic curves: a mathematical connection between power series and Fourier analysis for first-order nonlinear systems},
author = {{F. J. Gonzalez}},
volume = {112},
issn = {1573-269X},
doi = {10.1007/s11071-024-09890-4},
number = {18},
journal = {Nonlinear Dyn.},
publisher = {Springer Science and Business Media LLC},
year = {2024},
month = jul,
pages = {16167β16197}
}
@article{Gonzalez2023,
title = {Determination of the characteristic curves of a nonlinear first order system from Fourier analysis},
author = {Gonzalez, Federico J.},
journal = {Sci. Rep.},
publisher = {Springer Science and Business Media LLC},
volume = 13,
number = 1,
pages = {1955},
month = feb,
year = 2023,
doi = {10.1038/s41598-023-29151-5},
}
@article{Cranmer2023PySR,
title={Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl},
author={Miles Cranmer},
journal={arXiv preprint arXiv:2305.01582},
year={2023},
eprint={2305.01582},
url={https://arxiv.org/abs/2305.01582},
}Please share your or reach out for a possible collaboration to:
- Federico J. Gonzalez: fgonzalez@ifir-conicet.gov.ar