libhmm — C++20 Hidden Markov Model Library

Fit Hidden Markov Models with Baum-Welch EM, decode sequences with Viterbi or posterior decoding, and run exact probabilistic inference at native C++ speed — no external runtime required.

Zero external dependencies — C++20 standard library only.

Use cases

HMMs model systems where discrete latent states drive observable outputs. libhmm fits naturally when you need to:

Movement ecology — classify behavioral states (foraging, travelling, resting) from GPS step lengths and turning angles using Gamma, Rayleigh, or Weibull emissions
Financial regime detection — identify bull/bear/volatile market states from return series with Gaussian or Student-t emissions
Sensor analysis — decode activity states from accelerometer or IMU data in embedded or real-time C++ systems
Bioinformatics — fit discrete-emission HMMs to biological sequences (CpG islands, gene prediction, splice sites)
Any C++ application — embed HMM inference without pulling in an R or Python runtime as a dependency

If you are doing exploratory analysis, R packages (moveHMM, depmixS4) or Python libraries (hmmlearn, pomegranate) may be more ergonomic. libhmm is the right choice when the model needs to run inside a C++ application or pipeline.

Features

Training Algorithms

Baum-Welch — canonical log-space EM; works with any EmissionDistribution via weighted fit()
MAP Baum-Welch — adds symmetric Dirichlet priors on A, π, and discrete emissions; prevents degenerate zero-probability transitions on sparse data. c = 0 recovers standard MLE exactly. Use computeLogPrior() for the correct convergence criterion (likelihood alone is not monotone when c > 0).
Viterbi Training — hard-assignment with TrainingConfig presets (fast, balanced, precise)
Segmental K-Means — for discrete HMMs; useful as initialiser before EM

Inference

ForwardBackward — canonical log-space calculator; returns probability(), getLogProbability(), and decodePosterior() (per-step argmax-γ decoding, minimising per-step state error rate)
Viterbi — canonical log-space decoder; returns the MAP joint state sequence

Use decodePosterior() when per-step annotation accuracy matters (e.g. gene prediction). Use Viterbi when whole-sequence structural coherence is required (e.g. speech alignment).

Both calculators call getBatchLogProbabilities() per state per time step, enabling SIMD acceleration directly at the distribution layer.

Model Selection

count_free_parameters(hmm) — free parameters = N*(N-1) transitions + (N-1) initial + sum of emission params
compute_aic(logL, k), compute_bic(logL, k, n), compute_aicc(logL, k, n) — lower is better
evaluate_model(hmm, logL, n) — returns HmmModelCriteria{aic, bic, aicc} in one call

ForwardBackwardCalculator fbc(hmm, obs);
HmmModelCriteria mc = evaluate_model(hmm, fbc.getLogProbability(), obs.size());
std::cout << "AIC: " << mc.aic << "  BIC: " << mc.bic << "\n";

Probability Distributions (16)

Discrete: DiscreteDistribution, BinomialDistribution, NegativeBinomialDistribution, PoissonDistribution

Continuous: GaussianDistribution, ExponentialDistribution, GammaDistribution, LogNormalDistribution, BetaDistribution, UniformDistribution, WeibullDistribution, ParetoDistribution, RayleighDistribution, StudentTDistribution, ChiSquaredDistribution, VonMisesDistribution

All distributions implement getBatchLogProbabilities() for SIMD-accelerated batch evaluation. GaussianDistribution and ExponentialDistribution have explicit AVX-512/AVX2/SSE2/NEON intrinsics (tier 2); the remaining 13 use concrete non-virtual loops that the compiler auto-vectorizes under -march=native.

I/O

JSON serialization (recommended): save_json(hmm, path) / load_json(path) — exact IEEE 754 round-trip, no external dependencies, locale-safe.
```
libhmm::save_json(hmm, "model.json");
auto hmm2 = libhmm::load_json("model.json");
```
Legacy XML (XMLFileReader / XMLFileWriter): retained for reading existing .xml files; deprecated in favour of JSON for new code.
See samples/ for ready-to-use reference HMM files in both formats.

Performance

Compile-time SIMD dispatch: each machine builds for its own CPU
- GCC/Clang: -march=native (AVX-512 on capable x86, NEON on AArch64)
- MSVC: /arch:AVX512, /arch:AVX2, or /arch:AVX (CPU-verified at configure time)
Log-space throughout: no numerical underflow on long sequences
Pre-computed log transition matrices: amortised once per compute() call

Quick Start

Build

git clone https://github.com/OldCrow/libhmm.git
cd libhmm
cmake -B build
cmake --build build --config Release
ctest --test-dir build

On Windows with Visual Studio:

cmake -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release --parallel 4
ctest --test-dir build -C Release --parallel 4

On macOS Catalina (10.15), use the guarded configure path:

./scripts/configure_catalina.sh build
cmake --build build --config Release
ctest --test-dir build

Basic Usage

#include "libhmm/libhmm.h"
using namespace libhmm;

// Create a 2-state HMM with Gaussian emissions
Hmm hmm(2);

Matrix trans(2, 2);
trans(0, 0) = 0.9; trans(0, 1) = 0.1;
trans(1, 0) = 0.2; trans(1, 1) = 0.8;
hmm.setTrans(trans);

Vector pi(2); pi(0) = 0.6; pi(1) = 0.4;
hmm.setPi(pi);

hmm.setDistribution(0, std::make_unique<GaussianDistribution>(0.0, 1.0));
hmm.setDistribution(1, std::make_unique<GaussianDistribution>(5.0, 1.5));

// Train with Baum-Welch (works with any EmissionDistribution)
ObservationLists obs = { /* your sequences */ };
BaumWelchTrainer trainer(&hmm, obs);
trainer.train();

// Evaluate
ForwardBackwardCalculator fbc(hmm, obs[0]);
double log_p = fbc.getLogProbability();

// Decode - two strategies:
ViterbiCalculator vc(hmm, obs[0]);
StateSequence map_path      = vc.decode();           // MAP joint path
StateSequence marginal_path = fbc.decodePosterior(); // per-step argmax-gamma

MAP Baum-Welch

// c=1 (Laplace smoothing) prevents zero transitions on sparse data.
MapBaumWelchTrainer map_trainer(&hmm, obs, /*pseudo_count=*/1.0);

double prev = -std::numeric_limits<double>::infinity();
for (int i = 0; i < 50; ++i) {
    map_trainer.train();
    // Correct MAP convergence criterion: logL + log P(lambda|c)
    double logL   = ForwardBackwardCalculator(hmm, obs[0]).getLogProbability();
    double mapObj = logL + map_trainer.computeLogPrior();
    if (mapObj - prev < 1e-6) break;
    prev = mapObj;
}

ViterbiTrainer Presets

// Fast convergence for interactive use
ViterbiTrainer fast_trainer(&hmm, obs, training_presets::fast());
fast_trainer.train();

// Precise convergence for final models
ViterbiTrainer precise_trainer(&hmm, obs, training_presets::precise());
precise_trainer.train();
std::cout << "Converged: " << precise_trainer.hasConverged() << "\n";

Project Structure

libhmm/
├── include/libhmm/    # Public headers (layered architecture)
│   ├── platform/      # Layer 0: SIMD detection
│   ├── math/          # Layer 1: constants, log-space, numerics
│   ├── linalg/        # Layer 2: Matrix, Vector types
│   ├── distributions/ # Layer 3: 16 distributions + base
│   ├── hmm.h          # Core HMM class
│   ├── calculators/   # Layer 4: ForwardBackward, Viterbi
│   ├── training/      # Layer 4: BaumWelch, MapBaumWelch, Viterbi, SegmentalKMeans
│   └── io/            # JSON (hmm_json.h) + legacy XML I/O
├── src/               # Implementation (mirrors include/)
├── tests/             # 41-test GTest suite
├── examples/          # 17 usage demonstrations
├── tools/             # simd_inspection, batch_performance, hmm_validator (.json/.xml)
├── samples/           # Reference HMM files (two_state_gaussian, casino) in JSON and XML
├── benchmarks/        # Comparative benchmarks (requires external libraries)
├── docs/              # Documentation and checklists
└── CMakeLists.txt

Examples

See examples/ for demonstrations:

Example	Distribution(s)	Trainer
`basic_hmm_example`	Discrete, Gaussian, Poisson	Viterbi + JSON I/O
`baum_welch_example`	Gaussian	BaumWelch (with EM convergence table)
`viterbi_trainer_example`	Gaussian	Viterbi (preset comparison)
`student_t_hmm_example`	StudentT	BaumWelch
`poisson_hmm_example`	Poisson	Viterbi
`financial_hmm_example`	Beta, LogNormal	Viterbi
`reliability_hmm_example`	Weibull, Exponential	Viterbi
`quality_control_hmm_example`	Binomial, Uniform	Viterbi
`economics_hmm_example`	NegBinomial, Pareto	Viterbi
`queuing_theory_hmm_example`	Poisson, Exponential, Gamma	Viterbi
`statistical_process_control_hmm_example`	ChiSquared	Viterbi
`swarm_coordination_example`	Discrete (243 symbols)	—
`posterior_decoding_example`	Discrete	ForwardBackward (`decodePosterior` vs Viterbi)
`map_baum_welch_example`	Discrete	MAP Baum-Welch (c=0 vs c=1, MAP convergence table)
`elk_movement_example`	Gamma, VonMises	BaumWelch (step length + turning angle)
`dax_regime_example`	StudentT	BaumWelch (DAX log-returns 2000–2022)

Requirements

C++20 compiler: GCC 11+, Clang 14+, MSVC 2019 16.11+
CMake 3.20+

No external dependencies. GTest is fetched automatically via CMake FetchContent for the test suite.

Documentation

docs/CROSS_PLATFORM.md — build options, library output, CI matrix
docs/GOLD_STANDARD_CHECKLIST.md — distribution implementation requirements
docs/STYLE_GUIDE.md — coding conventions

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 241 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
cmake		cmake
docs		docs
examples		examples
include/libhmm		include/libhmm
performance		performance
samples		samples
scripts		scripts
src		src
tests		tests
tools		tools
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.cmake-format.yaml		.cmake-format.yaml
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

libhmm — C++20 Hidden Markov Model Library

Use cases

Features

Training Algorithms

Inference

Model Selection

Probability Distributions (16)

I/O

Performance

Quick Start

Build

Basic Usage

MAP Baum-Welch

ViterbiTrainer Presets

Project Structure

Examples

Requirements

Documentation

License

About

Uh oh!

Releases 30

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

libhmm — C++20 Hidden Markov Model Library

Use cases

Features

Training Algorithms

Inference

Model Selection

Probability Distributions (16)

I/O

Performance

Quick Start

Build

Basic Usage

MAP Baum-Welch

ViterbiTrainer Presets

Project Structure

Examples

Requirements

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 30

Contributors

Uh oh!

Languages