feat: JOSS paper draft — paper/, figures, CONTRIBUTING.md, Python data prep scripts#20

Draft

OldCrow wants to merge 22 commits into

mainfrom

Owner

OldCrow commented May 18, 2026

JOSS paper draft

Adds the initial JOSS submission materials for libhmm v3.7.0.

Contents

paper/

paper.md — JOSS submission draft
paper.bib — bibliography (16 entries)
figures/generate_figures.py — generates all three paper figures from embedded/CSV data
figures/figure1_speedup.{pdf,png} — wall-time comparison: libhmm vs R packages (log scale)
figures/figure2_convergence.{pdf,png} — ECME EM convergence on DAX 2000–2022; shows 152-nat improvement over kurtosis MOM and fHMM surpassed at iteration 5
figures/figure3_wind_boundary.{pdf,png} — VonMisesDistribution vs Normal boundary failure at 0°/360°; wind rose + disagreement rate per 30° bin

CONTRIBUTING.md — build/test instructions for macOS (Catalina+, Apple Clang) and Windows (VS 2022)

scripts/prepare_dax_data.py — Python equivalent of prepare_dax_data.R; downloads DAX and S&P 500 log-returns via yfinance (for systems without R)

scripts/prepare_wind_data.py — Python equivalent of prepare_wind_data.R; downloads NOAA ISD O'Hare 2015 wind data via urllib

Figure notes

All figures generated on Windows (MSVC/AVX-512) from live benchmark runs:

Figure 1: hardcoded timing data from examples/README.md
Figure 2: full 201-point ECME LL trajectory from dax_regime_example (5,838 observations, 200 EM iterations); MOM reference from v3.6.0 run on same data
Figure 3: computed from ohare_wind_2015.csv (11,894 hourly observations) using fitted VonMises parameters from wind_direction_example

TODO before submission

Fill author name, ORCID, affiliation placeholders in paper.md
Set submission date
Review paper text for accuracy and completeness
Check JOSS whedon/editorialbot requirements (JOSS checklist)

Warp conversation

Co-Authored-By: Oz oz-agent@warp.dev

OldCrow and others added 22 commits

May 17, 2026 23:57


          feat: add JOSS paper draft, figures, and data prep scripts

1761b46

- paper/paper.md: JOSS submission draft (v3.7.0)
- paper/paper.bib: 16-entry bibliography
- paper/figures/generate_figures.py: generates all three paper figures
  (speedup comparison, ECME convergence, VonMises boundary analysis)
- paper/figures/figure{1,2,3}_{speedup,convergence,wind_boundary}.{pdf,png}
- CONTRIBUTING.md: build/test instructions for macOS and Windows
- scripts/prepare_dax_data.py: Python equivalent of prepare_dax_data.R
- scripts/prepare_wind_data.py: Python equivalent of prepare_wind_data.R

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(paper): add pylibhmm reference; insert author name

edccc17

- paper.md: add Python bindings paragraph in Software Design section
- paper.md: replace [Author Name] placeholder with Gary Wolfman
- paper.bib: add @software{pylibhmm} entry (v0.4.0)

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(paper): fill ORCID and affiliation

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(paper): add provenance, acknowledgements, and JAHMM citation

a4e4697

- Summary: note JAHMM port origin and MASc research context
- Acknowledgements: RMC Computer Security Lab, JAHMM, Kevin Forest (YASWIN)
- paper.bib: add @software{JAHMM} entry (attribution to confirm)

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(paper): confirm JAHMM author attribution (Jean-Marc Francois)

89de358

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(paper): add figure references at appropriate positions in text

6eb8d2a

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): add first draft of arXiv technical paper

3168f68

- paper/arxiv/libhmm_arxiv.tex: full draft (~772 lines)
- paper/arxiv/libhmm_arxiv.bib: 20-entry bibliography

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): fill synthetic benchmarks and S&P 500 sections with real…

f3f4b56

… data

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs: correct hardware description Ryzen 9 -> Ryzen 7 throughout

21fe5ca

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs: regenerate all figures with corrected Ryzen 7 hardware description

ee49585

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): add GHMM macOS benchmark result to Section 6.1

07e7f11

GHMM 0.9-rc3 vs libhmm on macOS Catalina (Intel i7-3820QM, AppleClang, AVX):
  GHMM: ~20,545 obs/ms vs libhmm: ~2,235 obs/ms — 9.2x ratio
  Same hardware, same compiler, same benchmark harness.
  Larger ratio vs HMMLib (9.2x vs 3.2x) explained by weaker SIMD tier
  on macOS (AVX-only vs AVX-512 on Ryzen 7).

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): explain why GHMM was benchmarked on macOS (build require…

55b44e8

…ments)

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): reorder benchmarks (DAX+S&P adjacent), fix hardware note…

06669c6

… accuracy

- Section order: Elk → DAX → S&P 500 → Earthquake → Wind
- Timing table reordered to match
- Hardware note: correctly identifies DAX/S&P/Wind as Windows Ryzen 7,
  Elk/Earthquake as macOS Catalina (marked with dag, pending re-run)
- Earthquake: note data is embedded (no CSV needed for Windows re-run)
- AMD Ryzen 7 7745 specified by model in hardware note

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs: update author to 'Gary Wolfman, P.Eng.' / Independent Researcher

1d12cf4

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs: correct elk/earthquake benchmarks with Windows Ryzen 7 measurem…

…ents

- Elk: obs count 14,394 -> 725 (moveHMM::elk_data bundled subset)
  Travelling state params corrected (pre-fix local-optimum replaced):
  libhmm step mean 1741 -> 3189 m, SD 1519 -> 4392 m, kappa 0.782 -> 0.204
  moveHMM reference 1751 -> 3247 m, SD 1527 -> 4394 m, kappa 0.780 -> 0.208
  Wall time 99 ms -> 55 ms (libhmm), ~2000 ms -> ~1270 ms (moveHMM, same machine)
  Speedup ~20x -> ~23x (same-machine comparison)
- Earthquake: wall time 4 ms -> 2 ms (warm), speedup ~5x -> ~10x
- Remove dagger notation; all libhmm timings now on Windows Ryzen 7 / AVX-512
- Regenerate figure1_speedup with corrected data
- Old elk travelling-state numbers were from a pre-fix run (VonMises kappa
  convergence bug, fixed in v3.7.0); new numbers match moveHMM reference
  within 2% on the same dataset

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): update GHMM benchmark — Ventura/Kaby Lake/AVX2, 9.2x -> …

d81d54c

…4.9x

Fresh benchmark on Intel Core i7-7820HQ (Kaby Lake), macOS 13.7.8 Ventura,
AppleClang, march=native (AVX2+FMA). Same hardware for both libraries.

GHMM 0.9-rc3 vs libhmm v3.7.0 (Dishonest Casino + Weather, T in 10^3..10^6):
  libhmm:  4,277 obs/ms average
  GHMM:   20,775 obs/ms average
  Ratio:   4.86x (GHMM faster)

All log-likelihoods match to machine precision.

Replaces macOS Catalina/Ivy Bridge/AVX result (9.2x) with same-hardware
AVX2 measurement. Removes speculation about Ryzen 7 ratio. Attributes
GHMM advantage to flat C array layout + scaled FB, consistent with the
HMMLib explanation already in the paper.

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs: update DAX benchmark to fHMM 1.4.3 same-machine comparison; not…

2241ad9

…e version improvement

- libhmm DAX timing: 2 s -> 1.1 s (Ryzen 7 / AVX-512 / Windows)
- fHMM: 1360 s (v1.2.0, Ivy Bridge) -> 5.5 s / 13 s (v1.4.3, same machine, 1 run / 10 restarts)
- Speedup: ~680x -> ~5x / ~12x (same-machine, current versions)
- Both papers: add version-comparison note acknowledging fHMM improvements v1.2.0 -> v1.4.3
- JOSS: fix 'on the same hardware' error; fill date (26 May 2026); update Figure 1 caption
- arXiv: update Table 2 and hardware note; add wall-time paragraph to Section 6.2
- Regenerate figure1_speedup with corrected data

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): recompile PDF after DAX benchmark update (16 pp, no errors)

12a2a0e

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(paper): fix JOSS submission requirements; add paper-draft CI

cfcd547

- Author metadata: name -> given-names/surname/suffix (avoids parser issues)
- AI disclosure: add model/platform version info per updated JOSS AI policy
- Acknowledgements: add no-funding statement; restore JAHMM sentence; suggested
- Add .github/workflows/paper-draft.yml: builds JOSS draft PDF via
  openjournals/inara on every push to joss-paper touching paper/ files

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): prepare for arXiv submission

cd32828

- Flatten figure paths: copy figure2/figure3 into arxiv/ dir, remove ../figures/ prefix
- Add bookmarks=false to hyperref (guards against PDF conversion failure)
- Add typeout 4-pass hint after \end{document}

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs(arxiv): fix Table 1 overflow, float placement, caption width; ad…

bb49ddb

…d array/placeins/caption packages

Co-Authored-By: Oz <oz-agent@warp.dev>


          docs: correct two factual errors found in QC scan

a752a7e

- std::cyl_bessel_i: unavailable on ALL Apple platforms (not just Catalina)
  Apple libc++ has not implemented C++17 special math functions on any
  macOS version (absent as of Xcode 15 / macOS 14). Fixed in both papers.
  Verified against include/libhmm/math/bessel.h and CMakeLists.txt.
- JOSS wind boundary count: 990 -> 730 hours in 330-360 bin, matching
  the arXiv table, generate_figures.py annotation, and analysis output.

Co-Authored-By: Oz <oz-agent@warp.dev>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet