Voynich Manuscript — Candidate Decipherment (V5)

Author: Kameldip Singh Basra · kameldipbasra@gmail.com
Current paper: paper_v5.md · DOI 10.5281/zenodo.20138182
Concept DOI (always resolves to latest): 10.5281/zenodo.18598229

About this project

I am a software engineer and AI architect, not a historian, linguist, or medical historian. My background is in machine learning, NLP, and computational systems. I came to the Voynich Manuscript the same way I approach any pattern-recognition problem: build a decoder, measure it against external corpora, stress-test it against hostile alternatives, and document everything so someone else can reproduce or refute it.

This repository is a live research log, not a polished academic paper uploaded after the fact. I started in February 2026 and have been publishing each stage of the work as it happened — hypothesis, test, revision, new finding, repeat. The five versions in this repo represent that evolution in real time. V1 introduced the core decipherment; V5 is five months of refinement, adversarial testing, and evidence accumulation on top of that. Each version is preserved unchanged so the progression is auditable.

What I used: Python for all analysis and testing, SQLite for the corpus database, and Anthropic Claude (Sonnet 4.6 / Claude Code) as a collaborative AI assistant throughout — for coding, statistical testing, cross-referencing medical texts, and drafting. I am naming this explicitly because it is part of the method and it would be dishonest not to. The AI did not generate the hypothesis or the evidence; it helped me execute tests quickly, catch errors in my reasoning, and work through large corpora I would not have been able to process manually alone.

What I am not claiming: I am not a Sinhala linguist. I am not a Sri Lankan medical historian. I am not a trained botanist. The paper is explicit about each of these gaps and identifies the specific specialists whose review it needs. The computational and structural evidence is solid; the linguistic and botanical interpretation layers need expert review before the identification should be treated as confirmed.

The hypothesis in one sentence

The Voynich Manuscript (Beinecke MS 408, carbon-dated 1404–1438 CE) is a 15th-century Sri Lankan Elu-Sinhala pharmaceutical text — a working pharmacist's compressed reference recording Ayurvedic drug preparations in a bespoke phonetic script.

Confidence summary:

Claim	Confidence
South Asian pharmaceutical text	~97%
Sri Lankan provenance	~90%
Sinhala/Elu specifically (vs Pali/Sanskrit sister)	~90%
Pre-12c Elu chronolect	~83%
Working-pharmacist register	~98%
P(overall identification wrong)	~7–10%

Strongest evidence (non-circular, reproduced)

Rival-language tournament (V5): 27 control corpora tested across 11 tradition families. Sārārtha Saṃgrahaya (Sri Lankan Sinhala medical text) scores 66.67% repeated locked-anchor overlap. All Unani, Tamil/Siddha, and European controls score 0–15%. No tested rival tradition explains the same evidence bundle.
Non-circular structure tests: Section classifier 61% accuracy vs 32.3% chance baseline (p=0.0099). KALPANA preparation-marker enrichment OR=8.11 (p=2×10⁻²²²). q-/ch- phonological allomorph distribution OR=32.81. These tests do not depend on any English glosses.
Parallel-recipe template matches: f75r L38 decoded output q-keda q-keda q-keda q-keda q-keda lada is structurally identical to the Bodleian Sinhala recipe enumeration pattern (Bodleian MS Sinh.a.2(R)).
Single-token chapter anchor: pesaca (piśāca = evil spirit) is the sole occurrence of that word in 36,633 corpus tokens and opens f113r line 1 — the direct Elu reflex of the diagnostic term in AH Bhūta-pratishedha.
External state-marker grounding: leda (disease) attested 60× in Somadasa's Wellcome manuscript catalogue (33 distinct disease-stems); seda (fomentation) matches Caraka's 9 sveda compounds; 5 of 21 VPNS state-markers independently grounded.

Version history

Version	Date	DOI	What changed
V1	Feb 2026	—	Initial decipherment hypothesis, primary statistical tests
V2	2026-05-04	10.5281/zenodo.20023733	V17 decoder, Bowern engagement, hostile-reviewer test, falsification probes
V3	2026-05-07	10.5281/zenodo.20072618	Full corpus expansion, VPNS 21 states, 25 BM formula clusters, COSMO architecture
V4	2026-05-09	10.5281/zenodo.20098162	V21 meaning corrections, 81-folio plant table, 23-chapter BM/AH mapping, blind botanical review, Team B suite
V5	2026-05-12	10.5281/zenodo.20138182	27-corpus rival tournament, BALNEO recharacterised, pharmacopoeia architecture, pesaca anchor, senna backbone, iron-eye cross-reference, canonical plant IDs

All versions are preserved in this repository and on Zenodo. Read any version to see the state of the evidence at that point.

Repository layout

paper_v5.md                    ← current paper (1,135 lines)
paper_v4.md / v3.md / v2.md    ← preserved earlier versions
canonical_plant_test.py        ← DB-level plant ID verification (17 checks)
run_all.sh                     ← full validation gate (24 tests, all pass)

scripts/                       ← decoders, statistical tests, corpus analysis
translation/
  voynich_v20_corpus.db        ← canonical corpus DB (36,633 tokens, V17+V21)
supplementary/                 ← extended analysis writeups
  PHARMACOPOEIA_STRUCTURE_ANALYSIS.md
  CANONICAL_PLANT_IDENTIFICATIONS.md
  RECIPE_AH_MAPPING.md
  PAPER_ADDITION_NOTES_2026-05-12.md
  … (60+ files)
teamb/
  scripts/                     ← Team B validation scripts (16 tests)
  reports/                     ← audit reports, rival scorecard, recipe order audit
  outputs/                     ← v5_close_gap_package.json (evidence lockbox)
references/medical_corpus/     ← comparison corpora (Sarartha, BM, Caraka, AH, …)
results/                       ← validation outputs from run_all.sh

Reproducing the key tests

All analysis runs on Python 3.8+ with no unusual dependencies (sqlite3, scipy, numpy). Clone the repo and:

# Full validation gate — should produce: 24 passed, 0 failed
./run_all.sh

# DB-level canonical plant ID checks
python3 canonical_plant_test.py

# Rival-language tournament (27 corpora)
python3 teamb/scripts/genre_anchor_control_tests.py

# Non-circular structure tests
python3 teamb/scripts/statistical_stress_tests.py

The frozen corpus DB is translation/voynich_v20_corpus.db (SHA256 in teamb/outputs/v5_close_gap_package.json). All test outputs are in teamb/outputs/ and results/.

Honest limitations

No Sinhala/Elu specialist review yet. The linguistic interpretation needs a philologist. Outreach materials are in supplementary/SPECIALIST_OUTREACH_PACKAGE.md.
No trained botanist review yet. Plant identifications are candidate-level. Dossier at v15_work/BOTANIST_DOSSIER.md.
Sister-language question remains open. Pali and Sinhala/Elu are closely related; the corpus discriminates well statistically but specialist review is the definitive test.
Three rival corpora not yet acquired. Arabic/Persian aqrabadhin prose, Tibetan Sowa Rigpa formulary text, and historical Tamil/Siddha prose are missing controls. A definitive rival-family closure claim is not made pending those.
Plant section species IDs are candidates. One-picture-one-label species matching failed as a general model. Direct text-token IDs (where a decoded word names the plant) are strong; visual-only species IDs remain hypotheses.

Citation

@misc{basra2026voynich,
  title  = {A Candidate Decipherment of the Voynich Manuscript:
             Evidence for a Spoken Elu-Sinhala Pharmaceutical Register (V5)},
  author = {Basra, Kameldip Singh},
  year   = {2026},
  month  = {May},
  doi    = {10.5281/zenodo.20138182},
  url    = {https://doi.org/10.5281/zenodo.20138182},
  note   = {Concept DOI 10.5281/zenodo.18598229 always resolves to latest version}
}

Acknowledgments

Beinecke Rare Book and Manuscript Library for digital access to MS 408. The EVA transcription community (Stolfi, Takahashi, and contributors) for the foundational transcription without which none of this analysis would be possible. Daniel Gaskell for open-sourcing the random-forest Voynich classifier used in §4.13. The Buddhist medical traditions of Sri Lanka, whose pharmacopoeial literature forms the comparative backbone of this work. Anthropic Claude (Sonnet 4.6 / Claude Code) was used throughout as a collaborative AI assistant for coding, statistical testing, corpus cross-referencing, and drafting — this is stated explicitly as a matter of transparency, not as a limitation.

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.github/workflows		.github/workflows
archive		archive
data		data
output		output
publication		publication
references		references
results		results
reviewer_packs		reviewer_packs
scripts		scripts
supplementary		supplementary
teamb		teamb
teamb_dictionary_attack_audit_20260516		teamb_dictionary_attack_audit_20260516
teamb_rerun_d32bc5e_20260515		teamb_rerun_d32bc5e_20260515
translation		translation
.dockerignore		.dockerignore
.gitignore		.gitignore
.zenodo.json		.zenodo.json
AUDIT_NOTES.md		AUDIT_NOTES.md
CITATION.cff		CITATION.cff
DECODED_VOCABULARY.md		DECODED_VOCABULARY.md
Dockerfile		Dockerfile
GROUNDING_DOCUMENT.md		GROUNDING_DOCUMENT.md
LICENSE		LICENSE
MANIFEST.md		MANIFEST.md
PAPER_V1_VS_V2_LAYOUT.md		PAPER_V1_VS_V2_LAYOUT.md
README.md		README.md
REPRODUCTION.md		REPRODUCTION.md
SESSION_NOTES_v14.md		SESSION_NOTES_v14.md
SESSION_NOTES_v8.md		SESSION_NOTES_v8.md
SESSION_NOTES_v9.md		SESSION_NOTES_v9.md
TYPOLOGICAL_JUSTIFICATION.md		TYPOLOGICAL_JUSTIFICATION.md
UNRESOLVED_ISSUES_FOR_REVIEW.md		UNRESOLVED_ISSUES_FOR_REVIEW.md
UPLOAD_INSTRUCTIONS.md		UPLOAD_INSTRUCTIONS.md
VALIDATION_LOG.md		VALIDATION_LOG.md
canonical_plant_test.py		canonical_plant_test.py
main.pdf		main.pdf
main.tex		main.tex
p_initial_test.py		p_initial_test.py
paper.md		paper.md
paper_framework.md		paper_framework.md
paper_v1_archived.md		paper_v1_archived.md
paper_v2.md		paper_v2.md
paper_v2.pdf		paper_v2.pdf
paper_v2.tex		paper_v2.tex
paper_v3.md		paper_v3.md
paper_v3.pdf		paper_v3.pdf
paper_v4.md		paper_v4.md
paper_v4.pdf		paper_v4.pdf
paper_v5.md		paper_v5.md
paper_v6.md		paper_v6.md
paper_v7.md		paper_v7.md
paper_v8.md		paper_v8.md
recipe_coherence_test.py		recipe_coherence_test.py
references.bib		references.bib
requirements.txt		requirements.txt
run_all.sh		run_all.sh
smoke_test.py		smoke_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voynich Manuscript — Candidate Decipherment (V5)

About this project

The hypothesis in one sentence

Strongest evidence (non-circular, reproduced)

Version history

Repository layout

Reproducing the key tests

Honest limitations

Citation

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voynich Manuscript — Candidate Decipherment (V5)

About this project

The hypothesis in one sentence

Strongest evidence (non-circular, reproduced)

Version history

Repository layout

Reproducing the key tests

Honest limitations

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages