Skip to content

Latest commit

 

History

History
79 lines (57 loc) · 2.48 KB

File metadata and controls

79 lines (57 loc) · 2.48 KB

Journal Digital Corpus Reader

A web-based search interface for the Journal Digital Corpus - transcripts from Swedish historical newsreels (SF Veckorevy).

Features

  • Full-text search across ~6,800 transcript files
  • Fuzzy search for finding matches despite OCR/ASR errors
  • Filter by transcript type (speech/intertitle), collection, and year
  • Side-by-side viewer showing speech and intertitle transcripts with timestamps
  • Shareable URLs for bookmarking searches and specific videos
  • Client-side only - loads corpus directly from Zenodo, no backend required

Usage

Visit the hosted version at: https://[username].github.io/jdc_browser/

Or run locally:

git clone https://github.com/[username]/jdc_browser.git
cd jdc_browser
python3 -m http.server 8000
# Open http://localhost:8000

Deployment to GitHub Pages

  1. Push the repository to GitHub
  2. Go to Settings > Pages
  3. Set source to "Deploy from a branch" and select main / root
  4. The site will be available at https://[username].github.io/jdc_browser/

Data Source

The corpus is loaded directly from Zenodo at runtime (~13 MB download). It contains:

  • Speech transcripts: Automatic speech recognition via SweScribe
  • Intertitle transcripts: OCR from silent film text cards via stum

DOI: 10.5281/zenodo.15596191

Source repository: Modern36/journal_digital_corpus

Credits

Developed for the Modern Times 1936 research project at Lund University, Sweden. The project investigates what software "sees," "hears," and "perceives" when pattern recognition technologies such as 'AI' are applied to media historical sources. The project is funded by Riksbankens Jubileumsfond.

License

The Journal Digital Corpus is licensed under the CC-BY-NC 4.0 International license.

References

@article{aspenskog2025journal,
  title={Journal Digital Corpus: Swedish Newsreel Transcriptions},
  author={Aspenskog, Robert and Johansson, Mathias and Snickars, Pelle},
  journal={Journal of Open Humanities Data},
  volume={11},
  number={1},
  year={2025}
}