This repository contains csv files containing text reuse data generated by passim for OpenITI release 2025.1.9. Each file represents all the text reuse detected by passim between a pair of OpenITI texts. We call the first text in the pair "book1" and the second "book2". The "light" in the title of this repository refers to the fact that the files in the repository contain only part of the data generated by passim; most notably, it omits the actual alignment strings. The main use for these light files is in visualisations, where the actual text of the alignments is not needed. The full text reuse data set, including the alignment strings, is too large for GitHub.
Each folder contains all the pairwise text reuse data for a single text. The name of the folder is the OpenITI text version ID, including the language code, and the file extension of the text version in the OpenITI corpus. Each file inside the folder represents all the text reuse detected by passim between that folder's text version and another text version. The filenames consist of two parts, separated by an underscore:
- the OpenITI text version ID of the folder's main text version ("book 1")
- the OpenITI text version ID of the second text version ("book 2")
- seq1: number of the milestone in which the alignment was found in book 1
- seq2: number of the milestone in which the alignment was found in book 2
- b1: character offset of the beginning of the alignment in the milestone in book 1
- b2: character offset of the beginning of the alignment in the milestone in book 2
- bw1: Arabic word token offset of the beginning of the alignment in the milestone in book 1
- bw2: Arabic word token offset of the beginning of the alignment in the milestone in book 2
- e1: character offset of the end of the alignment in the milestone in book 1
- e2: character offset of the end of the alignment in the milestone in book 2
- ew1: Arabic word token offset of the end of the alignment in the milestone in book 1
- ew2: Arabic word token offset of the end of the alignment in the milestone in book 2