https://github.com/paracrawl/cirrus-scripts/blob/61765e3bb1da3d580bc72f48b34634cf8c79ea45/09.clean#L43 This counts all words, in all columns in including metadata. We report this number as _source words_. So it should only be counting column 3, really.
cirrus-scripts/09.clean
Line 43 in 61765e3
This counts all words, in all columns in including metadata. We report this number as source words. So it should only be counting column 3, really.