=======================
Bridgewater coding challenge #1
Given an arbitrary text document written in English, write a program that will generate a concordance, i.e. an alphabetical list of all word occurrences, labeled with word frequencies. Bonus: label each word with the sentence numbers in which each occurrence appeared.
$ git clone https://github.com/gabe0912/Concordance.git
$ cd Concordance
$ java -jar out/artifacts/concordance_jar/concordance.jar inputDocument.txt
https://github.com/gabe0912/Concordance/tree/master/src/main/java
- Words are connonicallized to avoid comparator discrepancies. Example: Given --> given; Bonus: --> bonus
- Compound words should be counted as one. Examples: e.g., high-touch, daughter-in-law
- Words do not include punctuation - excluding specific abbreviations. Examples: i.e. --> i.e.; English, --> English ; etc... --> etc
- Sentences are delimited with punctuations only. Examples:
.!?
- Only supports one document at a time
- Some abbreviations count as a sencence delimiter: "i.e. Bridgewater Assoc. becomes two sensences"
- Unexpected punctuations will not be stripped