[ENC] Align .txt file uploads across String Similarity, Phonotactic Probability, and Neighbourhood Density

Currently, SS, PP, and ND each allow .txt file uploads, but they have different behaviours and allow different inputs. Ideally, these would all be aligned.

Currently, the behaviour is:

1.  PhonProb		
Words in corpus, spelling:			calculates
Words in corpus, transcription: 		can’t do this — wants spelling
Words not in corpus, spelling: 		can’t do this — not in corpus
Words not in corpus, transcription: 	can’t do this — wants spelling and in corpus
Words not in corpus, both: 		        can’t do this, even though old docs said you could!

* Ideally, PCT would calculate PP regardless of whether spelling or transcription is provided, and if there are words not in the corpus, it would skip them (reporting them to the user and returning N/A), while still calculating the rest of the list.

2. ND		
Words in corpus, spelling:			calculates (must specify that file contains spelling)
Words in corpus, transcription: 		calculates (must specify that file contains trans)
Words not in corpus, spelling: 		calculates, giving NA for words not in corpus and telling you which they are
Words not in corpus, transcription: 	calculates for all, explaining that some words aren’t in corpus

* This one is currently the closest to the ideal solution for all!


3. String Sim		
Word pairs in corpus, spelling:		calculates	
Word pairs in corpus, transcription: 	gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are
Word pairs not in corpus, spelling: 	calculates, giving either result if it can or NA for words not in corpus, and tells you which they are
Word pairs not in corpus, transcription: gives NA for all, explaining that some words (all words) are not in corpus, and tells you which they are

* This behaviour is basically fine, but there's no principled reason why the algorithm couldn’t calculate SS for word pairs given in transcription, even if not in the corpus — this just might be problematic with phonological edit distance? But we could make it like ND and just grey out that option for that algorithm.

Sample files to test all of this can be found in:
~/Dropbox/Phonological_CorpusTools_Public/PCT_text_file_upload_tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENC] Align .txt file uploads across String Similarity, Phonotactic Probability, and Neighbourhood Density #782

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENC] Align .txt file uploads across String Similarity, Phonotactic Probability, and Neighbourhood Density #782

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions