Data processing and analysis of connected speech features in PPA
- Clone the repository
git clone -b main https://github.com/lingualab/ConnectedSpeech-UCSF
cd ConnectedSpeech-UCSF- Create a virtual environment
python3 -m venv --prompt ConnectedSpeech-UCSF venv
source venv/bin/activate- Install ConnectedSpeech-UCSF:
pip install -e .Participants completed the Picnic Scene from the Western Aphasia Battery. Five transcripts are available:
salt.slt: Original SALT transcript.salt.txt: SALT transcript with all SALT codes removed.manual.txt: SALT transcript with all SALT codes removed, and new lines removed.whisper.txt: Original Whiser transcript.whisperQC.txt: Whisper transcript manually checked, and disfluencies added.
- Updated data was sent from UCSF. We now use the data in
/data/brambati/dataset/ConnectedSpeech-UCSF/sourcedata/NEW.
csucsf_process_merge: merge select speech features output from speechmetryflow with phenotype information for further analysis.csucsf_describe_participants: calculate demographic statistics based on data from included participantscsucsf_analysis_wer: performs text analysis and compute Word Error Rates metriccsucsf_analysis_icc: compute intraclass correlation analysiscsucsf_classification: run binary classification for specified pairs of diagnoses, using linguistic features selected in thecsucsf_process_mergescript. Optionally include feature selection.csucsf_classification_stats: plot bar graphs comparing the classification performance of different transcription methods.