Skip to content

geodavic/sugarpy

Repository files navigation

sugarpy

sugarpy

A library, API and web app for using NLP to perform language sample analysis using the SUGAR framework. This is primarily meant as a tool for Speech Language Pathologists (SLPs) to expedite the often time-consuming process of computing SUGAR metrics by hand.

Try it out

API Documentation

The sugarpy python library is the core driver of the tool. It uses classical NLP (spacy) to perform rule-based and token based analysis on the input language samples.

Install

To install the python library, use pip:

pip install sugar-python

You can also clone this repo and install from source using uv:

uv sync

Use

The main operation in sugarpy is get_metrics:

from sugarpy import get_metrics

language_samples = [
  "My last name is Y and my middle name is Z",
  "And you can take this bag off and wear it",
  "But it’s a little small",
  "Yea mine didn’t come with one that matches",
  "It didn’t come with this; it came with these markers"
]

metrics = get_metrics(language_samples)

The result is an object with the four SUGAR metrics as attributes: mlu (mean length utterance), cps (clauses per sentence), wps (words per sentence), and tnw (total number of words).

One can also check whether the resulting metrics are within established averages. The mean and standard deviation for each score depends on the subject's age, and they are found in sugarpy/norms.py. To retrieve them, use get_norms:

from sugarpy import get_norms

age_y = 4 #Age in years
age_m = 11 #Age in months
norms = get_norms(age_y,age_m, "mlu") # Returns {'min_age': 108, 'max_age': 131, 'mean_score': 9.61, 'sd': 1.52}

The min_age and max_age are measured in months, and are the age range for which the mean_score and sd (standard deviation) apply. In the above example, children between the ages of 108 months and 131 months have a mean mlu score of 9.61, with a standard deviation of 1.52.

All of this data is taken directly from the SUGAR language website.

Configuration

The library is configured to use the spacy model en_core_web_lg by default. This is a CPU-performant token classification model. You can substitute a different spacy model, such as a transformer-based one like en_core_web_trf, by passing model=<your_model_name> to the get_metrics function:

from sugarpy import get_metrics

...

metrics = get_metrics(language_samples, model="en_core_web_trf")

You will need to ensure that any model you pass has already been installed on your machine.

License

This project is licensed under the MIT license. Please see LICENSE for details.