-
Notifications
You must be signed in to change notification settings - Fork 0
Add likelihood ratio (LR) calculation endpoints for striation and impression marks #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
71d764c
Dummy LR System
laurensWe 03d527b
Integration tests
laurensWe 6041d11
merge conflicts
laurensWe 52dcd6c
LR endpoint
laurensWe be5e8fa
merge conflicts
laurensWe 9f4b71c
n_cells for lr_calculate and added pickle
laurensWe 6c33900
move to scratch-core
laurensWe 523fea8
fix pyright issues
laurensWe 2cb4e41
update dependencies
laurensWe 831b1ba
Feedback simone halfway
laurensWe 2fe41ea
mege conflict
SimoneAriens 8a342e3
update lr endpoints
SimoneAriens d954319
cleanup
SimoneAriens 97a44e7
merge conflicts
SimoneAriens 21f6a56
fix pydantic fastapi error
laurensWe eb05b45
decouple schema from the dataclass of metrics
laurensWe 53238ed
Merge conflicts
laurensWe 573b95a
Cleanup tests And make use of MarkMetadata's
laurensWe 24e44ec
fix order of dirs
laurensWe a3e4f13
property instead of cached
laurensWe aa4aba1
Transform CCF Scores
laurensWe 87d0b67
Transform ccfs to logodds
laurensWe 0eb6c8d
Further review
laurensWe 4948c65
Simone feedback
laurensWe f70a1b3
pipeline fixes
laurensWe 29629b4
Feedback Peter
laurensWe 4e89424
minor feedback
laurensWe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
127 changes: 127 additions & 0 deletions
127
packages/scratch-core/src/conversion/likelihood_ratio.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| import pickle | ||
| from pathlib import Path | ||
| from typing import Self | ||
|
|
||
| import numpy as np | ||
| from lir.data.models import FeatureData, LLRData | ||
| from lir.lrsystems import LRSystem | ||
| from pydantic import model_validator | ||
|
|
||
| from container_models.base import ConfigBaseModel | ||
|
|
||
|
|
||
| class ModelSpecs(ConfigBaseModel): | ||
| """Training data and model types for KM and KNM populations used to calibrate an LR system. | ||
|
|
||
| Holds scores and LLR data for two populations: known matches (KM) and | ||
| known non-matches (KNM), along with the model name used to produce each. | ||
|
|
||
| :param km_model: Identifier of the model used for KM scores. | ||
| :param km_scores: Similarity scores for the KM population. | ||
| :param km_llrs: Log-likelihood ratios for the KM population. | ||
| :param km_llr_intervals: LLR confidence intervals for the KM population, shape (n, 2), or None. | ||
| :param knm_model: Identifier of the model used for KNM scores. | ||
| :param knm_scores: Similarity scores for the KNM population. | ||
| :param knm_llrs: Log-likelihood ratios for the KNM population. | ||
| :param knm_llr_intervals: LLR confidence intervals for the KNM population, shape (n, 2), or None. | ||
| """ | ||
|
|
||
| km_model: str | ||
| km_scores: np.ndarray | ||
| km_llrs: np.ndarray | ||
| km_llr_intervals: np.ndarray | None | ||
| knm_model: str | ||
| knm_scores: np.ndarray | ||
| knm_llrs: np.ndarray | ||
| knm_llr_intervals: np.ndarray | None | ||
|
|
||
| @model_validator(mode="after") | ||
| def _validate_matching_lengths(self) -> Self: | ||
| if len(self.km_scores) != len(self.km_llrs): | ||
| raise ValueError("km_scores and km_lrs must have the same length") | ||
| if len(self.knm_scores) != len(self.knm_llrs): | ||
| raise ValueError("knm_scores and knm_lrs must have the same length") | ||
| return self | ||
|
|
||
| @property | ||
| def scores(self) -> np.ndarray: | ||
| """Concatenated KM and KNM similarity scores.""" | ||
| return np.concatenate([self.km_scores, self.knm_scores]) | ||
|
|
||
| @property | ||
| def llrs(self) -> np.ndarray: | ||
| """Concatenated KM and KNM log-likelihood ratios.""" | ||
| return np.concatenate([self.km_llrs, self.knm_llrs]) | ||
|
|
||
| @property | ||
| def llr_intervals(self) -> np.ndarray: | ||
| """Concatenated KM and KNM LLR intervals, shape (n, 2).""" | ||
| if self.km_llr_intervals is None or self.knm_llr_intervals is None: | ||
| raise ValueError("Only models with llr_intervals can be used") | ||
| return np.concatenate([self.km_llr_intervals, self.knm_llr_intervals], axis=0) | ||
|
|
||
| @property | ||
| def labels(self) -> np.ndarray: | ||
| """Boolean labels: True for KM samples, False for KNM samples.""" | ||
| return np.concatenate( | ||
| [ | ||
| np.ones(len(self.km_scores), dtype=bool), | ||
| np.zeros(len(self.knm_scores), dtype=bool), | ||
| ] | ||
| ) | ||
|
|
||
|
|
||
| def get_lr_system( | ||
| lr_system_path: Path, | ||
| ) -> LRSystem: # TODO replace with lr_module_scratch | ||
| """Load an LR system from a pickle file.""" | ||
| with lr_system_path.open("rb") as f: | ||
| return pickle.load(f) # noqa: S301 | ||
|
|
||
|
|
||
| def get_reference_data( | ||
| lr_system_path: Path, | ||
| ) -> ModelSpecs: # TODO replace with lr_module_scratch | ||
| """Return hardcoded dummy reference data (KM/KNM scores and LLRs). | ||
|
|
||
| .. note:: | ||
| This is a placeholder. The ``lr_system_path`` argument is accepted for | ||
| API compatibility but is not used; real reference data will be derived | ||
| from the LR system once ``lr_module_scratch`` is integrated. | ||
| """ | ||
| _ = get_lr_system(lr_system_path) | ||
| return ModelSpecs( | ||
| km_model="random", | ||
| km_scores=np.array([0.9, 0.85, 0.78]), | ||
| km_llrs=np.array([2.1, 1.8, 1.5]), | ||
| km_llr_intervals=np.array([[1.9, 2.3], [1.6, 2.0], [1.3, 1.7]]), | ||
| knm_model="random", | ||
| knm_scores=np.array([0.3, 0.25, 0.15, 0.1]), | ||
| knm_llrs=np.array([-1.2, -0.9, -1.5, -2.0]), | ||
| knm_llr_intervals=np.array( | ||
| [[-1.4, -1.0], [-1.1, -0.7], [-1.7, -1.3], [-2.2, -1.8]] | ||
| ), | ||
| ) | ||
|
|
||
|
|
||
| def calculate_lr_striation(lr_system: LRSystem, score: float) -> LLRData: | ||
| """ | ||
| Calculate likelihood ratio for striation marks. | ||
|
|
||
| :param lr_system: Trained LR system to apply. | ||
| :param score: Correlation coefficient between two striation profiles. | ||
| """ | ||
| log10_lr_data = lr_system.apply(FeatureData(features=np.array([[score]]))) | ||
| return log10_lr_data | ||
|
|
||
|
|
||
| def calculate_lr_impression(lr_system: LRSystem, score: int, n_cells: int) -> LLRData: | ||
| """ | ||
| Calculate likelihood ratio for impression marks. | ||
|
|
||
| :param lr_system: Trained LR system to apply. | ||
| :param score: CMC count (number of matching cells). | ||
| :param n_cells: Total number of cells analyzed. | ||
| """ | ||
| result = lr_system.apply(FeatureData(features=np.array([[score, n_cells]]))) | ||
| return result | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| import numpy as np | ||
| from lir.data.models import FeatureData, LLRData, InstanceData | ||
| from lir.lrsystems.lrsystems import LRSystem | ||
|
|
||
|
|
||
| class RandomLRSystem(LRSystem): | ||
| """LRSystem that returns seeded random LLR values, for use in tests.""" | ||
|
|
||
| def __init__(self) -> None: | ||
| pass | ||
|
|
||
| def apply(self, instances: InstanceData) -> LLRData: | ||
| """Return seeded random LLR values, one per input instance.""" | ||
| assert isinstance(instances, FeatureData) | ||
| n = len(instances.features) | ||
| rng = np.random.default_rng(seed=42) | ||
| return LLRData(features=rng.random(n)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.