feat: render Solr highlights as bold in instrument list #466

PouyaMohseni · 2025-12-17T19:34:04Z

Added apply_highlights method to InstrumentNameSet to apply Solr highlight snippets to each instrument name.
Replaced matching parts of instrument names with highlighted  text from Solr.
Updated SolrInstrument to store highlights and pass them to InstrumentNameSet.
Updated template to render instrument names with applied highlights.

resolves #343

yinanazhou · 2025-12-19T21:17:38Z

web-app/django/VIM/apps/instruments/views/instrument_list.py

-        ]
+
+        # Extract highlight info
+        highlights = getattr(solr_response, "highlighting", {})


I personally prefer to rename highlights to highlight_info or highlight_data.

yinanazhou · 2025-12-19T22:21:08Z

web-app/django/VIM/apps/instruments/views/instrument_list.py

+            "hl.fl": "text",
+            "hl.simple.pre": "<b>",
+            "hl.simple.post": "</b>",
+            "hl.snippets": 1000,


1000 is too much...50 should be sufficient for most instruments?

Some instruments like guitar have labels in 200 languages, adding aliases it can increase even more (many of them share a common ar). Although for one language we might not have more than 10 labels, we are not sure if our highlight query hits all of the labels in the selected language.

As an example, 50 would not be enough for a query of ar in guitar as it hits more then 100 times.

In this case, switching to token-level highlights makes more sense.

yinanazhou · 2025-12-19T22:53:38Z

web-app/django/VIM/apps/instruments/views/instrument_list.py

+            highlight_map = {
+                snippet.replace("<b>", "").replace("</b>", ""): snippet
+                for snippet in hl_snippets
+            }


This maps the whole snippets, which can be unreliable. For example, instead of creating a map

hl_snippets = [ "Stradivarius Violin", "Violin Concerto in D Major" ]

It's better to use

"Violin": "Violin"

I do not understand why this would be unreliable. Aren't both highlight and instrumentname_set solr outputs? Also, if we want to reduce these hits ("Stradivarius Violin", "Violin Concerto in D Major" ) to (Violin), isn't it better to define a new field in solr storing tokenized?

Yes, both values come from Solr. My concern is that full highlight snippets are context-dependent and may include surrounding text, which makes them brittle as mapping keys. Mapping at the matched-term level (e.g. "Violin" → "Violin") is more robust and less coupled to Solr’s snippet construction. And yes, tokenizers would be a better solution.

yinanazhou · 2025-12-20T01:40:38Z

web-app/django/VIM/apps/instruments/views/instrument_list.py

+            highlight_map = {
+                snippet.replace("<b>", "").replace("</b>", ""): snippet
+                for snippet in hl_snippets
+            }


Yes, both values come from Solr. My concern is that full highlight snippets are context-dependent and may include surrounding text, which makes them brittle as mapping keys. Mapping at the matched-term level (e.g. "Violin" → "Violin") is more robust and less coupled to Solr’s snippet construction. And yes, tokenizers would be a better solution.

yinanazhou · 2025-12-20T01:43:10Z

web-app/django/VIM/apps/instruments/views/instrument_list.py

+            "hl.fl": "text",
+            "hl.simple.pre": "<b>",
+            "hl.simple.post": "</b>",
+            "hl.snippets": 1000,


In this case, switching to token-level highlights makes more sense.

yinanazhou · 2025-12-20T01:44:40Z

web-app/django/VIM/apps/instruments/views/instrument_list.py

+        for name in self._names:
+            for orig, highlighted in highlight_dict.items():
+                if orig and orig.lower() in name.lower():
+                    name = re.sub(
+                        re.escape(orig), highlighted, name, flags=re.IGNORECASE
+                    )
+            updated_names.append(name)


Also realize this is O(nm)...not great. And this doesn't highlight harp in harpsichord

I noticed it would be a bad idea to create a different tokenization for highlighted field in Solr indexing as this field would need to replicate our query structure to able to highlight as intended i(e.g. containing Ngram). Maybe it is better to keep the highlighted field as our query field but apply deduplication/tokenization as post-processing for a more robust highlighting.

- Added `apply_highlights` method to `InstrumentNameSet` to apply Solr highlight snippets to each instrument name. - Replaced matching parts of instrument names with highlighted `` text from Solr. - Updated `SolrInstrument` to store highlights and pass them to `InstrumentNameSet`. resolves #343

PouyaMohseni requested a review from yinanazhou December 17, 2025 19:34

yinanazhou requested changes Dec 19, 2025

View reviewed changes

yinanazhou reviewed Dec 20, 2025

View reviewed changes

PouyaMohseni added 2 commits December 21, 2025 11:20

feat: update template to render instrument names with applied highlights

f6db6fe

PouyaMohseni force-pushed the highlight-matched-test branch from bbe332f to f6db6fe Compare December 21, 2025 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: render Solr highlights as bold in instrument list #466

feat: render Solr highlights as bold in instrument list #466

Uh oh!

PouyaMohseni commented Dec 17, 2025

Uh oh!

yinanazhou Dec 19, 2025

Uh oh!

yinanazhou Dec 19, 2025

Uh oh!

PouyaMohseni Dec 20, 2025

Uh oh!

yinanazhou Dec 20, 2025

Uh oh!

yinanazhou Dec 19, 2025

Uh oh!

PouyaMohseni Dec 20, 2025

Uh oh!

yinanazhou Dec 20, 2025

Uh oh!

yinanazhou Dec 20, 2025

Uh oh!

yinanazhou Dec 20, 2025

Uh oh!

yinanazhou Dec 20, 2025

Uh oh!

PouyaMohseni Dec 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: render Solr highlights as bold in instrument list #466

Are you sure you want to change the base?

feat: render Solr highlights as bold in instrument list #466

Uh oh!

Conversation

PouyaMohseni commented Dec 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants