Skip to content

Conversation

@liseli
Copy link
Contributor

@liseli liseli commented Dec 12, 2025

Issue: Catalog search fails with a different input query because Solr special characters are not escaped.

The main goal of this task is to fix the search algorithm for preventing query parser errors and injection.

Ticket: ETT-1200

As part of this PR:

  • A thorough review of the Catalog search has been completed.
  • Unit test has been created.
  • Different solutions have been tested with the goal of finding the smallest change that would have the least impact on the application.
  • A confluence page with a detailed explanation of the issue has been written.

The search algorithm in production consist on:

  • Remove some special characters
  • Fix the input query; some input queries are rejected, and by default, the application makes the query *:*.
  • Validate
  • Tokenize
  • Create the Solr query
  • Some of the inputs that are valid and the production code fails: ~, \, table~~2, ~~~///

What changes have been implemented on the current PR?

  • Remove some special characters
  • Validate and reject invalid queries, and by default, the application makes the query *:*.
    • Refactoring the function validateInput adding additional rules to identify invalid queries before sending them to the Solr server.
    • All of these inputs are invalid: ~, \, table~~2, ~~~///
  • Tokenize
  • Escape special characters
    • Create a set of functions to escape the different syntax included when the q field is created.
    • Create a function to escape special characters when the fq field is created
  • Create the Solr query

How to test:

docker compose build
docker composer up -d

Next step:

  • Try testing the application to identify any related issues.
  • Take a moment to compare the production version with the current output to see what's different.
  • Also, consider testing the Catalog application locally using various input queries to ensure everything works smoothly.
  • The function lucene_escape should be replaced by lucene_escape_fq.

@liseli liseli requested review from aelkiss and moseshll December 15, 2025 19:26
Copy link
Contributor

@moseshll moseshll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as expected. See the original ticket for a follow-up about the display of this previously-broken facet now that we can select it without an error -- the quotes are missing when the facet is in the "Current Filters". May be out of scope and it's just cosmetic. APPROVE

EDIT: I just put a screen shot on the ticket

Copy link
Member

@aelkiss aelkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main question I have is if we should be doing something more general for escaping queries. In particular it doesn't seem like this will address the issues with characters like ~ and \. Given that these don't cause issues in ls, I think it's worth seeing if we can apply the more general escaping strategy.

@liseli liseli force-pushed the ETT-1200_facetSearchError branch from bc9f577 to 7f4f26d Compare January 13, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants