Skip to content

Resolve UTF-8 encoding and IPC issues in Search plugin#859

Open
LakatosMark1 wants to merge 1 commit into
Ericsson:masterfrom
LakatosMark1:fix-850-clean
Open

Resolve UTF-8 encoding and IPC issues in Search plugin#859
LakatosMark1 wants to merge 1 commit into
Ericsson:masterfrom
LakatosMark1:fix-850-clean

Conversation

@LakatosMark1
Copy link
Copy Markdown
Collaborator

@LakatosMark1 LakatosMark1 commented May 6, 2026

Forces UTF-8 encoding in the C++ layer and rewrites the Java IOHelper to properly decode multi-byte UTF-8 characters.

Related to #850.

Forces UTF-8 encoding in the C++ layer and rewrites the Java IOHelper to properly decode multi-byte UTF-8 characters.
@mcserep mcserep changed the title Fix #850: Resolve UTF-8 encoding and IPC issues in Search plugin Resolve UTF-8 encoding and IPC issues in Search plugin May 12, 2026
@mcserep mcserep linked an issue May 12, 2026 that may be closed by this pull request
@mcserep mcserep added Kind: Bug ⚠️ Plugin: Search Issues related to the full-text search (Lucene) plugin. labels May 12, 2026
@mcserep mcserep removed a link to an issue May 12, 2026
@mcserep mcserep requested review from Copilot and mcserep May 12, 2026 09:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses character encoding problems in the Search plugin’s Java subprocesses by forcing UTF-8 at process launch and by fixing Java stream-to-string reading to correctly handle multi-byte UTF-8 characters.

Changes:

  • Add -Dfile.encoding=UTF-8 to the Java command lines used to launch the search service and indexer.
  • Rewrite IOHelper.readFullContent(...) to build strings via buffered char[] reads instead of byte-collecting.
  • Change file reading in Context to use an explicit UTF-8 InputStreamReader.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
plugins/search/service/include/service/serviceprocess.h Forces UTF-8 as the default Java process encoding for the search service subprocess.
plugins/search/indexer/src/indexerprocess.cpp Forces UTF-8 as the default Java process encoding for the indexer subprocess.
plugins/search/indexer/indexer-java/src/cc/search/indexer/util/IOHelper.java Fixes full-stream reading to avoid corrupting multi-byte characters.
plugins/search/indexer/indexer-java/src/cc/search/indexer/Context.java Switches file decoding to an explicit UTF-8 reader for indexing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 63 to 66
try (FileInputStream stream = new FileInputStream(file_)) {
String fileContent = IOHelper.readFullContent(
IOHelper.getReaderForInput(stream));
new InputStreamReader(stream, StandardCharsets.UTF_8));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Kind: Bug ⚠️ Plugin: Search Issues related to the full-text search (Lucene) plugin.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants