Skip to content

Conversation

@rjrudin
Copy link
Contributor

@rjrudin rjrudin commented Dec 10, 2025

Draft PR for now, just want to see the tests all pass.

@sonarqube-progress-marklogic
Copy link

@rjrudin rjrudin marked this pull request as ready for review December 17, 2025 19:34
Copilot AI review requested due to automatic review settings December 17, 2025 19:34
@rjrudin rjrudin merged commit 2cbfe70 into develop Dec 17, 2025
3 checks passed
@rjrudin rjrudin deleted the feature/default-chunks branch December 17, 2025 19:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the default behavior for chunking documents by changing the default value of WRITE_SPLITTER_SIDECAR_MAX_CHUNKS from 0 (all chunks in one sidecar document) to 1 (one chunk per sidecar document), aligning with best practices for embeddings where each chunk/embedding should be in a separate document. Test files are updated to explicitly set this option to 0 to maintain their existing test behavior.

  • Changed default chunks per sidecar document from 0 to 1
  • Updated all affected tests to explicitly use the previous default (0) to preserve test coverage
  • Added documentation explaining the version change

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test-app/build.gradle Added mavenLocal() repository for local testing
marklogic-spark-connector/src/main/java/com/marklogic/spark/core/splitter/ChunkAssemblerFactory.java Changed default value from 0 to 1 for max chunks per sidecar document
marklogic-spark-connector/src/main/java/com/marklogic/spark/Options.java Updated documentation to explain the default value change in version 3.0.0
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/splitter/SplitXmlDocumentTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/splitter/SplitTextDocumentTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/splitter/SplitJsonDocumentTest.java Updated test expectations and added explicit sidecar max chunks option where needed
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/embedding/AddEmbeddingsToXmlTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/embedding/AddEmbeddingsToJsonTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/document/WriteExtractedTextTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/classifier/ClassifyExtractedTextTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/classifier/AddClassificationToXmlTest.java Added explicit sidecar max chunks option to preserve test behavior
marklogic-spark-connector/src/test/java/com/marklogic/spark/writer/classifier/AddClassificationToJsonTest.java Added explicit sidecar max chunks option to preserve test behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants