Added embedding benchmark along with new config file and plot updates #14

radoslavralev · 2025-12-08T09:00:22Z

Note

Adds optional cross-encoder reranking (top-k) across evaluation/benchmark pipelines, normalizes embeddings, hardens Redis index setup, and upgrades multi-model plotting with AUC bar charts.

Evaluation/Benchmark CLI
- Add --cross_encoder_model/--cross_encoder_models and --rerank_k flags to enable optional cross-encoder reranking.
- Benchmark iterates over retrievers and optional rerankers; output paths reflect reranker via _rerank_<model>.
Matching/Search
- run_matching/run_matching_redis: support top-k candidate retrieval and cross-encoder reranking; batch CE scoring and selection.
- Normalize query/cache vectors to float32 before Redis insert/search.
Embedding engine (NeuralEmbedding)
- Add top-k support across large/small dataset paths, including blockwise two-set search.
- Probe model to infer true embedding dim; extend APIs with k.
Redis vector index (RedisVectorIndex)
- Trust remote code; probe to set accurate embed_dim.
- Always recreate index (overwrite=True) to match current dims.
Plotting (scripts/plot_multiple_precision_vs_cache_hit_ratio.py)
- Add grouped color mapping (retriever + darker reranker variants).
- New 2-panel figure: precision–CHR curves + AUC comparison bar chart with theoretical baselines.
Metrics
- Threshold sweep uses fixed 200 steps and upper bound up to max similarity.
Scripts
- Add run_benchmark.sh example with new flags.

^{Written by Cursor Bugbot for commit 069650a. This will update automatically on new commits. Configure here.}

src/customer_analysis/embedding_interface.py

run_benchmark.sh

src/customer_analysis/embedding_interface.py

Copilot

Pull request overview

This PR adds optional cross-encoder reranking functionality to improve retrieval quality, implements top-k candidate retrieval in embedding matching, normalizes embeddings before Redis operations, enhances the Redis index handling with dimension probing, and improves visualization with dual-subplot charts. The changes span the core matching logic, benchmarking infrastructure, and result visualization.

Key Changes

Adds cross-encoder reranking with configurable top-k retrieval for both Redis-based and standard neural embedding matching
Implements top-k retrieval logic in blockwise embedding matching with support for variable k values
Normalizes embeddings to float32 and unit length before Redis operations, with dimension probing to handle models with incorrect config dimensions

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
`src/customer_analysis/query_engine.py`	Adds embedding dimension probing, enables `trust_remote_code`, and always overwrites Redis index on initialization
`src/customer_analysis/embedding_interface.py`	Implements top-k retrieval in blockwise matching methods with support for variable k; updates dimension inference to probe models first
`src/customer_analysis/data_processing.py`	Integrates cross-encoder reranking in both `run_matching` and `run_matching_redis`; adds embedding normalization before Redis operations
`scripts/plot_multiple_precision_vs_cache_hit_ratio.py`	Creates dual-subplot visualization with precision-CHR curves and AUC bar chart; adds numpy version compatibility for trapezoid/trapz
`run_benchmark.sh`	Provides example shell script for running benchmarks with cross-encoder models
`run_benchmark.py`	Extends benchmark loop to iterate over cross-encoder models and include them in output paths
`evaluation.py`	Adds command-line arguments for cross-encoder model and rerank_k parameter

Comments suppressed due to low confidence (2)

src/customer_analysis/embedding_interface.py:414

The docstring for calculate_best_matches_with_cache_large_dataset does not document the new k parameter or how it affects the return value shapes. The function returns arrays with shape (num_sentences,) when k=1 but (num_sentences, k) when k>1. This should be documented to avoid confusion.

        """Large-dataset variant: find best cache match for each sentence using memmaps.

        Writes two memmaps (rows for sentences, cols for cache), normalised, and
        performs blockwise dot-products. If `sentence_offset` is provided and the
        cache corresponds to the same corpus, the self-similarity diagonal is masked.
        """

src/customer_analysis/embedding_interface.py:490

The docstrings for calculate_best_matches_with_cache and calculate_best_matches_from_embeddings_with_cache do not document the new k parameter or how it affects return value shapes. When k=1, arrays have shape (N,), but when k>1, they have shape (N, k). This should be documented.

        """
        Calculate the best similarity match for each sentence against all other
        sentences using a neural embedding model.
        """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-09T10:39:47Z

src/customer_analysis/embedding_interface.py

+                else:
+                    # Top-k logic
+                    # If columns in this block < k, take all valid
+                    curr_block_size = col_end - col_start
+                    if curr_block_size <= k:
+                        top_k_in_block_idx = np.argsort(-sim, axis=1) # Sort all
+                        top_k_in_block_val = np.take_along_axis(sim, top_k_in_block_idx, axis=1)
+                        # Might have fewer than k if block is small
+                    else:
+                        # Use argpartition for top k
+                        # We want largest k
+                        part_idx = np.argpartition(-sim, k, axis=1)[:, :k]
+                        top_k_in_block_val = np.take_along_axis(sim, part_idx, axis=1)
+
+                        # Sort them to have ordered top-k (optional but good for merging)
+                        sorted_sub_idx = np.argsort(-top_k_in_block_val, axis=1)
+                        top_k_in_block_val = np.take_along_axis(top_k_in_block_val, sorted_sub_idx, axis=1)
+                        top_k_in_block_idx = np.take_along_axis(part_idx, sorted_sub_idx, axis=1)
+
+                    # Merge with accumulated bests
+                    # chunk_best_scores: (batch, k)
+                    # top_k_in_block_val: (batch, min(block, k))
+
+                    # Adjust indices to global column indices
+                    top_k_in_block_idx_global = top_k_in_block_idx + col_start
+
+                    combined_vals = np.concatenate([chunk_best_scores, top_k_in_block_val], axis=1)
+                    combined_idxs = np.concatenate([chunk_best_indices, top_k_in_block_idx_global], axis=1)
+
+                    # Find top k in combined
+                    best_combined_args = np.argsort(-combined_vals, axis=1)[:, :k]
+
+                    chunk_best_scores = np.take_along_axis(combined_vals, best_combined_args, axis=1)
+                    chunk_best_indices = np.take_along_axis(combined_idxs, best_combined_args, axis=1)


The top-k retrieval logic in blockwise matching lacks test coverage. Since the repository has comprehensive tests and this is a significant new feature that changes the shape of return values and introduces complex merging logic, it should have tests covering: 1) k > 1 with various cache sizes, 2) edge cases where block size < k, 3) self-similarity masking with k > 1, and 4) correct sorting and merging across blocks.

src/customer_analysis/data_processing.py

run_benchmark.sh

scripts/plot_multiple_precision_vs_cache_hit_ratio.py

src/customer_analysis/query_engine.py

src/customer_analysis/embedding_interface.py

scripts/plot_multiple_precision_vs_cache_hit_ratio.py

src/customer_analysis/data_processing.py

src/customer_analysis/embedding_interface.py

Added embedding benchmark along with new config file and plot updates

af99886

radoslavralev self-assigned this Dec 8, 2025

radoslavralev added the enhancement New feature or request label Dec 8, 2025

radoslavralev marked this pull request as ready for review December 8, 2025 09:16

cursor bot reviewed Dec 8, 2025

View reviewed changes

src/customer_analysis/embedding_interface.py Show resolved Hide resolved

run_benchmark.sh Show resolved Hide resolved

src/customer_analysis/embedding_interface.py Show resolved Hide resolved

radoslavralev requested review from Copilot, ichko and sjster and removed request for ichko December 9, 2025 10:33

Copilot started reviewing on behalf of radoslavralev December 9, 2025 10:33 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

some metric updates and coloring scheme for plots

069650a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added embedding benchmark along with new config file and plot updates #14

Added embedding benchmark along with new config file and plot updates #14

Uh oh!

radoslavralev commented Dec 8, 2025 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added embedding benchmark along with new config file and plot updates #14

Are you sure you want to change the base?

Added embedding benchmark along with new config file and plot updates #14

Uh oh!

Conversation

radoslavralev commented Dec 8, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

radoslavralev commented Dec 8, 2025 •

edited by cursor bot

Loading