added new testcases #35

maharajamihir · 2026-01-20T17:02:47Z

add_import_after_use.md - User writes code using json.loads(), model suggests navigating to file top and adding import json
binary_toggle.md - User writes dark theme if block, types else {, model completes with opposite light theme values (sun.png vs moon.png)
comment_momentum.md - User copies Transformer forward pass, edits to pre-layernorm, starts commenting out old version; model continues the commenting pattern
cross_file_navigation.md - User adds parameter to function call in main.py, model suggests jumping to services.py to update function definition
cross_file_python.md - User writes code in main.py using functions from utils.py; model completes normalize(mat_a) based on cross-file context
cross_file_react.md - User pastes Button component, switches to App.jsx, types partial import; model completes import { Button } from './components/Button'
debug_typo_fix.md - User runs sbatch job, sees FileNotFoundError for trian_dataset.jsonl, model suggests fixing typo to train_dataset.jsonl
env_config_completion.md - User writes os.getenv("STRIPE_W, model completes with EBHOOK_SECRET") based on .env.example
fullstack_glue.md - User adds "Premium Status" column in frontend, types partial accessorKey: ", model completes with is_premium" from backend model
god_tier_file_context.md - User types report_date = _format in 500-line helpers.py, model completes with _format_iso_date_to_human(created_at) defined ~440 lines above
line_complete.md - Tests basic line completion for DataLoader methods (self.load_json(filename), item.get(key) == value], json.dump(..., indent=2))
long_horizon_debug.md - User adds breakpoints, runs debugger, renames variables to Shazeer notation (x_BT, x_BTD, logits_BTV), then removes breakpoints
overfit_single_batch.md - User comments out training loop, fetches single batch, types while Tr; model completes overfitting loop
propagate_field.md - User adds email parameter to init, model propagates to self.email = email, then to user_to_dict, then to instantiation
semantic_filter.md - User creates get_available_products function, types available_; model infers [p for p in products if p['in_stock']] from jsonl data

Copilot

Pull request overview

This PR adds 15 new handcrafted test cases for evaluating code completion model capabilities across diverse scenarios including cross-file navigation, semantic understanding, debugging workflows, and pattern recognition.

Changes:

Added 15 markdown test case files testing various code completion scenarios
Each test includes bash commands, code snippets, and assertion validation
Tests cover Python, JavaScript/React, and configuration file scenarios

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
semantic_filter.md	Tests model ability to infer list comprehension filter based on JSONL data context
propagate_field.md	Tests field propagation across init, dict conversion, and instantiation
overfit_single_batch.md	Tests completion of overfitting loop pattern for ML debugging
long_horizon_debug.md	Tests variable renaming to Shazeer notation and breakpoint cleanup
line_complete.md	Tests basic line completion for DataLoader methods
god_tier_file_context.md	Tests long-range context usage (~440 lines) for function completion
fullstack_glue.md	Tests cross-file field name matching between backend and frontend
env_config_completion.md	Tests environment variable name completion from .env.example
debug_typo_fix.md	Tests typo identification and correction in file paths
cross_file_react.md	Tests React component import completion with correct path
cross_file_python.md	Tests Python function call completion using cross-file context
cross_file_navigation.md	Tests navigation and parameter addition across files
comment_momentum.md	Tests pattern recognition for continuing comment blocks
binary_toggle.md	Tests completion with opposite/toggle values (dark/light theme)
add_import_after_use.md	Tests adding missing import after detecting usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

added new testcases

a544188

maharajamihir requested review from avocadoali and Copilot January 20, 2026 17:02

maharajamihir self-assigned this Jan 20, 2026

Copilot started reviewing on behalf of maharajamihir January 20, 2026 17:03 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added new testcases #35

added new testcases #35

Uh oh!

maharajamihir commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

added new testcases #35

Are you sure you want to change the base?

added new testcases #35

Uh oh!

Conversation

maharajamihir commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants