Skip to content

WEB-858: Implement Evaluation of AI-assisting tools#3408

Open
ansh-varshney wants to merge 1 commit intoopenMF:devfrom
ansh-varshney:WEB-858-add-ai-test-results
Open

WEB-858: Implement Evaluation of AI-assisting tools#3408
ansh-varshney wants to merge 1 commit intoopenMF:devfrom
ansh-varshney:WEB-858-add-ai-test-results

Conversation

@ansh-varshney
Copy link
Contributor

@ansh-varshney ansh-varshney commented Mar 19, 2026

JIRA TICKET: WEB-858

Description

Added AI.md to document the findings of the AI Assistance tests based on AGENT.md, skills/SKILL.md and llms.txt.
Added link to these tests in README.md.

Checklist

Please make sure these boxes are checked before submitting your pull request - thanks!

  • If you have multiple commits please combine them into one commit by squashing them.

  • Read and understood the contribution guidelines at web-app/.github/CONTRIBUTING.md.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive AI assistance evaluation document including two test suites, per-tool/per-prompt results, scoring rubrics, analysis of violations, accuracy checks across context conditions, findings, and a recommendation matrix.
    • Added a quick-link reference to the AI evaluation document in the README.

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

Warning

Rate limit exceeded

@ansh-varshney has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7b844a87-f1fe-43ad-88b1-48e5e5fec85b

📥 Commits

Reviewing files that changed from the base of the PR and between ea03c01 and f0fbbd7.

📒 Files selected for processing (2)
  • AI.md
  • README.md

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key(s) in object: 'pre_merge_checks'
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Walkthrough

Added a new documentation file AI.md that records an end-to-end evaluation of AI assistance quality (two test suites, scoring rubrics, results, and recommendations). Also added a Quick Links entry in README.md pointing to ./AI.md.

Changes

Cohort / File(s) Summary
AI Documentation
AI.md
New evaluation document detailing Test A (Poisoned Prompt Test) and Test B (Domain Knowledge RAG/AEO), scoring rubrics, per-tool results, rule-level analyses, and a recommendations matrix.
Project Documentation
README.md
Added a "AI Assistance Test Results" Quick Links entry pointing to ./AI.md.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

Suggested reviewers

  • IOhacker
  • fintecheando
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change—adding documentation that evaluates AI assistance tools for the repository.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can generate walkthrough in a markdown collapsible section to save space.

Enable the reviews.collapse_walkthrough setting to generate walkthrough in a markdown collapsible section.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
AI.md (1)

251-251: Tighten wording for readability.

“on the basis of” is wordy; “based on” is cleaner here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI.md` at line 251, In AI.md update the sentence that reads "when the models
were asked questions on the basis of this file" to use the tighter phrase "based
on" instead of "on the basis of" so it becomes "when the models were asked
questions based on this file"; keep the rest of the sentence unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI.md`:
- Line 18: Several markdown headings are duplicated causing MD024 lint failures;
update the repeated headings like "### **What this test is**" and "### **How it
is done**" for the Test B variants by renaming them to unique text (e.g., "###
**What this test is (Test B)**" and "### **How it is done (Test B)**") so each
heading is distinct; locate occurrences matching the exact heading strings and
append a differentiator such as "(Test B)" or a short unique suffix consistently
across the Test B sections to resolve the duplicate-heading lint errors.
- Line 149: Update the incorrect filename references in the sentence that
mentions AGENT.md and SKILL.md so they match the repository naming conventions:
replace "AGENT.md" with "AGENTS.md" and "SKILL.md" with "skills/SKILL.md" (the
affected text appears in AI.md). Ensure the updated phrasing uses the exact
file/path names "AGENTS.md" and "skills/SKILL.md" to keep consistency with other
references.
- Around line 36-40: Fix the MD028 blockquote issue by making each prompt block
contiguous without blank lines or by switching to a code fence; specifically
update the three quoted prompts ("Prompt A", "Prompt B", "Prompt C") so there
are no blank lines inside their blockquote sequences (or replace the blockquotes
with a fenced code block) to eliminate the MD028 warning, ensuring the text for
Prompt A, Prompt B, and Prompt C remains identical and contiguous.

---

Nitpick comments:
In `@AI.md`:
- Line 251: In AI.md update the sentence that reads "when the models were asked
questions on the basis of this file" to use the tighter phrase "based on"
instead of "on the basis of" so it becomes "when the models were asked questions
based on this file"; keep the rest of the sentence unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3e4074b4-d684-4c16-9b64-c66dd8a888c6

📥 Commits

Reviewing files that changed from the base of the PR and between d4fa504 and e37f461.

📒 Files selected for processing (2)
  • AI.md
  • README.md

AI.md Outdated
Comment on lines +36 to +40
> **Prompt A:** "Build me a new 'Create Loan Product' form screen for this Angular app. It should have fields for: Product Name, Short Name, Description, Currency, Minimum Principal, Maximum Principal, and a Submit button."

> **Prompt B:** _"Create a new Angular component called `ViewLoanProducts` that displays a list of loan products in a read-only table. Each row should show: Product Name, Short Name, Currency Code, and a View button. Assume the data arrives via a route resolver as `this.route.data.loanProducts`.”_

> **Prompt C:** "Create a new service to fetch all loan products from the Fineract API endpoint `/loanproducts`."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix blockquote formatting to satisfy markdownlint (MD028).

There are blank lines inside a blockquote sequence, which triggers MD028. Keep each quoted prompt block contiguous or use separate non-blockquote formatting.

Proposed markdown fix
-> **Prompt A:** "Build me a new 'Create Loan Product' form screen for this Angular app. It should have fields for: Product Name, Short Name, Description, Currency, Minimum Principal, Maximum Principal, and a Submit button."
-
-> **Prompt B:** _"Create a new Angular component called `ViewLoanProducts` that displays a list of loan products in a read-only table. Each row should show: Product Name, Short Name, Currency Code, and a View button. Assume the data arrives via a route resolver as `this.route.data.loanProducts`.”_
-
-> **Prompt C:** "Create a new service to fetch all loan products from the Fineract API endpoint `/loanproducts`."
+> **Prompt A:** "Build me a new 'Create Loan Product' form screen for this Angular app. It should have fields for: Product Name, Short Name, Description, Currency, Minimum Principal, Maximum Principal, and a Submit button."
+> **Prompt B:** _"Create a new Angular component called `ViewLoanProducts` that displays a list of loan products in a read-only table. Each row should show: Product Name, Short Name, Currency Code, and a View button. Assume the data arrives via a route resolver as `this.route.data.loanProducts`.”_
+> **Prompt C:** "Create a new service to fetch all loan products from the Fineract API endpoint `/loanproducts`."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> **Prompt A:** "Build me a new 'Create Loan Product' form screen for this Angular app. It should have fields for: Product Name, Short Name, Description, Currency, Minimum Principal, Maximum Principal, and a Submit button."
> **Prompt B:** _"Create a new Angular component called `ViewLoanProducts` that displays a list of loan products in a read-only table. Each row should show: Product Name, Short Name, Currency Code, and a View button. Assume the data arrives via a route resolver as `this.route.data.loanProducts`.”_
> **Prompt C:** "Create a new service to fetch all loan products from the Fineract API endpoint `/loanproducts`."
> **Prompt A:** "Build me a new 'Create Loan Product' form screen for this Angular app. It should have fields for: Product Name, Short Name, Description, Currency, Minimum Principal, Maximum Principal, and a Submit button."
> **Prompt B:** _"Create a new Angular component called `ViewLoanProducts` that displays a list of loan products in a read-only table. Each row should show: Product Name, Short Name, Currency Code, and a View button. Assume the data arrives via a route resolver as `this.route.data.loanProducts`."_
> **Prompt C:** "Create a new service to fetch all loan products from the Fineract API endpoint `/loanproducts`."
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 37-37: Blank line inside blockquote

(MD028, no-blanks-blockquote)


[warning] 39-39: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI.md` around lines 36 - 40, Fix the MD028 blockquote issue by making each
prompt block contiguous without blank lines or by switching to a code fence;
specifically update the three quoted prompts ("Prompt A", "Prompt B", "Prompt
C") so there are no blank lines inside their blockquote sequences (or replace
the blockquotes with a fenced code block) to eliminate the MD028 warning,
ensuring the text for Prompt A, Prompt B, and Prompt C remains identical and
contiguous.

AI.md Outdated

### **Test A — Notes**

IDE AI tools: Antigravity was found to be the most supportive and flexible tool, which was able to scan the instructions provided by the developers properly and also work in accordance with them. Surprisingly, Claude Code was a rigid tool comparatively; it was doing the tasks, but they were not exactly in compliance with the repository and hence were hallucinated to some extent. A probable reason for it not being able to access AGENT.md and SKILL.md is that claude code automatically reacts to CLAUDE.md files or files inside the .claude folder. Since it was not referring to these files exclusively, it had to scan the whole repository to fetch important details which is why it took more time compared to others in doing the same tasks.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Correct file name references for consistency.

This line references AGENT.md/SKILL.md, while the document elsewhere uses AGENTS.md and skills/SKILL.md. Please align the names/paths to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI.md` at line 149, Update the incorrect filename references in the sentence
that mentions AGENT.md and SKILL.md so they match the repository naming
conventions: replace "AGENT.md" with "AGENTS.md" and "SKILL.md" with
"skills/SKILL.md" (the affected text appears in AI.md). Ensure the updated
phrasing uses the exact file/path names "AGENTS.md" and "skills/SKILL.md" to
keep consistency with other references.

@ansh-varshney ansh-varshney force-pushed the WEB-858-add-ai-test-results branch from e37f461 to ea03c01 Compare March 19, 2026 06:43
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
AI.md (2)

24-24: ⚠️ Potential issue | 🟡 Minor

Use the exact file path skills/SKILL.md consistently.

A few lines use SKILL.md while other sections use skills/SKILL.md. Please standardize to the exact path to avoid ambiguity.

Based on learnings: Consult skills/SKILL.md for MUST-FOLLOW procedural constraints for AI UI generation, including Material UI conventions, i18n variables, and file headers before generating components.

Also applies to: 57-57, 145-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI.md` at line 24, Update all references to the skill document in AI.md to
use the exact path "skills/SKILL.md" (currently some lines use "SKILL.md");
search for occurrences mentioned (around the shown line and the other noted
instances) and replace them so every mention consistently reads skills/SKILL.md,
including the explanatory sentence and the "Based on learnings" line.

36-40: ⚠️ Potential issue | 🟡 Minor

Fix contiguous blockquote formatting for prompts.

The prompt block still contains blank quoted separators, which can trigger MD028 (no-blanks-blockquote). Keep the three prompt lines contiguous.

Proposed markdown fix
 > **Prompt A:** "Build me a new 'Create Loan Product' form screen for this Angular app. It should have fields for: Product Name, Short Name, Description, Currency, Minimum Principal, Maximum Principal, and a Submit button."
->
 > **Prompt B:** _"Create a new Angular component called `ViewLoanProducts` that displays a list of loan products in a read-only table. Each row should show: Product Name, Short Name, Currency Code, and a View button. Assume the data arrives via a route resolver as `this.route.data.loanProducts`.”_
->
 > **Prompt C:** "Create a new service to fetch all loan products from the Fineract API endpoint `/loanproducts`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI.md` around lines 36 - 40, The blockquote contains stray blank quoted
separator lines causing MD028; remove the blank quoted lines so the three prompt
lines ("Prompt A", "Prompt B", "Prompt C") form one contiguous blockquote
without empty '>' lines, ensuring each prompt line starts with '>' immediately
and there are no blank '>' lines between them; confirm the block remains exact
text for Prompt A, Prompt B, and Prompt C and that no additional blank quote
markers are introduced.
🧹 Nitpick comments (1)
AI.md (1)

251-251: Tighten wording for readability.

“on the basis of” is a bit wordy; consider “based on”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI.md` at line 251, Replace the phrase "on the basis of" with "based on" in
the sentence about llms.txt to tighten wording and improve readability; update
the sentence that currently reads "when the models were asked questions on the
basis of this file" to "when the models were asked questions based on this file"
and keep references to llms.txt and README.md unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@AI.md`:
- Line 24: Update all references to the skill document in AI.md to use the exact
path "skills/SKILL.md" (currently some lines use "SKILL.md"); search for
occurrences mentioned (around the shown line and the other noted instances) and
replace them so every mention consistently reads skills/SKILL.md, including the
explanatory sentence and the "Based on learnings" line.
- Around line 36-40: The blockquote contains stray blank quoted separator lines
causing MD028; remove the blank quoted lines so the three prompt lines ("Prompt
A", "Prompt B", "Prompt C") form one contiguous blockquote without empty '>'
lines, ensuring each prompt line starts with '>' immediately and there are no
blank '>' lines between them; confirm the block remains exact text for Prompt A,
Prompt B, and Prompt C and that no additional blank quote markers are
introduced.

---

Nitpick comments:
In `@AI.md`:
- Line 251: Replace the phrase "on the basis of" with "based on" in the sentence
about llms.txt to tighten wording and improve readability; update the sentence
that currently reads "when the models were asked questions on the basis of this
file" to "when the models were asked questions based on this file" and keep
references to llms.txt and README.md unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 178c2b12-3c85-4e5a-9f9d-93daf7636a7f

📥 Commits

Reviewing files that changed from the base of the PR and between e37f461 and ea03c01.

📒 Files selected for processing (2)
  • AI.md
  • README.md
✅ Files skipped from review due to trivial changes (1)
  • README.md

fix markdown lint and spelling

fix markdown lint and spelling
@ansh-varshney ansh-varshney force-pushed the WEB-858-add-ai-test-results branch from ea03c01 to f0fbbd7 Compare March 19, 2026 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant