Skip to content

feat: structured SQL DML parsing for INSERT/UPDATE/DELETE/DECLARE CURSOR#2842

Closed
allieseb wants to merge 16 commits intoTypeCobolTeam:developfrom
allieseb:fix/sql-catch-all-grammar
Closed

feat: structured SQL DML parsing for INSERT/UPDATE/DELETE/DECLARE CURSOR#2842
allieseb wants to merge 16 commits intoTypeCobolTeam:developfrom
allieseb:fix/sql-catch-all-grammar

Conversation

@allieseb
Copy link
Copy Markdown

@allieseb allieseb commented Mar 12, 2026

Summary

  • Add ANTLR grammar rules and CUP parser integration for INSERT, UPDATE, DELETE, and DECLARE CURSOR SQL statements
  • Create typed CodeElement classes (InsertStatement, UpdateStatement, SqlDeleteStatement, DeclareCursorStatement) with structured properties (table name, column bindings, host variables)
  • Introduce HostVariableBinding model to track column-to-variable mappings with direction (IN/OUT), enabling column-level data lineage for COBOL DB2 programs
  • Wire AST node builders, visitor pattern, and dispatcher for all new statement types

Motivation

Previously, INSERT/UPDATE/DELETE/DECLARE CURSOR fell into UnsupportedSqlStatement with only a raw SQL keyword available. This made it impossible to reliably extract:

  • Which tables are accessed
  • Which columns are read or written
  • Which host variables carry data in/out of SQL

This structured parsing is required for CRUD matrix extraction and data lineage analysis in downstream tools (SAST bridges, data flow analysis).

Changes

Area Files What
Grammar CobolCodeElements.g4 4 new rules: insertStatement, updateStatement, sqlDeleteStatement, declareCursorStatement
CodeElements InsertStatement.cs, UpdateStatement.cs, SqlDeleteStatement.cs, DeclareCursorStatement.cs Typed AST elements with table/column/binding properties
Model HostVariableBinding.cs Column↔variable mapping with direction enum
Builder SqlCodeElementBuilder.cs 143 lines — parse ANTLR contexts into typed statements
CUP TypeCobolProgram.cup, ProgramClassBuilder.cs Terminal wiring + builder dispatch
Visitor CobolLanguageLevelVisitor.cs, NodeDispatcher, NodeListener Visit pattern for new node types
AST Nodes Insert.cs, Update.cs, SqlDelete.cs, DeclareCursor.cs Tree node wrappers

Test plan

  • Parse test on TriangulateSI COBOL programs (1469 files across 5 mainframe apps)
  • Verified CRUD extraction: SELECT, INSERT, UPDATE, DELETE with column bindings
  • Verified host variable binding with correct direction (IN for INSERT/UPDATE values, OUT for SELECT INTO)
  • TypeCobol unit test suite (existing tests should pass — new types only fire on EnableSqlParsing = true)

🤖 Generated with Claude Code

allieseb and others added 6 commits March 12, 2026 16:42
…very

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… infrastructure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New test file with INSERT, UPDATE, DELETE, DECLARE CURSOR, OPEN, FETCH, CLOSE
- All 7 unsupported statements parse without errors (catch-all rule works)
- Updated expected results for ExecInDataDivision and ExecSqlWithCommit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
allieseb and others added 7 commits March 15, 2026 10:42
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…SOR)

Defines architecture for extracting table names, columns, and host variable
bindings from embedded SQL to feed XREF CRUD matrix and column-level data
lineage in TriangulateSystemOfSystem.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ints

Fixes from spec review:
- Rename DeleteStatement to SqlDeleteStatement (avoids COBOL collision)
- Use existing grammar rules (tableOrViewOrCorrelationName, hostVariable)
- Use correct token types (LeftParenthesisSeparator, SQL_CommaSeparator)
- Fix WHERE clause to (hostVariable | ~END_EXEC)* for host var extraction
- Use FullSelect model in DeclareCursorStatement (not CodeElement)
- Add CodeElementType/StatementType enum steps
- Separate ANTLR listener and CUP integration steps
- Add edge cases section
- Exclude ColonSeparator from negated sets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
16 tasks across 4 chunks:
1. Foundation (HostVariableBinding, enums, nodes, visitors, CodeElements)
2. Infrastructure (CUP, builders, dispatcher, listener)
3. Grammar (ANTLR rules, SqlCodeElementBuilder, ANTLR listener)
4. Tests (test files, unit tests, regression, integration)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ucture

- Create HostVariableBinding class for tracking host variable bindings with direction
- Add CodeElementType and StatementType entries for Insert, Update, SqlDelete, DeclareCursor
- Create AST node classes (Insert, Update, SqlDelete, DeclareCursor) in Sql/Nodes
- Create CodeElement statement classes (InsertStatement, UpdateStatement, SqlDeleteStatement, DeclareCursorStatement)
- Add Visit methods for all new types to IASTVisitor and AbstractAstVisitor
- Enrich SelectStatement with IntoHostVariables and WhereHostVariables properties

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ECLARE CURSOR/SELECT INTO

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@allieseb allieseb changed the title Fix SQL parser hang via grammar catch-all rule feat: structured SQL DML parsing for INSERT/UPDATE/DELETE/DECLARE CURSOR Mar 15, 2026
allieseb and others added 3 commits March 16, 2026 20:29
…DELETE/DECLARE CURSOR)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix file encoding: preserve original Windows-1252 bytes in grammar
  comments (no more UTF-8 replacement characters in diff)
- Remove redundant insertValueList rule, reuse repeatedSourceValue
  as suggested by reviewer
- Add TODO comments in CobolCodeElements.g4 for all incomplete
  optional clauses (INSERT, UPDATE, DELETE, DECLARE CURSOR)
- Update SqlCodeElementBuilder to navigate sourceValue tree for
  host variable extraction (FindHostVariable helper)
- Add Comparator Visit methods for all new SQL statement types
  (InsertStatement, UpdateStatement, SqlDeleteStatement,
  DeclareCursorStatement, UnsupportedSqlStatement)
- Create dedicated rdzSQL unit tests (one per statement type):
  ExecSqlWithInsertStatement, ExecSqlWithUpdateStatement,
  ExecSqlWithDeleteStatement, ExecSqlWithDeclareCursorStatement
- Update ExecSqlWithUnsupportedStatement test: remove now-supported
  statements, keep only OPEN/FETCH/CLOSE
- Build verified: 0 errors (885 pre-existing warnings)

Addresses review from fm-117 on PR TypeCobolTeam#2838.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix Comparator: use DumpString/DumpStringList for string properties
  to avoid SqlObject.DumpProperty treating strings as IEnumerable<char>
- Regenerate expected .SQL.txt files from actual parser output
- Update ExecInDataDivision_NoEndExec expected error message to include
  new SQL DML statement types in token list
- All 63 tests pass (0 failures)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fm-117
Copy link
Copy Markdown
Contributor

fm-117 commented Apr 7, 2026

Will be recreated separately.

@fm-117 fm-117 closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants