Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 102 additions & 5 deletions .github/BUG_FIXING_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,29 +16,126 @@ The process of fixing a bug, especially one that involves adding new syntax, fol
3. **Script Generation Update**:
* Update the script generator to handle the new AST node or enum. This typically involves modifying files in `SqlScriptDom/ScriptDom/SqlServer/ScriptGenerator/`. For the `NOT LIKE` example, this meant adding an entry to the `_booleanComparisonTypeGenerators` dictionary in `SqlScriptGeneratorVisitor.CommonPhrases.cs`.

4. **Build the Project**:
5. **Build the Project**:
* After making code changes, run a build to regenerate the parser and ensure everything compiles correctly:
```bash
dotnet build
```

5. **Add a Unit Test**:
6. **Add a Unit Test**:
* Create a new `.sql` file in `Test/SqlDom/TestScripts/` that contains the specific syntax for the new test case.

6. **Define the Test Case**:
7. **Define the Test Case**:
* Add a new `ParserTest` entry to the appropriate `Only<version>SyntaxTests.cs` files (e.g., `Only130SyntaxTests.cs`). This entry points to your new test script and defines the expected number of parsing errors for each SQL Server version.

7. **Generate and Verify Baselines**:
8. **Generate and Verify Baselines**:
This is a critical and multi-step process:
* **a. Create Placeholder Baseline Files**: Before running the test, create empty or placeholder baseline files in the corresponding `Test/SqlDom/Baselines<version>/` directories. The filename must match the test script's filename.
* **b. Run the Test to Get the Generated Script**: Run the specific test that you just added. It is *expected to fail* because the placeholder baseline will not match the script generated by the parser.
```bash
# Example filter for running a specific test
dotnet test --filter "FullyQualifiedName~YourTestMethodName"
```
* **c. Update the Baseline Files**: Copy the "Actual" output from the test failure log. This is the correctly formatted script generated from the AST. Paste this content into all the baseline files you created in step 7a.
* **c. Update the Baseline Files**: Copy the "Actual" output from the test failure log. This is the correctly formatted script generated from the AST. Paste this content into all the baseline files you created in step 8a.
* **d. Re-run the Tests**: Run the same test command again. This time, the tests should pass, confirming that the generated script matches the new baseline.

9. **⚠️ CRITICAL: Run Full Test Suite**:
* **Always run the complete test suite** to ensure your changes didn't break existing functionality:
```bash
dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c Debug
```
* **Why this is critical**: Grammar changes can have unintended side effects on other parts of the parser. Shared grammar rules are used in multiple contexts.
* **Common issues**: Modifying shared rules like `identifierColumnReferenceExpression` can cause other tests to fail because they now accept syntax that should be rejected.
* **Solution**: If tests fail, create context-specific grammar rules instead of modifying shared ones.

By following these steps, you can ensure that new syntax is correctly parsed, represented in the AST, generated back into a script, and fully validated by the testing framework without breaking existing functionality.

## Special Case: Extending Grammar Rules from Literals to Expressions

A common type of bug involves extending existing grammar rules that only accept literal values (like integers or strings) to accept full expressions (parameters, variables, outer references, etc.). This pattern was used to fix the VECTOR_SEARCH TOP_N parameter issue.

### Example: VECTOR_SEARCH TOP_N Parameter Extension

**Problem**: VECTOR_SEARCH's TOP_N parameter only accepted integer literals (`TOP_N = 10`) but needed to support parameters (`TOP_N = @k`) and outer references (`TOP_N = outerref.col`).

**Solution Steps**:

1. **Update AST Definition** (`SqlScriptDom/Parser/TSql/Ast.xml`):
```xml
<!-- Before: -->
<Member Name="TopN" Type="IntegerLiteral" Summary="..." />

<!-- After: -->
<Member Name="TopN" Type="ScalarExpression" Summary="..." />
```

2. **Update Grammar Rule** (`SqlScriptDom/Parser/TSql/TSql170.g`):
```antlr
// Before - Variable declaration:
IntegerLiteral vTopN;

// After - Variable declaration:
ScalarExpression vTopN;

// Before - Parsing rule:
vTopN = integer

// After - Parsing rule:
vTopN = expression
```

3. **Script Generator**: Usually no changes needed if using `GenerateNameEqualsValue()` or similar generic methods that work with `TSqlFragment`.

4. **Test Cases**: Add tests covering the new expression types:
```sql
-- Parameter test
TOP_N = @k

-- Outer reference test
TOP_N = outerref.max_results
```

### When to Apply This Pattern

Use this pattern when:
- ✅ Existing grammar accepts only literal values (integer, string, etc.)
- ✅ Users need to pass dynamic values (parameters, variables, computed expressions)
- ✅ The SQL Server syntax actually supports expressions in that position
- ✅ Backward compatibility must be maintained (literals still work)

### ⚠️ Critical Warning: Shared Grammar Rules

**DO NOT modify shared grammar rules** like `identifierColumnReferenceExpression` that are used throughout the codebase. This can cause:
- ✅ Other tests to fail unexpectedly
- ✅ Unintended syntax acceptance in different contexts
- ✅ Breaking changes in existing functionality

**Instead, create specialized rules** for your specific use case:
```antlr
// ❌ WRONG: Modifying shared rule
identifierColumnReferenceExpression: multiPartIdentifier[2] // Affects ALL usage

// ✅ CORRECT: Create specialized rule
vectorSearchColumnReferenceExpression: multiPartIdentifier[2] // Only for VECTOR_SEARCH
```

### Common Expression Types to Support

When extending from literals to expressions, consider supporting:
- **Parameters**: `@parameter`
- **Variables**: `@variable`
- **Column references**: `table.column`
- **Outer references**: `outerref.column`
- **Function calls**: `FUNCTION(args)`
- **Arithmetic expressions**: `value + 1`
- **Case expressions**: `CASE WHEN ... END`

### Files Typically Modified

1. **`Ast.xml`**: Change member type from specific literal to `ScalarExpression`
2. **`TSql*.g`**: Update variable declarations and parsing rules
3. **Test files**: Add comprehensive test coverage
4. **Script generators**: Usually no changes needed for well-designed generators
By following these steps, you can ensure that new syntax is correctly parsed, represented in the AST, generated back into a script, and fully validated by the testing framework.

## Special Case: Parser Predicate Recognition Issues
Expand Down
256 changes: 256 additions & 0 deletions .github/GRAMMAR_EXTENSION_PATTERNS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,256 @@
# Grammar Extension Patterns for SqlScriptDOM

This guide documents common patterns for extending the SqlScriptDOM parser grammar to support new syntax or enhance existing functionality.

## Pattern 1: Extending Literals to Expressions

### When to Use
When existing grammar rules only accept literal values but need to support dynamic expressions like parameters, variables, or computed values.

### Example Problem
Functions or constructs that currently accept only:
- `IntegerLiteral` (e.g., `TOP_N = 10`)
- `StringLiteral` (e.g., `VALUE = 'literal'`)

But need to support:
- Parameters: `@parameter`
- Variables: `@variable`
- Column references: `table.column`
- Outer references: `outerref.column`
- Function calls: `FUNCTION(args)`
- Computed expressions: `value + 1`

### ⚠️ Critical Warning: Avoid Modifying Shared Grammar Rules

**DO NOT** modify existing shared grammar rules like `identifierColumnReferenceExpression` that are used throughout the codebase. This can cause unintended side effects and break other functionality.

**Instead**, create specialized rules for your specific context.

### Solution Template

#### Step 1: Update AST Definition (`Ast.xml`)
```xml
<!-- Before: -->
<Member Name="PropertyName" Type="IntegerLiteral" Summary="Description" />

<!-- After: -->
<Member Name="PropertyName" Type="ScalarExpression" Summary="Description" />
```

#### Step 2: Create Context-Specific Grammar Rule (`TSql*.g`)
```antlr
// Create a specialized rule for your context
yourContextColumnReferenceExpression returns [ColumnReferenceExpression vResult = this.FragmentFactory.CreateFragment<ColumnReferenceExpression>()]
{
MultiPartIdentifier vMultiPartIdentifier;
}
:
vMultiPartIdentifier=multiPartIdentifier[2] // Allows table.column syntax
{
vResult.ColumnType = ColumnType.Regular;
vResult.MultiPartIdentifier = vMultiPartIdentifier;
}
;

// Use the specialized rule in your custom grammar
yourContextParameterRule returns [ScalarExpression vResult]
: vResult=signedInteger
| vResult=variable
| vResult=yourContextColumnReferenceExpression // Context-specific rule
;
```

#### Step 3: Verify Script Generator
Most script generators using `GenerateNameEqualsValue()` or similar methods work automatically with `ScalarExpression`. No changes typically needed.

#### Step 4: Add Test Coverage
```sql
-- Test parameter
FUNCTION_NAME(PARAM = @parameter)

-- Test outer reference
FUNCTION_NAME(PARAM = outerref.column)

-- Test computed expression
FUNCTION_NAME(PARAM = value + 1)
```

### Real-World Example: VECTOR_SEARCH TOP_N

**Problem**: `VECTOR_SEARCH` TOP_N parameter only accepted integer literals.

**❌ Wrong Approach**: Modify `identifierColumnReferenceExpression` to use `multiPartIdentifier[2]`
- **Result**: Broke `CreateIndexStatementErrorTest` because other grammar rules started accepting invalid syntax

**✅ Correct Approach**: Create `vectorSearchColumnReferenceExpression` specialized for VECTOR_SEARCH
- **Result**: VECTOR_SEARCH supports multi-part identifiers without affecting other functionality

**Final Implementation**:
```antlr
signedIntegerOrVariableOrColumnReference returns [ScalarExpression vResult]
: vResult=signedInteger
| vResult=variable
| vResult=vectorSearchColumnReferenceExpression // VECTOR_SEARCH-specific rule
;

vectorSearchColumnReferenceExpression returns [ColumnReferenceExpression vResult = ...]
:
vMultiPartIdentifier=multiPartIdentifier[2] // Allows table.column syntax
{
vResult.ColumnType = ColumnType.Regular;
vResult.MultiPartIdentifier = vMultiPartIdentifier;
}
;
```

**Result**: Now supports dynamic TOP_N values:
```sql
-- Parameters
VECTOR_SEARCH(..., TOP_N = @k) AS ann

-- Outer references
VECTOR_SEARCH(..., TOP_N = outerref.max_results) AS ann
```

## Pattern 2: Adding New Enum Members

### When to Use
When adding new operators, keywords, or options to existing constructs.

### Solution Template

#### Step 1: Update Enum in AST (`Ast.xml`)
```xml
<Enum Name="ExistingEnumType">
<Member Name="ExistingValue1" />
<Member Name="ExistingValue2" />
<Member Name="NewValue" /> <!-- Add this -->
</Enum>
```

#### Step 2: Update Grammar Rule (`TSql*.g`)
```antlr
// Add new token matching
| tNewValue:Identifier
{
Match(tNewValue, CodeGenerationSupporter.NewValue);
vResult.EnumProperty = ExistingEnumType.NewValue;
}
```

#### Step 3: Update Script Generator
```csharp
// Add mapping in appropriate generator file
private static readonly Dictionary<EnumType, string> _enumGenerators =
new Dictionary<EnumType, string>()
{
{ EnumType.ExistingValue1, CodeGenerationSupporter.ExistingValue1 },
{ EnumType.ExistingValue2, CodeGenerationSupporter.ExistingValue2 },
{ EnumType.NewValue, CodeGenerationSupporter.NewValue }, // Add this
};
```

## Pattern 3: Adding New Function or Statement

### When to Use
When adding completely new T-SQL functions or statements.

### Solution Template

#### Step 1: Define AST Node (`Ast.xml`)
```xml
<Class Name="NewFunctionCall" Base="PrimaryExpression">
<Member Name="Parameter1" Type="ScalarExpression" />
<Member Name="Parameter2" Type="StringLiteral" />
</Class>
```

#### Step 2: Add Grammar Rule (`TSql*.g`)
```antlr
newFunctionCall returns [NewFunctionCall vResult = FragmentFactory.CreateFragment<NewFunctionCall>()]
{
ScalarExpression vParam1;
StringLiteral vParam2;
}
:
tFunction:Identifier LeftParenthesis
{
Match(tFunction, CodeGenerationSupporter.NewFunction);
UpdateTokenInfo(vResult, tFunction);
}
vParam1 = expression
{
vResult.Parameter1 = vParam1;
}
Comma vParam2 = stringLiteral
{
vResult.Parameter2 = vParam2;
}
RightParenthesis
;
```

#### Step 3: Integrate with Existing Rules
Add the new rule to appropriate places in the grammar (e.g., `functionCall`, `primaryExpression`, etc.).

#### Step 4: Create Script Generator
```csharp
public override void ExplicitVisit(NewFunctionCall node)
{
GenerateIdentifier(CodeGenerationSupporter.NewFunction);
GenerateSymbol(TSqlTokenType.LeftParenthesis);
GenerateFragmentIfNotNull(node.Parameter1);
GenerateSymbol(TSqlTokenType.Comma);
GenerateFragmentIfNotNull(node.Parameter2);
GenerateSymbol(TSqlTokenType.RightParenthesis);
}
```

## Best Practices

### 1. Backward Compatibility
- Always ensure existing syntax continues to work
- Extend rather than replace existing rules
- Test both old and new syntax

### 2. Testing Strategy
- Add comprehensive test cases in `TestScripts/`
- Update baseline files with expected output
- Test edge cases and error conditions

### 3. Documentation
- Update grammar comments with new syntax
- Add examples in code comments
- Document any limitations or requirements

### 4. Version Targeting
- Add new features to the appropriate SQL Server version grammar
- Consider whether feature should be backported to earlier versions
- Update all relevant grammar files if syntax is version-independent

## Common Pitfalls

### 1. Forgetting Script Generator Updates
- Grammar changes often require corresponding script generator changes
- Test the round-trip: parse → generate → parse again

### 2. Incomplete Test Coverage
- Test all supported expression types when extending to `ScalarExpression`
- Include error cases and boundary conditions

### 3. Missing Version Updates
- New syntax should be added to all relevant grammar versions
- Consider SQL Server version compatibility

### 4. AST Design Issues
- Choose appropriate base classes for new AST nodes
- Consider reusing existing AST patterns where possible
- Ensure proper inheritance hierarchy

## Reference Examples

- **VECTOR_SEARCH TOP_N Extension**: Literal to expression pattern
- **REGEXP_LIKE Predicate**: Boolean parentheses recognition pattern
- **EVENT SESSION Predicates**: Function-style vs operator-style predicates

For detailed step-by-step examples, see [BUG_FIXING_GUIDE.md](BUG_FIXING_GUIDE.md).
Loading
Loading