A Python-based code analysis tool that leverages tree-sitter to parse and analyze Java code changes in Git repositories. This tool is particularly useful for identifying modified controller methods and validating Swagger annotations in Java Spring applications.
- Git Diff Analysis: Parse git diffs to identify changed files and line ranges
- Java Code Parsing: Use tree-sitter to parse Java source files and extract method declarations
- Controller Method Detection: Find modified controller methods based on annotations and file naming patterns
- Swagger Annotation Validation: Validate that controller methods have proper Swagger/OpenAPI annotations
- Reusable Components: Modular design with reusable git and parser utilities
Identify which controller methods were modified between two git revisions, useful for code review and impact analysis.
Automatically validate that modified controller methods follow Swagger annotation standards:
- Methods must have
@ApiOperationwithhttpMethodandvaluefields - Parameters need
@ApiParamannotation (unless annotated with@RequestBody) - Required fields:
name,value,required, andexample
- tree-sitter: Code parsing library (requires tree-sitter Python bindings)
- subprocess: For executing git commands (built-in)
- re: For regex pattern matching (built-in)
- Git: Required for repository operations
- PyInstaller: Optional, for building standalone executables
- tree-sitter-java
- tree-sitter-python
The project includes pre-compiled tree-sitter language libraries in language/my-languages.so.
- Clone the repository:
git clone https://github.com/albert-lv/code-parser.git
cd code-parser- Install Python dependencies:
pip install tree-sitter- (Optional) Build the language library if you need to update it:
python init_library.pyRun the main script to check changed controller methods and their Swagger annotations:
python main.pyYou'll be prompted to enter:
- Repository path
- Old version (commit SHA, branch, or tag)
- New version (commit SHA, branch, or tag)
from find_changed_controller import find_changed_controller_methods
repo_path = "/path/to/repo"
old_version = "commit-sha-1"
new_version = "commit-sha-2"
annotations = ['@RequestMapping', '@GetMapping', '@PostMapping', '@PutMapping', '@DeleteMapping']
controller_keywords = ['Controller', 'Rest', 'Api']
changed_methods = find_changed_controller_methods(
repo_path, old_version, new_version, annotations, controller_keywords
)from parser.parse_git_diff import parse_diff
file_changes = parse_diff(repo_path, old_version, new_version)
# Returns: {file_path: [(start_line, end_line), ...], ...}from parser.parse_single_file import parse_changed_file
methods = parse_changed_file(repo_path, file_path, revision, annotations)
# Returns: [{'start_line': int, 'end_line': int, 'code': str}, ...]from check_swagger_annotations import check_method_annotations
code = """
@ApiOperation(httpMethod = "POST", value = "Create user")
@PostMapping("/users")
public User createUser(@ApiParam(name = "user", value = "User info", required = true, example = "{}") User user) {
return userService.create(user);
}
"""
is_compliant, message = check_method_annotations(code)- Function:
run_git_diff(repo_path, old_version, new_version) - Purpose: Execute git diff command and return unified diff output
- Returns: String containing diff output
- Reusable for: Any project needing to analyze git diffs
- Function:
get_single_file(repo_path, revision, file_path) - Purpose: Retrieve file content at a specific git revision
- Returns: String containing file content
- Reusable for: Any project needing to access historical file versions
- Function:
init_parser(language_name) - Purpose: Initialize a tree-sitter parser for a specific language
- Returns: Configured Parser instance
- Reusable for: Any project using tree-sitter for code parsing
- Function:
parse_diff(repo_path, old_version, new_version) - Purpose: Parse git diff output to extract changed files and line ranges
- Returns: Dictionary mapping file paths to list of (start_line, end_line) tuples
- Reusable for: Projects analyzing code changes, code review tools, CI/CD pipelines
- Function:
find_annotated_methods(tree, content, annotations) - Purpose: Extract methods with specific annotations from parsed AST
- Returns: List of method information dictionaries
- Reusable for: Java static analysis tools, documentation generators, code metrics tools
To build a standalone executable using PyInstaller:
pyinstaller --onefile --name="CodeParser" --paths="/path/to/code-parser" main.pyImportant: The executable requires language/my-languages.so to be present in the same directory as the executable.
code-parser/
├── git/ # Git operation utilities
│ ├── git_diff.py # Git diff execution
│ └── git_show_file.py # File retrieval at specific revision
├── parser/ # Code parsing utilities
│ ├── init_parser.py # Parser initialization
│ ├── parse_git_diff.py # Diff parsing logic
│ └── parse_single_file.py # Java file parsing
├── language/ # Tree-sitter language libraries
│ └── my-languages.so # Compiled language definitions
├── vendor/ # Tree-sitter grammar submodules
│ ├── tree-sitter-java/
│ └── tree-sitter-python/
├── main.py # Main entry point
├── find_changed_controller.py # Controller method finder
├── check_swagger_annotations.py # Swagger validation
└── init_library.py # Language library builder
To support additional languages:
- Add the tree-sitter grammar as a git submodule in
vendor/ - Update
init_library.pyto include the new language - Run
python init_library.pyto rebuild the language library - Create language-specific parsing logic similar to the Java parser
Extend check_swagger_annotations.py to add custom validation rules:
def check_custom_annotation(code):
parser = init_java_parser()
tree = parser.parse(bytes(code, 'utf8'))
# Add your custom logic here
return is_valid, messageModify the annotations list in main.py or when calling functions to support different annotation patterns:
annotations = ['@MyCustomAnnotation', '@AnotherAnnotation']Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Albert Lv