Skip to content

Conversation

@shubham-n-khanna
Copy link
Contributor

Description

Jinja style in caterpillar

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have checked downstream dependencies (e.g. ExternalTaskSensors) by searching for DAG name elsewhere in the repo

Copilot AI review requested due to automatic review settings December 4, 2025 06:07
@wiz-55ccc8b716
Copy link

wiz-55ccc8b716 bot commented Dec 4, 2025

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings 1 Medium
Total 1 Medium

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Jinja-style template support to the caterpillar configuration system, enabling file inclusion and date string templating for more modular and dynamic configuration management.

Key Changes

  • Implements Jinja-style {% include 'path' %} directive processing with recursive inclusion and circular dependency detection
  • Adds ds template function to retrieve date strings from environment variables (DS, EXECUTION_DATE) or current date
  • Integrates include preprocessing before template parsing in the config loading pipeline

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +136 to +148
absPath, err := filepath.Abs(resolvedPath)
if err != nil {
return "", fmt.Errorf("failed to resolve absolute path for %s: %w", filePath, err)
}

if visited[absPath] {
return "", fmt.Errorf("circular include detected: %s", absPath)
}

// Read the included file
includedContent, err := os.ReadFile(resolvedPath)
if err != nil {
return "", fmt.Errorf("failed to read included file %s: %w", filePath, err)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The processIncludes function is missing validation to prevent directory traversal attacks. While filepath.Clean normalizes paths, it doesn't prevent accessing files outside the intended directory structure. An attacker could use {% include '../../../etc/passwd' %} to read sensitive files.

Consider adding a check after resolving the absolute path to ensure it stays within allowed directories:

// After line 139, add validation
if !strings.HasPrefix(absPath, filepath.Clean(baseDir)) {
    return "", fmt.Errorf("include path %s is outside allowed directory", filePath)
}

Alternatively, maintain a whitelist of allowed base directories for includes.

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +179
func getDateString() (string, error) {
// Check for DS environment variable first (common in data pipelines)
if ds := os.Getenv("DS"); ds != "" {
return ds, nil
}
// Check for EXECUTION_DATE (alternative common name)
if execDate := os.Getenv("EXECUTION_DATE"); execDate != "" {
return execDate, nil
}
// Default to current date
return time.Now().Format("2006-01-02"), nil
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getDateString function always returns nil error but is declared to return an error. Since this function never fails (both environment variable reads and time.Now().Format() cannot fail), the error return type is unnecessary and inconsistent with similar functions like getEnvironmentVariable which also returns (string, error) but could be simplified.

Consider either:

  1. Removing the error return type: func getDateString() string
  2. Validating the date format from environment variables and returning an error if invalid

If you keep the error return, you should validate that DS/EXECUTION_DATE values are in the expected YYYY-MM-DD format.

Suggested change
func getDateString() (string, error) {
// Check for DS environment variable first (common in data pipelines)
if ds := os.Getenv("DS"); ds != "" {
return ds, nil
}
// Check for EXECUTION_DATE (alternative common name)
if execDate := os.Getenv("EXECUTION_DATE"); execDate != "" {
return execDate, nil
}
// Default to current date
return time.Now().Format("2006-01-02"), nil
func getDateString() string {
// Check for DS environment variable first (common in data pipelines)
if ds := os.Getenv("DS"); ds != "" {
return ds
}
// Check for EXECUTION_DATE (alternative common name)
if execDate := os.Getenv("EXECUTION_DATE"); execDate != "" {
return execDate
}
// Default to current date
return time.Now().Format("2006-01-02")

Copilot uses AI. Check for mistakes.
Comment on lines +159 to +161
// Replace the include directive with the file content
result = strings.ReplaceAll(result, includePath, processedContent)
}
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using strings.ReplaceAll to replace include directives could cause incorrect replacements if the same include directive appears multiple times or if the included content itself contains text that matches other include directives in the original file.

For example, if file A has {% include 'common.yaml' %} twice, both will be replaced with the content, which is correct. However, if the included content also contains the text {% include 'common.yaml' %} as a literal string (not meant to be processed), it will be incorrectly replaced in subsequent iterations.

Consider using a different approach:

  1. Replace includes one at a time in order
  2. Build the result string progressively by replacing only the first occurrence at each iteration
  3. Or process matches in reverse order to avoid position shifts
Suggested change
// Replace the include directive with the file content
result = strings.ReplaceAll(result, includePath, processedContent)
}
// Replace only the first occurrence of the include directive with the file content
if idx := strings.Index(result, includePath); idx != -1 {
result = result[:idx] + processedContent + result[idx+len(includePath):]
}

Copilot uses AI. Check for mistakes.

var (
// Jinja-style include directive: {% include 'path/to/file' %}
includeRegex = regexp.MustCompile(`\{\%\s*include\s+['"]([^'"]+)['"]\s*\%\}`)
Copy link

Copilot AI Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern \{\%\s*include\s+['"]([^'"]+)['"]\s*\%\} doesn't properly match Jinja-style include directives that may have mixed quotes. The pattern [^'"] will match until it finds either a single or double quote, which could lead to incorrect parsing if quotes are mixed or nested.

For example, the pattern would incorrectly parse:

  • {% include "file's_name.yaml" %} (stops at the apostrophe in "file's")
  • {% include 'file"name.yaml' %} (stops at the internal quote)

Consider using separate patterns for single and double quotes:

includeRegex = regexp.MustCompile(`\{\%\s*include\s+(?:'([^']+)'|"([^"]+)")\s*\%\}`)

Then handle both capture groups in the matching logic.

Suggested change
includeRegex = regexp.MustCompile(`\{\%\s*include\s+['"]([^'"]+)['"]\s*\%\}`)
includeRegex = regexp.MustCompile(`\{\%\s*include\s+(?:'([^']+)'|"([^"]+)")\s*\%\}`)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants