Skip to content

Fix eval() security vulnerability in transform operations #26

@AliiiBenn

Description

@AliiiBenn

Title: Fix eval() security vulnerability in transform operations

Problem Description

The transform command uses eval() to evaluate user-provided expressions, which poses a security risk even with current safeguards. The validation is not comprehensive enough to prevent all attack vectors.

Current Behavior

In excel_toolkit/operations/transforming.py (line 148):

result = eval(expression, {"__builtins__": {}}, series_dict)

While the code blocks dangerous patterns like import, exec, __, etc., there are still potential attack vectors:

  1. Attribute access bypass: (1).__class__.__base__ can access Python internals
  2. Lambda expressions: Can be constructed to bypass filters
  3. List comprehensions: May allow arbitrary code execution
  4. Operator overloading: Could exploit dunder methods

Real-World Risk Scenarios

Scenario 1: Information disclosure

# Attacker could use:
xl transform data.xlsx --col "Amount" --expr "(1).__class__.__base__.__subclasses__()"

# This would reveal all available classes in the system

Scenario 2: Resource exhaustion

# Attacker could create expressions that consume massive resources
xl transform data.xlsx --col "Amount" --expr "([1 for _ in range(1000000)], 0)[1]"

Scenario 3: Bypassing filters

# Current filters block: import, exec, eval, __, org., sys., etc.
# But don't block: __class__, __base__, __subclasses__, etc.

Affected Files

  • excel_toolkit/operations/transforming.py (lines 28-46, 148-154)
  • excel_toolkit/commands/transform.py

Proposed Solutions

Option 1: Use AST Parsing and Whitelisting (Recommended)

import ast

def validate_expression_ast(expression: str) -> Result[None, InvalidExpressionError]:
    """Validate expression using AST parsing."""

    try:
        tree = ast.parse(expression, mode='eval')
    except SyntaxError:
        return err(InvalidExpressionError("Invalid syntax"))

    # Only allow specific node types
    allowed_nodes = {
        ast.Expression, ast.BinOp, ast.UnaryOp,
        ast.Compare, ast.BoolOp,
        ast.Name, ast.Constant, ast.Num, ast.Str,
        ast.Load, ast.Add, ast.Sub, ast.Mult, ast.Div,
        ast.Mod, ast.Pow, ast.USub, ast.UAdd,
        ast.Eq, ast.NotEq, ast.Lt, ast.LtE, ast.Gt, ast.GtE,
        ast.And, ast.Or, ast.Not
    }

    for node in ast.walk(tree):
        if type(node) not in allowed_nodes:
            return err(InvalidExpressionError(
                f"Operation '{type(node).__name__}' is not allowed"
            ))

    return ok(None)

Option 2: Use Simple Math Expression Parser

Replace eval() with a safe expression parser:

import simpleeval

def evaluate_expression_safe(expression: str, variables: dict) -> Result[Any, InvalidExpressionError]:
    """Evaluate expression using safe parser."""

    try:
        # simpleeval only allows basic math operations
        evaluator = simpleeval.SimpleEval()
        evaluator.names.update(variables)
        result = evaluator.eval(expression)
        return ok(result)
    except Exception as e:
        return err(InvalidExpressionError(str(e)))

Requires adding dependency: simpleeval>=1.0.0

Option 3: Implement Custom Expression Language

Create a domain-specific language with only safe operations:

# Supported syntax:
# - Column references: Amount, Price
# - Basic math: Amount * 2, Price + 10
# - Functions: sum(Amount), avg(Price), round(Amount, 2)
# - Comparisons: Amount > 100, Price == 50

# Not supported:
# - Any Python code
# - Function calls
# - Imports
# - Complex expressions

Option 4: Add Comprehensive Input Validation

Keep eval() but add much stronger validation:

DANGEROUS_PATTERNS = [
    r'\bimport\b', r'\bexec\b', r'\beval\b',
    r'__class__', r'__base__', r'__subclasses__',
    r'\.__', r'\[.*__.*\]',  # Access to dunder methods
    r'\blambda\b',  # Lambda expressions
    r'\[.*for.*in.*\]',  # List comprehensions
    r'\bgetattr\b', r'\bsetattr\b', r'\bhasattr\b',
    r'\btype\b', r'\bisinstance\b',
]

Additional Safeguards

  1. Sandbox execution:
import RestrictedPython

def eval_restricted(expression: str, variables: dict):
    """Evaluate in restricted environment."""
    compiled = RestrictedPython.compile_restricted(expression, '<string>', 'eval')
    return eval(compiled.code, {'__builtins__': {}}, variables)
  1. Resource limits:
import signal

def timeout_handler(signum, frame):
    raise TimeoutError("Expression evaluation timed out")

signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(1)  # 1 second timeout
result = eval(expression, {...}, variables)
signal.alarm(0)
  1. Expression complexity limit:
def max_expression_complexity(expression: str, max_length: int = 100):
    """Limit expression complexity."""
    if len(expression) > max_length:
        raise InvalidExpressionError("Expression too complex")

Impact

  • Severity: Critical security vulnerability
  • Affected users: All users of the transform command
  • Exploitability: Medium (requires user to run malicious expression)
  • Risk: Code execution, information disclosure, DoS

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcriticalCritical priority issues that block functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions