Skip to content

Performance: hcl2.dumps() is slow for large generated configs due to Lark reconstruction pipeline #300

@alex-au-922

Description

@alex-au-922

Summary

hcl2.dumps() becomes slow when generating large HCL documents from Python dictionaries / Builder output.

From local profiling, the slow path appears to be generation/reconstruction rather than parsing. The current pipeline builds a LarkElement tree, formats it, converts it into raw lark.Tree / lark.Token objects, then recursively reconstructs text.

This creates a large number of Python and Lark objects for data that is already structured.

Environment

  • python-hcl2 version / commit: main at e76c9ae
  • Python: 3.13.5
  • OS: macOS arm64

Example benchmark

Synthetic Terraform-like document with 500 resource blocks:

deserialize      ~351 ms
format            ~98 ms
to_lark          ~676 ms
reconstruct      ~142 ms
total           ~1267 ms

Skipping the format pass reduced the same case to roughly:

total            ~784 ms

In one 500-resource sample, formatting increased the IR from about 179k nodes to 228k nodes, and to_lark() then copied those nodes into new Lark objects.

Observed hot spots

Relevant code paths:

  • hcl2.api.dumps() calls from_dict() then reconstruct()
  • from_dict() always applies BaseFormatter by default
  • reconstruct() converts StartRule to raw Lark via tree.to_lark()
  • LarkRule.to_lark() and LarkToken.to_lark() allocate new lark.Tree / lark.Token objects for the whole document
  • expression strings such as "${var.x}" are reparsed as small HCL snippets in _deserialize_expression()

Metadata

Metadata

Labels

builderIssue is related to creating HCL2 content with Builder classdeserializationPython dict to LarkElement IR (reverse path)enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions