Skip to content

Latest commit

 

History

History
529 lines (408 loc) · 13.9 KB

File metadata and controls

529 lines (408 loc) · 13.9 KB

Enhanced DOCX Template System

FileUtils now includes a comprehensive DOCX template system that provides:

  • Template Support: Use existing DOCX templates with custom styles
  • Markdown Conversion: Convert markdown content to professionally formatted DOCX
  • Style Mapping: Customize how elements are styled in the output
  • Reviewer Workflow: Built-in support for document review processes
  • Provenance Tracking: Automatic metadata and source tracking

Quick Start

from FileUtils import FileUtils, OutputFileType

# Initialize with template configuration
file_utils = FileUtils(
    config_override={
        "docx_templates": {
            "template_dir": "templates",
            "templates": {
                "default": "style-template-doc.docx",  # Generic template
                "personal": "IP-template-doc.docx"      # Personal template
            }
        },
        "style_mapping": {
            "table": "IP-table_light",
            "heading_1": "Heading 1"
        }
    }
)

# Convert markdown to DOCX with template
file_utils.save_document_to_storage(
    content=markdown_content,
    output_filetype=OutputFileType.DOCX,
    template="review",
    add_provenance=True,
    add_reviewer_instructions=True
)

Features

1. Template Management

The template system supports multiple templates and automatic fallback:

# List available templates
from FileUtils.templates import DocxTemplateManager

template_manager = DocxTemplateManager(file_utils.config)
templates = template_manager.list_available_templates()
print(f"Available templates: {list(templates.keys())}")

# Get template information
info = template_manager.get_template_info("default")
print(f"Template info: {info}")

2. Markdown to DOCX Conversion

Convert markdown content to professionally formatted DOCX:

markdown_content = """# Project Report

## Key Findings

- **Important**: We've achieved 95% completion
- [ ] Complete final testing
- [x] Update documentation

| Metric | Value | Status |
|--------|-------|--------|
| Progress | 95% | ✅ On Track |
| Budget | $45,000 | ✅ Under Budget |

## Next Steps

1. Complete testing phase
2. Prepare documentation
3. Schedule review
"""

# Convert with template and options
saved_path, _ = file_utils.save_document_to_storage(
    content=markdown_content,
    output_filetype=OutputFileType.DOCX,
    template="review",
    add_provenance=True,
    add_reviewer_instructions=True,
    source_file="project_report.md"
)

3. Style Mapping

Customize how elements are styled in the output:

style_mapping = {
    "table": "IP-table_light",        # Custom table style
    "table_fallback": "IP-table",      # Fallback table style
    "heading_1": "Heading 1",         # Heading styles
    "list_bullet": "List Bullet",     # List styles
    "list_number": "List Number"
}

file_utils = FileUtils(
    config_override={"style_mapping": style_mapping}
)

4. Reviewer Workflow Support

Built-in support for document review processes:

# Enable reviewer instructions
file_utils.save_document_to_storage(
    content=content,
    output_filetype=OutputFileType.DOCX,
    template="review",
    add_reviewer_instructions=True,  # Adds reviewer instructions section
    add_provenance=True              # Adds provenance header
)

The reviewer instructions include:

  • TODO item resolution guidelines
  • Resolution field requirements
  • Review process steps
  • Document modification instructions

5. Provenance Tracking

Automatic metadata and source tracking:

# Automatic provenance header
file_utils.save_document_to_storage(
    content=content,
    output_filetype=OutputFileType.DOCX,
    source_file="source.md",  # Source file for provenance
    add_provenance=True       # Adds "Autogenerated from source.md on 2024-01-15"
)

Configuration

Template Configuration

Configure templates in your FileUtils configuration:

# Default configuration (generic template for sharing)
docx_templates:
  template_dir: "templates"
  default_template: "style-template-doc.docx"
  templates:
    default: "style-template-doc.docx"  # Generic template
    review: "style-template-doc.docx"
    report: "style-template-doc.docx"
    ip_template: "IP-template-doc.docx"  # Personal IP template
    simple: null  # Use default document

style_mapping:
  table: "IP-table_light"
  table_fallback: "IP-table"
  table_default: "Table Grid"
  heading_1: "Heading 1"
  heading_2: "Heading 2"
  list_bullet: "List Bullet"
  list_number: "List Number"

markdown_options:
  add_provenance: true
  add_reviewer_instructions: false
  preserve_formatting: true
  checkbox_symbols:
    unchecked: ""
    checked: ""
    font: "Segoe UI Symbol"
    size: 12

Switching to Personal Template

For personal use with your IP template, you can easily switch:

Option 1: Override in code

from FileUtils import FileUtils

file_utils = FileUtils(
    config_override={
        "docx_templates": {
            "default_template": "IP-template-doc.docx",
            "templates": {
                "default": "IP-template-doc.docx"
            }
        }
    }
)

Option 2: Use personal config file Copy src/FileUtils/templates/config/personal_template_config.yaml to override the default configuration.

Option 3: Use specific template

# Use IP template for specific documents
file_utils.save_document_to_storage(
    content=markdown_content,
    output_filetype=OutputFileType.DOCX,
    template="ip_template"  # Use IP template specifically
)

Programmatic Configuration

config = {
    "docx_templates": {
        "template_dir": "custom_templates",
        "templates": {
            "corporate": "corporate-template.docx",
            "technical": "technical-template.docx"
        }
    },
    "style_mapping": {
        "table": "Corporate-Table",
        "heading_1": "Corporate-Heading-1"
    }
}

file_utils = FileUtils(config_override=config)

Supported Markdown Features

The markdown converter supports:

Headings

# Heading 1
## Heading 2
### Heading 3

Lists

- Bullet point 1
- Bullet point 2
  - Nested bullet

1. Numbered item 1
2. Numbered item 2

Checkboxes

- [ ] Unchecked item
- [x] Checked item

Tables

| Column 1 | Column 2 | Column 3 |
|----------|----------|----------|
| Data 1   | Data 2   | Data 3   |
| Data 4   | Data 5   | Data 6   |

Formatting

**Bold text**
*Italic text*
`Code text`
[Link text](https://example.com)

Line Breaks

Line 1<br>Line 2

Template Requirements

Template Structure

Your DOCX templates should:

  1. Be Regular DOCX Files: Use standard .docx files (not .dotx template files)
  2. Contain Styles: Include the styles you want to use (e.g., "IP-table_light", "Heading 1")
  3. Be Clean: Template content will be cleared, but styles are preserved
  4. Be Accessible: Place templates in the configured template directory
  5. Include Headers/Footers: Headers and footers from templates are automatically preserved

Important: FileUtils uses regular .docx files as templates, not Microsoft Word .dotx template files. The system loads the DOCX file, clears its content, and preserves the styles for use in the generated documents.

Headers and Footers: The system automatically preserves headers and footers from your template files. This means you can create templates with company logos, page numbers, document titles, or any other header/footer content, and they will be maintained in the generated documents.

Template Files: The generic template (style-template-doc.docx) is included in the repository for sharing, while personal templates (like IP-template-doc.docx) remain private and are ignored by git.

Style Names

The system looks for these style names (configurable):

  • Tables: "IP-table_light", "IP-table", "Table Grid"
  • Headings: "Heading 1", "Heading 2", etc.
  • Lists: "List Bullet", "List Number", "List Paragraph"
  • Text: "Normal", "Title", "Subtitle"

Template Locations

Templates are searched in this order:

  1. {template_dir}/{template_filename}
  2. templates/{template_filename}
  3. data/templates/{template_filename}
  4. {template_filename} (current directory)
  5. src/conversion/templates/{template_filename} (legacy location)

Advanced Usage

Custom Style Mapper

from FileUtils.templates import StyleMapper

# Create custom style mapper
style_mapper = StyleMapper({
    "table": "My-Custom-Table",
    "heading_1": "My-Heading-1"
})

# Use with converter
from FileUtils.templates import MarkdownToDocxConverter

converter = MarkdownToDocxConverter(template_manager, style_mapper)
doc = converter.convert_markdown_to_docx(markdown_content)

Direct Template Usage

# Use specific template without markdown conversion
file_utils.save_document_to_storage(
    content={
        "title": "Document Title",
        "sections": [
            {
                "heading": "Section 1",
                "level": 1,
                "text": "Section content"
            }
        ]
    },
    output_filetype=OutputFileType.DOCX,
    template="corporate"
)

Template Validation

from FileUtils.templates import DocxTemplateManager

template_manager = DocxTemplateManager(config)

# Validate template
is_valid = template_manager.validate_template(template_path)

# Get template information including headers/footers
info = template_manager.get_template_info("default")
print(f"Available styles: {info['available_styles']}")
print(f"Has headers: {info['headers_footers']['has_headers']}")
print(f"Has footers: {info['headers_footers']['has_footers']}")
print(f"Header count: {info['headers_footers']['header_count']}")
print(f"Footer count: {info['headers_footers']['footer_count']}")

Flexible Template Referencing: The system supports multiple ways to reference templates:

  • Template names: "default", "report", "ip_template"
  • Filenames: "style-template-doc.docx", "IP-template-doc.docx"
  • Names without extension: "style-template-doc", "IP-template-doc"

Error Handling

The system includes comprehensive error handling:

  • Template Not Found: Falls back to default document
  • Style Not Available: Uses fallback styles or defaults
  • Import Errors: Graceful degradation if template system unavailable
  • Invalid Content: Clear error messages for debugging

Migration from Old Script

If you're migrating from the old markdown_to_docx.py script:

Old Script Features → FileUtils Features

Old Script FileUtils
MarkdownToDocxConverter MarkdownToDocxConverter (enhanced)
template_file parameter template parameter
Hardcoded styles Configurable style_mapping
Manual template clearing Automatic template management
Basic error handling Comprehensive error handling
Single file conversion Integrated with FileUtils workflow

Migration Steps

  1. Move Templates: Place your templates in data/templates/
  2. Update Configuration: Add template config to FileUtils
  3. Update Code: Replace direct converter usage with FileUtils methods
  4. Test: Verify output matches your expectations

Example Migration

Old Script:

converter = MarkdownToDocxConverter("style-template-doc.docx")
converter.convert_file("input.md", "output.docx")

FileUtils:

file_utils = FileUtils(config_override={
    "docx_templates": {
        "templates": {"default": "style-template-doc.docx"}
    }
})

file_utils.save_document_to_storage(
    content=markdown_content,
    output_filetype=OutputFileType.DOCX,
    template="default"
)

Best Practices

Template Design

  1. Use Clear Style Names: Make style names descriptive and consistent
  2. Test Fallbacks: Ensure fallback styles work if custom styles unavailable
  3. Keep Templates Clean: Remove unnecessary content, keep only styles
  4. Document Styles: Document which styles your templates provide

Configuration Management

  1. Environment-Specific: Use different templates for different environments
  2. Version Control: Keep templates in version control
  3. Validation: Validate templates before deployment
  4. Documentation: Document template requirements and usage

Content Preparation

  1. Clean Markdown: Use consistent markdown formatting
  2. Test Conversion: Test with various content types
  3. Review Output: Always review generated documents
  4. Iterate: Refine templates based on output quality

Troubleshooting

Common Issues

Template Not Found

Template 'custom' not found in configuration

Solution: Check template configuration and file paths

Style Not Available

Style 'IP-table_light' not found, using fallback

Solution: Check template has the required styles or update style mapping

Import Errors

Template system not available, using simple conversion

Solution: Ensure python-docx is installed: pip install 'FileUtils[documents]'

Debug Mode

Enable debug logging to see template system activity:

file_utils = FileUtils(log_level="DEBUG")

Template Validation

Validate your templates:

from FileUtils.templates import DocxTemplateManager

template_manager = DocxTemplateManager(config)
info = template_manager.get_template_info("default")
print(f"Template validation: {info}")

Examples

See examples/enhanced_docx.py for comprehensive examples including:

  • Markdown conversion with templates
  • Structured content creation
  • Template management
  • Configuration options
  • Error handling

Future Enhancements

Planned enhancements include:

  • Additional Templates: More built-in templates
  • Style Customization: More granular style control
  • Batch Processing: Convert multiple markdown files
  • Template Editor: GUI for template management
  • Advanced Formatting: More markdown features
  • Export Options: Additional output formats