Concatext

Concatext is a utility tool for concatenating the contents of a local directory into structured text files with a configurable token limit. It's designed to help process codebases and directories into a format that can be easily consumed by LLMs (Large Language Models) or other text processing tools.

Features

Command Line Interface - Process directories through a simple command-line tool
Graphical User Interface - User-friendly GUI for configuring and running processes
Smart Token Management - Automatically splits output into multiple files based on token count
Configurable Output - Control how files are formatted in the output
Customizable Filtering - Skip specific directories and file patterns
Output Templating - Custom formatting of file blocks with support for placeholders
Non-text File Handling - Option to include or exclude binary/non-text files
Text Obscuration - Replace sensitive words with placeholders to protect private information

Requirements

Python 3.6+
NLTK library (for tokenization)
PyYAML (for configuration)
Tkinter (for GUI)

Installation

Clone the repository
Install the required dependencies:
```
pip install nltk pyyaml
```
Ensure Tkinter is installed (usually comes with Python)

Usage

Command Line Interface

Process a directory with default settings:

python concatext.py /path/to/directory

The script uses settings defined in config.yaml in the current directory. If a directory path is specified via command line, it takes precedence over the value in the config file.

Graphical User Interface

Launch the GUI application:

python concatext_gui.py

With the GUI, you can:

Select input and output directories
Configure token limits
Edit ignored directories and file patterns
Customize file templates and separators
Process directories with a click of a button

GUI Text Obscuration

Using the graphical interface, you can define mappings for text obscuration:

Click on the "Edit" button in the "Obscuration" section under Advanced Configuration
Add word-to-placeholder mappings where sensitive words will be replaced with placeholders
- Example: When you add password → XXXXX, any occurrence of "password" in the files will be replaced with "XXXXX"
- Example: When you add admin@example.com → [EMAIL], all instances of that email will be replaced with "[EMAIL]"
All occurrences of the defined sensitive words will be replaced with their placeholders in the output files

This feature is useful for:

Protecting sensitive information when sharing code
Removing personally identifiable information
Anonymizing data before processing

Configuration

Concatext can be configured using YAML configuration files:

config.yaml - Configuration for the command-line version
config_gui.yaml - Configuration for the GUI version

Available Configuration Options

# Maximum number of tokens per output file
max_tokens: 250000

# Directory to save output files
output_dir: "./output"

# Include files that are not readable as text (can't be decoded)
include_non_text_files: true

# Placeholder text for non-text files
non_text_file_placeholder: "This file format is not supported or cannot be decoded."

# Separator text between files
file_separator: "\n\n"

# Directories to ignore
ignore_dirs:
  - node_modules
  - .git
  - .vscode

# File patterns to ignore
ignore_patterns:
  - ".DS_Store"
  - ".gitignore"
  - "package-lock.json"

# Template for formatting file blocks
file_template: |
  ========================================
  {path} - start
  ***========================================***
  {content}
  ***========================================***
  {name} - end
  ========================================

# Obscured words configuration
# Define mappings to replace sensitive words with placeholders in the output
# format: word: placeholder
obscured_words:
  # Example (uncomment and modify as needed):
  # password: "XXXXX"
  # username: "USER"

Output Format

The tool generates output files with a naming pattern based on the input directory name. Each file in the output contains formatted content from the source files, structured according to the template defined in the configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concatext

Features

Requirements

Installation

Usage

Command Line Interface

Graphical User Interface

GUI Text Obscuration

Configuration

Available Configuration Options

Output Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
output		output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
concatext.py		concatext.py
concatext_gui.py		concatext_gui.py
config.yaml		config.yaml
config_backup.yaml		config_backup.yaml
config_gui.yaml		config_gui.yaml

Folders and files

Latest commit

History

Repository files navigation

Concatext

Features

Requirements

Installation

Usage

Command Line Interface

Graphical User Interface

GUI Text Obscuration

Configuration

Available Configuration Options

Output Format

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages