A high-performance Go tool that crawls GitHub repositories, filters files by extension, analyzes line counts, and saves the results as JSON — with concurrency using goroutines, channels, and a worker pool for efficient performance.
- Concurrent traversal of GitHub repository directories
- Configurable file extension filtering (
fileExtensions.txt) - Line counting for selected file types
- JSON output for file line analysis
- Logging for time measurements and progress tracking
- Fetch Directory Tree: The program starts at a GitHub repo URL and retrieves all directories and files using concurrent workers.
- Filter Files: Only files with extensions listed in
fileExtensions.txtare analyzed. - Count Lines: Each valid file is fetched and its line count is calculated.
- Output: A JSON file (
analysis_result.json) is created containing per-file line counts and the total.
Set the following constants in main.go:
const (
URL = "https://github.com/your/repo/tree/main"
BRANCH = "main"
LOGGING = true
SAVE_RESULTS_TO_FILE = true
NUMBER_OF_WORKERS = 10
)Create a fileExtensions.txt with each extension on a new line:
.go
.py
.javaThe result is saved as analysis_result.json:
{
"files and their lines": {
"https://raw.githubusercontent.com/your/repo/file1.java": 856,
"https://raw.githubusercontent.com/your/repo/file2.java": 428,
"https://raw.githubusercontent.com/your/repo/file3.py": 643
},
"lines per language": [
{
"extension": ".java",
"lines": 1284
},
{
"extension": ".py",
"lines": 643
}
],
"total amount of lines": 1927
}- Go 1.18+
- Internet access to reach GitHub URLs
- fileExtensions.txt in the root directory
go run .Or
main.exe