Skip to content

Initial implementation#1

Merged
b4prog merged 15 commits into
mainfrom
Initial-implementation
Sep 9, 2025
Merged

Initial implementation#1
b4prog merged 15 commits into
mainfrom
Initial-implementation

Conversation

@b4prog

@b4prog b4prog commented Sep 7, 2025

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Added Log2CSV CLI: reads STDIN, extracts fields via a required regex with named capture groups, emits CSV with headers, preserves line endings, and supports an unmatched-only mode.
  • Documentation

    • Expanded README with detailed description, features, usage, examples (Linux/Windows), install and build instructions.
  • Chores

    • Added CI workflow, task runner configuration, updated ignore rules, and new repository review automation configuration.

@coderabbitai

coderabbitai Bot commented Sep 7, 2025

Copy link
Copy Markdown

Warning

Rate limit exceeded

@b4prog has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 93e7ca7 and 16515dc.

📒 Files selected for processing (1)
  • .coderabbit.yaml (1 hunks)

Walkthrough

Adds a new Go module and CLI tool log2csv that reads STDIN, applies a user-supplied regex with named capture groups to produce CSV (or unique non-matching lines). Also adds CI workflow, task runner, updated .gitignore, expanded README, and a CodeRabbit config; exports ErrInvalidRegexp and ErrNoNamedCaptureGroups.

Changes

Cohort / File(s) Summary
CLI implementation
cmd/log2csv/log2csv.go
New CLI requiring -regexp with named capture groups; validates/compiles regex (returns ErrInvalidRegexp or ErrNoNamedCaptureGroups), writes CSV header and rows, preserves input EOL, supports -unmatched unique non-match output, buffered I/O, skips all-empty-group matches, and propagates scanner/write errors.
Module config
go.mod
New Go module github.com/b4prog/Log2CSV targeting Go 1.25 (no external deps).
CI workflow
.github/workflows/ci.yaml
Adds GitHub Actions CI: triggers on push/pull_request; runs on ubuntu-latest; checks gofmt, verifies go mod tidy (fails if changes), runs go vet, go build, go test (race, coverage), uploads coverage.out; concurrency configured.
Automation & tooling config
.coderabbit.yaml, taskfile.yaml
Adds CodeRabbit review config (language: en, assertive profile, high_level_summary: true, poem: false, auto_review.drafts: true) and enables checks (golangci-lint, gitleaks, yamllint, actionlint). Adds Taskfile with tasks for deps, tidy, fmt, vet, build, run, test, clean and cross-platform build/run handling.
Repository hygiene
.gitignore
Activates ignoring of .idea/ and .vscode/* while whitelisting .vscode/extensions.json, .vscode/settings.json, .vscode/tasks.json, .vscode/launch.json; adds /build/ and .task/ ignores.
Documentation
README.md
Expanded README: describes Log2CSV (Go-based), features, usage, Linux/Windows examples, install and build-from-source instructions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Shell as Shell/OS
  participant CLI as log2csv
  participant RE as RegExp Engine
  participant CSV as CSV Writer
  participant STDIN as STDIN
  participant STDOUT as STDOUT

  User->>Shell: run log2csv -regexp "<pattern>" [--unmatched]
  Shell->>CLI: start
  CLI->>RE: compile pattern
  alt compile error
    RE-->>CLI: syntax error
    CLI-->>Shell: exit ErrInvalidRegexp
  else compiled
    RE-->>CLI: compiled
    alt no named groups
      CLI-->>Shell: exit ErrNoNamedCaptureGroups
    else has named groups
      loop each input line
        STDIN-->>CLI: line
        CLI->>RE: match(line)
        alt match
          CLI->>CLI: extract named groups
          opt first matching line
            CLI->>CSV: write header (group names)
          end
          alt all groups empty
            CLI-->>CLI: skip row
          else
            CLI->>CSV: write row (preserve EOL)
            CSV-->>STDOUT: CSV line
          end
        else no match
          alt --unmatched
            CLI-->>CLI: collect unique non-match
          else
            CLI-->>CLI: count non-matches (warn later)
          end
        end
      end
      CLI->>CSV: flush
      CLI-->>Shell: exit 0
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (2 warnings, 1 inconclusive)

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is empty and fails to provide any context or overview of the extensive additions—including the new CLI implementation, configuration files, documentation updates, and CI/task automation—which leaves reviewers without guidance. Add a concise description summarizing the scope and intent of the changes, such as detailing the Log2CSV tool implementation, the new .coderabbit.yaml and CI workflows, documentation enhancements, and task runner setup, to orient reviewers effectively.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The title “Initial implementation” is overly generic and does not clearly convey the primary change of adding the Log2CSV CLI tool, configuration, documentation, and CI/task automation, making it difficult for reviewers to immediately grasp the main update. Revise the title to succinctly describe the core feature, for example “Implement Log2CSV CLI for converting logs to CSV with YAML configuration and CI setup,” so reviewers can quickly understand the main change.
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch Initial-implementation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@b4prog b4prog marked this pull request as ready for review September 7, 2025 21:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66d00a2 and 90c9ae4.

📒 Files selected for processing (7)
  • .coderabbit.yaml (1 hunks)
  • .github/workflows/ci.yaml (1 hunks)
  • .gitignore (1 hunks)
  • README.md (1 hunks)
  • cmd/log2csv/log2csv.go (1 hunks)
  • go.mod (1 hunks)
  • taskfile.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

  • go.mod
  • cmd/log2csv/log2csv.go
  • README.md
🔇 Additional comments (3)
go.mod (1)

3-3: Go 1.25 GA and CI support confirmed. Go 1.25 was released on August 12, 2025, and actions/setup-go@v5 supports it via the go-version input with manifest lookup and fallback to the official distribution.

.gitignore (1)

31-42: LGTM on ignores for IDEs and build artifacts.

The whitelist for specific .vscode configs and adding /build/ and .task/ look good.

.github/workflows/ci.yaml (1)

31-34: No changes needed: actions/setup-go@v5 supports Go 1.25.x
setup-go v5 resolves 1.25.x from the Go versions manifest (Go 1.25 was released Aug 12, 2025) and will install it correctly.

Comment thread .coderabbit.yaml
Comment thread .github/workflows/ci.yaml
Comment thread cmd/log2csv/log2csv.go
Comment thread cmd/log2csv/log2csv.go
Comment thread cmd/log2csv/log2csv.go
Comment thread cmd/log2csv/log2csv.go
Comment thread README.md
Comment thread README.md
Comment thread taskfile.yaml
- IPv4 (TOS/PREC/TTL/ID[/DF]) or IPv6 (TC/HOPLIMIT/FLOWLBL) through alternation
– TCP/UDP/ICMP(-v6) with optional ports and TCP flags
– the second LEN (L4 length) present on UDP

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
README.md (2)

10-16: Document duplicate named groups and header emission semantics.

Make the behavior explicit to avoid surprises when patterns reuse names and when no lines match.

 - Extracts fields using **named capture groups** (`(?P<Name>...)`).
-- The **CSV header row** is automatically generated from group names.
+- Allows **duplicate named groups**; each occurrence becomes its own CSV column in left‑to‑right order (names are not deduplicated).
+- The **CSV header row** is automatically generated from group names (including duplicates) and is emitted on the first matching line (no header if no lines match).

29-31: Remove trailing period from the Windows example (breaks copy/paste).

There’s an extra “.” at the end of Line 30 inside the code block.

-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[(?P<Kernel_Time>[\d\.]+)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S+)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)\s+(?:(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)\s+(?:(?P<DF>DF)\s+)?|(?:TC=(?P<TC>\d+)\s+)?(?:HOPLIMIT=(?P<HOPLIMIT>\d+)\s+)?(?:FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+)\s+)? )PROTO=(?P<PROTO>[A-Za-z0-9]+)\s+(?:(?:SPT|SP)=(?P<SPT>\d+)\s+)?(?:(?:DPT|DP)=(?P<DPT>\d+)\s+)?(?:WINDOW=(?P<WINDOW>\d+)\s+)?(?:RES=(?P<RES>0x[0-9A-Fa-f]{2})\s+)?(?:(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$".
+Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[(?P<Kernel_Time>[\d\.]+)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S+)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)\s+(?:(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)\s+(?:(?P<DF>DF)\s+)?|(?:TC=(?P<TC>\d+)\s+)?(?:HOPLIMIT=(?P<HOPLIMIT>\d+)\s+)?(?:FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+)\s+)? )PROTO=(?P<PROTO>[A-Za-z0-9]+)\s+(?:(?:SPT|SP)=(?P<SPT>\d+)\s+)?(?:(?:DPT|DP)=(?P<DPT>\d+)\s+)?(?:WINDOW=(?P<WINDOW>\d+)\s+)?(?:RES=(?P<RES>0x[0-9A-Fa-f]{2})\s+)?(?:(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"
cmd/log2csv/log2csv.go (4)

21-28: Exported errors in package main hinder reuse. Consider a library package.

If you plan reuse, move parsing and errors into an internal/pkg and keep main thin.


3-13: Use encoding/csv for correct CSV quoting and newlines.

Removes hand-rolled quoting; handles commas, quotes, and CRLF robustly.

@@
 import (
 	"bufio"
 	"bytes"
+	"encoding/csv"
 	"errors"
 	"flag"
 	"fmt"
 	"io"
 	"os"
 	"regexp"
 	"strings"
 )
@@
-func writeCSVRow(output io.Writer, values []string, lineEnding string) error {
-	csvValues := make([]string, len(values))
-	for idxValue, value := range values {
-		csvValues[idxValue] = `"` + strings.ReplaceAll(value, `"`, `""`) + `"`
-	}
-	csvLine := strings.Join(csvValues, string(csvSeparator)) + lineEnding
-	_, err := io.WriteString(output, csvLine)
-	return err
-}
+func writeCSVRow(output io.Writer, values []string, lineEnding string) error {
+	w := csv.NewWriter(output)
+	w.Comma = csvSeparator
+	w.UseCRLF = lineEnding == "\r\n"
+	if err := w.Write(values); err != nil {
+		return err
+	}
+	w.Flush()
+	return w.Error()
+}

Also applies to: 160-168


51-73: Precompute named-group indices; avoid per-line SubexpNames/allocations.

Cuts per-line work and fixes the currently unused parameter pattern.

@@
 	out := bufio.NewWriter(os.Stdout)
-	err = processInput(os.Stdin, re, groupNames, out)
+	groupIndices := namedGroupIndices(re)
+	err = processInput(os.Stdin, re, groupIndices, groupNames, out)
@@
-func extractGroupNames(re *regexp.Regexp) []string {
+func extractGroupNames(re *regexp.Regexp) []string {
 	names := re.SubexpNames()[1:]
 	ordered := make([]string, 0, len(names))
 	for _, name := range names {
 		if name != "" {
 			ordered = append(ordered, name)
 		}
 	}
 	return ordered
 }
+
+// namedGroupIndices returns the submatch indices for named groups (incl. duplicates) in order.
+func namedGroupIndices(re *regexp.Regexp) []int {
+	names := re.SubexpNames()
+	idxs := make([]int, 0, len(names))
+	for i := 1; i < len(names); i++ {
+		if names[i] != "" {
+			idxs = append(idxs, i)
+		}
+	}
+	return idxs
+}
@@
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer) error {
@@
-		values, ok := processLine(line, re, groupNames)
+		values, ok := processLine(line, re, groupIndices)
@@
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
+// Process a log line and return CSV values + true if valid, or nil + false if it should be ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
 	submatches := re.FindStringSubmatch(line)
 	if submatches == nil {
 		return nil, false
 	}
-	subNames := re.SubexpNames()
-	values := make([]string, 0, len(groupNames))
+	values := make([]string, len(groupIndices))
 	allEmpty := true
-	for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-		name := subNames[idxSubmatch]
-		if name == "" {
-			continue
-		}
-		val := submatches[idxSubmatch]
+	for i, subIdx := range groupIndices {
+		val := submatches[subIdx]
 		if val != "" {
 			allEmpty = false
 		}
-		values = append(values, val)
+		values[i] = val
 	}
 	if allEmpty {
 		return nil, false
 	}
 	return values, true
 }

Also applies to: 86-110, 134-158


112-125: Peek-based line-ending detection can misdetect on very long first lines (>64 KiB).

Switch to ReadSlice + io.MultiReader to be exact.

-func peekForLineEnding(input io.Reader, sizeMaxPeek int) (io.Reader, string, error) {
-	inputBuffer := bufio.NewReader(input)
-	sample, err := inputBuffer.Peek(sizeMaxPeek)
-	if err != nil && err != io.EOF && !errors.Is(err, bufio.ErrBufferFull) {
-		return nil, "", err
-	}
-	if idx := bytes.IndexByte(sample, '\n'); idx >= 0 {
-		if idx > 0 && sample[idx-1] == '\r' {
-			return inputBuffer, "\r\n", nil
-		}
-		return inputBuffer, "\n", nil
-	}
-	return inputBuffer, "\n", nil
-}
+func peekForLineEnding(input io.Reader, _ int) (io.Reader, string, error) {
+	inputBuffer := bufio.NewReader(input)
+	var first []byte
+	for {
+		chunk, err := inputBuffer.ReadSlice('\n')
+		first = append(first, chunk...)
+		if errors.Is(err, bufio.ErrBufferFull) {
+			continue
+		}
+		if err != nil && err != io.EOF {
+			return nil, "", err
+		}
+		break
+	}
+	ending := "\n"
+	if idx := bytes.IndexByte(first, '\n'); idx >= 1 && first[idx-1] == '\r' {
+		ending = "\r\n"
+	}
+	return io.MultiReader(bytes.NewReader(first), inputBuffer), ending, nil
+}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90c9ae4 and 29280dd.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • cmd/log2csv/log2csv.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

  • README.md
  • cmd/log2csv/log2csv.go
🔇 Additional comments (2)
cmd/log2csv/log2csv.go (2)

30-43: Usage and examples read well. LGTM.


59-66: Good error mapping and validation.

Comment thread cmd/log2csv/log2csv.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
README.md (2)

10-15: Document duplicate named groups and header emission.

Align docs with intended behavior (duplicates allowed; header once on first match).

 - The **CSV header row** is automatically generated from group names.
+- Allows **duplicate named groups**; each occurrence becomes its own CSV column in left-to-right pattern order (names are not de-duplicated).
+- The **CSV header** is emitted once, on the first matched line. No header is produced if no lines match or when `-unmatched` is used.

31-33: Remove trailing period in Windows example (breaks copy/paste).

-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$".
+Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"
cmd/log2csv/log2csv.go (3)

71-81: Precompute named-group indices; avoid SubexpNames per line.

Improves hot-path performance and removes unused parameter.

@@
-    groupNames := extractGroupNames(re)
+    groupNames := extractGroupNames(re)
+    groupIndices := namedGroupIndices(re)
@@
-    out := bufio.NewWriter(os.Stdout)
-    err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+    out := bufio.NewWriter(os.Stdout)
+    err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)
@@
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error {
@@
-        values, ok := processLine(line, re, groupNames)
+        values, ok := processLine(line, re, groupIndices)
@@
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
+// Process a log line and return CSV values + true if the line is valid, or nil + false if the line should be ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
     submatches := re.FindStringSubmatch(line)
     if submatches == nil {
         return nil, false
     }
-    subNames := re.SubexpNames()
-    values := make([]string, 0, len(groupNames))
+    values := make([]string, len(groupIndices))
     allEmpty := true
-    for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-        name := subNames[idxSubmatch]
-        if name == "" {
-            continue
-        }
-        val := submatches[idxSubmatch]
+    for i, subIdx := range groupIndices {
+        val := submatches[subIdx]
         if val != "" {
             allEmpty = false
         }
-        values = append(values, val)
+        values[i] = val
     }
     if allEmpty {
         return nil, false
     }
     return values, true
 }
+ 
+func namedGroupIndices(re *regexp.Regexp) []int {
+    names := re.SubexpNames()
+    idxs := make([]int, 0, len(names))
+    for i := 1; i < len(names); i++ {
+        if names[i] != "" {
+            idxs = append(idxs, i)
+        }
+    }
+    return idxs
+}

Also applies to: 94-107, 168-191


146-159: Line-ending detection may misdetect on very long first lines.

Optional: switch to ReadSlice('\n') and rebuild the reader to avoid peek-size dependency.


22-29: Consider moving exported errors out of package main.

If reuse is planned, place them in an internal/pkg library and keep main thin.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29280dd and 7cc0d98.

📒 Files selected for processing (2)
  • README.md (1 hunks)
  • cmd/log2csv/log2csv.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

  • cmd/log2csv/log2csv.go
  • README.md
🪛 GitHub Check: build-test
cmd/log2csv/log2csv.go

[failure] 114-114:
Error return value of fmt.Fprintln is not checked (errcheck)

🪛 GitHub Actions: CI
cmd/log2csv/log2csv.go

[error] 114-114: Error return value of fmt.Fprintln is not checked (errcheck). Command: 'golangci-lint run --timeout=5m'.

🪛 markdownlint-cli2 (0.17.2)
README.md

48-48: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

Comment thread cmd/log2csv/log2csv.go
Comment thread cmd/log2csv/log2csv.go
Comment thread README.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
.github/workflows/ci.yaml (1)

24-27: Add actionlint to lint the workflow itself.

Run actionlint right after checkout to fail fast on YAML/Actions issues.

       - name: Checkout
         uses: actions/checkout@v4

+      - name: Lint workflows (actionlint)
+        uses: reviewdog/action-actionlint@v1
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cc0d98 and d2a8fe1.

📒 Files selected for processing (1)
  • .github/workflows/ci.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Comment thread .github/workflows/ci.yaml
Comment thread .github/workflows/ci.yaml
Comment on lines +25 to +26
- name: Checkout
uses: actions/checkout@v4

@coderabbitai coderabbitai Bot Sep 8, 2025

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin GitHub Actions by commit SHA for supply‑chain security.

Avoid mutable tags; pin to immutable SHAs (update via Dependabot).

-      - name: Checkout
-        uses: actions/checkout@v4
+      - name: Checkout
+        uses: actions/checkout@<commit-sha>

-      - name: Set up Go
-        uses: actions/setup-go@v5
+      - name: Set up Go
+        uses: actions/setup-go@<commit-sha>

-      - name: Upload coverage (artifact)
-        if: always()
-        uses: actions/upload-artifact@v4
+      - name: Upload coverage (artifact)
+        if: always()
+        uses: actions/upload-artifact@<commit-sha>

Tip: enable Dependabot for GitHub Actions to auto-bump SHAs.

Also applies to: 29-29, 91-91

🤖 Prompt for AI Agents
.github/workflows/ci.yaml around lines 25-26 (also apply same change at lines 29
and 91): the workflow is using mutable action tags (e.g., actions/checkout@v4);
replace these with the action reference pinned to the corresponding immutable
commit SHA (actions/checkout@<commit-sha>) for supply-chain security, and do the
same for any other actions referenced at the noted lines; update Dependabot to
auto-bump these SHAs (or document a process) so future updates use Dependabot
PRs that refresh the pinned SHAs.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be already addressed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of GitHub Actions pinning in the CI workflow
cat .github/workflows/ci.yaml

Length of output: 2584


@b4prog, I can see that the structure for pinning GitHub Actions is in place, but the actual commit SHAs haven't been filled in yet. The workflow currently uses placeholder values:

  • Line 26: actions/checkout@<commit-sha>
  • Line 29: actions/setup-go@<commit-sha>
  • Line 91: actions/upload-artifact@<commit-sha>

To complete the security improvement, these placeholders need to be replaced with actual commit SHAs. Here are the current stable commit SHAs for these actions:

- name: Checkout
  uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

- name: Set up Go
  uses: actions/setup-go@41dfa10bad2bb2ae585af6ee5bb4d7d973ad74ed # v5.1.0

- name: Upload coverage (artifact)
  uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.4.3

Would you like me to create an issue to track completing this security improvement with the actual commit SHAs?

Comment thread .github/workflows/ci.yaml
Comment on lines +28 to +35
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: "1.25.x"
check-latest: true
cache: true
cache-dependency-path: go.sum

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Drive toolchain version from go.mod for single source of truth.

Let setup-go read the version from go.mod; fewer drift issues.

-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: "1.25.x"
-          check-latest: true
-          cache: true
-          cache-dependency-path: go.sum
+      - name: Set up Go
+        uses: actions/setup-go@<commit-sha>
+        with:
+          go-version-file: go.mod
+          check-latest: true
+          cache: true
+          cache-dependency-path: go.sum

If you intentionally target Go 1.25.x regardless of go.mod, keep the current config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Set up Go
uses: actions/setup-go@v5
with:
go-version: "1.25.x"
check-latest: true
cache: true
cache-dependency-path: go.sum
- name: Set up Go
uses: actions/setup-go@<commit-sha>
with:
go-version-file: go.mod
check-latest: true
cache: true
cache-dependency-path: go.sum
🤖 Prompt for AI Agents
.github/workflows/ci.yaml lines 28-35: the workflow pins Go with a hardcoded
go-version ("1.25.x"); change it to drive the toolchain from go.mod by replacing
the explicit go-version input with the setup-go input that reads the version
from the file (use the action's go-version-file input set to "go.mod"), keep the
other inputs (check-latest, cache, cache-dependency-path) as-is, and ensure no
hardcoded version remains so the toolchain follows go.mod as the single source
of truth.

Comment thread .github/workflows/ci.yaml
Comment on lines +80 to +85
- name: Vet
run: go vet -mod=readonly ./...

- name: Build
run: go build -mod=readonly ./...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add golangci-lint to catch issues beyond vet.

Staticcheck, style, and common bug patterns won’t be covered by vet alone.

       - name: Vet
         run: go vet -mod=readonly ./...

+      - name: Lint (golangci-lint)
+        uses: golangci/golangci-lint-action@<commit-sha>
+        with:
+          version: latest
+          args: --timeout=5m
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Vet
run: go vet -mod=readonly ./...
- name: Build
run: go build -mod=readonly ./...
- name: Vet
run: go vet -mod=readonly ./...
- name: Lint (golangci-lint)
uses: golangci/golangci-lint-action@<commit-sha>
with:
version: latest
args: --timeout=5m
- name: Build
run: go build -mod=readonly ./...
🤖 Prompt for AI Agents
In .github/workflows/ci.yaml around lines 80 to 85 the workflow only runs go vet
and go build; add a golangci-lint step to catch staticcheck/style/bug patterns
that vet misses by invoking golangci-lint (e.g., use the golangci-lint action or
install it in the job and run golangci-lint run --timeout 5m ./...), ensure the
step runs before build, exits non-zero on issues, and optionally use caching for
the linter binary to speed CI.

Comment thread .github/workflows/ci.yaml
Comment on lines +86 to +88
- name: Test (race, coverage)
run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Shuffle tests to surface order dependencies.

Helps catch hidden global state issues.

-      - name: Test (race, coverage)
-        run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...
+      - name: Test (race, coverage)
+        run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Test (race, coverage)
run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...
- name: Test (race, coverage)
run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...
🤖 Prompt for AI Agents
In .github/workflows/ci.yaml around lines 86 to 88, the CI job runs `go test`
deterministically which can hide test order dependencies; update the `run` step
to pass the test shuffle flag so test execution order is randomized (e.g.,
append `-shuffle=time` or a seeded value to the `go test` invocation) to surface
hidden global state issues and ensure reproducibility by logging the seed if
using a time-based seed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
cmd/log2csv/log2csv.go (2)

113-118: Preserve CRLF and check write errors in -unmatched mode.

fmt.Fprintln normalizes to LF and its error is unchecked. Write with the detected lineEnding and handle errors.

Apply:

-                        // In unmatched mode, print each unique non-matching line once.
-                        fmt.Fprintln(output, line)
+                        // In unmatched mode, print each unique non-matching line once, preserving line endings.
+                        if _, err := io.WriteString(output, line+lineEnding); err != nil {
+                            return err
+                        }

23-29: Library reuse: exported errors in package main.

If you plan reuse, move parsing/writing and exported errors into an internal/pkg package and keep main thin.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2a8fe1 and af752e3.

📒 Files selected for processing (1)
  • cmd/log2csv/log2csv.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

  • cmd/log2csv/log2csv.go
📚 Learning: 2025-09-08T04:42:11.224Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.

Applied to files:

  • cmd/log2csv/log2csv.go
🔇 Additional comments (1)
cmd/log2csv/log2csv.go (1)

6-6: Good move to encoding/csv and single-writer usage.

Using csv.Writer once with UseCRLF based on detected line endings is correct and reduces per-row allocations.

Also applies to: 99-103, 149-151

Comment thread cmd/log2csv/log2csv.go
Comment on lines +48 to +49
fmt.Fprint(os.Stderr, msg)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Silence errcheck for informational prints.

Main/usage can’t return errors; explicitly ignore the result.

Apply:

-    fmt.Fprint(os.Stderr, msg)
+    _, _ = fmt.Fprint(os.Stderr, msg)
@@
-        fmt.Fprintf(os.Stderr, "error: %v\n", err)
+        _, _ = fmt.Fprintf(os.Stderr, "error: %v\n", err)

Also applies to: 53-54

🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 48-49 (and similarly at 53-54), the
fmt.Fprint calls that write informational usage/help text return an error which
is currently unused; explicitly ignore the returned error to silence errcheck by
assigning it to the blank identifier. Update each fmt.Fprint/Printf invocation
used for informational prints to capture the return values like _, _ =
fmt.Fprint(...) or _, _ = fmt.Fprintf(...) so the result is intentionally
ignored while preserving the existing behavior.

Comment thread cmd/log2csv/log2csv.go
Comment on lines +71 to +79
groupNames := extractGroupNames(re)
if len(groupNames) == 0 {
return ErrNoNamedCaptureGroups
}
out := bufio.NewWriter(os.Stdout)
err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
if flushErr := out.Flush(); err == nil && flushErr != nil {
err = flushErr
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid per-line SubexpNames; precompute indices. Also don’t require named groups in -unmatched mode.

  • SubexpNames() on every line is unnecessary; precompute capture indices once.
  • In -unmatched mode, named groups aren’t needed—current unconditional check blocks valid usage.

Apply:

@@
-    groupNames := extractGroupNames(re)
-    if len(groupNames) == 0 {
-        return ErrNoNamedCaptureGroups
-    }
-    out := bufio.NewWriter(os.Stdout)
-    err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+    // Precompute indices for named groups once.
+    groupIndices := namedGroupIndices(re)
+    // Only require/compute names when emitting CSV.
+    var groupNames []string
+    if !*listUnmatched {
+        groupNames = extractGroupNames(re)
+        if len(groupNames) == 0 {
+            return ErrNoNamedCaptureGroups
+        }
+    }
+    out := bufio.NewWriter(os.Stdout)
+    err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)
@@
 func extractGroupNames(re *regexp.Regexp) []string {
@@
 }
 
+// namedGroupIndices returns the capture indices (into FindStringSubmatch)
+// for all named groups, in declaration order. Duplicates are preserved.
+func namedGroupIndices(re *regexp.Regexp) []int {
+    names := re.SubexpNames()
+    idxs := make([]int, 0, len(names))
+    for i := 1; i < len(names); i++ {
+        if names[i] != "" {
+            idxs = append(idxs, i)
+        }
+    }
+    return idxs
+}
@@
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error {
@@
-        values, ok := processLine(line, re, groupNames)
+        values, ok := processLine(line, re, groupIndices)
@@
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
-    submatches := re.FindStringSubmatch(line)
-    if submatches == nil {
-        return nil, false
-    }
-    subNames := re.SubexpNames()
-    values := make([]string, 0, len(groupNames))
-    allEmpty := true
-    for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-        name := subNames[idxSubmatch]
-        if name == "" {
-            continue
-        }
-        val := submatches[idxSubmatch]
-        if val != "" {
-            allEmpty = false
-        }
-        values = append(values, val)
-    }
-    if allEmpty {
-        return nil, false
-    }
-    return values, true
-}
+// Process a log line and return CSV values + true if valid, or nil + false if ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
+    submatches := re.FindStringSubmatch(line)
+    if submatches == nil {
+        return nil, false
+    }
+    values := make([]string, len(groupIndices))
+    allEmpty := true
+    for i, idx := range groupIndices {
+        v := submatches[idx]
+        if v != "" {
+            allEmpty = false
+        }
+        values[i] = v
+    }
+    if allEmpty {
+        return nil, false
+    }
+    return values, true
+}

Note: This preserves duplicate header names as-is (per your stated flexibility goal).

Also applies to: 83-93, 94-94, 110-110, 175-199

🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 71-79 (and also adjust the referenced
sections 83-93, 94, 110, and 175-199), avoid calling re.SubexpNames() per input
line by calling it once before processing and precomputing the slice of named
capture indices; pass that precomputed data into processInput and any per-line
logic. Also remove the unconditional check that requires named capture groups
when the -unmatched flag is set: only return ErrNoNamedCaptureGroups if
-unmatched is false and no named groups were found. Update function signatures
and callers as needed to accept the cached names/indices and ensure
flushing/error handling remains unchanged.

Comment thread cmd/log2csv/log2csv.go
Comment on lines +106 to +108
// Track unique non-matching lines when -unmatched is set.
seenUnmatched := make(map[string]struct{})
for sc.Scan() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unbounded memory in -unmatched dedupe.

seenUnmatched grows with unique lines; huge logs could exhaust memory. Consider a flag to disable dedupe (-unmatched-all), a size cap, or a Bloom filter if approximate uniqueness is acceptable.

Would you like a patch adding -unmatched-all and an optional -unmatched-max= cap?

🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 106 to 108, the current seenUnmatched map
for deduping unmatched lines is unbounded and can exhaust memory on huge logs;
add two flags: -unmatched-all (bool) to disable dedupe entirely and
-unmatched-max=<N> (int) to cap the number of unique entries tracked; implement
logic so that when -unmatched-all is true the code does not create or consult
seenUnmatched, otherwise create seenUnmatched but enforce the cap by stopping
insertion of new unique keys once the count reaches unmatched-max (or
alternatively evict oldest if you prefer FIFO behavior), and document the
semantics and defaults (e.g., unmatched-max default 100000 or similar) and
validate the flag values on startup.

Comment thread cmd/log2csv/log2csv.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.coderabbit.yaml (1)

6-6: Poem disabled — good follow-through on prior feedback.
Matches the earlier request; keeps reviews concise.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af752e3 and 9d5548e.

📒 Files selected for processing (1)
  • .coderabbit.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.
🔇 Additional comments (1)
.coderabbit.yaml (1)

7-9: Confirm intent to review draft PRs.
Setting drafts: true will trigger reviews on drafts; verify this is desired for your workflow.

Comment thread .coderabbit.yaml

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (12)
.github/workflows/ci.yaml (4)

29-35: Drive Go version from go.mod (single source of truth).

Let setup-go read the version from go.mod to avoid drift.

-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: "1.25.x"
-          check-latest: true
-          cache: true
-          cache-dependency-path: go.sum
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+          check-latest: true
+          cache: true
+          cache-dependency-path: go.sum

27-27: Pin GitHub Actions by commit SHA.

Improves supply-chain security; avoid mutable tags.

-        uses: actions/checkout@v4
+        uses: actions/checkout@<commit-sha>

-        uses: actions/setup-go@v5
+        uses: actions/setup-go@<commit-sha>

-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@<commit-sha>

Tip: enable Dependabot for Actions to auto-bump SHAs.

Also applies to: 30-30, 92-92


81-86: Add golangci-lint to catch issues beyond vet.

Covers errcheck, staticcheck, style, and common bugs.

       - name: Vet
         run: go vet -mod=readonly ./...

+      - name: Lint (golangci-lint)
+        uses: golangci/golangci-lint-action@<commit-sha>
+        with:
+          version: latest
+          args: --timeout=5m

87-88: Shuffle tests to surface order dependencies.

Randomizes execution order.

-        run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...
+        run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...
cmd/log2csv/log2csv.go (7)

48-49: Silence errcheck for usage print.

Explicitly ignore the write error.

-	fmt.Fprint(os.Stderr, msg)
+	_, _ = fmt.Fprint(os.Stderr, msg)

52-55: Silence errcheck for error print in main.

Keep behavior; ignore write error explicitly.

-		fmt.Fprintf(os.Stderr, "error: %v\n", err)
+		_, _ = fmt.Fprintf(os.Stderr, "error: %v\n", err)

145-147: Check stderr write for the warning.

Propagate write failures.

-		fmt.Fprintf(os.Stderr, "\nwarning: %d log line(s) did not match the pattern and were ignored\nfirst ignored line: %q\n", ignoredLines, firstIgnoredLine)
+		if _, err := fmt.Fprintf(os.Stderr, "\nwarning: %d log line(s) did not match the pattern and were ignored\nfirst ignored line: %q\n", ignoredLines, firstIgnoredLine); err != nil {
+			return err
+		}

22-29: Library vs CLI split (optional).

Exported errors in package main aren’t reusable; consider moving core logic to a library package and keep main thin.


71-76: Don’t require named groups in -unmatched mode.

Current check blocks valid -unmatched use cases.

-	groupNames := extractGroupNames(re)
-	if len(groupNames) == 0 {
-		return ErrNoNamedCaptureGroups
-	}
+	var groupNames []string
+	if !*listUnmatched {
+		groupNames = extractGroupNames(re)
+		if len(groupNames) == 0 {
+			return ErrNoNamedCaptureGroups
+		}
+	}

177-201: Avoid per-line SubexpNames; precompute named-group indices once.

Removes O(lines) allocations and speeds hot path.

-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
-	submatches := re.FindStringSubmatch(line)
-	if submatches == nil {
-		return nil, false
-	}
-	subNames := re.SubexpNames()
-	values := make([]string, 0, len(groupNames))
-	allEmpty := true
-	for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-		name := subNames[idxSubmatch]
-		if name == "" {
-			continue
-		}
-		val := submatches[idxSubmatch]
-		if val != "" {
-			allEmpty = false
-		}
-		values = append(values, val)
-	}
-	if allEmpty {
-		return nil, false
-	}
-	return values, true
-}
+// Process a log line and return CSV values + true if valid, or nil + false if ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
+	submatches := re.FindStringSubmatch(line)
+	if submatches == nil {
+		return nil, false
+	}
+	values := make([]string, len(groupIndices))
+	allEmpty := true
+	for i, idx := range groupIndices {
+		v := submatches[idx]
+		if v != "" {
+			allEmpty = false
+		}
+		values[i] = v
+	}
+	if allEmpty {
+		return nil, false
+	}
+	return values, true
+}

And wire it through:

 func extractGroupNames(re *regexp.Regexp) []string {
@@
 }
+
+// namedGroupIndices returns capture indices for named groups in declaration order.
+func namedGroupIndices(re *regexp.Regexp) []int {
+	names := re.SubexpNames()
+	idxs := make([]int, 0, len(names))
+	for i := 1; i < len(names); i++ {
+		if names[i] != "" {
+			idxs = append(idxs, i)
+		}
+	}
+	return idxs
+}
-	groupNames := extractGroupNames(re)
+	groupIndices := namedGroupIndices(re)
+	groupNames := extractGroupNames(re)
-	err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+	err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error {
@@
-		values, ok := processLine(line, re, groupNames)
+		values, ok := processLine(line, re, groupIndices)

106-127: Unbounded memory for -unmatched dedupe. Add a cap.

Large logs can exhaust memory.

-	seenUnmatched := make(map[string]struct{})
+	seenUnmatched := make(map[string]struct{})
+	const unmatchedMaxDefault = 0 // 0 = unlimited
+	unmatchedMax := unmatchedMaxDefault
@@
-				if listUnmatched {
-					if _, exists := seenUnmatched[line]; !exists {
-						seenUnmatched[line] = struct{}{}
-						// In unmatched mode, print each unique non-matching line once, preserving line endings.
-						if _, err := io.WriteString(output, line+lineEnding); err != nil {
-							return err
-						}
-					}
+				if listUnmatched {
+					// If a cap is configured and reached, stop tracking new uniques and stream lines.
+					if unmatchedMax > 0 && len(seenUnmatched) >= unmatchedMax {
+						if _, err := io.WriteString(output, line+lineEnding); err != nil {
+							return err
+						}
+					} else if _, exists := seenUnmatched[line]; !exists {
+						seenUnmatched[line] = struct{}{}
+						if _, err := io.WriteString(output, line+lineEnding); err != nil {
+							return err
+						}
+					}

Follow-up: if you prefer a flag (e.g., -unmatched-max=N or -unmatched-all), I can wire it through run() and processInput.

README.md (1)

31-33: Remove trailing period from Windows example (breaks copy/paste).

The fenced command ends with an extra ".".

-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$".
+Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5548e and 93e7ca7.

📒 Files selected for processing (3)
  • .github/workflows/ci.yaml (1 hunks)
  • README.md (1 hunks)
  • cmd/log2csv/log2csv.go (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

  • README.md
  • cmd/log2csv/log2csv.go
📚 Learning: 2025-09-08T04:42:11.224Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.

Applied to files:

  • README.md
  • cmd/log2csv/log2csv.go
🔇 Additional comments (1)
.github/workflows/ci.yaml (1)

16-17: LGTM: job timeout added.

Good safeguard against hung runs.

Comment thread cmd/log2csv/log2csv.go
Comment on lines +94 to +103
func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
inputReader, lineEnding, err := peekForLineEnding(input, logLineSizeMax)
if err != nil {
return err
}
sc := openInput(inputReader)
csvWriter := csv.NewWriter(output)
csvWriter.Comma = csvSeparator
csvWriter.UseCRLF = lineEnding == "\r\n"
firstLine := true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid creating csv.Writer in -unmatched mode.

Skip allocation/flush when not needed.

-	csvWriter := csv.NewWriter(output)
-	csvWriter.Comma = csvSeparator
-	csvWriter.UseCRLF = lineEnding == "\r\n"
+	var csvWriter *csv.Writer
+	if !listUnmatched {
+		csvWriter = csv.NewWriter(output)
+		csvWriter.Comma = csvSeparator
+		csvWriter.UseCRLF = lineEnding == "\r\n"
+	}
@@
-	csvWriter.Flush()
-	return csvWriter.Error()
+	if csvWriter != nil {
+		csvWriter.Flush()
+		return csvWriter.Error()
+	}
+	return nil

Also applies to: 151-153

🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 94 to 103 (and similarly at 151-153), the
function always allocates a csv.Writer and sets its properties even when running
in -unmatched (listUnmatched) mode where CSV output is not produced; change the
logic so that csv.Writer is only created and configured when listUnmatched is
false, and guard any later calls to csvWriter.Write and csvWriter.Flush with the
same condition so no allocation or flush is performed when listUnmatched is
true.

Comment thread README.md
Comment on lines +10 to +14
- Reads log lines from **STDIN** and writes CSV to **STDOUT**.
- Extracts fields using **named capture groups** (`(?P<Name>...)`).
- The **CSV header row** is automatically generated from group names.
- Preserves the input's line endings (LF/CRLF).
- Optional **unmatched mode** (`-unmatched`) to print **unique non-matching lines** instead of CSV.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Docs/code mismatch for -unmatched named-groups requirement.

Current code requires named groups even with -unmatched; README doesn’t say so. Either relax the code (preferred; see code review) or document the requirement.

Also applies to: 18-22

🤖 Prompt for AI Agents
In README.md around lines 10 to 14 (and similarly 18 to 22), the docs state
-unmatched prints unique non-matching lines but the code currently enforces that
the regex contains named capture groups even when -unmatched is used; relax the
code so that when -unmatched is enabled the named-group validation is skipped:
modify the validation logic to only require named groups if unmatched flag is
false, ensure unmatched mode continues to collect and print unique non-matching
input lines, update or add tests for unmatched without named groups, and keep
README unchanged (or if you prefer documenting the requirement instead, update
these README lines to state that named groups are required when not using
-unmatched).

Comment thread README.md
Comment on lines +35 to +40
## Install

```sh
go install github.com/b4prog/Log2CSV@latest
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Mention minimum Go version.

Improves install clarity.

 ## Install
@@
 ```sh
 go install github.com/b4prog/Log2CSV@latest

+Requires Go 1.25+.


<details>
<summary>🤖 Prompt for AI Agents</summary>

README.md around lines 35 to 40: the Install section calls "go install
github.com/b4prog/Log2CSV@latest" but doesn't state the minimum Go version;
update the Install section to explicitly mention the required minimum Go version
(e.g., "Requires Go 1.25+"), placing that note directly below the install
command and keeping phrasing concise and consistent with the document's style.


</details>

<!-- fingerprinting:phantom:poseidon:chinchilla -->

<!-- This is an auto-generated comment by CodeRabbit -->

@b4prog b4prog merged commit 96a5ad4 into main Sep 9, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant