Initial implementation by b4prog · Pull Request #1 · b4prog/Log2CSV

b4prog · 2025-09-07T21:56:35Z

Summary by CodeRabbit

New Features
- Added Log2CSV CLI: reads STDIN, extracts fields via a required regex with named capture groups, emits CSV with headers, preserves line endings, and supports an unmatched-only mode.
Documentation
- Expanded README with detailed description, features, usage, examples (Linux/Windows), install and build instructions.
Chores
- Added CI workflow, task runner configuration, updated ignore rules, and new repository review automation configuration.

coderabbitai · 2025-09-07T21:56:40Z

Warning

Rate limit exceeded

@b4prog has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 93e7ca7 and 16515dc.

📒 Files selected for processing (1)

.coderabbit.yaml (1 hunks)

Walkthrough

Adds a new Go module and CLI tool log2csv that reads STDIN, applies a user-supplied regex with named capture groups to produce CSV (or unique non-matching lines). Also adds CI workflow, task runner, updated .gitignore, expanded README, and a CodeRabbit config; exports ErrInvalidRegexp and ErrNoNamedCaptureGroups.

Changes

Cohort / File(s)	Summary
CLI implementation `cmd/log2csv/log2csv.go`	New CLI requiring `-regexp` with named capture groups; validates/compiles regex (returns `ErrInvalidRegexp` or `ErrNoNamedCaptureGroups`), writes CSV header and rows, preserves input EOL, supports `-unmatched` unique non-match output, buffered I/O, skips all-empty-group matches, and propagates scanner/write errors.
Module config `go.mod`	New Go module `github.com/b4prog/Log2CSV` targeting Go 1.25 (no external deps).
CI workflow `.github/workflows/ci.yaml`	Adds GitHub Actions CI: triggers on push/pull_request; runs on ubuntu-latest; checks `gofmt`, verifies `go mod tidy` (fails if changes), runs `go vet`, `go build`, `go test` (race, coverage), uploads `coverage.out`; concurrency configured.
Automation & tooling config `.coderabbit.yaml`, `taskfile.yaml`	Adds CodeRabbit review config (`language: en`, assertive profile, `high_level_summary: true`, `poem: false`, `auto_review.drafts: true`) and enables checks (golangci-lint, gitleaks, yamllint, actionlint). Adds Taskfile with tasks for deps, tidy, fmt, vet, build, run, test, clean and cross-platform build/run handling.
Repository hygiene `.gitignore`	Activates ignoring of `.idea/` and `.vscode/*` while whitelisting `.vscode/extensions.json`, `.vscode/settings.json`, `.vscode/tasks.json`, `.vscode/launch.json`; adds `/build/` and `.task/` ignores.
Documentation `README.md`	Expanded README: describes Log2CSV (Go-based), features, usage, Linux/Windows examples, install and build-from-source instructions.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Shell as Shell/OS
  participant CLI as log2csv
  participant RE as RegExp Engine
  participant CSV as CSV Writer
  participant STDIN as STDIN
  participant STDOUT as STDOUT

  User->>Shell: run log2csv -regexp "<pattern>" [--unmatched]
  Shell->>CLI: start
  CLI->>RE: compile pattern
  alt compile error
    RE-->>CLI: syntax error
    CLI-->>Shell: exit ErrInvalidRegexp
  else compiled
    RE-->>CLI: compiled
    alt no named groups
      CLI-->>Shell: exit ErrNoNamedCaptureGroups
    else has named groups
      loop each input line
        STDIN-->>CLI: line
        CLI->>RE: match(line)
        alt match
          CLI->>CLI: extract named groups
          opt first matching line
            CLI->>CSV: write header (group names)
          end
          alt all groups empty
            CLI-->>CLI: skip row
          else
            CLI->>CSV: write row (preserve EOL)
            CSV-->>STDOUT: CSV line
          end
        else no match
          alt --unmatched
            CLI-->>CLI: collect unique non-match
          else
            CLI-->>CLI: count non-matches (warn later)
          end
        end
      end
      CLI->>CSV: flush
      CLI-->>Shell: exit 0
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (2 warnings, 1 inconclusive)

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is empty and fails to provide any context or overview of the extensive additions—including the new CLI implementation, configuration files, documentation updates, and CI/task automation—which leaves reviewers without guidance.	Add a concise description summarizing the scope and intent of the changes, such as detailing the Log2CSV tool implementation, the new .coderabbit.yaml and CI workflows, documentation enhancements, and task runner setup, to orient reviewers effectively.
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title Check	❓ Inconclusive	The title “Initial implementation” is overly generic and does not clearly convey the primary change of adding the Log2CSV CLI tool, configuration, documentation, and CI/task automation, making it difficult for reviewers to immediately grasp the main update.	Revise the title to succinctly describe the core feature, for example “Implement Log2CSV CLI for converting logs to CSV with YAML configuration and CI setup,” so reviewers can quickly understand the main change.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch Initial-implementation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66d00a2 and 90c9ae4.

📒 Files selected for processing (7)

.coderabbit.yaml (1 hunks)
.github/workflows/ci.yaml (1 hunks)
.gitignore (1 hunks)
README.md (1 hunks)
cmd/log2csv/log2csv.go (1 hunks)
go.mod (1 hunks)
taskfile.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

📚 Learning: 2025-09-07T18:51:13.329Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

go.mod
cmd/log2csv/log2csv.go
README.md

🔇 Additional comments (3)

go.mod (1)

3-3: Go 1.25 GA and CI support confirmed. Go 1.25 was released on August 12, 2025, and actions/setup-go@v5 supports it via the go-version input with manifest lookup and fallback to the official distribution.

.gitignore (1)

31-42: LGTM on ignores for IDEs and build artifacts.

The whitelist for specific .vscode configs and adding /build/ and .task/ look good.

.github/workflows/ci.yaml (1)

31-34: No changes needed: actions/setup-go@v5 supports Go 1.25.x
setup-go v5 resolves 1.25.x from the Go versions manifest (Go 1.25 was released Aug 12, 2025) and will install it correctly.

- IPv4 (TOS/PREC/TTL/ID[/DF]) or IPv6 (TC/HOPLIMIT/FLOWLBL) through alternation – TCP/UDP/ICMP(-v6) with optional ports and TCP flags – the second LEN (L4 length) present on UDP

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (6)

README.md (2)

10-16: Document duplicate named groups and header emission semantics.

Make the behavior explicit to avoid surprises when patterns reuse names and when no lines match.

 - Extracts fields using **named capture groups** (`(?P<Name>...)`).
-- The **CSV header row** is automatically generated from group names.
+- Allows **duplicate named groups**; each occurrence becomes its own CSV column in left‑to‑right order (names are not deduplicated).
+- The **CSV header row** is automatically generated from group names (including duplicates) and is emitted on the first matching line (no header if no lines match).

29-31: Remove trailing period from the Windows example (breaks copy/paste).

There’s an extra “.” at the end of Line 30 inside the code block.

-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[(?P<Kernel_Time>[\d\.]+)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S+)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)\s+(?:(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)\s+(?:(?P<DF>DF)\s+)?|(?:TC=(?P<TC>\d+)\s+)?(?:HOPLIMIT=(?P<HOPLIMIT>\d+)\s+)?(?:FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+)\s+)? )PROTO=(?P<PROTO>[A-Za-z0-9]+)\s+(?:(?:SPT|SP)=(?P<SPT>\d+)\s+)?(?:(?:DPT|DP)=(?P<DPT>\d+)\s+)?(?:WINDOW=(?P<WINDOW>\d+)\s+)?(?:RES=(?P<RES>0x[0-9A-Fa-f]{2})\s+)?(?:(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$".
+Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[(?P<Kernel_Time>[\d\.]+)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S+)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)\s+(?:(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)\s+(?:(?P<DF>DF)\s+)?|(?:TC=(?P<TC>\d+)\s+)?(?:HOPLIMIT=(?P<HOPLIMIT>\d+)\s+)?(?:FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+)\s+)? )PROTO=(?P<PROTO>[A-Za-z0-9]+)\s+(?:(?:SPT|SP)=(?P<SPT>\d+)\s+)?(?:(?:DPT|DP)=(?P<DPT>\d+)\s+)?(?:WINDOW=(?P<WINDOW>\d+)\s+)?(?:RES=(?P<RES>0x[0-9A-Fa-f]{2})\s+)?(?:(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"

cmd/log2csv/log2csv.go (4)

21-28: Exported errors in package main hinder reuse. Consider a library package.

If you plan reuse, move parsing and errors into an internal/pkg and keep main thin.

3-13: Use encoding/csv for correct CSV quoting and newlines.

Removes hand-rolled quoting; handles commas, quotes, and CRLF robustly.

@@
 import (
 	"bufio"
 	"bytes"
+	"encoding/csv"
 	"errors"
 	"flag"
 	"fmt"
 	"io"
 	"os"
 	"regexp"
 	"strings"
 )
@@
-func writeCSVRow(output io.Writer, values []string, lineEnding string) error {
-	csvValues := make([]string, len(values))
-	for idxValue, value := range values {
-		csvValues[idxValue] = `"` + strings.ReplaceAll(value, `"`, `""`) + `"`
-	}
-	csvLine := strings.Join(csvValues, string(csvSeparator)) + lineEnding
-	_, err := io.WriteString(output, csvLine)
-	return err
-}
+func writeCSVRow(output io.Writer, values []string, lineEnding string) error {
+	w := csv.NewWriter(output)
+	w.Comma = csvSeparator
+	w.UseCRLF = lineEnding == "\r\n"
+	if err := w.Write(values); err != nil {
+		return err
+	}
+	w.Flush()
+	return w.Error()
+}

Also applies to: 160-168

51-73: Precompute named-group indices; avoid per-line SubexpNames/allocations.

Cuts per-line work and fixes the currently unused parameter pattern.

@@
 	out := bufio.NewWriter(os.Stdout)
-	err = processInput(os.Stdin, re, groupNames, out)
+	groupIndices := namedGroupIndices(re)
+	err = processInput(os.Stdin, re, groupIndices, groupNames, out)
@@
-func extractGroupNames(re *regexp.Regexp) []string {
+func extractGroupNames(re *regexp.Regexp) []string {
 	names := re.SubexpNames()[1:]
 	ordered := make([]string, 0, len(names))
 	for _, name := range names {
 		if name != "" {
 			ordered = append(ordered, name)
 		}
 	}
 	return ordered
 }
+
+// namedGroupIndices returns the submatch indices for named groups (incl. duplicates) in order.
+func namedGroupIndices(re *regexp.Regexp) []int {
+	names := re.SubexpNames()
+	idxs := make([]int, 0, len(names))
+	for i := 1; i < len(names); i++ {
+		if names[i] != "" {
+			idxs = append(idxs, i)
+		}
+	}
+	return idxs
+}
@@
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer) error {
@@
-		values, ok := processLine(line, re, groupNames)
+		values, ok := processLine(line, re, groupIndices)
@@
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
+// Process a log line and return CSV values + true if valid, or nil + false if it should be ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
 	submatches := re.FindStringSubmatch(line)
 	if submatches == nil {
 		return nil, false
 	}
-	subNames := re.SubexpNames()
-	values := make([]string, 0, len(groupNames))
+	values := make([]string, len(groupIndices))
 	allEmpty := true
-	for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-		name := subNames[idxSubmatch]
-		if name == "" {
-			continue
-		}
-		val := submatches[idxSubmatch]
+	for i, subIdx := range groupIndices {
+		val := submatches[subIdx]
 		if val != "" {
 			allEmpty = false
 		}
-		values = append(values, val)
+		values[i] = val
 	}
 	if allEmpty {
 		return nil, false
 	}
 	return values, true
 }

Also applies to: 86-110, 134-158

112-125: Peek-based line-ending detection can misdetect on very long first lines (>64 KiB).

Switch to ReadSlice + io.MultiReader to be exact.

-func peekForLineEnding(input io.Reader, sizeMaxPeek int) (io.Reader, string, error) {
-	inputBuffer := bufio.NewReader(input)
-	sample, err := inputBuffer.Peek(sizeMaxPeek)
-	if err != nil && err != io.EOF && !errors.Is(err, bufio.ErrBufferFull) {
-		return nil, "", err
-	}
-	if idx := bytes.IndexByte(sample, '\n'); idx >= 0 {
-		if idx > 0 && sample[idx-1] == '\r' {
-			return inputBuffer, "\r\n", nil
-		}
-		return inputBuffer, "\n", nil
-	}
-	return inputBuffer, "\n", nil
-}
+func peekForLineEnding(input io.Reader, _ int) (io.Reader, string, error) {
+	inputBuffer := bufio.NewReader(input)
+	var first []byte
+	for {
+		chunk, err := inputBuffer.ReadSlice('\n')
+		first = append(first, chunk...)
+		if errors.Is(err, bufio.ErrBufferFull) {
+			continue
+		}
+		if err != nil && err != io.EOF {
+			return nil, "", err
+		}
+		break
+	}
+	ending := "\n"
+	if idx := bytes.IndexByte(first, '\n'); idx >= 1 && first[idx-1] == '\r' {
+		ending = "\r\n"
+	}
+	return io.MultiReader(bytes.NewReader(first), inputBuffer), ending, nil
+}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90c9ae4 and 29280dd.

📒 Files selected for processing (2)

README.md (1 hunks)
cmd/log2csv/log2csv.go (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

📚 Learning: 2025-09-07T18:51:13.329Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

README.md
cmd/log2csv/log2csv.go

🔇 Additional comments (2)

cmd/log2csv/log2csv.go (2)

30-43: Usage and examples read well. LGTM.

59-66: Good error mapping and validation.

-unmatched to list lines that do not match

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (5)

README.md (2)

10-15: Document duplicate named groups and header emission.

Align docs with intended behavior (duplicates allowed; header once on first match).

 - The **CSV header row** is automatically generated from group names.
+- Allows **duplicate named groups**; each occurrence becomes its own CSV column in left-to-right pattern order (names are not de-duplicated).
+- The **CSV header** is emitted once, on the first matched line. No header is produced if no lines match or when `-unmatched` is used.

31-33: Remove trailing period in Windows example (breaks copy/paste).

-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$".
+Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"

cmd/log2csv/log2csv.go (3)

71-81: Precompute named-group indices; avoid SubexpNames per line.

Improves hot-path performance and removes unused parameter.

@@
-    groupNames := extractGroupNames(re)
+    groupNames := extractGroupNames(re)
+    groupIndices := namedGroupIndices(re)
@@
-    out := bufio.NewWriter(os.Stdout)
-    err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+    out := bufio.NewWriter(os.Stdout)
+    err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)
@@
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error {
@@
-        values, ok := processLine(line, re, groupNames)
+        values, ok := processLine(line, re, groupIndices)
@@
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
+// Process a log line and return CSV values + true if the line is valid, or nil + false if the line should be ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
     submatches := re.FindStringSubmatch(line)
     if submatches == nil {
         return nil, false
     }
-    subNames := re.SubexpNames()
-    values := make([]string, 0, len(groupNames))
+    values := make([]string, len(groupIndices))
     allEmpty := true
-    for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-        name := subNames[idxSubmatch]
-        if name == "" {
-            continue
-        }
-        val := submatches[idxSubmatch]
+    for i, subIdx := range groupIndices {
+        val := submatches[subIdx]
         if val != "" {
             allEmpty = false
         }
-        values = append(values, val)
+        values[i] = val
     }
     if allEmpty {
         return nil, false
     }
     return values, true
 }
+ 
+func namedGroupIndices(re *regexp.Regexp) []int {
+    names := re.SubexpNames()
+    idxs := make([]int, 0, len(names))
+    for i := 1; i < len(names); i++ {
+        if names[i] != "" {
+            idxs = append(idxs, i)
+        }
+    }
+    return idxs
+}

Also applies to: 94-107, 168-191

146-159: Line-ending detection may misdetect on very long first lines.

Optional: switch to ReadSlice('\n') and rebuild the reader to avoid peek-size dependency.

22-29: Consider moving exported errors out of package main.

If reuse is planned, place them in an internal/pkg library and keep main thin.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29280dd and 7cc0d98.

📒 Files selected for processing (2)

README.md (1 hunks)
cmd/log2csv/log2csv.go (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

📚 Learning: 2025-09-07T18:51:13.329Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

cmd/log2csv/log2csv.go
README.md

🪛 GitHub Check: build-test

cmd/log2csv/log2csv.go

[failure] 114-114:
Error return value of fmt.Fprintln is not checked (errcheck)

🪛 GitHub Actions: CI

cmd/log2csv/log2csv.go

[error] 114-114: Error return value of fmt.Fprintln is not checked (errcheck). Command: 'golangci-lint run --timeout=5m'.

🪛 markdownlint-cli2 (0.17.2)

README.md

48-48: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

.github/workflows/ci.yaml (1)
24-27: Add actionlint to lint the workflow itself.

Run actionlint right after checkout to fail fast on YAML/Actions issues.
       - name: Checkout
         uses: actions/checkout@v4

+      - name: Lint workflows (actionlint)
+        uses: reviewdog/action-actionlint@v1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cc0d98 and d2a8fe1.

📒 Files selected for processing (1)

.github/workflows/ci.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

coderabbitai · 2025-09-08T00:12:45Z

+      - name: Checkout
+        uses: actions/checkout@v4


🛠️ Refactor suggestion

Pin GitHub Actions by commit SHA for supply‑chain security.

Avoid mutable tags; pin to immutable SHAs (update via Dependabot).

- - name: Checkout - uses: actions/checkout@v4 + - name: Checkout + uses: actions/checkout@<commit-sha> - - name: Set up Go - uses: actions/setup-go@v5 + - name: Set up Go + uses: actions/setup-go@<commit-sha> - - name: Upload coverage (artifact) - if: always() - uses: actions/upload-artifact@v4 + - name: Upload coverage (artifact) + if: always() + uses: actions/upload-artifact@<commit-sha>

Tip: enable Dependabot for GitHub Actions to auto-bump SHAs.

Also applies to: 29-29, 91-91

🤖 Prompt for AI Agents

.github/workflows/ci.yaml around lines 25-26 (also apply same change at lines 29 and 91): the workflow is using mutable action tags (e.g., actions/checkout@v4); replace these with the action reference pinned to the corresponding immutable commit SHA (actions/checkout@<commit-sha>) for supply-chain security, and do the same for any other actions referenced at the noted lines; update Dependabot to auto-bump these SHAs (or document a process) so future updates use Dependabot PRs that refresh the pinned SHAs.

This should be already addressed

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the current state of GitHub Actions pinning in the CI workflow cat .github/workflows/ci.yaml

Length of output: 2584

@b4prog, I can see that the structure for pinning GitHub Actions is in place, but the actual commit SHAs haven't been filled in yet. The workflow currently uses placeholder values:

Line 26: actions/checkout@<commit-sha>

Line 29: actions/setup-go@<commit-sha>

Line 91: actions/upload-artifact@<commit-sha>

To complete the security improvement, these placeholders need to be replaced with actual commit SHAs. Here are the current stable commit SHAs for these actions:

- name: Checkout uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 - name: Set up Go uses: actions/setup-go@41dfa10bad2bb2ae585af6ee5bb4d7d973ad74ed # v5.1.0 - name: Upload coverage (artifact) uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.4.3

Would you like me to create an issue to track completing this security improvement with the actual commit SHAs?

coderabbitai · 2025-09-08T00:12:45Z

+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: "1.25.x"
+          check-latest: true
+          cache: true
+          cache-dependency-path: go.sum
+


🧹 Nitpick (assertive)

Drive toolchain version from go.mod for single source of truth.

Let setup-go read the version from go.mod; fewer drift issues.

- - name: Set up Go - uses: actions/setup-go@v5 - with: - go-version: "1.25.x" - check-latest: true - cache: true - cache-dependency-path: go.sum + - name: Set up Go + uses: actions/setup-go@<commit-sha> + with: + go-version-file: go.mod + check-latest: true + cache: true + cache-dependency-path: go.sum

If you intentionally target Go 1.25.x regardless of go.mod, keep the current config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Set up Go

uses: actions/setup-go@v5

with:

go-version: "1.25.x"

check-latest: true

cache: true

cache-dependency-path: go.sum

- name: Set up Go

uses: actions/setup-go@<commit-sha>

with:

go-version-file: go.mod

check-latest: true

cache: true

cache-dependency-path: go.sum

🤖 Prompt for AI Agents

.github/workflows/ci.yaml lines 28-35: the workflow pins Go with a hardcoded go-version ("1.25.x"); change it to drive the toolchain from go.mod by replacing the explicit go-version input with the setup-go input that reads the version from the file (use the action's go-version-file input set to "go.mod"), keep the other inputs (check-latest, cache, cache-dependency-path) as-is, and ensure no hardcoded version remains so the toolchain follows go.mod as the single source of truth.

coderabbitai · 2025-09-08T00:12:45Z

+      - name: Vet
+        run: go vet -mod=readonly ./...
+
+      - name: Build
+        run: go build -mod=readonly ./...
+


🛠️ Refactor suggestion

Add golangci-lint to catch issues beyond vet.

Staticcheck, style, and common bug patterns won’t be covered by vet alone.

- name: Vet run: go vet -mod=readonly ./... + - name: Lint (golangci-lint) + uses: golangci/golangci-lint-action@<commit-sha> + with: + version: latest + args: --timeout=5m

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Vet

run: go vet -mod=readonly ./...

- name: Build

run: go build -mod=readonly ./...

- name: Vet

run: go vet -mod=readonly ./...

- name: Lint (golangci-lint)

uses: golangci/golangci-lint-action@<commit-sha>

with:

version: latest

args: --timeout=5m

- name: Build

run: go build -mod=readonly ./...

🤖 Prompt for AI Agents

In .github/workflows/ci.yaml around lines 80 to 85 the workflow only runs go vet and go build; add a golangci-lint step to catch staticcheck/style/bug patterns that vet misses by invoking golangci-lint (e.g., use the golangci-lint action or install it in the job and run golangci-lint run --timeout 5m ./...), ensure the step runs before build, exits non-zero on issues, and optionally use caching for the linter binary to speed CI.

coderabbitai · 2025-09-08T00:12:46Z

+      - name: Test (race, coverage)
+        run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...
+


🧹 Nitpick (assertive)

Shuffle tests to surface order dependencies.

Helps catch hidden global state issues.

- - name: Test (race, coverage) - run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./... + - name: Test (race, coverage) + run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Test (race, coverage)

run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...

- name: Test (race, coverage)

run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...

🤖 Prompt for AI Agents

In .github/workflows/ci.yaml around lines 86 to 88, the CI job runs `go test` deterministically which can hide test order dependencies; update the `run` step to pass the test shuffle flag so test execution order is randomized (e.g., append `-shuffle=time` or a seeded value to the `go test` invocation) to surface hidden global state issues and ensure reproducibility by logging the seed if using a time-based seed.

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

cmd/log2csv/log2csv.go (2)
113-118: Preserve CRLF and check write errors in -unmatched mode.

fmt.Fprintln normalizes to LF and its error is unchecked. Write with the detected lineEnding and handle errors.

Apply:
-                        // In unmatched mode, print each unique non-matching line once.
-                        fmt.Fprintln(output, line)
+                        // In unmatched mode, print each unique non-matching line once, preserving line endings.
+                        if _, err := io.WriteString(output, line+lineEnding); err != nil {
+                            return err
+                        }
23-29: Library reuse: exported errors in package main.

If you plan reuse, move parsing/writing and exported errors into an internal/pkg package and keep main thin.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2a8fe1 and af752e3.

📒 Files selected for processing (1)

cmd/log2csv/log2csv.go (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

📚 Learning: 2025-09-07T18:51:13.329Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

cmd/log2csv/log2csv.go

📚 Learning: 2025-09-08T04:42:11.224Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.

Applied to files:

cmd/log2csv/log2csv.go

🔇 Additional comments (1)

cmd/log2csv/log2csv.go (1)

6-6: Good move to encoding/csv and single-writer usage.

Using csv.Writer once with UseCRLF based on detected line endings is correct and reduces per-row allocations.

Also applies to: 99-103, 149-151

coderabbitai · 2025-09-08T04:58:57Z

+	fmt.Fprint(os.Stderr, msg)
+}


🧹 Nitpick (assertive)

Silence errcheck for informational prints.

Main/usage can’t return errors; explicitly ignore the result.

Apply:

- fmt.Fprint(os.Stderr, msg) + _, _ = fmt.Fprint(os.Stderr, msg) @@ - fmt.Fprintf(os.Stderr, "error: %v\n", err) + _, _ = fmt.Fprintf(os.Stderr, "error: %v\n", err)

Also applies to: 53-54

🤖 Prompt for AI Agents

In cmd/log2csv/log2csv.go around lines 48-49 (and similarly at 53-54), the fmt.Fprint calls that write informational usage/help text return an error which is currently unused; explicitly ignore the returned error to silence errcheck by assigning it to the blank identifier. Update each fmt.Fprint/Printf invocation used for informational prints to capture the return values like _, _ = fmt.Fprint(...) or _, _ = fmt.Fprintf(...) so the result is intentionally ignored while preserving the existing behavior.

coderabbitai · 2025-09-08T04:58:58Z

+	groupNames := extractGroupNames(re)
+	if len(groupNames) == 0 {
+		return ErrNoNamedCaptureGroups
+	}
+	out := bufio.NewWriter(os.Stdout)
+	err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+	if flushErr := out.Flush(); err == nil && flushErr != nil {
+		err = flushErr
+	}


🛠️ Refactor suggestion

Avoid per-line SubexpNames; precompute indices. Also don’t require named groups in -unmatched mode.

SubexpNames() on every line is unnecessary; precompute capture indices once.

In -unmatched mode, named groups aren’t needed—current unconditional check blocks valid usage.

Apply:

@@ - groupNames := extractGroupNames(re) - if len(groupNames) == 0 { - return ErrNoNamedCaptureGroups - } - out := bufio.NewWriter(os.Stdout) - err = processInput(os.Stdin, re, groupNames, out, *listUnmatched) + // Precompute indices for named groups once. + groupIndices := namedGroupIndices(re) + // Only require/compute names when emitting CSV. + var groupNames []string + if !*listUnmatched { + groupNames = extractGroupNames(re) + if len(groupNames) == 0 { + return ErrNoNamedCaptureGroups + } + } + out := bufio.NewWriter(os.Stdout) + err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched) @@ func extractGroupNames(re *regexp.Regexp) []string { @@ } +// namedGroupIndices returns the capture indices (into FindStringSubmatch) +// for all named groups, in declaration order. Duplicates are preserved. +func namedGroupIndices(re *regexp.Regexp) []int { + names := re.SubexpNames() + idxs := make([]int, 0, len(names)) + for i := 1; i < len(names); i++ { + if names[i] != "" { + idxs = append(idxs, i) + } + } + return idxs +} @@ -func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error { +func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error { @@ - values, ok := processLine(line, re, groupNames) + values, ok := processLine(line, re, groupIndices) @@ -// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored. -func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) { - submatches := re.FindStringSubmatch(line) - if submatches == nil { - return nil, false - } - subNames := re.SubexpNames() - values := make([]string, 0, len(groupNames)) - allEmpty := true - for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ { - name := subNames[idxSubmatch] - if name == "" { - continue - } - val := submatches[idxSubmatch] - if val != "" { - allEmpty = false - } - values = append(values, val) - } - if allEmpty { - return nil, false - } - return values, true -} +// Process a log line and return CSV values + true if valid, or nil + false if ignored. +func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) { + submatches := re.FindStringSubmatch(line) + if submatches == nil { + return nil, false + } + values := make([]string, len(groupIndices)) + allEmpty := true + for i, idx := range groupIndices { + v := submatches[idx] + if v != "" { + allEmpty = false + } + values[i] = v + } + if allEmpty { + return nil, false + } + return values, true +}

Note: This preserves duplicate header names as-is (per your stated flexibility goal).

Also applies to: 83-93, 94-94, 110-110, 175-199

🤖 Prompt for AI Agents

In cmd/log2csv/log2csv.go around lines 71-79 (and also adjust the referenced sections 83-93, 94, 110, and 175-199), avoid calling re.SubexpNames() per input line by calling it once before processing and precomputing the slice of named capture indices; pass that precomputed data into processInput and any per-line logic. Also remove the unconditional check that requires named capture groups when the -unmatched flag is set: only return ErrNoNamedCaptureGroups if -unmatched is false and no named groups were found. Update function signatures and callers as needed to accept the cached names/indices and ensure flushing/error handling remains unchanged.

coderabbitai · 2025-09-08T04:58:58Z

+	// Track unique non-matching lines when -unmatched is set.
+	seenUnmatched := make(map[string]struct{})
+	for sc.Scan() {


🧹 Nitpick (assertive)

Unbounded memory in -unmatched dedupe.

seenUnmatched grows with unique lines; huge logs could exhaust memory. Consider a flag to disable dedupe (-unmatched-all), a size cap, or a Bloom filter if approximate uniqueness is acceptable.

Would you like a patch adding -unmatched-all and an optional -unmatched-max= cap?

🤖 Prompt for AI Agents

In cmd/log2csv/log2csv.go around lines 106 to 108, the current seenUnmatched map for deduping unmatched lines is unbounded and can exhaust memory on huge logs; add two flags: -unmatched-all (bool) to disable dedupe entirely and -unmatched-max=<N> (int) to cap the number of unique entries tracked; implement logic so that when -unmatched-all is true the code does not create or consult seenUnmatched, otherwise create seenUnmatched but enforce the cap by stopping insertion of new unique keys once the count reaches unmatched-max (or alternatively evict oldest if you prefer FIFO behavior), and document the semantics and defaults (e.g., unmatched-max default 100000 or similar) and validate the flag values on startup.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

.coderabbit.yaml (1)

6-6: Poem disabled — good follow-through on prior feedback.
Matches the earlier request; keeps reviews concise.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af752e3 and 9d5548e.

📒 Files selected for processing (1)

.coderabbit.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.

🔇 Additional comments (1)

.coderabbit.yaml (1)

7-9: Confirm intent to review draft PRs.
Setting drafts: true will trigger reviews on drafts; verify this is desired for your workflow.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (12)

.github/workflows/ci.yaml (4)

29-35: Drive Go version from go.mod (single source of truth).

Let setup-go read the version from go.mod to avoid drift.

-      - name: Set up Go
-        uses: actions/setup-go@v5
-        with:
-          go-version: "1.25.x"
-          check-latest: true
-          cache: true
-          cache-dependency-path: go.sum
+      - name: Set up Go
+        uses: actions/setup-go@v5
+        with:
+          go-version-file: go.mod
+          check-latest: true
+          cache: true
+          cache-dependency-path: go.sum

27-27: Pin GitHub Actions by commit SHA.

Improves supply-chain security; avoid mutable tags.

-        uses: actions/checkout@v4
+        uses: actions/checkout@<commit-sha>

-        uses: actions/setup-go@v5
+        uses: actions/setup-go@<commit-sha>

-        uses: actions/upload-artifact@v4
+        uses: actions/upload-artifact@<commit-sha>

Tip: enable Dependabot for Actions to auto-bump SHAs.

Also applies to: 30-30, 92-92

81-86: Add golangci-lint to catch issues beyond vet.

Covers errcheck, staticcheck, style, and common bugs.

       - name: Vet
         run: go vet -mod=readonly ./...

+      - name: Lint (golangci-lint)
+        uses: golangci/golangci-lint-action@<commit-sha>
+        with:
+          version: latest
+          args: --timeout=5m

87-88: Shuffle tests to surface order dependencies.

Randomizes execution order.

-        run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...
+        run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...

cmd/log2csv/log2csv.go (7)

48-49: Silence errcheck for usage print.

Explicitly ignore the write error.

-	fmt.Fprint(os.Stderr, msg)
+	_, _ = fmt.Fprint(os.Stderr, msg)

52-55: Silence errcheck for error print in main.

Keep behavior; ignore write error explicitly.

-		fmt.Fprintf(os.Stderr, "error: %v\n", err)
+		_, _ = fmt.Fprintf(os.Stderr, "error: %v\n", err)

145-147: Check stderr write for the warning.

Propagate write failures.

-		fmt.Fprintf(os.Stderr, "\nwarning: %d log line(s) did not match the pattern and were ignored\nfirst ignored line: %q\n", ignoredLines, firstIgnoredLine)
+		if _, err := fmt.Fprintf(os.Stderr, "\nwarning: %d log line(s) did not match the pattern and were ignored\nfirst ignored line: %q\n", ignoredLines, firstIgnoredLine); err != nil {
+			return err
+		}

22-29: Library vs CLI split (optional).

Exported errors in package main aren’t reusable; consider moving core logic to a library package and keep main thin.

71-76: Don’t require named groups in -unmatched mode.

Current check blocks valid -unmatched use cases.

-	groupNames := extractGroupNames(re)
-	if len(groupNames) == 0 {
-		return ErrNoNamedCaptureGroups
-	}
+	var groupNames []string
+	if !*listUnmatched {
+		groupNames = extractGroupNames(re)
+		if len(groupNames) == 0 {
+			return ErrNoNamedCaptureGroups
+		}
+	}

177-201: Avoid per-line SubexpNames; precompute named-group indices once.

Removes O(lines) allocations and speeds hot path.

-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
-	submatches := re.FindStringSubmatch(line)
-	if submatches == nil {
-		return nil, false
-	}
-	subNames := re.SubexpNames()
-	values := make([]string, 0, len(groupNames))
-	allEmpty := true
-	for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
-		name := subNames[idxSubmatch]
-		if name == "" {
-			continue
-		}
-		val := submatches[idxSubmatch]
-		if val != "" {
-			allEmpty = false
-		}
-		values = append(values, val)
-	}
-	if allEmpty {
-		return nil, false
-	}
-	return values, true
-}
+// Process a log line and return CSV values + true if valid, or nil + false if ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
+	submatches := re.FindStringSubmatch(line)
+	if submatches == nil {
+		return nil, false
+	}
+	values := make([]string, len(groupIndices))
+	allEmpty := true
+	for i, idx := range groupIndices {
+		v := submatches[idx]
+		if v != "" {
+			allEmpty = false
+		}
+		values[i] = v
+	}
+	if allEmpty {
+		return nil, false
+	}
+	return values, true
+}

And wire it through:

 func extractGroupNames(re *regexp.Regexp) []string {
@@
 }
+
+// namedGroupIndices returns capture indices for named groups in declaration order.
+func namedGroupIndices(re *regexp.Regexp) []int {
+	names := re.SubexpNames()
+	idxs := make([]int, 0, len(names))
+	for i := 1; i < len(names); i++ {
+		if names[i] != "" {
+			idxs = append(idxs, i)
+		}
+	}
+	return idxs
+}

-	groupNames := extractGroupNames(re)
+	groupIndices := namedGroupIndices(re)
+	groupNames := extractGroupNames(re)

-	err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+	err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)

-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error {
@@
-		values, ok := processLine(line, re, groupNames)
+		values, ok := processLine(line, re, groupIndices)

106-127: Unbounded memory for -unmatched dedupe. Add a cap.

Large logs can exhaust memory.

-	seenUnmatched := make(map[string]struct{})
+	seenUnmatched := make(map[string]struct{})
+	const unmatchedMaxDefault = 0 // 0 = unlimited
+	unmatchedMax := unmatchedMaxDefault
@@
-				if listUnmatched {
-					if _, exists := seenUnmatched[line]; !exists {
-						seenUnmatched[line] = struct{}{}
-						// In unmatched mode, print each unique non-matching line once, preserving line endings.
-						if _, err := io.WriteString(output, line+lineEnding); err != nil {
-							return err
-						}
-					}
+				if listUnmatched {
+					// If a cap is configured and reached, stop tracking new uniques and stream lines.
+					if unmatchedMax > 0 && len(seenUnmatched) >= unmatchedMax {
+						if _, err := io.WriteString(output, line+lineEnding); err != nil {
+							return err
+						}
+					} else if _, exists := seenUnmatched[line]; !exists {
+						seenUnmatched[line] = struct{}{}
+						if _, err := io.WriteString(output, line+lineEnding); err != nil {
+							return err
+						}
+					}

Follow-up: if you prefer a flag (e.g., -unmatched-max=N or -unmatched-all), I can wire it through run() and processInput.

README.md (1)

31-33: Remove trailing period from Windows example (breaks copy/paste).

The fenced command ends with an extra ".".

-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$".
+Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d5548e and 93e7ca7.

📒 Files selected for processing (3)

.github/workflows/ci.yaml (1 hunks)
README.md (1 hunks)
cmd/log2csv/log2csv.go (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.

📚 Learning: 2025-09-07T18:51:13.329Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.

Applied to files:

README.md
cmd/log2csv/log2csv.go

📚 Learning: 2025-09-08T04:42:11.224Z

Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.

Applied to files:

README.md
cmd/log2csv/log2csv.go

🔇 Additional comments (1)

.github/workflows/ci.yaml (1)

16-17: LGTM: job timeout added.

Good safeguard against hung runs.

coderabbitai · 2025-09-09T04:31:26Z

+func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+	inputReader, lineEnding, err := peekForLineEnding(input, logLineSizeMax)
+	if err != nil {
+		return err
+	}
+	sc := openInput(inputReader)
+	csvWriter := csv.NewWriter(output)
+	csvWriter.Comma = csvSeparator
+	csvWriter.UseCRLF = lineEnding == "\r\n"
+	firstLine := true


🧹 Nitpick (assertive)

Avoid creating csv.Writer in -unmatched mode.

Skip allocation/flush when not needed.

- csvWriter := csv.NewWriter(output) - csvWriter.Comma = csvSeparator - csvWriter.UseCRLF = lineEnding == "\r\n" + var csvWriter *csv.Writer + if !listUnmatched { + csvWriter = csv.NewWriter(output) + csvWriter.Comma = csvSeparator + csvWriter.UseCRLF = lineEnding == "\r\n" + } @@ - csvWriter.Flush() - return csvWriter.Error() + if csvWriter != nil { + csvWriter.Flush() + return csvWriter.Error() + } + return nil

Also applies to: 151-153

🤖 Prompt for AI Agents

In cmd/log2csv/log2csv.go around lines 94 to 103 (and similarly at 151-153), the function always allocates a csv.Writer and sets its properties even when running in -unmatched (listUnmatched) mode where CSV output is not produced; change the logic so that csv.Writer is only created and configured when listUnmatched is false, and guard any later calls to csvWriter.Write and csvWriter.Flush with the same condition so no allocation or flush is performed when listUnmatched is true.

coderabbitai · 2025-09-09T04:31:26Z

+- Reads log lines from **STDIN** and writes CSV to **STDOUT**.
+- Extracts fields using **named capture groups** (`(?P<Name>...)`).
+- The **CSV header row** is automatically generated from group names.
+- Preserves the input's line endings (LF/CRLF).
+- Optional **unmatched mode** (`-unmatched`) to print **unique non-matching lines** instead of CSV.


🧹 Nitpick (assertive)

Docs/code mismatch for -unmatched named-groups requirement.

Current code requires named groups even with -unmatched; README doesn’t say so. Either relax the code (preferred; see code review) or document the requirement.

Also applies to: 18-22

🤖 Prompt for AI Agents

In README.md around lines 10 to 14 (and similarly 18 to 22), the docs state -unmatched prints unique non-matching lines but the code currently enforces that the regex contains named capture groups even when -unmatched is used; relax the code so that when -unmatched is enabled the named-group validation is skipped: modify the validation logic to only require named groups if unmatched flag is false, ensure unmatched mode continues to collect and print unique non-matching input lines, update or add tests for unmatched without named groups, and keep README unchanged (or if you prefer documenting the requirement instead, update these README lines to state that named groups are required when not using -unmatched).

coderabbitai · 2025-09-09T04:31:26Z

+## Install
+
+```sh
+go install github.com/b4prog/Log2CSV@latest
+```
+


🧹 Nitpick (assertive)

Mention minimum Go version.

Improves install clarity.

## Install @@ ```sh go install github.com/b4prog/Log2CSV@latest

+Requires Go 1.25+.

<details> <summary>🤖 Prompt for AI Agents</summary>

README.md around lines 35 to 40: the Install section calls "go install
github.com/b4prog/Log2CSV@latest" but doesn't state the minimum Go version;
update the Install section to explicitly mention the required minimum Go version
(e.g., "Requires Go 1.25+"), placing that note directly below the install
command and keeping phrasing concise and consistent with the document's style.

</details>  

…uce flakiness on slow runs.

…bot)." This reverts commit 8f01515.

b4prog added 2 commits September 7, 2025 23:51

Log2CSV initial implementation

84ccafa

improves CodeRabbit config

90c9ae4

b4prog marked this pull request as ready for review September 7, 2025 21:57

coderabbitai Bot reviewed Sep 7, 2025

View reviewed changes

Improves sample regular expression for UFW logs

29280dd

- IPv4 (TOS/PREC/TTL/ID[/DF]) or IPv6 (TC/HOPLIMIT/FLOWLBL) through alternation – TCP/UDP/ICMP(-v6) with optional ports and TCP flags – the second LEN (L4 length) present on UDP

coderabbitai Bot reviewed Sep 7, 2025

View reviewed changes

Comment thread cmd/log2csv/log2csv.go

b4prog added 3 commits September 8, 2025 00:47

Write CSV using 'encoding/csv'

d536662

Changes the behavior to mage an error when line don't match.

7cc0d98

-unmatched to list lines that do not match

removes lint tha break way too often

d2a8fe1

coderabbitai Bot reviewed Sep 8, 2025

View reviewed changes

Comment thread cmd/log2csv/log2csv.go

Comment thread cmd/log2csv/log2csv.go

Comment thread README.md Outdated

coderabbitai Bot reviewed Sep 8, 2025

View reviewed changes

Reuse a single csv.Writer; avoid per-row allocation/flush.

af752e3

coderabbitai Bot reviewed Sep 8, 2025

View reviewed changes

update CodeRabbit configuration

9d5548e

coderabbitai Bot reviewed Sep 9, 2025

View reviewed changes

Comment thread .coderabbit.yaml

b4prog added 5 commits September 9, 2025 06:18

Fix unchecked write and preserve CRLF in -unmatched mode.

49cdd39

fix readme.md empty quote

6c37960

Prevents infinite waits on stuck builds/tests.

93e7ca7

Avoid mutable tags; pin to immutable SHAs (update via Dependabot).

8f01515

update CodeRabbit config

0521b22

coderabbitai Bot reviewed Sep 9, 2025

View reviewed changes

b4prog added 2 commits September 9, 2025 06:33

update CodeRabbit config: Explicitly set GitHub Checks timeout to red…

972c22b

…uce flakiness on slow runs.

Revert "Avoid mutable tags; pin to immutable SHAs (update via Dependa…

16515dc

…bot)." This reverts commit 8f01515.

b4prog merged commit 96a5ad4 into main Sep 9, 2025
2 of 3 checks passed

		- name: Test (race, coverage)
		run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...

Conversation

b4prog commented Sep 7, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

b4prog Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Sep 9, 2025

b4prog commented Sep 7, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 7, 2025 •

edited

Loading

coderabbitai Bot Sep 8, 2025 •

edited

Loading