Initial implementation#1
Conversation
|
Warning Rate limit exceeded@b4prog has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 6 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughAdds a new Go module and CLI tool log2csv that reads STDIN, applies a user-supplied regex with named capture groups to produce CSV (or unique non-matching lines). Also adds CI workflow, task runner, updated .gitignore, expanded README, and a CodeRabbit config; exports ErrInvalidRegexp and ErrNoNamedCaptureGroups. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Shell as Shell/OS
participant CLI as log2csv
participant RE as RegExp Engine
participant CSV as CSV Writer
participant STDIN as STDIN
participant STDOUT as STDOUT
User->>Shell: run log2csv -regexp "<pattern>" [--unmatched]
Shell->>CLI: start
CLI->>RE: compile pattern
alt compile error
RE-->>CLI: syntax error
CLI-->>Shell: exit ErrInvalidRegexp
else compiled
RE-->>CLI: compiled
alt no named groups
CLI-->>Shell: exit ErrNoNamedCaptureGroups
else has named groups
loop each input line
STDIN-->>CLI: line
CLI->>RE: match(line)
alt match
CLI->>CLI: extract named groups
opt first matching line
CLI->>CSV: write header (group names)
end
alt all groups empty
CLI-->>CLI: skip row
else
CLI->>CSV: write row (preserve EOL)
CSV-->>STDOUT: CSV line
end
else no match
alt --unmatched
CLI-->>CLI: collect unique non-match
else
CLI-->>CLI: count non-matches (warn later)
end
end
end
CLI->>CSV: flush
CLI-->>Shell: exit 0
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks (2 warnings, 1 inconclusive)❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
.coderabbit.yaml(1 hunks).github/workflows/ci.yaml(1 hunks).gitignore(1 hunks)README.md(1 hunks)cmd/log2csv/log2csv.go(1 hunks)go.mod(1 hunks)taskfile.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Applied to files:
go.modcmd/log2csv/log2csv.goREADME.md
🔇 Additional comments (3)
go.mod (1)
3-3: Go 1.25 GA and CI support confirmed. Go 1.25 was released on August 12, 2025, and actions/setup-go@v5 supports it via thego-versioninput with manifest lookup and fallback to the official distribution..gitignore (1)
31-42: LGTM on ignores for IDEs and build artifacts.The whitelist for specific .vscode configs and adding /build/ and .task/ look good.
.github/workflows/ci.yaml (1)
31-34: No changes needed: actions/setup-go@v5 supports Go 1.25.x
setup-go v5 resolves1.25.xfrom the Go versions manifest (Go 1.25 was released Aug 12, 2025) and will install it correctly.
- IPv4 (TOS/PREC/TTL/ID[/DF]) or IPv6 (TC/HOPLIMIT/FLOWLBL) through alternation – TCP/UDP/ICMP(-v6) with optional ports and TCP flags – the second LEN (L4 length) present on UDP
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (6)
README.md (2)
10-16: Document duplicate named groups and header emission semantics.Make the behavior explicit to avoid surprises when patterns reuse names and when no lines match.
- Extracts fields using **named capture groups** (`(?P<Name>...)`). -- The **CSV header row** is automatically generated from group names. +- Allows **duplicate named groups**; each occurrence becomes its own CSV column in left‑to‑right order (names are not deduplicated). +- The **CSV header row** is automatically generated from group names (including duplicates) and is emitted on the first matching line (no header if no lines match).
29-31: Remove trailing period from the Windows example (breaks copy/paste).There’s an extra “.” at the end of Line 30 inside the code block.
-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[(?P<Kernel_Time>[\d\.]+)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S+)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)\s+(?:(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)\s+(?:(?P<DF>DF)\s+)?|(?:TC=(?P<TC>\d+)\s+)?(?:HOPLIMIT=(?P<HOPLIMIT>\d+)\s+)?(?:FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+)\s+)? )PROTO=(?P<PROTO>[A-Za-z0-9]+)\s+(?:(?:SPT|SP)=(?P<SPT>\d+)\s+)?(?:(?:DPT|DP)=(?P<DPT>\d+)\s+)?(?:WINDOW=(?P<WINDOW>\d+)\s+)?(?:RES=(?P<RES>0x[0-9A-Fa-f]{2})\s+)?(?:(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$". +Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[(?P<Kernel_Time>[\d\.]+)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S+)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)\s+(?:(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)\s+(?:(?P<DF>DF)\s+)?|(?:TC=(?P<TC>\d+)\s+)?(?:HOPLIMIT=(?P<HOPLIMIT>\d+)\s+)?(?:FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+)\s+)? )PROTO=(?P<PROTO>[A-Za-z0-9]+)\s+(?:(?:SPT|SP)=(?P<SPT>\d+)\s+)?(?:(?:DPT|DP)=(?P<DPT>\d+)\s+)?(?:WINDOW=(?P<WINDOW>\d+)\s+)?(?:RES=(?P<RES>0x[0-9A-Fa-f]{2})\s+)?(?:(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"cmd/log2csv/log2csv.go (4)
21-28: Exported errors in package main hinder reuse. Consider a library package.If you plan reuse, move parsing and errors into an internal/pkg and keep main thin.
3-13: Use encoding/csv for correct CSV quoting and newlines.Removes hand-rolled quoting; handles commas, quotes, and CRLF robustly.
@@ import ( "bufio" "bytes" + "encoding/csv" "errors" "flag" "fmt" "io" "os" "regexp" "strings" ) @@ -func writeCSVRow(output io.Writer, values []string, lineEnding string) error { - csvValues := make([]string, len(values)) - for idxValue, value := range values { - csvValues[idxValue] = `"` + strings.ReplaceAll(value, `"`, `""`) + `"` - } - csvLine := strings.Join(csvValues, string(csvSeparator)) + lineEnding - _, err := io.WriteString(output, csvLine) - return err -} +func writeCSVRow(output io.Writer, values []string, lineEnding string) error { + w := csv.NewWriter(output) + w.Comma = csvSeparator + w.UseCRLF = lineEnding == "\r\n" + if err := w.Write(values); err != nil { + return err + } + w.Flush() + return w.Error() +}Also applies to: 160-168
51-73: Precompute named-group indices; avoid per-line SubexpNames/allocations.Cuts per-line work and fixes the currently unused parameter pattern.
@@ out := bufio.NewWriter(os.Stdout) - err = processInput(os.Stdin, re, groupNames, out) + groupIndices := namedGroupIndices(re) + err = processInput(os.Stdin, re, groupIndices, groupNames, out) @@ -func extractGroupNames(re *regexp.Regexp) []string { +func extractGroupNames(re *regexp.Regexp) []string { names := re.SubexpNames()[1:] ordered := make([]string, 0, len(names)) for _, name := range names { if name != "" { ordered = append(ordered, name) } } return ordered } + +// namedGroupIndices returns the submatch indices for named groups (incl. duplicates) in order. +func namedGroupIndices(re *regexp.Regexp) []int { + names := re.SubexpNames() + idxs := make([]int, 0, len(names)) + for i := 1; i < len(names); i++ { + if names[i] != "" { + idxs = append(idxs, i) + } + } + return idxs +} @@ -func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer) error { +func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer) error { @@ - values, ok := processLine(line, re, groupNames) + values, ok := processLine(line, re, groupIndices) @@ -// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored. -func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) { +// Process a log line and return CSV values + true if valid, or nil + false if it should be ignored. +func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) { submatches := re.FindStringSubmatch(line) if submatches == nil { return nil, false } - subNames := re.SubexpNames() - values := make([]string, 0, len(groupNames)) + values := make([]string, len(groupIndices)) allEmpty := true - for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ { - name := subNames[idxSubmatch] - if name == "" { - continue - } - val := submatches[idxSubmatch] + for i, subIdx := range groupIndices { + val := submatches[subIdx] if val != "" { allEmpty = false } - values = append(values, val) + values[i] = val } if allEmpty { return nil, false } return values, true }Also applies to: 86-110, 134-158
112-125: Peek-based line-ending detection can misdetect on very long first lines (>64 KiB).Switch to ReadSlice + io.MultiReader to be exact.
-func peekForLineEnding(input io.Reader, sizeMaxPeek int) (io.Reader, string, error) { - inputBuffer := bufio.NewReader(input) - sample, err := inputBuffer.Peek(sizeMaxPeek) - if err != nil && err != io.EOF && !errors.Is(err, bufio.ErrBufferFull) { - return nil, "", err - } - if idx := bytes.IndexByte(sample, '\n'); idx >= 0 { - if idx > 0 && sample[idx-1] == '\r' { - return inputBuffer, "\r\n", nil - } - return inputBuffer, "\n", nil - } - return inputBuffer, "\n", nil -} +func peekForLineEnding(input io.Reader, _ int) (io.Reader, string, error) { + inputBuffer := bufio.NewReader(input) + var first []byte + for { + chunk, err := inputBuffer.ReadSlice('\n') + first = append(first, chunk...) + if errors.Is(err, bufio.ErrBufferFull) { + continue + } + if err != nil && err != io.EOF { + return nil, "", err + } + break + } + ending := "\n" + if idx := bytes.IndexByte(first, '\n'); idx >= 1 && first[idx-1] == '\r' { + ending = "\r\n" + } + return io.MultiReader(bytes.NewReader(first), inputBuffer), ending, nil +}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
README.md(1 hunks)cmd/log2csv/log2csv.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Applied to files:
README.mdcmd/log2csv/log2csv.go
🔇 Additional comments (2)
cmd/log2csv/log2csv.go (2)
30-43: Usage and examples read well. LGTM.
59-66: Good error mapping and validation.
-unmatched to list lines that do not match
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (5)
README.md (2)
10-15: Document duplicate named groups and header emission.Align docs with intended behavior (duplicates allowed; header once on first match).
- The **CSV header row** is automatically generated from group names. +- Allows **duplicate named groups**; each occurrence becomes its own CSV column in left-to-right pattern order (names are not de-duplicated). +- The **CSV header** is emitted once, on the first matched line. No header is produced if no lines match or when `-unmatched` is used.
31-33: Remove trailing period in Windows example (breaks copy/paste).-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$". +Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"cmd/log2csv/log2csv.go (3)
71-81: Precompute named-group indices; avoid SubexpNames per line.Improves hot-path performance and removes unused parameter.
@@ - groupNames := extractGroupNames(re) + groupNames := extractGroupNames(re) + groupIndices := namedGroupIndices(re) @@ - out := bufio.NewWriter(os.Stdout) - err = processInput(os.Stdin, re, groupNames, out, *listUnmatched) + out := bufio.NewWriter(os.Stdout) + err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched) @@ -func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error { +func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error { @@ - values, ok := processLine(line, re, groupNames) + values, ok := processLine(line, re, groupIndices) @@ -// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored. -func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) { +// Process a log line and return CSV values + true if the line is valid, or nil + false if the line should be ignored. +func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) { submatches := re.FindStringSubmatch(line) if submatches == nil { return nil, false } - subNames := re.SubexpNames() - values := make([]string, 0, len(groupNames)) + values := make([]string, len(groupIndices)) allEmpty := true - for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ { - name := subNames[idxSubmatch] - if name == "" { - continue - } - val := submatches[idxSubmatch] + for i, subIdx := range groupIndices { + val := submatches[subIdx] if val != "" { allEmpty = false } - values = append(values, val) + values[i] = val } if allEmpty { return nil, false } return values, true } + +func namedGroupIndices(re *regexp.Regexp) []int { + names := re.SubexpNames() + idxs := make([]int, 0, len(names)) + for i := 1; i < len(names); i++ { + if names[i] != "" { + idxs = append(idxs, i) + } + } + return idxs +}Also applies to: 94-107, 168-191
146-159: Line-ending detection may misdetect on very long first lines.Optional: switch to ReadSlice('\n') and rebuild the reader to avoid peek-size dependency.
22-29: Consider moving exported errors out of package main.If reuse is planned, place them in an internal/pkg library and keep main thin.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
README.md(1 hunks)cmd/log2csv/log2csv.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Applied to files:
cmd/log2csv/log2csv.goREADME.md
🪛 GitHub Check: build-test
cmd/log2csv/log2csv.go
[failure] 114-114:
Error return value of fmt.Fprintln is not checked (errcheck)
🪛 GitHub Actions: CI
cmd/log2csv/log2csv.go
[error] 114-114: Error return value of fmt.Fprintln is not checked (errcheck). Command: 'golangci-lint run --timeout=5m'.
🪛 markdownlint-cli2 (0.17.2)
README.md
48-48: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
.github/workflows/ci.yaml (1)
24-27: Add actionlint to lint the workflow itself.Run actionlint right after checkout to fail fast on YAML/Actions issues.
- name: Checkout uses: actions/checkout@v4 + - name: Lint workflows (actionlint) + uses: reviewdog/action-actionlint@v1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
| - name: Checkout | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Pin GitHub Actions by commit SHA for supply‑chain security.
Avoid mutable tags; pin to immutable SHAs (update via Dependabot).
- - name: Checkout
- uses: actions/checkout@v4
+ - name: Checkout
+ uses: actions/checkout@<commit-sha>
- - name: Set up Go
- uses: actions/setup-go@v5
+ - name: Set up Go
+ uses: actions/setup-go@<commit-sha>
- - name: Upload coverage (artifact)
- if: always()
- uses: actions/upload-artifact@v4
+ - name: Upload coverage (artifact)
+ if: always()
+ uses: actions/upload-artifact@<commit-sha>Tip: enable Dependabot for GitHub Actions to auto-bump SHAs.
Also applies to: 29-29, 91-91
🤖 Prompt for AI Agents
.github/workflows/ci.yaml around lines 25-26 (also apply same change at lines 29
and 91): the workflow is using mutable action tags (e.g., actions/checkout@v4);
replace these with the action reference pinned to the corresponding immutable
commit SHA (actions/checkout@<commit-sha>) for supply-chain security, and do the
same for any other actions referenced at the noted lines; update Dependabot to
auto-bump these SHAs (or document a process) so future updates use Dependabot
PRs that refresh the pinned SHAs.
There was a problem hiding this comment.
This should be already addressed
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the current state of GitHub Actions pinning in the CI workflow
cat .github/workflows/ci.yaml
Length of output: 2584
@b4prog, I can see that the structure for pinning GitHub Actions is in place, but the actual commit SHAs haven't been filled in yet. The workflow currently uses placeholder values:
- Line 26:
actions/checkout@<commit-sha> - Line 29:
actions/setup-go@<commit-sha> - Line 91:
actions/upload-artifact@<commit-sha>
To complete the security improvement, these placeholders need to be replaced with actual commit SHAs. Here are the current stable commit SHAs for these actions:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: Set up Go
uses: actions/setup-go@41dfa10bad2bb2ae585af6ee5bb4d7d973ad74ed # v5.1.0
- name: Upload coverage (artifact)
uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.4.3Would you like me to create an issue to track completing this security improvement with the actual commit SHAs?
| - name: Set up Go | ||
| uses: actions/setup-go@v5 | ||
| with: | ||
| go-version: "1.25.x" | ||
| check-latest: true | ||
| cache: true | ||
| cache-dependency-path: go.sum | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Drive toolchain version from go.mod for single source of truth.
Let setup-go read the version from go.mod; fewer drift issues.
- - name: Set up Go
- uses: actions/setup-go@v5
- with:
- go-version: "1.25.x"
- check-latest: true
- cache: true
- cache-dependency-path: go.sum
+ - name: Set up Go
+ uses: actions/setup-go@<commit-sha>
+ with:
+ go-version-file: go.mod
+ check-latest: true
+ cache: true
+ cache-dependency-path: go.sumIf you intentionally target Go 1.25.x regardless of go.mod, keep the current config.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Set up Go | |
| uses: actions/setup-go@v5 | |
| with: | |
| go-version: "1.25.x" | |
| check-latest: true | |
| cache: true | |
| cache-dependency-path: go.sum | |
| - name: Set up Go | |
| uses: actions/setup-go@<commit-sha> | |
| with: | |
| go-version-file: go.mod | |
| check-latest: true | |
| cache: true | |
| cache-dependency-path: go.sum |
🤖 Prompt for AI Agents
.github/workflows/ci.yaml lines 28-35: the workflow pins Go with a hardcoded
go-version ("1.25.x"); change it to drive the toolchain from go.mod by replacing
the explicit go-version input with the setup-go input that reads the version
from the file (use the action's go-version-file input set to "go.mod"), keep the
other inputs (check-latest, cache, cache-dependency-path) as-is, and ensure no
hardcoded version remains so the toolchain follows go.mod as the single source
of truth.
| - name: Vet | ||
| run: go vet -mod=readonly ./... | ||
|
|
||
| - name: Build | ||
| run: go build -mod=readonly ./... | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add golangci-lint to catch issues beyond vet.
Staticcheck, style, and common bug patterns won’t be covered by vet alone.
- name: Vet
run: go vet -mod=readonly ./...
+ - name: Lint (golangci-lint)
+ uses: golangci/golangci-lint-action@<commit-sha>
+ with:
+ version: latest
+ args: --timeout=5m📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Vet | |
| run: go vet -mod=readonly ./... | |
| - name: Build | |
| run: go build -mod=readonly ./... | |
| - name: Vet | |
| run: go vet -mod=readonly ./... | |
| - name: Lint (golangci-lint) | |
| uses: golangci/golangci-lint-action@<commit-sha> | |
| with: | |
| version: latest | |
| args: --timeout=5m | |
| - name: Build | |
| run: go build -mod=readonly ./... |
🤖 Prompt for AI Agents
In .github/workflows/ci.yaml around lines 80 to 85 the workflow only runs go vet
and go build; add a golangci-lint step to catch staticcheck/style/bug patterns
that vet misses by invoking golangci-lint (e.g., use the golangci-lint action or
install it in the job and run golangci-lint run --timeout 5m ./...), ensure the
step runs before build, exits non-zero on issues, and optionally use caching for
the linter binary to speed CI.
| - name: Test (race, coverage) | ||
| run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./... | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Shuffle tests to surface order dependencies.
Helps catch hidden global state issues.
- - name: Test (race, coverage)
- run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./...
+ - name: Test (race, coverage)
+ run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Test (race, coverage) | |
| run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./... | |
| - name: Test (race, coverage) | |
| run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./... |
🤖 Prompt for AI Agents
In .github/workflows/ci.yaml around lines 86 to 88, the CI job runs `go test`
deterministically which can hide test order dependencies; update the `run` step
to pass the test shuffle flag so test execution order is randomized (e.g.,
append `-shuffle=time` or a seeded value to the `go test` invocation) to surface
hidden global state issues and ensure reproducibility by logging the seed if
using a time-based seed.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
cmd/log2csv/log2csv.go (2)
113-118: Preserve CRLF and check write errors in -unmatched mode.fmt.Fprintln normalizes to LF and its error is unchecked. Write with the detected lineEnding and handle errors.
Apply:
- // In unmatched mode, print each unique non-matching line once. - fmt.Fprintln(output, line) + // In unmatched mode, print each unique non-matching line once, preserving line endings. + if _, err := io.WriteString(output, line+lineEnding); err != nil { + return err + }
23-29: Library reuse: exported errors in package main.If you plan reuse, move parsing/writing and exported errors into an internal/pkg package and keep main thin.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
cmd/log2csv/log2csv.go(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Applied to files:
cmd/log2csv/log2csv.go
📚 Learning: 2025-09-08T04:42:11.224Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.
Applied to files:
cmd/log2csv/log2csv.go
🔇 Additional comments (1)
cmd/log2csv/log2csv.go (1)
6-6: Good move to encoding/csv and single-writer usage.Using csv.Writer once with UseCRLF based on detected line endings is correct and reduces per-row allocations.
Also applies to: 99-103, 149-151
| fmt.Fprint(os.Stderr, msg) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Silence errcheck for informational prints.
Main/usage can’t return errors; explicitly ignore the result.
Apply:
- fmt.Fprint(os.Stderr, msg)
+ _, _ = fmt.Fprint(os.Stderr, msg)
@@
- fmt.Fprintf(os.Stderr, "error: %v\n", err)
+ _, _ = fmt.Fprintf(os.Stderr, "error: %v\n", err)Also applies to: 53-54
🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 48-49 (and similarly at 53-54), the
fmt.Fprint calls that write informational usage/help text return an error which
is currently unused; explicitly ignore the returned error to silence errcheck by
assigning it to the blank identifier. Update each fmt.Fprint/Printf invocation
used for informational prints to capture the return values like _, _ =
fmt.Fprint(...) or _, _ = fmt.Fprintf(...) so the result is intentionally
ignored while preserving the existing behavior.
| groupNames := extractGroupNames(re) | ||
| if len(groupNames) == 0 { | ||
| return ErrNoNamedCaptureGroups | ||
| } | ||
| out := bufio.NewWriter(os.Stdout) | ||
| err = processInput(os.Stdin, re, groupNames, out, *listUnmatched) | ||
| if flushErr := out.Flush(); err == nil && flushErr != nil { | ||
| err = flushErr | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Avoid per-line SubexpNames; precompute indices. Also don’t require named groups in -unmatched mode.
- SubexpNames() on every line is unnecessary; precompute capture indices once.
- In -unmatched mode, named groups aren’t needed—current unconditional check blocks valid usage.
Apply:
@@
- groupNames := extractGroupNames(re)
- if len(groupNames) == 0 {
- return ErrNoNamedCaptureGroups
- }
- out := bufio.NewWriter(os.Stdout)
- err = processInput(os.Stdin, re, groupNames, out, *listUnmatched)
+ // Precompute indices for named groups once.
+ groupIndices := namedGroupIndices(re)
+ // Only require/compute names when emitting CSV.
+ var groupNames []string
+ if !*listUnmatched {
+ groupNames = extractGroupNames(re)
+ if len(groupNames) == 0 {
+ return ErrNoNamedCaptureGroups
+ }
+ }
+ out := bufio.NewWriter(os.Stdout)
+ err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)
@@
func extractGroupNames(re *regexp.Regexp) []string {
@@
}
+// namedGroupIndices returns the capture indices (into FindStringSubmatch)
+// for all named groups, in declaration order. Duplicates are preserved.
+func namedGroupIndices(re *regexp.Regexp) []int {
+ names := re.SubexpNames()
+ idxs := make([]int, 0, len(names))
+ for i := 1; i < len(names); i++ {
+ if names[i] != "" {
+ idxs = append(idxs, i)
+ }
+ }
+ return idxs
+}
@@
-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error {
+func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error {
@@
- values, ok := processLine(line, re, groupNames)
+ values, ok := processLine(line, re, groupIndices)
@@
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored.
-func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) {
- submatches := re.FindStringSubmatch(line)
- if submatches == nil {
- return nil, false
- }
- subNames := re.SubexpNames()
- values := make([]string, 0, len(groupNames))
- allEmpty := true
- for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ {
- name := subNames[idxSubmatch]
- if name == "" {
- continue
- }
- val := submatches[idxSubmatch]
- if val != "" {
- allEmpty = false
- }
- values = append(values, val)
- }
- if allEmpty {
- return nil, false
- }
- return values, true
-}
+// Process a log line and return CSV values + true if valid, or nil + false if ignored.
+func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) {
+ submatches := re.FindStringSubmatch(line)
+ if submatches == nil {
+ return nil, false
+ }
+ values := make([]string, len(groupIndices))
+ allEmpty := true
+ for i, idx := range groupIndices {
+ v := submatches[idx]
+ if v != "" {
+ allEmpty = false
+ }
+ values[i] = v
+ }
+ if allEmpty {
+ return nil, false
+ }
+ return values, true
+}Note: This preserves duplicate header names as-is (per your stated flexibility goal).
Also applies to: 83-93, 94-94, 110-110, 175-199
🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 71-79 (and also adjust the referenced
sections 83-93, 94, 110, and 175-199), avoid calling re.SubexpNames() per input
line by calling it once before processing and precomputing the slice of named
capture indices; pass that precomputed data into processInput and any per-line
logic. Also remove the unconditional check that requires named capture groups
when the -unmatched flag is set: only return ErrNoNamedCaptureGroups if
-unmatched is false and no named groups were found. Update function signatures
and callers as needed to accept the cached names/indices and ensure
flushing/error handling remains unchanged.
| // Track unique non-matching lines when -unmatched is set. | ||
| seenUnmatched := make(map[string]struct{}) | ||
| for sc.Scan() { |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Unbounded memory in -unmatched dedupe.
seenUnmatched grows with unique lines; huge logs could exhaust memory. Consider a flag to disable dedupe (-unmatched-all), a size cap, or a Bloom filter if approximate uniqueness is acceptable.
Would you like a patch adding -unmatched-all and an optional -unmatched-max= cap?
🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 106 to 108, the current seenUnmatched map
for deduping unmatched lines is unbounded and can exhaust memory on huge logs;
add two flags: -unmatched-all (bool) to disable dedupe entirely and
-unmatched-max=<N> (int) to cap the number of unique entries tracked; implement
logic so that when -unmatched-all is true the code does not create or consult
seenUnmatched, otherwise create seenUnmatched but enforce the cap by stopping
insertion of new unique keys once the count reaches unmatched-max (or
alternatively evict oldest if you prefer FIFO behavior), and document the
semantics and defaults (e.g., unmatched-max default 100000 or similar) and
validate the flag values on startup.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
.coderabbit.yaml (1)
6-6: Poem disabled — good follow-through on prior feedback.
Matches the earlier request; keeps reviews concise.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
.coderabbit.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.
🔇 Additional comments (1)
.coderabbit.yaml (1)
7-9: Confirm intent to review draft PRs.
Setting drafts: true will trigger reviews on drafts; verify this is desired for your workflow.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (12)
.github/workflows/ci.yaml (4)
29-35: Drive Go version from go.mod (single source of truth).Let setup-go read the version from go.mod to avoid drift.
- - name: Set up Go - uses: actions/setup-go@v5 - with: - go-version: "1.25.x" - check-latest: true - cache: true - cache-dependency-path: go.sum + - name: Set up Go + uses: actions/setup-go@v5 + with: + go-version-file: go.mod + check-latest: true + cache: true + cache-dependency-path: go.sum
27-27: Pin GitHub Actions by commit SHA.Improves supply-chain security; avoid mutable tags.
- uses: actions/checkout@v4 + uses: actions/checkout@<commit-sha> - uses: actions/setup-go@v5 + uses: actions/setup-go@<commit-sha> - uses: actions/upload-artifact@v4 + uses: actions/upload-artifact@<commit-sha>Tip: enable Dependabot for Actions to auto-bump SHAs.
Also applies to: 30-30, 92-92
81-86: Add golangci-lint to catch issues beyond vet.Covers errcheck, staticcheck, style, and common bugs.
- name: Vet run: go vet -mod=readonly ./... + - name: Lint (golangci-lint) + uses: golangci/golangci-lint-action@<commit-sha> + with: + version: latest + args: --timeout=5m
87-88: Shuffle tests to surface order dependencies.Randomizes execution order.
- run: go test -mod=readonly -race -covermode=atomic -coverprofile=coverage.out ./... + run: go test -mod=readonly -race -shuffle=on -covermode=atomic -coverprofile=coverage.out ./...cmd/log2csv/log2csv.go (7)
48-49: Silence errcheck for usage print.Explicitly ignore the write error.
- fmt.Fprint(os.Stderr, msg) + _, _ = fmt.Fprint(os.Stderr, msg)
52-55: Silence errcheck for error print in main.Keep behavior; ignore write error explicitly.
- fmt.Fprintf(os.Stderr, "error: %v\n", err) + _, _ = fmt.Fprintf(os.Stderr, "error: %v\n", err)
145-147: Check stderr write for the warning.Propagate write failures.
- fmt.Fprintf(os.Stderr, "\nwarning: %d log line(s) did not match the pattern and were ignored\nfirst ignored line: %q\n", ignoredLines, firstIgnoredLine) + if _, err := fmt.Fprintf(os.Stderr, "\nwarning: %d log line(s) did not match the pattern and were ignored\nfirst ignored line: %q\n", ignoredLines, firstIgnoredLine); err != nil { + return err + }
22-29: Library vs CLI split (optional).Exported errors in package main aren’t reusable; consider moving core logic to a library package and keep main thin.
71-76: Don’t require named groups in -unmatched mode.Current check blocks valid -unmatched use cases.
- groupNames := extractGroupNames(re) - if len(groupNames) == 0 { - return ErrNoNamedCaptureGroups - } + var groupNames []string + if !*listUnmatched { + groupNames = extractGroupNames(re) + if len(groupNames) == 0 { + return ErrNoNamedCaptureGroups + } + }
177-201: Avoid per-line SubexpNames; precompute named-group indices once.Removes O(lines) allocations and speeds hot path.
-// Process a log line and returns CSV values + true if the line is valid, or nil + false if the line should be ignored. -func processLine(line string, re *regexp.Regexp, groupNames []string) ([]string, bool) { - submatches := re.FindStringSubmatch(line) - if submatches == nil { - return nil, false - } - subNames := re.SubexpNames() - values := make([]string, 0, len(groupNames)) - allEmpty := true - for idxSubmatch := 1; idxSubmatch < len(submatches); idxSubmatch++ { - name := subNames[idxSubmatch] - if name == "" { - continue - } - val := submatches[idxSubmatch] - if val != "" { - allEmpty = false - } - values = append(values, val) - } - if allEmpty { - return nil, false - } - return values, true -} +// Process a log line and return CSV values + true if valid, or nil + false if ignored. +func processLine(line string, re *regexp.Regexp, groupIndices []int) ([]string, bool) { + submatches := re.FindStringSubmatch(line) + if submatches == nil { + return nil, false + } + values := make([]string, len(groupIndices)) + allEmpty := true + for i, idx := range groupIndices { + v := submatches[idx] + if v != "" { + allEmpty = false + } + values[i] = v + } + if allEmpty { + return nil, false + } + return values, true +}And wire it through:
func extractGroupNames(re *regexp.Regexp) []string { @@ } + +// namedGroupIndices returns capture indices for named groups in declaration order. +func namedGroupIndices(re *regexp.Regexp) []int { + names := re.SubexpNames() + idxs := make([]int, 0, len(names)) + for i := 1; i < len(names); i++ { + if names[i] != "" { + idxs = append(idxs, i) + } + } + return idxs +}- groupNames := extractGroupNames(re) + groupIndices := namedGroupIndices(re) + groupNames := extractGroupNames(re)- err = processInput(os.Stdin, re, groupNames, out, *listUnmatched) + err = processInput(os.Stdin, re, groupIndices, groupNames, out, *listUnmatched)-func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error { +func processInput(input io.Reader, re *regexp.Regexp, groupIndices []int, groupNames []string, output io.Writer, listUnmatched bool) error { @@ - values, ok := processLine(line, re, groupNames) + values, ok := processLine(line, re, groupIndices)
106-127: Unbounded memory for -unmatched dedupe. Add a cap.Large logs can exhaust memory.
- seenUnmatched := make(map[string]struct{}) + seenUnmatched := make(map[string]struct{}) + const unmatchedMaxDefault = 0 // 0 = unlimited + unmatchedMax := unmatchedMaxDefault @@ - if listUnmatched { - if _, exists := seenUnmatched[line]; !exists { - seenUnmatched[line] = struct{}{} - // In unmatched mode, print each unique non-matching line once, preserving line endings. - if _, err := io.WriteString(output, line+lineEnding); err != nil { - return err - } - } + if listUnmatched { + // If a cap is configured and reached, stop tracking new uniques and stream lines. + if unmatchedMax > 0 && len(seenUnmatched) >= unmatchedMax { + if _, err := io.WriteString(output, line+lineEnding); err != nil { + return err + } + } else if _, exists := seenUnmatched[line]; !exists { + seenUnmatched[line] = struct{}{} + if _, err := io.WriteString(output, line+lineEnding); err != nil { + return err + } + }Follow-up: if you prefer a flag (e.g., -unmatched-max=N or -unmatched-all), I can wire it through run() and processInput.
README.md (1)
31-33: Remove trailing period from Windows example (breaks copy/paste).The fenced command ends with an extra ".".
-Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$". +Get-Content C:\path\ufw.log | log2csv -regexp "^(?P<Timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?\+\d{2}:\d{2})\s+(?P<Hostname>\S+)\s+(?P<Facility>\S+):\s+\[\s*(?P<Kernel_Time>\d+(?:\.\d+)?)\]\s+\[(?P<Action>UFW\s+\S+)\]\s+IN=(?P<IN>\S*)\s+OUT=(?P<OUT>\S*)\s+MAC=(?P<MAC>\S*)\s+SRC=(?P<SRC>\S+)\s+DST=(?P<DST>\S+)\s+LEN=(?P<LEN>\d+)(?:(?:\s+(?:TOS=(?P<TOS>0x[0-9A-Fa-f]{2})\s+)?(?:PREC=(?P<PREC>0x[0-9A-Fa-f]{2})\s+)?(?:TTL=(?P<TTL>\d+)\s+)?ID=(?P<ID>\d+)(?:\s+(?P<DF>DF))?)|\s+TC=(?P<TC>\d+)\s+HOPLIMIT=(?P<HOPLIMIT>\d+)\s+FLOWLBL=(?P<FLOWLBL>[0-9A-Fa-fx]+))?\s+PROTO=(?P<PROTO>[A-Za-z0-9]+)(?:\s+(?:SPT|SP)=(?P<SPT>\d+))?(?:\s+(?:DPT|DP)=(?P<DPT>\d+))?(?:\s+WINDOW=(?P<WINDOW>\d+))?(?:\s+RES=(?P<RES>0x[0-9A-Fa-f]{2}))?(?:\s+(?P<TCP_Flags>(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE)(?:\s+(?:SYN|ACK|FIN|RST|PSH|URG|CWR|ECE))*))?(?:\s+URGP=(?P<URGP>\d+))?(?:\s+TYPE=(?P<ICMP_TYPE>\d+))?(?:\s+CODE=(?P<ICMP_CODE>\d+))?(?:\s+SEQ=(?P<ICMP_SEQ>\d+))?(?:\s+LEN=(?P<L4_LEN>\d+))?\s*$"
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/ci.yaml(1 hunks)README.md(1 hunks)cmd/log2csv/log2csv.go(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.
📚 Learning: 2025-09-07T18:51:13.329Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: src/log2csv.go:68-77
Timestamp: 2025-09-07T18:51:13.329Z
Learning: The Log2CSV tool (src/log2csv.go) is intentionally designed to be flexible and generic, allowing users to provide any regex pattern with named groups, including duplicate group names. The tool prioritizes user flexibility over strict validation, as the author considers the user responsible for their regex choices and believes duplicate names may have legitimate use cases.
Applied to files:
README.mdcmd/log2csv/log2csv.go
📚 Learning: 2025-09-08T04:42:11.224Z
Learnt from: b4prog
PR: b4prog/Log2CSV#1
File: cmd/log2csv/log2csv.go:146-159
Timestamp: 2025-09-08T04:42:11.224Z
Learning: In Log2CSV (cmd/log2csv/log2csv.go), the author intentionally accepts that line-ending detection may default to LF for files with extremely long first lines (>64KB), considering such files likely invalid/corrupted and not worth the implementation complexity to handle perfectly.
Applied to files:
README.mdcmd/log2csv/log2csv.go
🔇 Additional comments (1)
.github/workflows/ci.yaml (1)
16-17: LGTM: job timeout added.Good safeguard against hung runs.
| func processInput(input io.Reader, re *regexp.Regexp, groupNames []string, output io.Writer, listUnmatched bool) error { | ||
| inputReader, lineEnding, err := peekForLineEnding(input, logLineSizeMax) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| sc := openInput(inputReader) | ||
| csvWriter := csv.NewWriter(output) | ||
| csvWriter.Comma = csvSeparator | ||
| csvWriter.UseCRLF = lineEnding == "\r\n" | ||
| firstLine := true |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Avoid creating csv.Writer in -unmatched mode.
Skip allocation/flush when not needed.
- csvWriter := csv.NewWriter(output)
- csvWriter.Comma = csvSeparator
- csvWriter.UseCRLF = lineEnding == "\r\n"
+ var csvWriter *csv.Writer
+ if !listUnmatched {
+ csvWriter = csv.NewWriter(output)
+ csvWriter.Comma = csvSeparator
+ csvWriter.UseCRLF = lineEnding == "\r\n"
+ }
@@
- csvWriter.Flush()
- return csvWriter.Error()
+ if csvWriter != nil {
+ csvWriter.Flush()
+ return csvWriter.Error()
+ }
+ return nilAlso applies to: 151-153
🤖 Prompt for AI Agents
In cmd/log2csv/log2csv.go around lines 94 to 103 (and similarly at 151-153), the
function always allocates a csv.Writer and sets its properties even when running
in -unmatched (listUnmatched) mode where CSV output is not produced; change the
logic so that csv.Writer is only created and configured when listUnmatched is
false, and guard any later calls to csvWriter.Write and csvWriter.Flush with the
same condition so no allocation or flush is performed when listUnmatched is
true.
| - Reads log lines from **STDIN** and writes CSV to **STDOUT**. | ||
| - Extracts fields using **named capture groups** (`(?P<Name>...)`). | ||
| - The **CSV header row** is automatically generated from group names. | ||
| - Preserves the input's line endings (LF/CRLF). | ||
| - Optional **unmatched mode** (`-unmatched`) to print **unique non-matching lines** instead of CSV. |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Docs/code mismatch for -unmatched named-groups requirement.
Current code requires named groups even with -unmatched; README doesn’t say so. Either relax the code (preferred; see code review) or document the requirement.
Also applies to: 18-22
🤖 Prompt for AI Agents
In README.md around lines 10 to 14 (and similarly 18 to 22), the docs state
-unmatched prints unique non-matching lines but the code currently enforces that
the regex contains named capture groups even when -unmatched is used; relax the
code so that when -unmatched is enabled the named-group validation is skipped:
modify the validation logic to only require named groups if unmatched flag is
false, ensure unmatched mode continues to collect and print unique non-matching
input lines, update or add tests for unmatched without named groups, and keep
README unchanged (or if you prefer documenting the requirement instead, update
these README lines to state that named groups are required when not using
-unmatched).
| ## Install | ||
|
|
||
| ```sh | ||
| go install github.com/b4prog/Log2CSV@latest | ||
| ``` | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Mention minimum Go version.
Improves install clarity.
## Install
@@
```sh
go install github.com/b4prog/Log2CSV@latest+Requires Go 1.25+.
<details>
<summary>🤖 Prompt for AI Agents</summary>
README.md around lines 35 to 40: the Install section calls "go install
github.com/b4prog/Log2CSV@latest" but doesn't state the minimum Go version;
update the Install section to explicitly mention the required minimum Go version
(e.g., "Requires Go 1.25+"), placing that note directly below the install
command and keeping phrasing concise and consistent with the document's style.
</details>
<!-- fingerprinting:phantom:poseidon:chinchilla -->
<!-- This is an auto-generated comment by CodeRabbit -->
…uce flakiness on slow runs.
…bot)." This reverts commit 8f01515.
Summary by CodeRabbit
New Features
Documentation
Chores