MchLogToolkitGo - Envio de logs via UDP no formato GELF #15
MchLogToolkitGo - Envio de logs via UDP no formato GELF #15ramon-gaudium wants to merge 6 commits intomainfrom
Conversation
There was a problem hiding this comment.
Mole Review
🔴 0/100
🐭 Mole dug deep into this PR! Found 15 issues to review. Score: 0/100
Issues
1. mchlogcore/mchlog.go:27 — 🔴 Critical · Bugs / Race Condition
Package-level variables
udpTransport,fileOutputEnabled,udpChan,udpOnce, andudpDoneare accessed from multiple goroutines without synchronization.SetUDPTarget,SetFileOutput,CloseUDP, andLogSubjectcan all be called concurrently. For example,LogSubjectreadsudpChan(line 115) andfileOutputEnabled(line 107) without a lock, whileSetUDPTargetandcloseUDPWorkerwrite them. This is a data race. Protect all access to these shared variables with async.RWMutex.
2. mchlogcore/mchlog.go:63 — 🔴 Critical · Bugs / Race Condition
udpOnceis reassigned withudpOnce = sync.Once{}instartUDPWorker(), butsync.Oncemust not be copied after first use, and reassigning it while it might be read concurrently is a race. Moreover,udpOnceis declared but never actually used to guard anything in the current code —startUDPWorkeris called directly andudpOnce.Do(...)is never invoked. RemoveudpOnceentirely or use it correctly to guardstartUDPWorkerinvocation.
3. mchlogcore/mchlog.go:45 — 🔴 Critical · Bugs / Race Condition
In
SetUDPTarget, the checkif udpTransport != niland subsequentcloseUDPWorker()/ assignment ofudpTransportare not protected by a mutex. Concurrent calls toSetUDPTargetor concurrent calls betweenSetUDPTargetandLogSubject(which readsudpChan) create a data race. Wrap the entire function body in a mutex.
4. mchlogcore/mchlog.go:79 — 🔴 Critical · Bugs / Race Condition
closeUDPWorkerclosesudpChanand then waits onudpDone. However, ifLogSubjectis concurrently executing theselect { case udpChan <- msg: ... }block, closing a channel that another goroutine may be sending to causes a panic ('send on closed channel'). The non-blockingselectinLogSubjectmitigates this partially, but there is still a window between the nil check (if udpChan != nil) and the send where the channel can be closed. A mutex guarding both the nil check and the send, and also held during close, is required.
5. mchlogcore/mchlog.go:93 — 🔴 Critical · Bugs / Race Condition
fileOutputEnabledis a plainboolwritten bySetFileOutputand read byLogSubjectwithout any synchronization. Usesync/atomic(atomic.Boolin Go 1.19+) or protect with a mutex.
6. mchlogcore/mchlog.go:115 — 🔴 Critical · Bugs / Race Condition
udpChanis read (nil-checked and sent to) inLogSubjectwhile it can be concurrently written (set to nil or replaced) bycloseUDPWorkerandstartUDPWorker. Even the nil check followed by send is not atomic — the channel could be closed between the check and the send, causing a panic. Guard with a mutex.
7. mchloggelf/gelf.go:82 — 🟡 Attention · Bugs / Unhandled Error
os.Hostname()error is silently ignored with_. If hostname resolution fails (possible in some container environments),hostnamewill be an empty string, producing a GELF message with an emptyhostfield, which violates the GELF spec (host is required). Consider falling back to"unknown"or logging the error:hostname, err := os.Hostname(); if err != nil { hostname = "unknown" }.
8. mchloggelf/gelf.go:158 — 🟡 Attention · Bugs / Logic Error
The
defaultcase incontentToMapcallsjson.Unmarshal([]byte("{}"), new(map[string]any))and returns the error (which will be nil) alongside a nil map. This means for any unrecognized content type, the function returns(nil, nil). The caller inNewGELFMessagechecksif err != niland takes the fallback path only on error, so with a nil map and nil error it will proceed to range over a nil map (harmless in Go) but silently drop content. This should return an explicit error:return nil, fmt.Errorf("unsupported content type: %T", content).
9. mchlogcore/mchlog.go:71 — 🟡 Attention · Bugs / Unhandled Error
When
SetUDPTargetis called fromInitialize()based on the environment variable, errors are silently swallowed (the return value is not checked). If UDP setup fails (e.g., bad address format in the env var), the operator gets no feedback. Consider logging the error withlog.Printfat minimum.
10. mchloggelf/udp.go:73 — 🟡 Attention · Bugs / Logic Error
After
t.conn.Close()is called (viaClose()), subsequent calls toSendwill attemptt.conn.Write(data)on a closed connection and return an error. Howevert.connis never set to nil after close, so there is no guard. While the error is propagated, a nil-set after close would make the behavior more explicit. More importantly,t.muis not held between the nil-check oft.conninClose()and theWritecall inSend(), but since the mutex IS used in both methods this is actually fine — just note that thet.conn != nilcheck inClose()is redundant sincenet.UDPConn.Close()handles nil safely.
11. mchlogcore/mchlog.go:30 — 🟢 Suggestion · Architecture / Tight Coupling
Package-level mutable state (
udpTransport,udpChan,fileOutputEnabled, etc.) makes this package difficult to test in isolation and prevents multiple logger instances from having independent UDP configurations. Consider encapsulating this state in a struct.
12. mchloggelf/gelf.go:88 — 🟡 Attention · Bugs / Logic Error
LevelToSyslog(subject)maps thesubjectstring (which is a log level like "info", "error") to a syslog level. However, if the content map contains its own "level" field with a different value (line 109 deletes it), andsubjectis something other than a standard level (e.g., a category name), the syslog level will default toSyslogInformational. This may be intentional but is worth documenting clearly, as the semantics ofsubjectin the existing logger API may not always be a log level.
13. mchloggelf/udp.go:83 — 🟢 Suggestion · Bugs / Logic Error
chunkCountis computed as anint. WithmaxChunkDataSize = 8180and very large messages, this could theoretically overflow on 32-bit platforms, though in practice messages are limited to 128 chunks × 8180 bytes ≈ 1MB which is well within int range. ThemaxChunkscheck on line 84 correctly bounds this. No action needed but worth a comment.
14. mchlogcore/mchlog.go:116 — 🟢 Suggestion · Bugs / Unhandled Error
When
NewGELFMessagereturns an error (e.g., for an unsupported content type), the error is silently dropped (if err == nil). Consider logging a warning so operators know that GELF messages are being skipped due to content parse failures.
15. mchloggelf/gelf_test.go:47 — 🟢 Suggestion · Bugs / Logic Error
In CI environments running in containers,
os.Hostname()may return a different value depending on timing. The test directly comparesmsg.Hostto the result of a secondos.Hostname()call, which is fine, but if hostname resolution fails for both calls, both return empty string and the test passes vacuously. Consider assertinghostname != "".
Suggestions
- Introduce a
sync.RWMutexinmchlogcore/mchlog.goto protect all concurrent access toudpChan,udpTransport,udpDone, andfileOutputEnabled. The worker goroutine reads fromudpChan(safe as it's the sole reader), but writes fromLogSubjectand the close fromcloseUDPWorkermust be synchronized. - Remove
udpOncesince it is never actually used — it's declared, assigned, and reset butudpOnce.Do(...)is never called. This is dead code that creates confusion about the intended initialization guarantee. - Consider adding an integration test or example that exercises
Initialize()withMCHLOG_UDP_TARGETset to a local listener to verify the environment-variable-driven configuration path end-to-end. - The
contentToMapdefault case should return a meaningful error rather than performing a no-op unmarshal. This makes the failure mode explicit and ensures the caller's fallback logic is triggered correctly for unknown content types. - Document the thread-safety guarantees (or lack thereof) of
SetUDPTarget,SetFileOutput, andCloseUDPin their godoc comments, so callers know whether these are safe to call concurrently with logging. - Consider using
os/hostnamecaching (compute once at startup) inNewGELFMessagerather than callingos.Hostname()on every log message, which is a syscall on every invocation and could be a minor performance concern in high-throughput scenarios.
mchlogcore/mchlog.go
Outdated
| ) | ||
|
|
||
| var currentVersion = V1 | ||
| var udpTransport *mchloggelf.UDPTransport |
There was a problem hiding this comment.
🔴 Critical · Bugs / Race Condition
Package-level variables udpTransport, fileOutputEnabled, udpChan, udpOnce, and udpDone are accessed from multiple goroutines without synchronization. SetUDPTarget, SetFileOutput, CloseUDP, and LogSubject can all be called concurrently. For example, LogSubject reads udpChan (line 115) and fileOutputEnabled (line 107) without a lock, while SetUDPTarget and closeUDPWorker write them. This is a data race. Protect all access to these shared variables with a sync.RWMutex.
mchlogcore/mchlog.go
Outdated
| func startUDPWorker() { | ||
| udpChan = make(chan *mchloggelf.GELFMessage, udpBufferSize) | ||
| udpDone = make(chan struct{}) | ||
| udpOnce = sync.Once{} |
There was a problem hiding this comment.
🔴 Critical · Bugs / Race Condition
udpOnce is reassigned with udpOnce = sync.Once{} in startUDPWorker(), but sync.Once must not be copied after first use, and reassigning it while it might be read concurrently is a race. Moreover, udpOnce is declared but never actually used to guard anything in the current code — startUDPWorker is called directly and udpOnce.Do(...) is never invoked. Remove udpOnce entirely or use it correctly to guard startUDPWorker invocation.
mchlogcore/mchlog.go
Outdated
| // If compress is true, messages will be GZIP compressed. | ||
| func SetUDPTarget(address string, compress bool) error { | ||
| // Close any existing connection to avoid file descriptor leaks | ||
| if udpTransport != nil { |
There was a problem hiding this comment.
🔴 Critical · Bugs / Race Condition
In SetUDPTarget, the check if udpTransport != nil and subsequent closeUDPWorker() / assignment of udpTransport are not protected by a mutex. Concurrent calls to SetUDPTarget or concurrent calls between SetUDPTarget and LogSubject (which reads udpChan) create a data race. Wrap the entire function body in a mutex.
mchlogcore/mchlog.go
Outdated
|
|
||
| // closeUDPWorker drains the channel and waits for the worker to finish. | ||
| func closeUDPWorker() { | ||
| if udpChan != nil { |
There was a problem hiding this comment.
🔴 Critical · Bugs / Race Condition
closeUDPWorker closes udpChan and then waits on udpDone. However, if LogSubject is concurrently executing the select { case udpChan <- msg: ... } block, closing a channel that another goroutine may be sending to causes a panic ('send on closed channel'). The non-blocking select in LogSubject mitigates this partially, but there is still a window between the nil check (if udpChan != nil) and the send where the channel can be closed. A mutex guarding both the nil check and the send, and also held during close, is required.
mchlogcore/mchlog.go
Outdated
| // SetFileOutput enables or disables file-based log output. | ||
| // When disabled, logs are only sent via UDP (if configured). | ||
| func SetFileOutput(enabled bool) { | ||
| fileOutputEnabled = enabled |
There was a problem hiding this comment.
🔴 Critical · Bugs / Race Condition
fileOutputEnabled is a plain bool written by SetFileOutput and read by LogSubject without any synchronization. Use sync/atomic (atomic.Bool in Go 1.19+) or protect with a mutex.
| var udpTransport *mchloggelf.UDPTransport | ||
| var fileOutputEnabled = true | ||
|
|
||
| var udpChan chan *mchloggelf.GELFMessage |
There was a problem hiding this comment.
🟢 Suggestion · Architecture / Tight Coupling
Package-level mutable state (udpTransport, udpChan, fileOutputEnabled, etc.) makes this package difficult to test in isolation and prevents multiple logger instances from having independent UDP configurations. Consider encapsulating this state in a struct.
| Version: "1.1", | ||
| Host: hostname, | ||
| Timestamp: float64(time.Now().UnixNano()) / 1e9, | ||
| Level: LevelToSyslog(subject), |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
LevelToSyslog(subject) maps the subject string (which is a log level like "info", "error") to a syslog level. However, if the content map contains its own "level" field with a different value (line 109 deletes it), and subject is something other than a standard level (e.g., a category name), the syslog level will default to SyslogInformational. This may be intentional but is worth documenting clearly, as the semantics of subject in the existing logger API may not always be a log level.
|
|
||
| // sendChunked splits the data into GELF chunks and sends each one. | ||
| func (t *UDPTransport) sendChunked(data []byte) error { | ||
| chunkCount := (len(data) + maxChunkDataSize - 1) / maxChunkDataSize |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
chunkCount is computed as an int. With maxChunkDataSize = 8180 and very large messages, this could theoretically overflow on 32-bit platforms, though in practice messages are limited to 128 chunks × 8180 bytes ≈ 1MB which is well within int range. The maxChunks check on line 84 correctly bounds this. No action needed but worth a comment.
mchlogcore/mchlog.go
Outdated
| } | ||
|
|
||
| if udpChan != nil { | ||
| msg, err := mchloggelf.NewGELFMessage(subject, content, errLog) |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Unhandled Error
When NewGELFMessage returns an error (e.g., for an unsupported content type), the error is silently dropped (if err == nil). Consider logging a warning so operators know that GELF messages are being skipped due to content parse failures.
| t.Errorf("version = %q, want %q", msg.Version, "1.1") | ||
| } | ||
|
|
||
| hostname, _ := os.Hostname() |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
In CI environments running in containers, os.Hostname() may return a different value depending on timing. The test directly compares msg.Host to the result of a second os.Hostname() call, which is fine, but if hostname resolution fails for both calls, both return empty string and the test passes vacuously. Consider asserting hostname != "".
There was a problem hiding this comment.
Code Review
This pull request introduces support for sending logs via UDP in GELF 1.1 format, enabling integration with observability platforms like Graylog. Key changes include new configuration methods and environment variables, an asynchronous UDP worker with GZIP compression and message chunking, and updated documentation. The review feedback identifies a critical compilation error in the GELF message processing logic, dead code in the core package, and several instances of ignored errors that should be handled or logged to improve the library's reliability.
| default: | ||
| return nil, json.Unmarshal([]byte("{}"), new(map[string]any)) | ||
| } |
There was a problem hiding this comment.
O default case no switch possui um erro de sintaxe (return com 3 valores para uma função que retorna 2) que impedirá a compilação. Além disso, se um tipo de content não suportado for passado, a lógica atual pode levar a um pânico de "nil pointer dereference" em NewGELFMessage.
Para tipos não suportados, a função deveria retornar um erro. Será necessário importar o pacote fmt para isso.
| default: | |
| return nil, json.Unmarshal([]byte("{}"), new(map[string]any)) | |
| } | |
| default: | |
| return nil, fmt.Errorf("unsupported content type: %T", c) |
logger.go
Outdated
| compress = false | ||
| } | ||
| // Best-effort: if UDP setup fails, continue with file output | ||
| mchlogcore.SetUDPTarget(target, compress) |
There was a problem hiding this comment.
O erro retornado por mchlogcore.SetUDPTarget está sendo ignorado. Se a configuração do UDP a partir de variáveis de ambiente falhar (por exemplo, por um endereço inválido), a falha será silenciosa e os logs UDP simplesmente não serão enviados. É recomendável tratar este erro, no mínimo registrando uma mensagem em stderr para alertar sobre a falha na configuração.
Para isso, será necessário importar o pacote log.
if err := mchlogcore.SetUDPTarget(target, compress); err != nil {
log.Printf("[mchlog] failed to set UDP target from environment: %v", err)
}
mchlogcore/mchlog.go
Outdated
| func startUDPWorker() { | ||
| udpChan = make(chan *mchloggelf.GELFMessage, udpBufferSize) | ||
| udpDone = make(chan struct{}) | ||
| udpOnce = sync.Once{} |
| func CloseUDP() error { | ||
| closeUDPWorker() | ||
| return nil | ||
| } |
There was a problem hiding this comment.
A função CloseUDP sempre retorna nil, mas a chamada subjacente udpTransport.Close() pode retornar um erro que está sendo ignorado em closeUDPWorker. Isso pode ocultar problemas ao fechar a conexão UDP.
Sugiro refatorar closeUDPWorker para que retorne o erro (o que exigirá alterar sua assinatura) e CloseUDP o propague. A chamada em SetUDPTarget pode continuar ignorando o erro, pois o objetivo é apenas limpar a conexão anterior.
Exemplo de alteração em closeUDPWorker:
func closeUDPWorker() error {
// ... (código existente)
if udpTransport != nil {
err := udpTransport.Close()
udpTransport = nil
return err
}
return nil
}func CloseUDP() error {
return closeUDPWorker()
}
mchloggelf/gelf.go
Outdated
| // The content is expected to be JSON bytes (as produced by formatLog in logger.go) | ||
| // or a map[string]any / map[string]string. | ||
| func NewGELFMessage(subject string, content any, errLog error) (*GELFMessage, error) { | ||
| hostname, _ := os.Hostname() |
There was a problem hiding this comment.
O erro de os.Hostname() está sendo ignorado. Se essa chamada falhar, a variável hostname será uma string vazia. O campo host é obrigatório na especificação GELF, e um valor vazio pode ser rejeitado pelo servidor de logs. Recomendo fornecer um valor padrão, como "unknown", em caso de falha.
hostname, err := os.Hostname()
if err != nil {
hostname = "unknown"
}|
/mole review |
There was a problem hiding this comment.
Mole Review
🔴 42/100
🐭 Mole dug deep into this PR! Found 10 issues to review. Score: 42/100
Issues
1. mchlogcore/mchlog.go:98 — 🔴 Critical · Bugs / Logic Error
The
closeLockedfunction unlocksudpMumid-execution (line 98) and then re-locks it (line 100), but the function's callers (bothSetUDPTargetat line 52 andCloseUDPat line 118) usedefer udpMu.Unlock(). AftercloseLockedreturns, the deferred unlock will attempt to unlock a mutex that is currently held — which is correct — but the temporary unlock creates a window where another goroutine callingSetUDPTargetcan acquire the write lock, modifyudpTransport/udpChan, and thencloseLockedre-acquires and proceeds to setudpChan = nilandudpDone = nilon what may now be a different (newly started) channel. This can silently discard the new worker's channels. The unlock/re-lock pattern inside a function that is called under a lock is extremely fragile. Consider instead signaling the worker to stop without holding the lock, or using a separate 'draining' state flag so the worker can be waited on without releasing the outer lock.
2. mchlogcore/mchlog.go:139 — 🔴 Critical · Bugs / Race Condition
There is still a TOCTOU race between reading
udpChanunderRLock(lines 139-141) and sending to the capturedchvariable (line 150). AfterudpMu.RUnlock()on line 141,closeLockedcan run (acquiring the write lock), close the channel, and setudpChan = nil. The local variablechstill holds a reference to the now-closed channel. The subsequentcase ch <- msg:at line 150 will then panic with 'send on closed channel'. The read lock must be held for the entire duration of the send — from the nil check through the completion of the channel send — to preventcloseLockedfrom closing the channel while a send is in progress. MoveudpMu.RUnlock()to after theselectblock (or usedefer udpMu.RUnlock()before the send).
3. mchlogcore/mchlog.go:129 — 🟡 Attention · Bugs / Logic Error
When
fileOutputEnabledis false, file logging is completely skipped. However,mchlogcore.InitializeMchLog(called byLogger.Initialize()) still creates the log directories and file handles. This is a minor inconsistency but could confuse operators who setMCHLOG_FILE_OUTPUT=falseexpecting zero file I/O, only to find directories/files created. Consider conditionally callingInitializeMchLogbased onfileOutputEnabled, or documenting this behavior.
4. logger.go:83 — 🟡 Attention · Bugs / Logic Error
SetFileOutput(false)is called beforeInitializeMchLog(line 87). However, iffileOutputEnabledis set to false here, butInitialize()then callsmchlogcore.InitializeMchLog(...)unconditionally, file structures are still initialized. Additionally,SetFileOutputaffects a package-level global, meaning callingInitialize()on a secondLoggerinstance (with file output enabled) would not restore thefileOutputEnabled=truestate if another logger had disabled it. This is a pre-existing design concern made worse by the new code path. Document that these are global settings shared across allLoggerinstances.
5. mchloggelf/udp.go:117 — 🟡 Attention · Bugs / Logic Error
In
sendChunked, if sending chunkifails, the function returns an error immediately but the previously sent chunks (0 to i-1) have already been transmitted. The receiver will receive a partial message that can never be reassembled. While there is no way to 'recall' UDP datagrams, it would be worth documenting this behavior and considering whether a retry or a 'best effort with full failure reporting' strategy is preferable. At minimum, the error message should indicate that partial data may have been sent.
6. mchloggelf/gelf.go:101 — 🟡 Attention · Bugs / Logic Error
In
NewGELFMessage, whencontentToMapreturns an error (line 102), the function falls back to usingsubjectasshort_messageand returns(msg, nil)— swallowing the error. However,errLog(the application error) is also silently dropped in this path; theerrLog != nilblock on lines 125-127 is only reached whencontentToMapsucceeds. If the content is, say, a custom struct type, the application error attached to the log call will be silently lost. Move theerrLogattachment before the early return on line 105.
7. mchloggelf/gelf.go:95 — 🟡 Attention · Performance / Blocking I/O in Hot Path
os.Hostname()performs a syscall on every single log message. In high-throughput logging scenarios this is a measurable overhead. The hostname is static for the lifetime of the process — cache it once at package init or in async.Onceand reuse the cached value inNewGELFMessage.
8. logger.go:118 — 🟢 Suggestion · Architecture / Tight Coupling
SetUDPTargetandSetUDPTargetWithOptionsare both thin wrappers that delegate tomchlogcore.SetUDPTarget. They add no value beyond a slightly different API surface, and the comment onSetUDPTargetsays 'Must be called before Initialize()' while theInitialize()method itself also callsmchlogcore.SetUDPTargetfrom env vars. This creates a confusing dual-configuration path. Consider consolidating into a single options struct (e.g.,LoggerOptions) passed toNewLoggerorInitialize, or at least documenting that callingSetUDPTargetafterInitialize()will replace the UDP target.
9. mchloggelf/udp_test.go:172 — 🟢 Suggestion · Bugs / Logic Error
The chunking test reads chunks in a loop until a read deadline error, relying on the 2-second deadline to terminate the loop. This makes the test artificially slow in the happy path. After verifying
len(chunks) < 2, a better approach would be to know the expected chunk count in advance (based on the known payload size andmaxChunkDataSize) and read exactly that many datagrams, failing fast if any read times out.
10. mchloggelf/gelf.go:176 — 🟢 Suggestion · Bugs / Unhandled Error
In the
toStringfallback,json.Marshal(v)error is silently discarded with_. If marshaling fails (e.g., for a cyclic structure or channel type),string(b)will return an empty string, silently losing the field value. Consider returningfmt.Sprintf("%v", v)as a fallback when marshaling fails.
Suggestions
- The temporary mutex unlock inside
closeLocked(releasingudpMuto wait onudpDone) is architecturally fragile. A cleaner pattern would be to capture the channel and done-signal locally before releasing the lock, then wait outside the lock entirely — similar to howstartWorkerLockedcaptures local references. This eliminates the re-lock and the associated TOCTOU window. - Cache
os.Hostname()at package init time inmchloggelf/gelf.goto avoid repeated syscalls on every log message. - The package-level global state in
mchlogcore(udpTransport, udpChan, fileOutputEnabled) means that multipleLoggerinstances share UDP and file-output configuration. This should be prominently documented, and ideally the README example should warn against creating multiple Logger instances with different UDP targets. - Consider adding an integration test that verifies the full pipeline:
Logger.Initialize()→LogSubject→ UDP receive, to catch regressions in the wiring betweenlogger.go,mchlogcore, andmchloggelf. - The
closeLockedfunction assumes it is always called withudpMuheld as a write lock, but this invariant is not enforced or documented with a comment at the function signature. Add a prominent// MUST be called with udpMu held for writing.comment and consider adding a//nolint:...or assertion pattern for clarity.
mchlogcore/mchlog.go
Outdated
| // Release the lock while waiting for the worker to drain, | ||
| // so LogSubject is not blocked. The worker uses its own captured | ||
| // channel reference, so this is safe. | ||
| udpMu.Unlock() |
There was a problem hiding this comment.
🔴 Critical · Bugs / Logic Error
The closeLocked function unlocks udpMu mid-execution (line 98) and then re-locks it (line 100), but the function's callers (both SetUDPTarget at line 52 and CloseUDP at line 118) use defer udpMu.Unlock(). After closeLocked returns, the deferred unlock will attempt to unlock a mutex that is currently held — which is correct — but the temporary unlock creates a window where another goroutine calling SetUDPTarget can acquire the write lock, modify udpTransport/udpChan, and then closeLocked re-acquires and proceeds to set udpChan = nil and udpDone = nil on what may now be a different (newly started) channel. This can silently discard the new worker's channels. The unlock/re-lock pattern inside a function that is called under a lock is extremely fragile. Consider instead signaling the worker to stop without holding the lock, or using a separate 'draining' state flag so the worker can be waited on without releasing the outer lock.
|
|
||
| // Hold a read lock for the duration of the channel send to prevent | ||
| // the channel from being closed between the nil check and the send. | ||
| udpMu.RLock() |
There was a problem hiding this comment.
🔴 Critical · Bugs / Race Condition
There is still a TOCTOU race between reading udpChan under RLock (lines 139-141) and sending to the captured ch variable (line 150). After udpMu.RUnlock() on line 141, closeLocked can run (acquiring the write lock), close the channel, and set udpChan = nil. The local variable ch still holds a reference to the now-closed channel. The subsequent case ch <- msg: at line 150 will then panic with 'send on closed channel'. The read lock must be held for the entire duration of the send — from the nil check through the completion of the channel send — to prevent closeLocked from closing the channel while a send is in progress. Move udpMu.RUnlock() to after the select block (or use defer udpMu.RUnlock() before the send).
| mchlogcorev1.MchLog.LogSubject(subject, content, errLog, ascendStackFrame...) | ||
| } else { | ||
| mchlogcorev2.MchLog.LogSubject(subject, content, errLog, ascendStackFrame...) | ||
| if fileOutputEnabled.Load() { |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
When fileOutputEnabled is false, file logging is completely skipped. However, mchlogcore.InitializeMchLog (called by Logger.Initialize()) still creates the log directories and file handles. This is a minor inconsistency but could confuse operators who set MCHLOG_FILE_OUTPUT=false expecting zero file I/O, only to find directories/files created. Consider conditionally calling InitializeMchLog based on fileOutputEnabled, or documenting this behavior.
| } | ||
|
|
||
| // Check environment variable for file output | ||
| if v := os.Getenv(EnvFileOutput); strings.ToLower(v) == "false" { |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
SetFileOutput(false) is called before InitializeMchLog (line 87). However, if fileOutputEnabled is set to false here, but Initialize() then calls mchlogcore.InitializeMchLog(...) unconditionally, file structures are still initialized. Additionally, SetFileOutput affects a package-level global, meaning calling Initialize() on a second Logger instance (with file output enabled) would not restore the fileOutputEnabled=true state if another logger had disabled it. This is a pre-existing design concern made worse by the new code path. Document that these are global settings shared across all Logger instances.
| // Payload | ||
| chunk = append(chunk, data[start:end]...) | ||
|
|
||
| if _, err := t.conn.Write(chunk); err != nil { |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
In sendChunked, if sending chunk i fails, the function returns an error immediately but the previously sent chunks (0 to i-1) have already been transmitted. The receiver will receive a partial message that can never be reassembled. While there is no way to 'recall' UDP datagrams, it would be worth documenting this behavior and considering whether a retry or a 'best effort with full failure reporting' strategy is preferable. At minimum, the error message should indicate that partial data may have been sent.
| } | ||
|
|
||
| // Parse content into a map to extract fields | ||
| contentMap, err := contentToMap(content) |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
In NewGELFMessage, when contentToMap returns an error (line 102), the function falls back to using subject as short_message and returns (msg, nil) — swallowing the error. However, errLog (the application error) is also silently dropped in this path; the errLog != nil block on lines 125-127 is only reached when contentToMap succeeds. If the content is, say, a custom struct type, the application error attached to the log call will be silently lost. Move the errLog attachment before the early return on line 105.
| msg := &GELFMessage{ | ||
| Version: "1.1", | ||
| Host: hostname, | ||
| Timestamp: float64(time.Now().UnixNano()) / 1e9, |
There was a problem hiding this comment.
🟡 Attention · Performance / Blocking I/O in Hot Path
os.Hostname() performs a syscall on every single log message. In high-throughput logging scenarios this is a measurable overhead. The hostname is static for the lifetime of the process — cache it once at package init or in a sync.Once and reuse the cached value in NewGELFMessage.
| // SetUDPTarget configures the logger to send GELF messages via UDP to the given address. | ||
| // The address should be in "host:port" format (e.g., "graylog.example.com:12201"). | ||
| // GZIP compression is enabled by default. Must be called before Initialize(). | ||
| func (l *Logger) SetUDPTarget(address string) error { |
There was a problem hiding this comment.
🟢 Suggestion · Architecture / Tight Coupling
SetUDPTarget and SetUDPTargetWithOptions are both thin wrappers that delegate to mchlogcore.SetUDPTarget. They add no value beyond a slightly different API surface, and the comment on SetUDPTarget says 'Must be called before Initialize()' while the Initialize() method itself also calls mchlogcore.SetUDPTarget from env vars. This creates a confusing dual-configuration path. Consider consolidating into a single options struct (e.g., LoggerOptions) passed to NewLogger or Initialize, or at least documenting that calling SetUDPTarget after Initialize() will replace the UDP target.
| } | ||
|
|
||
| // Read chunks | ||
| listener.SetReadDeadline(time.Now().Add(2 * time.Second)) |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
The chunking test reads chunks in a loop until a read deadline error, relying on the 2-second deadline to terminate the loop. This makes the test artificially slow in the happy path. After verifying len(chunks) < 2, a better approach would be to know the expected chunk count in advance (based on the known payload size and maxChunkDataSize) and read exactly that many datagrams, failing fast if any read times out.
| case string: | ||
| return s | ||
| default: | ||
| b, _ := json.Marshal(v) |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Unhandled Error
In the toString fallback, json.Marshal(v) error is silently discarded with _. If marshaling fails (e.g., for a cyclic structure or channel type), string(b) will return an empty string, silently losing the field value. Consider returning fmt.Sprintf("%v", v) as a fallback when marshaling fails.
|
/mole review |
There was a problem hiding this comment.
Mole Review
🟡 75/100
🐭 Mole dug deep into this PR! Found 7 issues to review. Score: 75/100
Issues
1. mchlogcore/mchlog.go:100 — 🔴 Critical · Bugs / Logic Error
closeLockedcalls<-udpDone(line 103) while holding the write lock onudpMu(the caller holds it viadefer udpMu.Unlock()). The worker goroutine instartWorkerLockedcallstransport.Send(msg)→t.mu.Lock()without ever acquiringudpMu, so that part is fine. However, the log.Printf call inside the worker goroutine (log.Printf("[mchlog] failed to send GELF message via UDP: %v", err)) is not the issue. The real concern: if any future code path inside the worker goroutine's loop ever tries to acquireudpMu(e.g., calling a function that locks it),<-udpDonewill deadlock. More immediately:closeLockedis called fromSetUDPTarget(which holds write lock) — the<-udpDonewait will block until the worker drains and exits. If the worker's send blocks for a long time (e.g., slow UDP write), the write lock is held for the entire duration, blocking ALL readers (LogSubject) and all other callers ofSetUDPTarget/CloseUDP. Consider closing the channel and releasing the lock before waiting onudpDone, capturingdonelocally before unlocking.
2. mchloggelf/gelf.go:54 — 🟢 Suggestion · Performance / Missing Cache
MarshalJSONallocates a newmap[string]any(namedfields) on every call. For high-throughput logging, this generates significant GC pressure. Consider pre-computing the fixed GELF fields or using a struct withencoding/jsontags for the non-extra fields, and only building the combined map when extra fields are present. Alternatively, usejson.RawMessageor a custom encoder that writes directly to a buffer.
3. mchloggelf/udp.go:105 — 🟢 Suggestion · Performance / Missing Cache
Inside
sendChunked, a newchunkslice is allocated per iteration (make([]byte, 0, chunkHeaderSize+end-start)). For messages requiring many chunks, this is up to 128 allocations. Consider allocating a single chunk buffer of sizemaxChunkSizebefore the loop and reusing it by re-slicing (reset length to 0) each iteration.
4. mchloggelf/udp_test.go:49 — 🟢 Suggestion · Bugs / Unhandled Error
listener.SetReadDeadline(...)returns an error that is silently ignored throughout the test file (lines 49, 104, 172). While a test-only concern, ifSetReadDeadlinefails, the test will block indefinitely onReadFromUDP. Uset.Helper()and check:if err := listener.SetReadDeadline(...); err != nil { t.Fatalf(...) }.
5. logger.go:135 — 🟡 Attention · Bugs / Logic Error
The doc comment for
DisableFileOutputstates: 'When disabled before Initialize(), no log directories or files are created.' This is now true given the updatedInitializeMchLogguard. However, the comment onClose()(line 140) does not mention thatClose()only closes UDP resources — it does not flush or close any file handles. If file logging uses buffered writers,Close()could silently lose buffered log entries. Verify (and document) whether the file backends need their own Close/Flush, or clarify that file handles are not managed byClose().
6. mchloggelf/gelf_test.go:50 — 🟢 Suggestion · Bugs / Logic Error
The test calls
os.Hostname()a second time (line 50) to validatemsg.Host, butgetHostname()uses async.Oncethat caches the result. IfhostnameOncehas already fired (e.g., from a prior test in the same process), but the hostname changes between process start and this test call (unlikely but possible in some CI setups), the cached value and the freshos.Hostname()call can diverge. More practically: sincecachedHostnameis a package-level variable initialized viasync.Once, tests that want to verify the hostname caching behavior cannot reset it between runs. This is a test isolation issue; consider exposing a test hook or using the exportedgetHostname()directly rather than a secondos.Hostname()call.
7. mchloggelf/gelf.go:108 — 🟢 Suggestion · Bugs / Logic Error
float64(time.Now().UnixNano()) / 1e9loses sub-nanosecond precision but more importantly can lose microsecond precision for large UnixNano values due to float64 mantissa limitations (float64 has 53 bits of mantissa; UnixNano is ~63 bits in 2024). For timestamps after ~2262 this overflows, but even now precision is limited to ~microseconds. Consider usingfloat64(time.Now().Unix()) + float64(time.Now().Nanosecond())/1e9or storing seconds and fractional seconds separately:sec := time.Now().Unix(); nsec := time.Now().Nanosecond(); ts := float64(sec) + float64(nsec)/1e9. The GELF spec recommends this approach.
Suggestions
- The
closeLockedfunction blocks under the write lock while waiting for the worker goroutine to drain (<-udpDone). This means allLogSubjectcallers (which need RLock) are blocked during shutdown. Consider a pattern where you capturechanddonelocally, set the globals to nil, release the lock, then closechand wait ondoneoutside the lock. This requires careful sequencing but prevents the lock being held during potentially slow drain. - Consider adding an integration test that exercises
SetUDPTarget→LogSubject→CloseUDPunder-raceflag to catch any remaining concurrency issues not covered by unit tests. - The README example at line 90 shows
logger.SetUDPTarget(...)called beforelogger.Initialize(), but the doc comment forSetUDPTargetin logger.go (line 115-123) does not mention this ordering requirement (unlike the removed comment that said 'Must be called before Initialize()'). SinceInitialize()also callsmchlogcore.SetUDPTargetfrom env vars, callingSetUDPTargetbeforeInitialize()will result in it being called twice ifMCHLOG_UDP_TARGETis also set. Document the precedence (env var wins if set, since it's applied inInitialize()afterSetUDPTarget). - The
addrfield inUDPTransportis stored but never used after construction (the connection is established vianet.DialUDPand stored inconn). Consider removing theaddrfield to reduce struct size and avoid confusion. - For observability, consider adding a metrics counter or callback for dropped messages (when the UDP buffer is full and the
defaultbranch fires inLogSubject). Currently these are only logged vialog.Printf, which itself goes to stderr and could be missed in production.
mchlogcore/mchlog.go
Outdated
| // LogSubject goroutine is mid-send (they hold RLock during send), so closing the | ||
| // channel is safe. The worker goroutine uses captured references and does not | ||
| // need the mutex, so waiting on udpDone under the write lock cannot deadlock. | ||
| func closeLocked() { |
There was a problem hiding this comment.
🔴 Critical · Bugs / Logic Error
closeLocked calls <-udpDone (line 103) while holding the write lock on udpMu (the caller holds it via defer udpMu.Unlock()). The worker goroutine in startWorkerLocked calls transport.Send(msg) → t.mu.Lock() without ever acquiring udpMu, so that part is fine. However, the log.Printf call inside the worker goroutine (log.Printf("[mchlog] failed to send GELF message via UDP: %v", err)) is not the issue. The real concern: if any future code path inside the worker goroutine's loop ever tries to acquire udpMu (e.g., calling a function that locks it), <-udpDone will deadlock. More immediately: closeLocked is called from SetUDPTarget (which holds write lock) — the <-udpDone wait will block until the worker drains and exits. If the worker's send blocks for a long time (e.g., slow UDP write), the write lock is held for the entire duration, blocking ALL readers (LogSubject) and all other callers of SetUDPTarget/CloseUDP. Consider closing the channel and releasing the lock before waiting on udpDone, capturing done locally before unlocking.
|
|
||
| // MarshalJSON implements custom JSON marshaling that merges standard GELF fields | ||
| // with extra fields prefixed with underscore, as required by the GELF spec. | ||
| func (m *GELFMessage) MarshalJSON() ([]byte, error) { |
There was a problem hiding this comment.
🟢 Suggestion · Performance / Missing Cache
MarshalJSON allocates a new map[string]any (named fields) on every call. For high-throughput logging, this generates significant GC pressure. Consider pre-computing the fixed GELF fields or using a struct with encoding/json tags for the non-extra fields, and only building the combined map when extra fields are present. Alternatively, use json.RawMessage or a custom encoder that writes directly to a buffer.
| end = len(data) | ||
| } | ||
|
|
||
| chunk := make([]byte, 0, chunkHeaderSize+end-start) |
There was a problem hiding this comment.
🟢 Suggestion · Performance / Missing Cache
Inside sendChunked, a new chunk slice is allocated per iteration (make([]byte, 0, chunkHeaderSize+end-start)). For messages requiring many chunks, this is up to 128 allocations. Consider allocating a single chunk buffer of size maxChunkSize before the loop and reusing it by re-slicing (reset length to 0) each iteration.
|
|
||
| // Read from listener | ||
| buf := make([]byte, maxChunkSize) | ||
| listener.SetReadDeadline(time.Now().Add(2 * time.Second)) |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Unhandled Error
listener.SetReadDeadline(...) returns an error that is silently ignored throughout the test file (lines 49, 104, 172). While a test-only concern, if SetReadDeadline fails, the test will block indefinitely on ReadFromUDP. Use t.Helper() and check: if err := listener.SetReadDeadline(...); err != nil { t.Fatalf(...) }.
| // DisableFileOutput disables file-based log output. | ||
| // When called, logs are only sent via UDP (if configured). | ||
| // This is a global setting shared across all Logger instances. | ||
| // When disabled before Initialize(), no log directories or files are created. |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
The doc comment for DisableFileOutput states: 'When disabled before Initialize(), no log directories or files are created.' This is now true given the updated InitializeMchLog guard. However, the comment on Close() (line 140) does not mention that Close() only closes UDP resources — it does not flush or close any file handles. If file logging uses buffered writers, Close() could silently lose buffered log entries. Verify (and document) whether the file backends need their own Close/Flush, or clarify that file handles are not managed by Close().
| if msg.Host == "" { | ||
| t.Error("host should never be empty (GELF spec requires it)") | ||
| } | ||
| hostname, _ := os.Hostname() |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
The test calls os.Hostname() a second time (line 50) to validate msg.Host, but getHostname() uses a sync.Once that caches the result. If hostnameOnce has already fired (e.g., from a prior test in the same process), but the hostname changes between process start and this test call (unlikely but possible in some CI setups), the cached value and the fresh os.Hostname() call can diverge. More practically: since cachedHostname is a package-level variable initialized via sync.Once, tests that want to verify the hostname caching behavior cannot reset it between runs. This is a test isolation issue; consider exposing a test hook or using the exported getHostname() directly rather than a second os.Hostname() call.
| msg := &GELFMessage{ | ||
| Version: "1.1", | ||
| Host: getHostname(), | ||
| Timestamp: float64(time.Now().UnixNano()) / 1e9, |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
float64(time.Now().UnixNano()) / 1e9 loses sub-nanosecond precision but more importantly can lose microsecond precision for large UnixNano values due to float64 mantissa limitations (float64 has 53 bits of mantissa; UnixNano is ~63 bits in 2024). For timestamps after ~2262 this overflows, but even now precision is limited to ~microseconds. Consider using float64(time.Now().Unix()) + float64(time.Now().Nanosecond())/1e9 or storing seconds and fractional seconds separately: sec := time.Now().Unix(); nsec := time.Now().Nanosecond(); ts := float64(sec) + float64(nsec)/1e9. The GELF spec recommends this approach.
|
/mole review |
There was a problem hiding this comment.
Mole Review
🔴 67/100
🐭 Mole dug deep into this PR! Found 9 issues to review. Score: 67/100
Issues
1. mchlogcore/mchlog.go:221 — 🟡 Attention · Bugs / Logic Error
When
fileOutputEnabledis false, theversionNamevariable stays as"V1"(its default value, set on line 220) regardless ofcurrentVersion. The subsequent log call (after this block) will always print"V1"even ifcurrentVersion == V2and file output is disabled. TheversionNameassignment inside theelsebranch (line 225) is only reached when file output is enabled. MoveversionNameassignment outside thefileOutputEnabledguard, or replicate the version name logic unconditionally before the guard.
2. mchlogcore/mchlog.go:78 — 🟡 Attention · Bugs / Logic Error
In
SetUDPTarget, between Phase 2 (waitAndClose) and Phase 4 (acquiring the write lock), another concurrentSetUDPTargetcall could also reach Phase 4 and install a new worker. The inner detach on lines 78-84 then tears down that newly-installed worker before the current call installs its own. This means the newly created transportt(built in Phase 3 of this call) is installed while the concurrently-created transport is discarded — but the concurrently-created transport is closed viainnerTransport.Close()correctly. The subtle issue is thatinnerDoneis waited on while holding the write lock (line 80:<-innerDone). Because the write lock is held,LogSubjectcannot acquireRLockand cannot enqueue new messages, so the worker drains only what is already buffered — which is correct and bounded. However, this is a blocking wait under a write lock and can stall all logging for the entire drain period. Consider documenting this explicitly, or accept the trade-off as intentional since concurrentSetUDPTargetcalls are expected to be rare.
3. mchloggelf/gelf_test.go:75 — 🟢 Suggestion · Bugs / Logic Error
The timestamp bounds check
msg.Timestamp < now-5 || msg.Timestamp > now+1usesnowcomputed afterNewGELFMessagewas called. On a loaded system, ifNewGELFMessagetook more than 1 second (unlikely but possible in CI under load),msg.Timestamp > now+1would always be false anyway since msg.Timestamp < now. The upper bound check is logically inverted — a timestamp from beforenowcan never exceednow+1. The intent is probably to checkmsg.Timestamp > now(i.e., it was not set in the future), which the current check does correctly via the lower bound. No change needed for correctness, but consider adding a comment explaining the ±window.
4. mchloggelf/udp_test.go:153 — 🟢 Suggestion · Bugs / Logic Error
The large payload is created as
make([]byte, 10000)filled with'A', then used asmsg.ShortMessage = string(largePayload). However, the actual JSON-encoded GELF message will be significantly larger than 10000 bytes due to JSON escaping overhead and the other GELF fields. The test relies on the final marshaled output exceedingmaxChunkSize(8192 bytes) to trigger chunking. This currently works because 10000 bytes of 'A' in a JSON string is well over 8192, but it would be cleaner to document the expected chunk count or assert it precisely rather than justlen(chunks) < 2.
5. logger.go:83 — 🟡 Attention · Bugs / Logic Error
The environment variable
MCHLOG_FILE_OUTPUTis only checked for the value"false"(case-insensitive). Any other value — including"0","no","off"— is silently treated astrue(file output enabled). This is inconsistent with how some users may expect boolean env vars to behave. The doc comment and README only mention"true"/"false", so this is acceptable, but it should be explicitly documented (or"0"should also be accepted) to avoid operator confusion.
6. mchlogcore/mchlog.go:183 — 🟡 Attention · Performance / Blocking I/O in Hot Path
mchloggelf.NewGELFMessage(includingcontentToMapwith JSON unmarshaling and map allocation) is called insideLogSubjectwhile holdingudpMu.RLock(). JSON unmarshaling can be slow for large payloads. SinceRLockis held to prevent the channel from being closed mid-send, any slowNewGELFMessagecall extends the duration the read lock is held, delaying any concurrentSetUDPTargetorCloseUDPthat needs the write lock. Consider building the GELF message before acquiring the lock, then doing a quick nil-check + send under the lock. You'd need to handle the race whereudpChanbecomes nil between message creation and the lock, but this is safe since thedefaultcase in the select already handles a full buffer, and a closed-channel send is protected by the lock. Example: build message first, thenudpMu.RLock(), check nil, send or drop,udpMu.RUnlock().
7. mchloggelf/udp.go:43 — 🟡 Attention · Bugs / Resource Leak
In
NewUDPTransport, ifnet.DialUDPsucceeds but the function later panics (or a future code addition betweenDialUDPand the return fails), theconnwould leak. This is not currently a problem but could become one if error paths are added. More concretely: the current code is fine, but it's worth noting thatconnis not closed on any error path after line 43 — there are no error paths after it, so this is actually safe as written. No action needed.
8. mchloggelf/gelf.go:69 — 🟢 Suggestion · Bugs / Logic Error
The GELF spec prohibits
_idas an extra field (the code skipsk == "id"which becomes_id). However, it does not check for keys that already start with an underscore inm.Extra— such a keykwould become__k(double underscore) after the"_"+kprefix is applied. While this is technically allowed by GELF, it may produce unexpected field names if callers pass extra keys with leading underscores. Consider documenting this behavior or stripping leading underscores from extra keys before prefixing.
9. mchlogcore/mchlog.go:56 — 🟡 Attention · Bugs / Logic Error
In
SetUDPTarget, Phase 1 detaches the old worker (closes its channel, setsudpChan = nil) and releases the lock. Phase 3 creates a new transport viamchloggelf.NewUDPTransport(a network operation). If this network call fails, the function returns the error and exits — leavingudpChan = nilandudpTransport = nilpermanently. This means after a failedSetUDPTargetcall, UDP logging is silently disabled with no indication. The caller receives the error, but if they ignore it (as noted in previously-reported issue #9 for the env-var path), logging silently stops. Consider documenting that a failedSetUDPTargetdisables UDP output until a successful call is made.
Suggestions
- Consider adding an integration test that exercises the full path:
Logger.Initialize()withMCHLOG_UDP_TARGETset, sends a log message, and verifies receipt on a UDP listener. The current tests cover themchloggelfpackage in isolation but not themchlogcoreorchestration layer. - The
detachWorkerLocked+waitAndClosetwo-phase pattern is a significant improvement over the previous design. Consider adding a brief architecture comment at the top ofmchlog.goexplaining the locking protocol (write lock for detach/install, read lock held during send) to help future contributors understand why the pattern is structured this way. - The
gzipCompressfunction creates a newbytes.Bufferandgzip.Writeron every call. For high-throughput scenarios, consider using async.Poolofgzip.Writerinstances to reduce allocations, similar to how the standard library handles this internally. - The README example on line 90 shows
logger.SetUDPTarget(...)called beforelogger.Initialize(), but the comment onSetUDPTarget(logger.go line 116) says nothing about ordering relative toInitialize(). The old comment 'Must be called before Initialize()' was removed. Clarify in the doc comment whetherSetUDPTargetcan be called afterInitialize()and what the effect is (it can, and it will replace the existing target).
| } else { | ||
| versionName = "V2" | ||
| mchlogcorev2.InitializeMchLog(path) | ||
| if fileOutputEnabled.Load() { |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
When fileOutputEnabled is false, the versionName variable stays as "V1" (its default value, set on line 220) regardless of currentVersion. The subsequent log call (after this block) will always print "V1" even if currentVersion == V2 and file output is disabled. The versionName assignment inside the else branch (line 225) is only reached when file output is enabled. Move versionName assignment outside the fileOutputEnabled guard, or replicate the version name logic unconditionally before the guard.
| // worker, shut it down first (last caller wins). This wait is bounded: | ||
| // the write lock prevents new sends, so the worker only drains what is | ||
| // already in the buffer. | ||
| innerDone, innerTransport := detachWorkerLocked() |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
In SetUDPTarget, between Phase 2 (waitAndClose) and Phase 4 (acquiring the write lock), another concurrent SetUDPTarget call could also reach Phase 4 and install a new worker. The inner detach on lines 78-84 then tears down that newly-installed worker before the current call installs its own. This means the newly created transport t (built in Phase 3 of this call) is installed while the concurrently-created transport is discarded — but the concurrently-created transport is closed via innerTransport.Close() correctly. The subtle issue is that innerDone is waited on while holding the write lock (line 80: <-innerDone). Because the write lock is held, LogSubject cannot acquire RLock and cannot enqueue new messages, so the worker drains only what is already buffered — which is correct and bounded. However, this is a blocking wait under a write lock and can stall all logging for the entire drain period. Consider documenting this explicitly, or accept the trade-off as intentional since concurrent SetUDPTarget calls are expected to be rare.
| } | ||
|
|
||
| // Timestamp should be recent | ||
| now := float64(time.Now().UnixNano()) / 1e9 |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
The timestamp bounds check msg.Timestamp < now-5 || msg.Timestamp > now+1 uses now computed after NewGELFMessage was called. On a loaded system, if NewGELFMessage took more than 1 second (unlikely but possible in CI under load), msg.Timestamp > now+1 would always be false anyway since msg.Timestamp < now. The upper bound check is logically inverted — a timestamp from before now can never exceed now+1. The intent is probably to check msg.Timestamp > now (i.e., it was not set in the future), which the current check does correctly via the lower bound. No change needed for correctness, but consider adding a comment explaining the ±window.
| defer transport.Close() | ||
|
|
||
| // Create a large message that will require chunking | ||
| largePayload := make([]byte, 10000) |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
The large payload is created as make([]byte, 10000) filled with 'A', then used as msg.ShortMessage = string(largePayload). However, the actual JSON-encoded GELF message will be significantly larger than 10000 bytes due to JSON escaping overhead and the other GELF fields. The test relies on the final marshaled output exceeding maxChunkSize (8192 bytes) to trigger chunking. This currently works because 10000 bytes of 'A' in a JSON string is well over 8192, but it would be cleaner to document the expected chunk count or assert it precisely rather than just len(chunks) < 2.
| } | ||
|
|
||
| // Check environment variable for file output | ||
| if v := os.Getenv(EnvFileOutput); strings.ToLower(v) == "false" { |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
The environment variable MCHLOG_FILE_OUTPUT is only checked for the value "false" (case-insensitive). Any other value — including "0", "no", "off" — is silently treated as true (file output enabled). This is inconsistent with how some users may expect boolean env vars to behave. The doc comment and README only mention "true"/"false", so this is acceptable, but it should be explicitly documented (or "0" should also be accepted) to avoid operator confusion.
| return | ||
| } | ||
|
|
||
| msg, err := mchloggelf.NewGELFMessage(subject, content, errLog) |
There was a problem hiding this comment.
🟡 Attention · Performance / Blocking I/O in Hot Path
mchloggelf.NewGELFMessage (including contentToMap with JSON unmarshaling and map allocation) is called inside LogSubject while holding udpMu.RLock(). JSON unmarshaling can be slow for large payloads. Since RLock is held to prevent the channel from being closed mid-send, any slow NewGELFMessage call extends the duration the read lock is held, delaying any concurrent SetUDPTarget or CloseUDP that needs the write lock. Consider building the GELF message before acquiring the lock, then doing a quick nil-check + send under the lock. You'd need to handle the race where udpChan becomes nil between message creation and the lock, but this is safe since the default case in the select already handles a full buffer, and a closed-channel send is protected by the lock. Example: build message first, then udpMu.RLock(), check nil, send or drop, udpMu.RUnlock().
| return nil, fmt.Errorf("failed to resolve UDP address %q: %w", address, err) | ||
| } | ||
|
|
||
| conn, err := net.DialUDP("udp", nil, addr) |
There was a problem hiding this comment.
🟡 Attention · Bugs / Resource Leak
In NewUDPTransport, if net.DialUDP succeeds but the function later panics (or a future code addition between DialUDP and the return fails), the conn would leak. This is not currently a problem but could become one if error paths are added. More concretely: the current code is fine, but it's worth noting that conn is not closed on any error path after line 43 — there are no error paths after it, so this is actually safe as written. No action needed.
|
|
||
| // Extra fields with underscore prefix | ||
| for k, v := range m.Extra { | ||
| if k == "id" { |
There was a problem hiding this comment.
🟢 Suggestion · Bugs / Logic Error
The GELF spec prohibits _id as an extra field (the code skips k == "id" which becomes _id). However, it does not check for keys that already start with an underscore in m.Extra — such a key k would become __k (double underscore) after the "_"+k prefix is applied. While this is technically allowed by GELF, it may produce unexpected field names if callers pass extra keys with leading underscores. Consider documenting this behavior or stripping leading underscores from extra keys before prefixing.
| func SetUDPTarget(address string, compress bool) error { | ||
| // Phase 1: under write lock, close the channel and capture references | ||
| // to the old worker so we can wait on it without holding the lock. | ||
| udpMu.Lock() |
There was a problem hiding this comment.
🟡 Attention · Bugs / Logic Error
In SetUDPTarget, Phase 1 detaches the old worker (closes its channel, sets udpChan = nil) and releases the lock. Phase 3 creates a new transport via mchloggelf.NewUDPTransport (a network operation). If this network call fails, the function returns the error and exits — leaving udpChan = nil and udpTransport = nil permanently. This means after a failed SetUDPTarget call, UDP logging is silently disabled with no indication. The caller receives the error, but if they ignore it (as noted in previously-reported issue #9 for the env-var path), logging silently stops. Consider documenting that a failed SetUDPTarget disables UDP output until a successful call is made.
|
/mole deep-review |
There was a problem hiding this comment.
Mole Review
🟡 85/100
🐭 A toupeira cavou fundo nesse PR! Encontrou 3 problemas para revisar. Score: 85/100
Diagram
sequenceDiagram
participant App as Aplicação
participant Logger as Logger
participant Core as mchlogcore
participant GELF as mchloggelf
participant Worker as UDP Worker (goroutine)
participant UDP as Servidor GELF/UDP
App->>Logger: Initialize()
Logger->>Core: SetUDPTarget(address, compress)
Core->>GELF: NewUDPTransport(address, compress)
GELF-->>Core: *UDPTransport
Core->>Core: startWorkerLocked()
Core-->>Worker: goroutine iniciada
App->>Logger: Info(content)
Logger->>Core: LogSubject("info", content, nil)
Core->>Core: fileOutputEnabled.Load()?
alt File habilitado
Core->>Core: mchlogcorev1/v2.LogSubject()
end
Core->>Core: udpMu.RLock()
Core->>GELF: NewGELFMessage(subject, content, err)
GELF-->>Core: *GELFMessage
Core->>Worker: udpChan <- msg
Core->>Core: udpMu.RUnlock()
Worker->>GELF: transport.Send(msg)
GELF->>GELF: MarshalJSON + gzipCompress
GELF->>UDP: conn.Write(data)
App->>Logger: Close()
Logger->>Core: CloseUDP()
Core->>Core: udpMu.Lock()
Core->>Core: detachWorkerLocked() → close(udpChan)
Core->>Core: udpMu.Unlock()
Core->>Worker: range ch termina
Worker->>Worker: close(done)
Core->>GELF: transport.Close()
GELF->>GELF: conn.Close()
Issues
1. mchlogcore/mchlog.go:183 — 🟡 Atenção · Performance / Blocking I/O in Hot Path
NewGELFMessage(que inclui JSON unmarshal, alocações de map, etc.) é executado dentro do escopo doudpMu.RLock(). Isso significa que toda chamada aLogSubjectque envia via UDP mantém o read lock durante a criação da mensagem GELF, atrasando qualquerSetUDPTargetouCloseUDPconcorrente que precisa do write lock. Considere construir a mensagem GELF antes de adquirir o RLock, e sob o lock fazer apenas o nil-check + send. Nota: este ponto já foi mencionado (#38), mas a severidade aqui é reforçada porque o lock é mantido por toda a operação de marshaling, não apenas o send.
2. mchlogcore/mchlog.go:56 — 🟡 Atenção · Bugs / Race Condition
Entre a Phase 1 (
udpMu.Unlock()na linha 58) e a Phase 4 (udpMu.Lock()na linha 71), qualquer chamada aLogSubjectveráudpChan == nile silenciosamente descartará mensagens. Este é um período de 'blackout' de UDP durante a troca de target, que inclui uma operação de rede (NewUDPTransportna linha 65) e a espera pelo drain do worker antigo (waitAndClosena linha 62). Em ambientes de alta taxa de log, mensagens serão perdidas durante esse intervalo. Considere documentar esse comportamento ou usar um padrão que mantenha o worker antigo ativo até que o novo esteja pronto (hot-swap).
3. mchloggelf/udp.go:138 — 🟡 Atenção · Performance / Missing Cache
gzipCompressaloca um novobytes.Buffere um novogzip.Writerem cada chamada. No caminho quente de logging com compressão habilitada, isso gera pressão significativa no GC. Considere usarsync.Poolpara reutilizar buffers e writers GZIP.
| return | ||
| } | ||
|
|
||
| msg, err := mchloggelf.NewGELFMessage(subject, content, errLog) |
There was a problem hiding this comment.
🟡 Atenção · Performance / Blocking I/O in Hot Path
NewGELFMessage (que inclui JSON unmarshal, alocações de map, etc.) é executado dentro do escopo do udpMu.RLock(). Isso significa que toda chamada a LogSubject que envia via UDP mantém o read lock durante a criação da mensagem GELF, atrasando qualquer SetUDPTarget ou CloseUDP concorrente que precisa do write lock. Considere construir a mensagem GELF antes de adquirir o RLock, e sob o lock fazer apenas o nil-check + send. Nota: este ponto já foi mencionado (#38), mas a severidade aqui é reforçada porque o lock é mantido por toda a operação de marshaling, não apenas o send.
| func SetUDPTarget(address string, compress bool) error { | ||
| // Phase 1: under write lock, close the channel and capture references | ||
| // to the old worker so we can wait on it without holding the lock. | ||
| udpMu.Lock() |
There was a problem hiding this comment.
🟡 Atenção · Bugs / Race Condition
Entre a Phase 1 (udpMu.Unlock() na linha 58) e a Phase 4 (udpMu.Lock() na linha 71), qualquer chamada a LogSubject verá udpChan == nil e silenciosamente descartará mensagens. Este é um período de 'blackout' de UDP durante a troca de target, que inclui uma operação de rede (NewUDPTransport na linha 65) e a espera pelo drain do worker antigo (waitAndClose na linha 62). Em ambientes de alta taxa de log, mensagens serão perdidas durante esse intervalo. Considere documentar esse comportamento ou usar um padrão que mantenha o worker antigo ativo até que o novo esteja pronto (hot-swap).
| } | ||
|
|
||
| // gzipCompress compresses data using GZIP. | ||
| func gzipCompress(data []byte) ([]byte, error) { |
There was a problem hiding this comment.
🟡 Atenção · Performance / Missing Cache
gzipCompress aloca um novo bytes.Buffer e um novo gzip.Writer em cada chamada. No caminho quente de logging com compressão habilitada, isso gera pressão significativa no GC. Considere usar sync.Pool para reutilizar buffers e writers GZIP.
Descrição
Adiciona suporte ao envio de logs via UDP no formato GELF 1.1 (Graylog Extended Log Format), permitindo integração com plataformas de observabilidade como Graylog, Logstash/Kibana, Fluentd e Datadog.
Commit 1 - 6b09c0e - Teste de envio de logs via UDP:
Commit 2 - 490dca5 - Correções de memory leaking e performance:
Tipo de Mudança
Checklist
Como testar as alterações
Testes automatizados:
go test ./... -v
Teste manual com Graylog:
export MCHLOG_UDP_TARGET=graylog.example.com:12201
Teste manual com netcat:
nc -lu 12201 # Em um terminal
Rodar a aplicação com MCHLOG_UDP_TARGET=127.0.0.1:12201 em outro terminal
Verificar que o JSON GELF aparece no netcat
Impactos no Deploy
Dependências
Nenhuma dependência externa nova. A implementação utiliza exclusivamente a stdlib do Go (net, compress/gzip, encoding/json, crypto/rand).
Issues
N/A
Observações Adicionais