Skip to content

MchLogToolkitGo - Envio de logs via UDP no formato GELF #15

Open
ramon-gaudium wants to merge 6 commits intomainfrom
topic/envio-via-udp
Open

MchLogToolkitGo - Envio de logs via UDP no formato GELF #15
ramon-gaudium wants to merge 6 commits intomainfrom
topic/envio-via-udp

Conversation

@ramon-gaudium
Copy link
Copy Markdown

Descrição

Adiciona suporte ao envio de logs via UDP no formato GELF 1.1 (Graylog Extended Log Format), permitindo integração com plataformas de observabilidade como Graylog, Logstash/Kibana, Fluentd e Datadog.

Commit 1 - 6b09c0e - Teste de envio de logs via UDP:

  • Criação do pacote mchloggelf/ com formatação GELF 1.1 e transporte UDP
  • Implementação de chunking (fragmentação para mensagens > 8192 bytes) e compressão GZIP
  • Modificação do facade mchlogcore/mchlog.go para dispatch dual (arquivo + UDP)
  • Novos métodos na API pública: SetUDPTarget, SetUDPTargetWithOptions, DisableFileOutput, Close
  • Suporte a configuração via variáveis de ambiente (MCHLOG_UDP_TARGET, MCHLOG_UDP_COMPRESS, MCHLOG_FILE_OUTPUT)
  • Testes unitários para GELF e UDP (13 testes)

Commit 2 - 490dca5 - Correções de memory leaking e performance:

  • Correção de vazamento de conexão: SetUDPTarget agora fecha a conexão anterior antes de criar uma nova
  • Substituição de goroutines ilimitadas por canal com buffer (1000) + worker dedicada, evitando exaustão de recursos (OOM/CPU spike)
  • Adição de log para stderr em caso de falha no envio UDP ou buffer cheio

Tipo de Mudança

  • Nova Funcionalidade: Adição de uma nova funcionalidade que enriquece o projeto.
  • Segurança: Correções ou melhorias que aumentam a segurança do projeto.
  • Testes: Adição ou melhoria nos testes do projeto, incluindo testes unitários, bdd, etc.

Checklist

  • Testei o código localmente
  • Revisei o código (self-review)
  • Comentei meu código, especialmente em áreas difíceis de entender
  • Fiz alterações correspondentes na documentação
  • Minhas alterações não geram novos warnings
  • Novos e existentes testes unitários passam localmente com minhas alterações
  • Chequei se o PR cumpre os critérios de aceitação da issue relacionada

Como testar as alterações

Testes automatizados:
go test ./... -v

Teste manual com Graylog:

  1. Configurar um GELF UDP Input no Graylog na porta 12201
  2. Definir a variável de ambiente e rodar a aplicação:
    export MCHLOG_UDP_TARGET=graylog.example.com:12201
  3. Verificar no dashboard do Graylog que as mensagens chegam no formato GELF 1.1

Teste manual com netcat:
nc -lu 12201 # Em um terminal

Rodar a aplicação com MCHLOG_UDP_TARGET=127.0.0.1:12201 em outro terminal

Verificar que o JSON GELF aparece no netcat

Impactos no Deploy

  • Nenhuma alteração breaking: sem SetUDPTarget, o comportamento é idêntico ao anterior (somente arquivo)
  • Novas variáveis de ambiente opcionais: MCHLOG_UDP_TARGET, MCHLOG_UDP_COMPRESS, MCHLOG_FILE_OUTPUT
  • Rede: a porta UDP do servidor GELF (ex: 12201) deve estar acessível a partir do pod/container da aplicação

Dependências

Nenhuma dependência externa nova. A implementação utiliza exclusivamente a stdlib do Go (net, compress/gzip, encoding/json, crypto/rand).

Issues

N/A

Observações Adicionais

  • O tráfego UDP não é criptografado. Se os logs contiverem dados sensíveis e trafegarem por redes não confiáveis, considerar posicionar o Graylog na mesma VPC/rede privada.
  • O buffer interno de 1000 mensagens protege contra picos de log. Mensagens descartadas por buffer cheio são reportadas via stderr.

Copy link
Copy Markdown

@mole-reviewer mole-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mole Review

🔴 0/100

🐭 Mole dug deep into this PR! Found 15 issues to review. Score: 0/100

Issues

1. mchlogcore/mchlog.go:27 — 🔴 Critical · Bugs / Race Condition

Package-level variables udpTransport, fileOutputEnabled, udpChan, udpOnce, and udpDone are accessed from multiple goroutines without synchronization. SetUDPTarget, SetFileOutput, CloseUDP, and LogSubject can all be called concurrently. For example, LogSubject reads udpChan (line 115) and fileOutputEnabled (line 107) without a lock, while SetUDPTarget and closeUDPWorker write them. This is a data race. Protect all access to these shared variables with a sync.RWMutex.

2. mchlogcore/mchlog.go:63 — 🔴 Critical · Bugs / Race Condition

udpOnce is reassigned with udpOnce = sync.Once{} in startUDPWorker(), but sync.Once must not be copied after first use, and reassigning it while it might be read concurrently is a race. Moreover, udpOnce is declared but never actually used to guard anything in the current code — startUDPWorker is called directly and udpOnce.Do(...) is never invoked. Remove udpOnce entirely or use it correctly to guard startUDPWorker invocation.

3. mchlogcore/mchlog.go:45 — 🔴 Critical · Bugs / Race Condition

In SetUDPTarget, the check if udpTransport != nil and subsequent closeUDPWorker() / assignment of udpTransport are not protected by a mutex. Concurrent calls to SetUDPTarget or concurrent calls between SetUDPTarget and LogSubject (which reads udpChan) create a data race. Wrap the entire function body in a mutex.

4. mchlogcore/mchlog.go:79 — 🔴 Critical · Bugs / Race Condition

closeUDPWorker closes udpChan and then waits on udpDone. However, if LogSubject is concurrently executing the select { case udpChan <- msg: ... } block, closing a channel that another goroutine may be sending to causes a panic ('send on closed channel'). The non-blocking select in LogSubject mitigates this partially, but there is still a window between the nil check (if udpChan != nil) and the send where the channel can be closed. A mutex guarding both the nil check and the send, and also held during close, is required.

5. mchlogcore/mchlog.go:93 — 🔴 Critical · Bugs / Race Condition

fileOutputEnabled is a plain bool written by SetFileOutput and read by LogSubject without any synchronization. Use sync/atomic (atomic.Bool in Go 1.19+) or protect with a mutex.

6. mchlogcore/mchlog.go:115 — 🔴 Critical · Bugs / Race Condition

udpChan is read (nil-checked and sent to) in LogSubject while it can be concurrently written (set to nil or replaced) by closeUDPWorker and startUDPWorker. Even the nil check followed by send is not atomic — the channel could be closed between the check and the send, causing a panic. Guard with a mutex.

7. mchloggelf/gelf.go:82 — 🟡 Attention · Bugs / Unhandled Error

os.Hostname() error is silently ignored with _. If hostname resolution fails (possible in some container environments), hostname will be an empty string, producing a GELF message with an empty host field, which violates the GELF spec (host is required). Consider falling back to "unknown" or logging the error: hostname, err := os.Hostname(); if err != nil { hostname = "unknown" }.

8. mchloggelf/gelf.go:158 — 🟡 Attention · Bugs / Logic Error

The default case in contentToMap calls json.Unmarshal([]byte("{}"), new(map[string]any)) and returns the error (which will be nil) alongside a nil map. This means for any unrecognized content type, the function returns (nil, nil). The caller in NewGELFMessage checks if err != nil and takes the fallback path only on error, so with a nil map and nil error it will proceed to range over a nil map (harmless in Go) but silently drop content. This should return an explicit error: return nil, fmt.Errorf("unsupported content type: %T", content).

9. mchlogcore/mchlog.go:71 — 🟡 Attention · Bugs / Unhandled Error

When SetUDPTarget is called from Initialize() based on the environment variable, errors are silently swallowed (the return value is not checked). If UDP setup fails (e.g., bad address format in the env var), the operator gets no feedback. Consider logging the error with log.Printf at minimum.

10. mchloggelf/udp.go:73 — 🟡 Attention · Bugs / Logic Error

After t.conn.Close() is called (via Close()), subsequent calls to Send will attempt t.conn.Write(data) on a closed connection and return an error. However t.conn is never set to nil after close, so there is no guard. While the error is propagated, a nil-set after close would make the behavior more explicit. More importantly, t.mu is not held between the nil-check of t.conn in Close() and the Write call in Send(), but since the mutex IS used in both methods this is actually fine — just note that the t.conn != nil check in Close() is redundant since net.UDPConn.Close() handles nil safely.

11. mchlogcore/mchlog.go:30 — 🟢 Suggestion · Architecture / Tight Coupling

Package-level mutable state (udpTransport, udpChan, fileOutputEnabled, etc.) makes this package difficult to test in isolation and prevents multiple logger instances from having independent UDP configurations. Consider encapsulating this state in a struct.

12. mchloggelf/gelf.go:88 — 🟡 Attention · Bugs / Logic Error

LevelToSyslog(subject) maps the subject string (which is a log level like "info", "error") to a syslog level. However, if the content map contains its own "level" field with a different value (line 109 deletes it), and subject is something other than a standard level (e.g., a category name), the syslog level will default to SyslogInformational. This may be intentional but is worth documenting clearly, as the semantics of subject in the existing logger API may not always be a log level.

13. mchloggelf/udp.go:83 — 🟢 Suggestion · Bugs / Logic Error

chunkCount is computed as an int. With maxChunkDataSize = 8180 and very large messages, this could theoretically overflow on 32-bit platforms, though in practice messages are limited to 128 chunks × 8180 bytes ≈ 1MB which is well within int range. The maxChunks check on line 84 correctly bounds this. No action needed but worth a comment.

14. mchlogcore/mchlog.go:116 — 🟢 Suggestion · Bugs / Unhandled Error

When NewGELFMessage returns an error (e.g., for an unsupported content type), the error is silently dropped (if err == nil). Consider logging a warning so operators know that GELF messages are being skipped due to content parse failures.

15. mchloggelf/gelf_test.go:47 — 🟢 Suggestion · Bugs / Logic Error

In CI environments running in containers, os.Hostname() may return a different value depending on timing. The test directly compares msg.Host to the result of a second os.Hostname() call, which is fine, but if hostname resolution fails for both calls, both return empty string and the test passes vacuously. Consider asserting hostname != "".

Suggestions

  • Introduce a sync.RWMutex in mchlogcore/mchlog.go to protect all concurrent access to udpChan, udpTransport, udpDone, and fileOutputEnabled. The worker goroutine reads from udpChan (safe as it's the sole reader), but writes from LogSubject and the close from closeUDPWorker must be synchronized.
  • Remove udpOnce since it is never actually used — it's declared, assigned, and reset but udpOnce.Do(...) is never called. This is dead code that creates confusion about the intended initialization guarantee.
  • Consider adding an integration test or example that exercises Initialize() with MCHLOG_UDP_TARGET set to a local listener to verify the environment-variable-driven configuration path end-to-end.
  • The contentToMap default case should return a meaningful error rather than performing a no-op unmarshal. This makes the failure mode explicit and ensures the caller's fallback logic is triggered correctly for unknown content types.
  • Document the thread-safety guarantees (or lack thereof) of SetUDPTarget, SetFileOutput, and CloseUDP in their godoc comments, so callers know whether these are safe to call concurrently with logging.
  • Consider using os/hostname caching (compute once at startup) in NewGELFMessage rather than calling os.Hostname() on every log message, which is a syscall on every invocation and could be a minor performance concern in high-throughput scenarios.

)

var currentVersion = V1
var udpTransport *mchloggelf.UDPTransport
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Race Condition


Package-level variables udpTransport, fileOutputEnabled, udpChan, udpOnce, and udpDone are accessed from multiple goroutines without synchronization. SetUDPTarget, SetFileOutput, CloseUDP, and LogSubject can all be called concurrently. For example, LogSubject reads udpChan (line 115) and fileOutputEnabled (line 107) without a lock, while SetUDPTarget and closeUDPWorker write them. This is a data race. Protect all access to these shared variables with a sync.RWMutex.

func startUDPWorker() {
udpChan = make(chan *mchloggelf.GELFMessage, udpBufferSize)
udpDone = make(chan struct{})
udpOnce = sync.Once{}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Race Condition


udpOnce is reassigned with udpOnce = sync.Once{} in startUDPWorker(), but sync.Once must not be copied after first use, and reassigning it while it might be read concurrently is a race. Moreover, udpOnce is declared but never actually used to guard anything in the current code — startUDPWorker is called directly and udpOnce.Do(...) is never invoked. Remove udpOnce entirely or use it correctly to guard startUDPWorker invocation.

// If compress is true, messages will be GZIP compressed.
func SetUDPTarget(address string, compress bool) error {
// Close any existing connection to avoid file descriptor leaks
if udpTransport != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Race Condition


In SetUDPTarget, the check if udpTransport != nil and subsequent closeUDPWorker() / assignment of udpTransport are not protected by a mutex. Concurrent calls to SetUDPTarget or concurrent calls between SetUDPTarget and LogSubject (which reads udpChan) create a data race. Wrap the entire function body in a mutex.


// closeUDPWorker drains the channel and waits for the worker to finish.
func closeUDPWorker() {
if udpChan != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Race Condition


closeUDPWorker closes udpChan and then waits on udpDone. However, if LogSubject is concurrently executing the select { case udpChan <- msg: ... } block, closing a channel that another goroutine may be sending to causes a panic ('send on closed channel'). The non-blocking select in LogSubject mitigates this partially, but there is still a window between the nil check (if udpChan != nil) and the send where the channel can be closed. A mutex guarding both the nil check and the send, and also held during close, is required.

// SetFileOutput enables or disables file-based log output.
// When disabled, logs are only sent via UDP (if configured).
func SetFileOutput(enabled bool) {
fileOutputEnabled = enabled
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Race Condition


fileOutputEnabled is a plain bool written by SetFileOutput and read by LogSubject without any synchronization. Use sync/atomic (atomic.Bool in Go 1.19+) or protect with a mutex.

var udpTransport *mchloggelf.UDPTransport
var fileOutputEnabled = true

var udpChan chan *mchloggelf.GELFMessage
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Architecture / Tight Coupling


Package-level mutable state (udpTransport, udpChan, fileOutputEnabled, etc.) makes this package difficult to test in isolation and prevents multiple logger instances from having independent UDP configurations. Consider encapsulating this state in a struct.

Version: "1.1",
Host: hostname,
Timestamp: float64(time.Now().UnixNano()) / 1e9,
Level: LevelToSyslog(subject),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


LevelToSyslog(subject) maps the subject string (which is a log level like "info", "error") to a syslog level. However, if the content map contains its own "level" field with a different value (line 109 deletes it), and subject is something other than a standard level (e.g., a category name), the syslog level will default to SyslogInformational. This may be intentional but is worth documenting clearly, as the semantics of subject in the existing logger API may not always be a log level.


// sendChunked splits the data into GELF chunks and sends each one.
func (t *UDPTransport) sendChunked(data []byte) error {
chunkCount := (len(data) + maxChunkDataSize - 1) / maxChunkDataSize
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


chunkCount is computed as an int. With maxChunkDataSize = 8180 and very large messages, this could theoretically overflow on 32-bit platforms, though in practice messages are limited to 128 chunks × 8180 bytes ≈ 1MB which is well within int range. The maxChunks check on line 84 correctly bounds this. No action needed but worth a comment.

}

if udpChan != nil {
msg, err := mchloggelf.NewGELFMessage(subject, content, errLog)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Unhandled Error


When NewGELFMessage returns an error (e.g., for an unsupported content type), the error is silently dropped (if err == nil). Consider logging a warning so operators know that GELF messages are being skipped due to content parse failures.

t.Errorf("version = %q, want %q", msg.Version, "1.1")
}

hostname, _ := os.Hostname()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


In CI environments running in containers, os.Hostname() may return a different value depending on timing. The test directly compares msg.Host to the result of a second os.Hostname() call, which is fine, but if hostname resolution fails for both calls, both return empty string and the test passes vacuously. Consider asserting hostname != "".

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for sending logs via UDP in GELF 1.1 format, enabling integration with observability platforms like Graylog. Key changes include new configuration methods and environment variables, an asynchronous UDP worker with GZIP compression and message chunking, and updated documentation. The review feedback identifies a critical compilation error in the GELF message processing logic, dead code in the core package, and several instances of ignored errors that should be handled or logged to improve the library's reliability.

Comment on lines +157 to +159
default:
return nil, json.Unmarshal([]byte("{}"), new(map[string]any))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

O default case no switch possui um erro de sintaxe (return com 3 valores para uma função que retorna 2) que impedirá a compilação. Além disso, se um tipo de content não suportado for passado, a lógica atual pode levar a um pânico de "nil pointer dereference" em NewGELFMessage.

Para tipos não suportados, a função deveria retornar um erro. Será necessário importar o pacote fmt para isso.

Suggested change
default:
return nil, json.Unmarshal([]byte("{}"), new(map[string]any))
}
default:
return nil, fmt.Errorf("unsupported content type: %T", c)

logger.go Outdated
compress = false
}
// Best-effort: if UDP setup fails, continue with file output
mchlogcore.SetUDPTarget(target, compress)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

O erro retornado por mchlogcore.SetUDPTarget está sendo ignorado. Se a configuração do UDP a partir de variáveis de ambiente falhar (por exemplo, por um endereço inválido), a falha será silenciosa e os logs UDP simplesmente não serão enviados. É recomendável tratar este erro, no mínimo registrando uma mensagem em stderr para alertar sobre a falha na configuração.

Para isso, será necessário importar o pacote log.

		if err := mchlogcore.SetUDPTarget(target, compress); err != nil {
			log.Printf("[mchlog] failed to set UDP target from environment: %v", err)
		}

func startUDPWorker() {
udpChan = make(chan *mchloggelf.GELFMessage, udpBufferSize)
udpDone = make(chan struct{})
udpOnce = sync.Once{}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A variável udpOnce é inicializada, mas nunca é utilizada. Recomendo remover esta linha para evitar código morto.

Comment on lines +97 to +100
func CloseUDP() error {
closeUDPWorker()
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A função CloseUDP sempre retorna nil, mas a chamada subjacente udpTransport.Close() pode retornar um erro que está sendo ignorado em closeUDPWorker. Isso pode ocultar problemas ao fechar a conexão UDP.

Sugiro refatorar closeUDPWorker para que retorne o erro (o que exigirá alterar sua assinatura) e CloseUDP o propague. A chamada em SetUDPTarget pode continuar ignorando o erro, pois o objetivo é apenas limpar a conexão anterior.

Exemplo de alteração em closeUDPWorker:

func closeUDPWorker() error {
    // ... (código existente)
	if udpTransport != nil {
		err := udpTransport.Close()
		udpTransport = nil
		return err
	}
	return nil
}
func CloseUDP() error {
	return closeUDPWorker()
}

// The content is expected to be JSON bytes (as produced by formatLog in logger.go)
// or a map[string]any / map[string]string.
func NewGELFMessage(subject string, content any, errLog error) (*GELFMessage, error) {
hostname, _ := os.Hostname()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

O erro de os.Hostname() está sendo ignorado. Se essa chamada falhar, a variável hostname será uma string vazia. O campo host é obrigatório na especificação GELF, e um valor vazio pode ser rejeitado pelo servidor de logs. Recomendo fornecer um valor padrão, como "unknown", em caso de falha.

	hostname, err := os.Hostname()
	if err != nil {
		hostname = "unknown"
	}

@ramon-gaudium
Copy link
Copy Markdown
Author

/mole review

Copy link
Copy Markdown

@mole-reviewer mole-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mole Review

🔴 42/100

🐭 Mole dug deep into this PR! Found 10 issues to review. Score: 42/100

Issues

1. mchlogcore/mchlog.go:98 — 🔴 Critical · Bugs / Logic Error

The closeLocked function unlocks udpMu mid-execution (line 98) and then re-locks it (line 100), but the function's callers (both SetUDPTarget at line 52 and CloseUDP at line 118) use defer udpMu.Unlock(). After closeLocked returns, the deferred unlock will attempt to unlock a mutex that is currently held — which is correct — but the temporary unlock creates a window where another goroutine calling SetUDPTarget can acquire the write lock, modify udpTransport/udpChan, and then closeLocked re-acquires and proceeds to set udpChan = nil and udpDone = nil on what may now be a different (newly started) channel. This can silently discard the new worker's channels. The unlock/re-lock pattern inside a function that is called under a lock is extremely fragile. Consider instead signaling the worker to stop without holding the lock, or using a separate 'draining' state flag so the worker can be waited on without releasing the outer lock.

2. mchlogcore/mchlog.go:139 — 🔴 Critical · Bugs / Race Condition

There is still a TOCTOU race between reading udpChan under RLock (lines 139-141) and sending to the captured ch variable (line 150). After udpMu.RUnlock() on line 141, closeLocked can run (acquiring the write lock), close the channel, and set udpChan = nil. The local variable ch still holds a reference to the now-closed channel. The subsequent case ch <- msg: at line 150 will then panic with 'send on closed channel'. The read lock must be held for the entire duration of the send — from the nil check through the completion of the channel send — to prevent closeLocked from closing the channel while a send is in progress. Move udpMu.RUnlock() to after the select block (or use defer udpMu.RUnlock() before the send).

3. mchlogcore/mchlog.go:129 — 🟡 Attention · Bugs / Logic Error

When fileOutputEnabled is false, file logging is completely skipped. However, mchlogcore.InitializeMchLog (called by Logger.Initialize()) still creates the log directories and file handles. This is a minor inconsistency but could confuse operators who set MCHLOG_FILE_OUTPUT=false expecting zero file I/O, only to find directories/files created. Consider conditionally calling InitializeMchLog based on fileOutputEnabled, or documenting this behavior.

4. logger.go:83 — 🟡 Attention · Bugs / Logic Error

SetFileOutput(false) is called before InitializeMchLog (line 87). However, if fileOutputEnabled is set to false here, but Initialize() then calls mchlogcore.InitializeMchLog(...) unconditionally, file structures are still initialized. Additionally, SetFileOutput affects a package-level global, meaning calling Initialize() on a second Logger instance (with file output enabled) would not restore the fileOutputEnabled=true state if another logger had disabled it. This is a pre-existing design concern made worse by the new code path. Document that these are global settings shared across all Logger instances.

5. mchloggelf/udp.go:117 — 🟡 Attention · Bugs / Logic Error

In sendChunked, if sending chunk i fails, the function returns an error immediately but the previously sent chunks (0 to i-1) have already been transmitted. The receiver will receive a partial message that can never be reassembled. While there is no way to 'recall' UDP datagrams, it would be worth documenting this behavior and considering whether a retry or a 'best effort with full failure reporting' strategy is preferable. At minimum, the error message should indicate that partial data may have been sent.

6. mchloggelf/gelf.go:101 — 🟡 Attention · Bugs / Logic Error

In NewGELFMessage, when contentToMap returns an error (line 102), the function falls back to using subject as short_message and returns (msg, nil) — swallowing the error. However, errLog (the application error) is also silently dropped in this path; the errLog != nil block on lines 125-127 is only reached when contentToMap succeeds. If the content is, say, a custom struct type, the application error attached to the log call will be silently lost. Move the errLog attachment before the early return on line 105.

7. mchloggelf/gelf.go:95 — 🟡 Attention · Performance / Blocking I/O in Hot Path

os.Hostname() performs a syscall on every single log message. In high-throughput logging scenarios this is a measurable overhead. The hostname is static for the lifetime of the process — cache it once at package init or in a sync.Once and reuse the cached value in NewGELFMessage.

8. logger.go:118 — 🟢 Suggestion · Architecture / Tight Coupling

SetUDPTarget and SetUDPTargetWithOptions are both thin wrappers that delegate to mchlogcore.SetUDPTarget. They add no value beyond a slightly different API surface, and the comment on SetUDPTarget says 'Must be called before Initialize()' while the Initialize() method itself also calls mchlogcore.SetUDPTarget from env vars. This creates a confusing dual-configuration path. Consider consolidating into a single options struct (e.g., LoggerOptions) passed to NewLogger or Initialize, or at least documenting that calling SetUDPTarget after Initialize() will replace the UDP target.

9. mchloggelf/udp_test.go:172 — 🟢 Suggestion · Bugs / Logic Error

The chunking test reads chunks in a loop until a read deadline error, relying on the 2-second deadline to terminate the loop. This makes the test artificially slow in the happy path. After verifying len(chunks) < 2, a better approach would be to know the expected chunk count in advance (based on the known payload size and maxChunkDataSize) and read exactly that many datagrams, failing fast if any read times out.

10. mchloggelf/gelf.go:176 — 🟢 Suggestion · Bugs / Unhandled Error

In the toString fallback, json.Marshal(v) error is silently discarded with _. If marshaling fails (e.g., for a cyclic structure or channel type), string(b) will return an empty string, silently losing the field value. Consider returning fmt.Sprintf("%v", v) as a fallback when marshaling fails.

Suggestions

  • The temporary mutex unlock inside closeLocked (releasing udpMu to wait on udpDone) is architecturally fragile. A cleaner pattern would be to capture the channel and done-signal locally before releasing the lock, then wait outside the lock entirely — similar to how startWorkerLocked captures local references. This eliminates the re-lock and the associated TOCTOU window.
  • Cache os.Hostname() at package init time in mchloggelf/gelf.go to avoid repeated syscalls on every log message.
  • The package-level global state in mchlogcore (udpTransport, udpChan, fileOutputEnabled) means that multiple Logger instances share UDP and file-output configuration. This should be prominently documented, and ideally the README example should warn against creating multiple Logger instances with different UDP targets.
  • Consider adding an integration test that verifies the full pipeline: Logger.Initialize()LogSubject → UDP receive, to catch regressions in the wiring between logger.go, mchlogcore, and mchloggelf.
  • The closeLocked function assumes it is always called with udpMu held as a write lock, but this invariant is not enforced or documented with a comment at the function signature. Add a prominent // MUST be called with udpMu held for writing. comment and consider adding a //nolint:... or assertion pattern for clarity.

// Release the lock while waiting for the worker to drain,
// so LogSubject is not blocked. The worker uses its own captured
// channel reference, so this is safe.
udpMu.Unlock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Logic Error


The closeLocked function unlocks udpMu mid-execution (line 98) and then re-locks it (line 100), but the function's callers (both SetUDPTarget at line 52 and CloseUDP at line 118) use defer udpMu.Unlock(). After closeLocked returns, the deferred unlock will attempt to unlock a mutex that is currently held — which is correct — but the temporary unlock creates a window where another goroutine calling SetUDPTarget can acquire the write lock, modify udpTransport/udpChan, and then closeLocked re-acquires and proceeds to set udpChan = nil and udpDone = nil on what may now be a different (newly started) channel. This can silently discard the new worker's channels. The unlock/re-lock pattern inside a function that is called under a lock is extremely fragile. Consider instead signaling the worker to stop without holding the lock, or using a separate 'draining' state flag so the worker can be waited on without releasing the outer lock.


// Hold a read lock for the duration of the channel send to prevent
// the channel from being closed between the nil check and the send.
udpMu.RLock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Race Condition


There is still a TOCTOU race between reading udpChan under RLock (lines 139-141) and sending to the captured ch variable (line 150). After udpMu.RUnlock() on line 141, closeLocked can run (acquiring the write lock), close the channel, and set udpChan = nil. The local variable ch still holds a reference to the now-closed channel. The subsequent case ch <- msg: at line 150 will then panic with 'send on closed channel'. The read lock must be held for the entire duration of the send — from the nil check through the completion of the channel send — to prevent closeLocked from closing the channel while a send is in progress. Move udpMu.RUnlock() to after the select block (or use defer udpMu.RUnlock() before the send).

mchlogcorev1.MchLog.LogSubject(subject, content, errLog, ascendStackFrame...)
} else {
mchlogcorev2.MchLog.LogSubject(subject, content, errLog, ascendStackFrame...)
if fileOutputEnabled.Load() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


When fileOutputEnabled is false, file logging is completely skipped. However, mchlogcore.InitializeMchLog (called by Logger.Initialize()) still creates the log directories and file handles. This is a minor inconsistency but could confuse operators who set MCHLOG_FILE_OUTPUT=false expecting zero file I/O, only to find directories/files created. Consider conditionally calling InitializeMchLog based on fileOutputEnabled, or documenting this behavior.

}

// Check environment variable for file output
if v := os.Getenv(EnvFileOutput); strings.ToLower(v) == "false" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


SetFileOutput(false) is called before InitializeMchLog (line 87). However, if fileOutputEnabled is set to false here, but Initialize() then calls mchlogcore.InitializeMchLog(...) unconditionally, file structures are still initialized. Additionally, SetFileOutput affects a package-level global, meaning calling Initialize() on a second Logger instance (with file output enabled) would not restore the fileOutputEnabled=true state if another logger had disabled it. This is a pre-existing design concern made worse by the new code path. Document that these are global settings shared across all Logger instances.

// Payload
chunk = append(chunk, data[start:end]...)

if _, err := t.conn.Write(chunk); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


In sendChunked, if sending chunk i fails, the function returns an error immediately but the previously sent chunks (0 to i-1) have already been transmitted. The receiver will receive a partial message that can never be reassembled. While there is no way to 'recall' UDP datagrams, it would be worth documenting this behavior and considering whether a retry or a 'best effort with full failure reporting' strategy is preferable. At minimum, the error message should indicate that partial data may have been sent.

}

// Parse content into a map to extract fields
contentMap, err := contentToMap(content)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


In NewGELFMessage, when contentToMap returns an error (line 102), the function falls back to using subject as short_message and returns (msg, nil) — swallowing the error. However, errLog (the application error) is also silently dropped in this path; the errLog != nil block on lines 125-127 is only reached when contentToMap succeeds. If the content is, say, a custom struct type, the application error attached to the log call will be silently lost. Move the errLog attachment before the early return on line 105.

msg := &GELFMessage{
Version: "1.1",
Host: hostname,
Timestamp: float64(time.Now().UnixNano()) / 1e9,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Performance / Blocking I/O in Hot Path


os.Hostname() performs a syscall on every single log message. In high-throughput logging scenarios this is a measurable overhead. The hostname is static for the lifetime of the process — cache it once at package init or in a sync.Once and reuse the cached value in NewGELFMessage.

// SetUDPTarget configures the logger to send GELF messages via UDP to the given address.
// The address should be in "host:port" format (e.g., "graylog.example.com:12201").
// GZIP compression is enabled by default. Must be called before Initialize().
func (l *Logger) SetUDPTarget(address string) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Architecture / Tight Coupling


SetUDPTarget and SetUDPTargetWithOptions are both thin wrappers that delegate to mchlogcore.SetUDPTarget. They add no value beyond a slightly different API surface, and the comment on SetUDPTarget says 'Must be called before Initialize()' while the Initialize() method itself also calls mchlogcore.SetUDPTarget from env vars. This creates a confusing dual-configuration path. Consider consolidating into a single options struct (e.g., LoggerOptions) passed to NewLogger or Initialize, or at least documenting that calling SetUDPTarget after Initialize() will replace the UDP target.

}

// Read chunks
listener.SetReadDeadline(time.Now().Add(2 * time.Second))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


The chunking test reads chunks in a loop until a read deadline error, relying on the 2-second deadline to terminate the loop. This makes the test artificially slow in the happy path. After verifying len(chunks) < 2, a better approach would be to know the expected chunk count in advance (based on the known payload size and maxChunkDataSize) and read exactly that many datagrams, failing fast if any read times out.

case string:
return s
default:
b, _ := json.Marshal(v)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Unhandled Error


In the toString fallback, json.Marshal(v) error is silently discarded with _. If marshaling fails (e.g., for a cyclic structure or channel type), string(b) will return an empty string, silently losing the field value. Consider returning fmt.Sprintf("%v", v) as a fallback when marshaling fails.

@ramon-gaudium
Copy link
Copy Markdown
Author

/mole review

Copy link
Copy Markdown

@mole-reviewer mole-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mole Review

🟡 75/100

🐭 Mole dug deep into this PR! Found 7 issues to review. Score: 75/100

Issues

1. mchlogcore/mchlog.go:100 — 🔴 Critical · Bugs / Logic Error

closeLocked calls <-udpDone (line 103) while holding the write lock on udpMu (the caller holds it via defer udpMu.Unlock()). The worker goroutine in startWorkerLocked calls transport.Send(msg)t.mu.Lock() without ever acquiring udpMu, so that part is fine. However, the log.Printf call inside the worker goroutine (log.Printf("[mchlog] failed to send GELF message via UDP: %v", err)) is not the issue. The real concern: if any future code path inside the worker goroutine's loop ever tries to acquire udpMu (e.g., calling a function that locks it), <-udpDone will deadlock. More immediately: closeLocked is called from SetUDPTarget (which holds write lock) — the <-udpDone wait will block until the worker drains and exits. If the worker's send blocks for a long time (e.g., slow UDP write), the write lock is held for the entire duration, blocking ALL readers (LogSubject) and all other callers of SetUDPTarget/CloseUDP. Consider closing the channel and releasing the lock before waiting on udpDone, capturing done locally before unlocking.

2. mchloggelf/gelf.go:54 — 🟢 Suggestion · Performance / Missing Cache

MarshalJSON allocates a new map[string]any (named fields) on every call. For high-throughput logging, this generates significant GC pressure. Consider pre-computing the fixed GELF fields or using a struct with encoding/json tags for the non-extra fields, and only building the combined map when extra fields are present. Alternatively, use json.RawMessage or a custom encoder that writes directly to a buffer.

3. mchloggelf/udp.go:105 — 🟢 Suggestion · Performance / Missing Cache

Inside sendChunked, a new chunk slice is allocated per iteration (make([]byte, 0, chunkHeaderSize+end-start)). For messages requiring many chunks, this is up to 128 allocations. Consider allocating a single chunk buffer of size maxChunkSize before the loop and reusing it by re-slicing (reset length to 0) each iteration.

4. mchloggelf/udp_test.go:49 — 🟢 Suggestion · Bugs / Unhandled Error

listener.SetReadDeadline(...) returns an error that is silently ignored throughout the test file (lines 49, 104, 172). While a test-only concern, if SetReadDeadline fails, the test will block indefinitely on ReadFromUDP. Use t.Helper() and check: if err := listener.SetReadDeadline(...); err != nil { t.Fatalf(...) }.

5. logger.go:135 — 🟡 Attention · Bugs / Logic Error

The doc comment for DisableFileOutput states: 'When disabled before Initialize(), no log directories or files are created.' This is now true given the updated InitializeMchLog guard. However, the comment on Close() (line 140) does not mention that Close() only closes UDP resources — it does not flush or close any file handles. If file logging uses buffered writers, Close() could silently lose buffered log entries. Verify (and document) whether the file backends need their own Close/Flush, or clarify that file handles are not managed by Close().

6. mchloggelf/gelf_test.go:50 — 🟢 Suggestion · Bugs / Logic Error

The test calls os.Hostname() a second time (line 50) to validate msg.Host, but getHostname() uses a sync.Once that caches the result. If hostnameOnce has already fired (e.g., from a prior test in the same process), but the hostname changes between process start and this test call (unlikely but possible in some CI setups), the cached value and the fresh os.Hostname() call can diverge. More practically: since cachedHostname is a package-level variable initialized via sync.Once, tests that want to verify the hostname caching behavior cannot reset it between runs. This is a test isolation issue; consider exposing a test hook or using the exported getHostname() directly rather than a second os.Hostname() call.

7. mchloggelf/gelf.go:108 — 🟢 Suggestion · Bugs / Logic Error

float64(time.Now().UnixNano()) / 1e9 loses sub-nanosecond precision but more importantly can lose microsecond precision for large UnixNano values due to float64 mantissa limitations (float64 has 53 bits of mantissa; UnixNano is ~63 bits in 2024). For timestamps after ~2262 this overflows, but even now precision is limited to ~microseconds. Consider using float64(time.Now().Unix()) + float64(time.Now().Nanosecond())/1e9 or storing seconds and fractional seconds separately: sec := time.Now().Unix(); nsec := time.Now().Nanosecond(); ts := float64(sec) + float64(nsec)/1e9. The GELF spec recommends this approach.

Suggestions

  • The closeLocked function blocks under the write lock while waiting for the worker goroutine to drain (<-udpDone). This means all LogSubject callers (which need RLock) are blocked during shutdown. Consider a pattern where you capture ch and done locally, set the globals to nil, release the lock, then close ch and wait on done outside the lock. This requires careful sequencing but prevents the lock being held during potentially slow drain.
  • Consider adding an integration test that exercises SetUDPTargetLogSubjectCloseUDP under -race flag to catch any remaining concurrency issues not covered by unit tests.
  • The README example at line 90 shows logger.SetUDPTarget(...) called before logger.Initialize(), but the doc comment for SetUDPTarget in logger.go (line 115-123) does not mention this ordering requirement (unlike the removed comment that said 'Must be called before Initialize()'). Since Initialize() also calls mchlogcore.SetUDPTarget from env vars, calling SetUDPTarget before Initialize() will result in it being called twice if MCHLOG_UDP_TARGET is also set. Document the precedence (env var wins if set, since it's applied in Initialize() after SetUDPTarget).
  • The addr field in UDPTransport is stored but never used after construction (the connection is established via net.DialUDP and stored in conn). Consider removing the addr field to reduce struct size and avoid confusion.
  • For observability, consider adding a metrics counter or callback for dropped messages (when the UDP buffer is full and the default branch fires in LogSubject). Currently these are only logged via log.Printf, which itself goes to stderr and could be missed in production.

// LogSubject goroutine is mid-send (they hold RLock during send), so closing the
// channel is safe. The worker goroutine uses captured references and does not
// need the mutex, so waiting on udpDone under the write lock cannot deadlock.
func closeLocked() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical · Bugs / Logic Error


closeLocked calls <-udpDone (line 103) while holding the write lock on udpMu (the caller holds it via defer udpMu.Unlock()). The worker goroutine in startWorkerLocked calls transport.Send(msg)t.mu.Lock() without ever acquiring udpMu, so that part is fine. However, the log.Printf call inside the worker goroutine (log.Printf("[mchlog] failed to send GELF message via UDP: %v", err)) is not the issue. The real concern: if any future code path inside the worker goroutine's loop ever tries to acquire udpMu (e.g., calling a function that locks it), <-udpDone will deadlock. More immediately: closeLocked is called from SetUDPTarget (which holds write lock) — the <-udpDone wait will block until the worker drains and exits. If the worker's send blocks for a long time (e.g., slow UDP write), the write lock is held for the entire duration, blocking ALL readers (LogSubject) and all other callers of SetUDPTarget/CloseUDP. Consider closing the channel and releasing the lock before waiting on udpDone, capturing done locally before unlocking.


// MarshalJSON implements custom JSON marshaling that merges standard GELF fields
// with extra fields prefixed with underscore, as required by the GELF spec.
func (m *GELFMessage) MarshalJSON() ([]byte, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Performance / Missing Cache


MarshalJSON allocates a new map[string]any (named fields) on every call. For high-throughput logging, this generates significant GC pressure. Consider pre-computing the fixed GELF fields or using a struct with encoding/json tags for the non-extra fields, and only building the combined map when extra fields are present. Alternatively, use json.RawMessage or a custom encoder that writes directly to a buffer.

end = len(data)
}

chunk := make([]byte, 0, chunkHeaderSize+end-start)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Performance / Missing Cache


Inside sendChunked, a new chunk slice is allocated per iteration (make([]byte, 0, chunkHeaderSize+end-start)). For messages requiring many chunks, this is up to 128 allocations. Consider allocating a single chunk buffer of size maxChunkSize before the loop and reusing it by re-slicing (reset length to 0) each iteration.


// Read from listener
buf := make([]byte, maxChunkSize)
listener.SetReadDeadline(time.Now().Add(2 * time.Second))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Unhandled Error


listener.SetReadDeadline(...) returns an error that is silently ignored throughout the test file (lines 49, 104, 172). While a test-only concern, if SetReadDeadline fails, the test will block indefinitely on ReadFromUDP. Use t.Helper() and check: if err := listener.SetReadDeadline(...); err != nil { t.Fatalf(...) }.

// DisableFileOutput disables file-based log output.
// When called, logs are only sent via UDP (if configured).
// This is a global setting shared across all Logger instances.
// When disabled before Initialize(), no log directories or files are created.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


The doc comment for DisableFileOutput states: 'When disabled before Initialize(), no log directories or files are created.' This is now true given the updated InitializeMchLog guard. However, the comment on Close() (line 140) does not mention that Close() only closes UDP resources — it does not flush or close any file handles. If file logging uses buffered writers, Close() could silently lose buffered log entries. Verify (and document) whether the file backends need their own Close/Flush, or clarify that file handles are not managed by Close().

if msg.Host == "" {
t.Error("host should never be empty (GELF spec requires it)")
}
hostname, _ := os.Hostname()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


The test calls os.Hostname() a second time (line 50) to validate msg.Host, but getHostname() uses a sync.Once that caches the result. If hostnameOnce has already fired (e.g., from a prior test in the same process), but the hostname changes between process start and this test call (unlikely but possible in some CI setups), the cached value and the fresh os.Hostname() call can diverge. More practically: since cachedHostname is a package-level variable initialized via sync.Once, tests that want to verify the hostname caching behavior cannot reset it between runs. This is a test isolation issue; consider exposing a test hook or using the exported getHostname() directly rather than a second os.Hostname() call.

msg := &GELFMessage{
Version: "1.1",
Host: getHostname(),
Timestamp: float64(time.Now().UnixNano()) / 1e9,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


float64(time.Now().UnixNano()) / 1e9 loses sub-nanosecond precision but more importantly can lose microsecond precision for large UnixNano values due to float64 mantissa limitations (float64 has 53 bits of mantissa; UnixNano is ~63 bits in 2024). For timestamps after ~2262 this overflows, but even now precision is limited to ~microseconds. Consider using float64(time.Now().Unix()) + float64(time.Now().Nanosecond())/1e9 or storing seconds and fractional seconds separately: sec := time.Now().Unix(); nsec := time.Now().Nanosecond(); ts := float64(sec) + float64(nsec)/1e9. The GELF spec recommends this approach.

@ramon-gaudium
Copy link
Copy Markdown
Author

/mole review

Copy link
Copy Markdown

@mole-reviewer mole-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mole Review

🔴 67/100

🐭 Mole dug deep into this PR! Found 9 issues to review. Score: 67/100

Issues

1. mchlogcore/mchlog.go:221 — 🟡 Attention · Bugs / Logic Error

When fileOutputEnabled is false, the versionName variable stays as "V1" (its default value, set on line 220) regardless of currentVersion. The subsequent log call (after this block) will always print "V1" even if currentVersion == V2 and file output is disabled. The versionName assignment inside the else branch (line 225) is only reached when file output is enabled. Move versionName assignment outside the fileOutputEnabled guard, or replicate the version name logic unconditionally before the guard.

2. mchlogcore/mchlog.go:78 — 🟡 Attention · Bugs / Logic Error

In SetUDPTarget, between Phase 2 (waitAndClose) and Phase 4 (acquiring the write lock), another concurrent SetUDPTarget call could also reach Phase 4 and install a new worker. The inner detach on lines 78-84 then tears down that newly-installed worker before the current call installs its own. This means the newly created transport t (built in Phase 3 of this call) is installed while the concurrently-created transport is discarded — but the concurrently-created transport is closed via innerTransport.Close() correctly. The subtle issue is that innerDone is waited on while holding the write lock (line 80: <-innerDone). Because the write lock is held, LogSubject cannot acquire RLock and cannot enqueue new messages, so the worker drains only what is already buffered — which is correct and bounded. However, this is a blocking wait under a write lock and can stall all logging for the entire drain period. Consider documenting this explicitly, or accept the trade-off as intentional since concurrent SetUDPTarget calls are expected to be rare.

3. mchloggelf/gelf_test.go:75 — 🟢 Suggestion · Bugs / Logic Error

The timestamp bounds check msg.Timestamp < now-5 || msg.Timestamp > now+1 uses now computed after NewGELFMessage was called. On a loaded system, if NewGELFMessage took more than 1 second (unlikely but possible in CI under load), msg.Timestamp > now+1 would always be false anyway since msg.Timestamp < now. The upper bound check is logically inverted — a timestamp from before now can never exceed now+1. The intent is probably to check msg.Timestamp > now (i.e., it was not set in the future), which the current check does correctly via the lower bound. No change needed for correctness, but consider adding a comment explaining the ±window.

4. mchloggelf/udp_test.go:153 — 🟢 Suggestion · Bugs / Logic Error

The large payload is created as make([]byte, 10000) filled with 'A', then used as msg.ShortMessage = string(largePayload). However, the actual JSON-encoded GELF message will be significantly larger than 10000 bytes due to JSON escaping overhead and the other GELF fields. The test relies on the final marshaled output exceeding maxChunkSize (8192 bytes) to trigger chunking. This currently works because 10000 bytes of 'A' in a JSON string is well over 8192, but it would be cleaner to document the expected chunk count or assert it precisely rather than just len(chunks) < 2.

5. logger.go:83 — 🟡 Attention · Bugs / Logic Error

The environment variable MCHLOG_FILE_OUTPUT is only checked for the value "false" (case-insensitive). Any other value — including "0", "no", "off" — is silently treated as true (file output enabled). This is inconsistent with how some users may expect boolean env vars to behave. The doc comment and README only mention "true"/"false", so this is acceptable, but it should be explicitly documented (or "0" should also be accepted) to avoid operator confusion.

6. mchlogcore/mchlog.go:183 — 🟡 Attention · Performance / Blocking I/O in Hot Path

mchloggelf.NewGELFMessage (including contentToMap with JSON unmarshaling and map allocation) is called inside LogSubject while holding udpMu.RLock(). JSON unmarshaling can be slow for large payloads. Since RLock is held to prevent the channel from being closed mid-send, any slow NewGELFMessage call extends the duration the read lock is held, delaying any concurrent SetUDPTarget or CloseUDP that needs the write lock. Consider building the GELF message before acquiring the lock, then doing a quick nil-check + send under the lock. You'd need to handle the race where udpChan becomes nil between message creation and the lock, but this is safe since the default case in the select already handles a full buffer, and a closed-channel send is protected by the lock. Example: build message first, then udpMu.RLock(), check nil, send or drop, udpMu.RUnlock().

7. mchloggelf/udp.go:43 — 🟡 Attention · Bugs / Resource Leak

In NewUDPTransport, if net.DialUDP succeeds but the function later panics (or a future code addition between DialUDP and the return fails), the conn would leak. This is not currently a problem but could become one if error paths are added. More concretely: the current code is fine, but it's worth noting that conn is not closed on any error path after line 43 — there are no error paths after it, so this is actually safe as written. No action needed.

8. mchloggelf/gelf.go:69 — 🟢 Suggestion · Bugs / Logic Error

The GELF spec prohibits _id as an extra field (the code skips k == "id" which becomes _id). However, it does not check for keys that already start with an underscore in m.Extra — such a key k would become __k (double underscore) after the "_"+k prefix is applied. While this is technically allowed by GELF, it may produce unexpected field names if callers pass extra keys with leading underscores. Consider documenting this behavior or stripping leading underscores from extra keys before prefixing.

9. mchlogcore/mchlog.go:56 — 🟡 Attention · Bugs / Logic Error

In SetUDPTarget, Phase 1 detaches the old worker (closes its channel, sets udpChan = nil) and releases the lock. Phase 3 creates a new transport via mchloggelf.NewUDPTransport (a network operation). If this network call fails, the function returns the error and exits — leaving udpChan = nil and udpTransport = nil permanently. This means after a failed SetUDPTarget call, UDP logging is silently disabled with no indication. The caller receives the error, but if they ignore it (as noted in previously-reported issue #9 for the env-var path), logging silently stops. Consider documenting that a failed SetUDPTarget disables UDP output until a successful call is made.

Suggestions

  • Consider adding an integration test that exercises the full path: Logger.Initialize() with MCHLOG_UDP_TARGET set, sends a log message, and verifies receipt on a UDP listener. The current tests cover the mchloggelf package in isolation but not the mchlogcore orchestration layer.
  • The detachWorkerLocked + waitAndClose two-phase pattern is a significant improvement over the previous design. Consider adding a brief architecture comment at the top of mchlog.go explaining the locking protocol (write lock for detach/install, read lock held during send) to help future contributors understand why the pattern is structured this way.
  • The gzipCompress function creates a new bytes.Buffer and gzip.Writer on every call. For high-throughput scenarios, consider using a sync.Pool of gzip.Writer instances to reduce allocations, similar to how the standard library handles this internally.
  • The README example on line 90 shows logger.SetUDPTarget(...) called before logger.Initialize(), but the comment on SetUDPTarget (logger.go line 116) says nothing about ordering relative to Initialize(). The old comment 'Must be called before Initialize()' was removed. Clarify in the doc comment whether SetUDPTarget can be called after Initialize() and what the effect is (it can, and it will replace the existing target).

} else {
versionName = "V2"
mchlogcorev2.InitializeMchLog(path)
if fileOutputEnabled.Load() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


When fileOutputEnabled is false, the versionName variable stays as "V1" (its default value, set on line 220) regardless of currentVersion. The subsequent log call (after this block) will always print "V1" even if currentVersion == V2 and file output is disabled. The versionName assignment inside the else branch (line 225) is only reached when file output is enabled. Move versionName assignment outside the fileOutputEnabled guard, or replicate the version name logic unconditionally before the guard.

// worker, shut it down first (last caller wins). This wait is bounded:
// the write lock prevents new sends, so the worker only drains what is
// already in the buffer.
innerDone, innerTransport := detachWorkerLocked()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


In SetUDPTarget, between Phase 2 (waitAndClose) and Phase 4 (acquiring the write lock), another concurrent SetUDPTarget call could also reach Phase 4 and install a new worker. The inner detach on lines 78-84 then tears down that newly-installed worker before the current call installs its own. This means the newly created transport t (built in Phase 3 of this call) is installed while the concurrently-created transport is discarded — but the concurrently-created transport is closed via innerTransport.Close() correctly. The subtle issue is that innerDone is waited on while holding the write lock (line 80: <-innerDone). Because the write lock is held, LogSubject cannot acquire RLock and cannot enqueue new messages, so the worker drains only what is already buffered — which is correct and bounded. However, this is a blocking wait under a write lock and can stall all logging for the entire drain period. Consider documenting this explicitly, or accept the trade-off as intentional since concurrent SetUDPTarget calls are expected to be rare.

}

// Timestamp should be recent
now := float64(time.Now().UnixNano()) / 1e9
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


The timestamp bounds check msg.Timestamp < now-5 || msg.Timestamp > now+1 uses now computed after NewGELFMessage was called. On a loaded system, if NewGELFMessage took more than 1 second (unlikely but possible in CI under load), msg.Timestamp > now+1 would always be false anyway since msg.Timestamp < now. The upper bound check is logically inverted — a timestamp from before now can never exceed now+1. The intent is probably to check msg.Timestamp > now (i.e., it was not set in the future), which the current check does correctly via the lower bound. No change needed for correctness, but consider adding a comment explaining the ±window.

defer transport.Close()

// Create a large message that will require chunking
largePayload := make([]byte, 10000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


The large payload is created as make([]byte, 10000) filled with 'A', then used as msg.ShortMessage = string(largePayload). However, the actual JSON-encoded GELF message will be significantly larger than 10000 bytes due to JSON escaping overhead and the other GELF fields. The test relies on the final marshaled output exceeding maxChunkSize (8192 bytes) to trigger chunking. This currently works because 10000 bytes of 'A' in a JSON string is well over 8192, but it would be cleaner to document the expected chunk count or assert it precisely rather than just len(chunks) < 2.

}

// Check environment variable for file output
if v := os.Getenv(EnvFileOutput); strings.ToLower(v) == "false" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


The environment variable MCHLOG_FILE_OUTPUT is only checked for the value "false" (case-insensitive). Any other value — including "0", "no", "off" — is silently treated as true (file output enabled). This is inconsistent with how some users may expect boolean env vars to behave. The doc comment and README only mention "true"/"false", so this is acceptable, but it should be explicitly documented (or "0" should also be accepted) to avoid operator confusion.

return
}

msg, err := mchloggelf.NewGELFMessage(subject, content, errLog)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Performance / Blocking I/O in Hot Path


mchloggelf.NewGELFMessage (including contentToMap with JSON unmarshaling and map allocation) is called inside LogSubject while holding udpMu.RLock(). JSON unmarshaling can be slow for large payloads. Since RLock is held to prevent the channel from being closed mid-send, any slow NewGELFMessage call extends the duration the read lock is held, delaying any concurrent SetUDPTarget or CloseUDP that needs the write lock. Consider building the GELF message before acquiring the lock, then doing a quick nil-check + send under the lock. You'd need to handle the race where udpChan becomes nil between message creation and the lock, but this is safe since the default case in the select already handles a full buffer, and a closed-channel send is protected by the lock. Example: build message first, then udpMu.RLock(), check nil, send or drop, udpMu.RUnlock().

return nil, fmt.Errorf("failed to resolve UDP address %q: %w", address, err)
}

conn, err := net.DialUDP("udp", nil, addr)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Resource Leak


In NewUDPTransport, if net.DialUDP succeeds but the function later panics (or a future code addition between DialUDP and the return fails), the conn would leak. This is not currently a problem but could become one if error paths are added. More concretely: the current code is fine, but it's worth noting that conn is not closed on any error path after line 43 — there are no error paths after it, so this is actually safe as written. No action needed.


// Extra fields with underscore prefix
for k, v := range m.Extra {
if k == "id" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion · Bugs / Logic Error


The GELF spec prohibits _id as an extra field (the code skips k == "id" which becomes _id). However, it does not check for keys that already start with an underscore in m.Extra — such a key k would become __k (double underscore) after the "_"+k prefix is applied. While this is technically allowed by GELF, it may produce unexpected field names if callers pass extra keys with leading underscores. Consider documenting this behavior or stripping leading underscores from extra keys before prefixing.

func SetUDPTarget(address string, compress bool) error {
// Phase 1: under write lock, close the channel and capture references
// to the old worker so we can wait on it without holding the lock.
udpMu.Lock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Attention · Bugs / Logic Error


In SetUDPTarget, Phase 1 detaches the old worker (closes its channel, sets udpChan = nil) and releases the lock. Phase 3 creates a new transport via mchloggelf.NewUDPTransport (a network operation). If this network call fails, the function returns the error and exits — leaving udpChan = nil and udpTransport = nil permanently. This means after a failed SetUDPTarget call, UDP logging is silently disabled with no indication. The caller receives the error, but if they ignore it (as noted in previously-reported issue #9 for the env-var path), logging silently stops. Consider documenting that a failed SetUDPTarget disables UDP output until a successful call is made.

@mateusmetzker
Copy link
Copy Markdown
Contributor

/mole deep-review

Copy link
Copy Markdown

@mole-reviewer mole-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mole Review

🟡 85/100

🐭 A toupeira cavou fundo nesse PR! Encontrou 3 problemas para revisar. Score: 85/100

Diagram

sequenceDiagram
    participant App as Aplicação
    participant Logger as Logger
    participant Core as mchlogcore
    participant GELF as mchloggelf
    participant Worker as UDP Worker (goroutine)
    participant UDP as Servidor GELF/UDP

    App->>Logger: Initialize()
    Logger->>Core: SetUDPTarget(address, compress)
    Core->>GELF: NewUDPTransport(address, compress)
    GELF-->>Core: *UDPTransport
    Core->>Core: startWorkerLocked()
    Core-->>Worker: goroutine iniciada

    App->>Logger: Info(content)
    Logger->>Core: LogSubject("info", content, nil)
    Core->>Core: fileOutputEnabled.Load()?
    alt File habilitado
        Core->>Core: mchlogcorev1/v2.LogSubject()
    end
    Core->>Core: udpMu.RLock()
    Core->>GELF: NewGELFMessage(subject, content, err)
    GELF-->>Core: *GELFMessage
    Core->>Worker: udpChan <- msg
    Core->>Core: udpMu.RUnlock()
    Worker->>GELF: transport.Send(msg)
    GELF->>GELF: MarshalJSON + gzipCompress
    GELF->>UDP: conn.Write(data)

    App->>Logger: Close()
    Logger->>Core: CloseUDP()
    Core->>Core: udpMu.Lock()
    Core->>Core: detachWorkerLocked() → close(udpChan)
    Core->>Core: udpMu.Unlock()
    Core->>Worker: range ch termina
    Worker->>Worker: close(done)
    Core->>GELF: transport.Close()
    GELF->>GELF: conn.Close()
Loading

Issues

1. mchlogcore/mchlog.go:183 — 🟡 Atenção · Performance / Blocking I/O in Hot Path

NewGELFMessage (que inclui JSON unmarshal, alocações de map, etc.) é executado dentro do escopo do udpMu.RLock(). Isso significa que toda chamada a LogSubject que envia via UDP mantém o read lock durante a criação da mensagem GELF, atrasando qualquer SetUDPTarget ou CloseUDP concorrente que precisa do write lock. Considere construir a mensagem GELF antes de adquirir o RLock, e sob o lock fazer apenas o nil-check + send. Nota: este ponto já foi mencionado (#38), mas a severidade aqui é reforçada porque o lock é mantido por toda a operação de marshaling, não apenas o send.

2. mchlogcore/mchlog.go:56 — 🟡 Atenção · Bugs / Race Condition

Entre a Phase 1 (udpMu.Unlock() na linha 58) e a Phase 4 (udpMu.Lock() na linha 71), qualquer chamada a LogSubject verá udpChan == nil e silenciosamente descartará mensagens. Este é um período de 'blackout' de UDP durante a troca de target, que inclui uma operação de rede (NewUDPTransport na linha 65) e a espera pelo drain do worker antigo (waitAndClose na linha 62). Em ambientes de alta taxa de log, mensagens serão perdidas durante esse intervalo. Considere documentar esse comportamento ou usar um padrão que mantenha o worker antigo ativo até que o novo esteja pronto (hot-swap).

3. mchloggelf/udp.go:138 — 🟡 Atenção · Performance / Missing Cache

gzipCompress aloca um novo bytes.Buffer e um novo gzip.Writer em cada chamada. No caminho quente de logging com compressão habilitada, isso gera pressão significativa no GC. Considere usar sync.Pool para reutilizar buffers e writers GZIP.

return
}

msg, err := mchloggelf.NewGELFMessage(subject, content, errLog)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Atenção · Performance / Blocking I/O in Hot Path


NewGELFMessage (que inclui JSON unmarshal, alocações de map, etc.) é executado dentro do escopo do udpMu.RLock(). Isso significa que toda chamada a LogSubject que envia via UDP mantém o read lock durante a criação da mensagem GELF, atrasando qualquer SetUDPTarget ou CloseUDP concorrente que precisa do write lock. Considere construir a mensagem GELF antes de adquirir o RLock, e sob o lock fazer apenas o nil-check + send. Nota: este ponto já foi mencionado (#38), mas a severidade aqui é reforçada porque o lock é mantido por toda a operação de marshaling, não apenas o send.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramon-gaudium Vê esse comentário

func SetUDPTarget(address string, compress bool) error {
// Phase 1: under write lock, close the channel and capture references
// to the old worker so we can wait on it without holding the lock.
udpMu.Lock()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Atenção · Bugs / Race Condition


Entre a Phase 1 (udpMu.Unlock() na linha 58) e a Phase 4 (udpMu.Lock() na linha 71), qualquer chamada a LogSubject verá udpChan == nil e silenciosamente descartará mensagens. Este é um período de 'blackout' de UDP durante a troca de target, que inclui uma operação de rede (NewUDPTransport na linha 65) e a espera pelo drain do worker antigo (waitAndClose na linha 62). Em ambientes de alta taxa de log, mensagens serão perdidas durante esse intervalo. Considere documentar esse comportamento ou usar um padrão que mantenha o worker antigo ativo até que o novo esteja pronto (hot-swap).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramon-gaudium Vê esse comentário

}

// gzipCompress compresses data using GZIP.
func gzipCompress(data []byte) ([]byte, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Atenção · Performance / Missing Cache


gzipCompress aloca um novo bytes.Buffer e um novo gzip.Writer em cada chamada. No caminho quente de logging com compressão habilitada, isso gera pressão significativa no GC. Considere usar sync.Pool para reutilizar buffers e writers GZIP.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ramon-gaudium Vê esse comentário

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants