Skip to content

Katago not reacting to commands #1152

@takosalad

Description

@takosalad

This is the OpenCL version, because latest Cuda version still requires Cuda v12 shared libraries while Cuda 13 was released (2 months before Katago 16.4 came out) and I cannot downgrade my Cuda installation for this.

$ ./katago runtests
Running rng and hash tests
Running fancy math tests
Running base64 tests
Running thread tests
Running board IO tests
Running board basic tests
Running board area tests
Running rules tests
Running board undo test
Running board handicap test
Running board stress test
Running sgf tests
Running basic symmetries tests
Running board symmetry tests
Running symmetry difference tests
Running board replay test
Not being run out of git repo, skipping config parsing tests
All tests passed

$ ./katago gtp -config default_gtp.cfg -model katago-network.bin.gz 
KataGo v1.16.4
Using TrompTaylor rules initially, unless GTP/GUI overrides this
version
quit

Seems to "start up" but does not react to any commands and requires ctrl+c termination to exit. As you can see I tried to issue "version" and "quit" commands.
Also, is it normal that there is no first-time start up tuning run? Just wondering.
The model is the "stable latest - recommended" katago network.

Trying to re-tune via "benchmark" command hangs at

$ ./katago benchmark -config default_gtp.cfg -model kata1-b28c512nbt-s12283775232-d5679728027.bin.gz
2026-01-25 13:48:15+0100: Running with following config:
allowResignation = true
lagBuffer = 1.0
logAllGTPCommunication = true
logDir = gtp_logs
logSearchInfo = true
logSearchInfoForChosenMove = false
logToStderr = false
maxTimePondering = 60.0
maxVisits = 500
numSearchThreads = 6
ponderingEnabled = false
resignConsecTurns = 3
resignThreshold = -0.90
rules = tromp-taylor
searchFactorAfterOnePass = 0.50
searchFactorAfterTwoPass = 0.25
searchFactorWhenWinning = 0.40
searchFactorWhenWinningThreshold = 0.95

2026-01-25 13:48:15+0100: Loading model and initializing benchmark...
2026-01-25 13:48:15+0100: Testing with default positions for board size: 19
2026-01-25 13:48:15+0100: nnRandSeed0 = 9546287206378380450
2026-01-25 13:48:15+0100: After dedups: nnModelFile0 = kata1-b28c512nbt-s12283775232-d5679728027.bin.gz useFP16 auto useNHWC auto
2026-01-25 13:48:15+0100: Initializing neural net buffer to be size 19 * 19 exactly

Running the above in strace shows this as final line:

futex(0x763de08a4878, FUTEX_WAIT_PRIVATE, 1, NULL

so katago suffers from some deadlock maybe.
GPU resources via nvidia-smi show almost no GPU/VRAM utilization so there is no low-resource problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions