The stddev == sqrt(1/Fanin) comes from trying to keep the variance for each layer the same.
https://intoli.com/blog/neural-network-initialization/
I suspect that convnets are not training very well because I am doing naive weight initialization. I want to try to write down a similar formula for a LUTNet. to see if it can train better.