I find the output channel of conv3_x/B_fc1 is 16 which is quite confusing. As sknet paper mentioned, the first fully connected layer(B_fc1)'s output channel d following the equation(4) which is
d = max(C/r, L),
where C is the input feature's channel number, r is the reduction factor(r=16) and L denotes the minimal value(L=32). According to the above equation, the conv3_x/B_fc1's output channel should equal to L which is 32(d = max(256/16, 32)).