Skip to content

Latest commit

 

History

History
58 lines (51 loc) · 5.63 KB

File metadata and controls

58 lines (51 loc) · 5.63 KB

Model Setup

Instructions about editing model_spec.json

Fields of the network_structure object:

Field Remarks
type Network structure type, which can be one of the following values:
  transformer
    transformer.decoder_only
      transformer.llama
      transformer.bloom
    transformer.encoder_only
    transformer.encoder_decoder
normalization_function
activation_function The activation function used in the model. The following functions are supported in Inferflow so far,
  relu
  gelu
  silu
  gelu_tanh
position_embedding The position embedding algorithm used in the model. The following algorithms are supported in Inferflow so far,
  alibi
  rope
  sinusoidal
qk_column_order
qkv_format
normalize_lm_head Possible values: true, false
Whether to normalize the lm_head (or output.weight) tensor before performing any inference operations.
tensor_name_prefix The prefix of the tensor names
tensor_name_mapping Its value is a JSON object containing the mapping from the tensor names in the model file to standard tensor names.

 

Standard tensor names for a Transformer model:

Standard Name Remarks
enc.token_embeddings.weight The token embeddings tensor of the encoder
enc.{i}.self_attn.pre_norm.weight/.bias The weight and bias for the normalization operation before a self-attention layer of the encoder
enc.{i}.self_attn.qkv.weight/.bias Tensor qkv.weight is the concatenation of wq.weight, wk.weight, and wv.weight. Vector qkv.bias is the concatenation of wq.bias, wk.bias, and wv.bias.
enc.{i}.self_attn.wq.weight/.bias WQ and its corresponding bias vector in a self-attention layer of the encoder
enc.{i}.self_attn.wk.weight/.bias WK and its corresponding bias vector in a self-attention layer of the encoder
enc.{i}.self_attn.wv.weight/.bias WV and its corresponding bias vector in a self-attention layer of the encoder
enc.{i}.self_attn.wo.weight/.bias WO and its corresponding bias vector in a self-attention layer of the encoder
enc.{i}.feed_forward.pre_norm.weight/.bias The weight and bias for the normalization operation before a feed-forward layer of the encoder
enc.{i}.feed_forward.w1.weight/.bias The weight and bias of the gate projection layer in in a feed-forward layer of the encoder.
enc.{i}.feed_forward.w2.weight/.bias The weight and bias of the down projection layer in in a feed-forward layer of the encoder.
enc.{i}.feed_forward.w3.weight/.bias The weight and bias of the up projection layer in in a feed-forward layer of the encoder.
enc.{i}.feed_forward.post_norm.weight/.bias The weight and bias for the normalization operation after a feed-forward layer of the encoder
enc.output_norm.weight/.bias The weight and bias for the normalization operation performed on the overall output of the encoder
dec.token_embeddings.weight The token embeddings tensor of the decoder
dec.{i}.self_attn.pre_norm.weight/.bias The weight and bias for the normalization operation before a self-attention layer of the decoder
dec.{i}.self_attn.qkv.weight/.bias Tensor qkv.weight is the concatenation of wq.weight, wk.weight, and wv.weight. Vector qkv.bias is the concatenation of wq.bias, wk.bias, and wv.bias.
dec.{i}.self_attn.wq.weight/.bias WQ and its corresponding bias vector in a self-attention layer of the decoder
dec.{i}.self_attn.wk.weight/.bias WK and its corresponding bias vector in a self-attention layer of the decoder
dec.{i}.self_attn.wv.weight/.bias WV and its corresponding bias vector in a self-attention layer of the decoder
dec.{i}.self_attn.wo.weight/.bias WO and its corresponding bias vector in a self-attention layer of the decoder
dec.{i}.cross_attn.pre_norm.weight/.bias The weight and bias for the normalization operation before a cross-attention layer of the decoder
dec.{i}.cross_attn.qkv.weight/.bias Tensor qkv.weight is the concatenation of wq.weight, wk.weight, and wv.weight. Vector qkv.bias is the concatenation of wq.bias, wk.bias, and wv.bias.
dec.{i}.cross_attn.wq.weight/.bias WQ and its corresponding bias vector in a cross-attention layer of the decoder
dec.{i}.cross_attn.wk.weight/.bias WK and its corresponding bias vector in a cross-attention layer of the decoder
dec.{i}.cross_attn.wv.weight/.bias WV and its corresponding bias vector in a cross-attention layer of the decoder
dec.{i}.cross_attn.wo.weight/.bias WO and its corresponding bias vector in a cross-attention layer of the decoder
dec.{i}.feed_forward.pre_norm.weight/.bias The weight and bias for the normalization operation before a feed-forward layer of the decoder
dec.{i}.feed_forward.w1.weight/.bias The weight and bias of the gate projection layer in in a feed-forward layer of the decoder.
dec.{i}.feed_forward.w2.weight/.bias The weight and bias of the down projection layer in in a feed-forward layer of the decoder.
dec.{i}.feed_forward.w3.weight/.bias The weight and bias of the up projection layer in in a feed-forward layer of the decoder.
dec.{i}.feed_forward.post_norm.weight/.bias The weight and bias for the normalization operation after a feed-forward layer of the decoder
dec.output_norm.weight/.bias The weight and bias for the normalization operation performed on the overall output of the decoder
output.weight lm_head