Usage Reference

Config file

The most important part of the library is a user-defined config yaml file. It has five separate sections: training, pruning, quantization, finetuning, and fitcompress section, currently maintained by TensorFlow only, parameters. By default, the parameters in the config are the following:

Training parameters

The following table outlines the primary parameters used to configure the training process:

Field	Type	Default	Description
`epochs`	int	`200`	Total number of training epochs.
`fine_tuning_epochs`	int	`0`	Additional epochs for fine-tuning.
`pretraining_epochs`	int	`50`	Pretraining / warm-up epochs.
`rewind`	str	`"never"`	Weight rewinding policy.
`rounds`	int	`1`	Number of prune–fine-tune cycles.
`save_weights_epoch`	int	`-1`	Save checkpoint at this epoch (`-1` disables).

If you require additional parameters for the training or optimization loops, please define them directly in the config.yaml file.

Quantization parameters

Field	Type	Default	Description
`default_data_keep_negatives`	bool	`0`	Default k value for data quantization (0 = clamp negatives, 1 = keep).
`default_data_integer_bits`	int	`0`	Default integer bitwidth i for data quantization.
`default_data_fractional_bits`	int	`0`	Default fractional bitwidth f for data quantization.
`default_weight_keep_negatives`	bool	`0`	Default k value for weight quantization (0 or 1).
`default_weight_integer_bits`	int	`0`	Default integer bitwidth i for weight quantization.
`default_weight_fractional_bits`	int	`0`	Default fractional bitwidth f for weight quantization.
`quantize_input`	bool	`true`	Whether inputs to layers are quantized by default.
`quantize_output`	bool	`true`	Whether outputs of layers are quantized by default.
`enable_quantization`	bool	`true`	Global switch to enable or disable quantization.
`hgq_gamma`	float	`0.0`	HGQ regularization coefficient for bitwidth stability.
`hgq_beta`	float	`0.0`	HGQ loss coefficient scaling EBOPs.
`layer_specific`	dict	`{}`	Dictionary for per-layer quantization overrides.
`use_hgq`	bool	`false`	Enable or disable High Granularity Quantization (HGQ).
`use_real_tanh`	bool	`false`	Use a real `tanh` instead of hard/approximate `tanh`.
`overflow_mode_data`	str	`"SAT"`	Overflow handling mode for input and output quantizers(`SAT`, `SAT_SYM`, `WRAP`, `WRAP_SM`).
`overflow_mode_parameters`	str	`"SAT"`	Overflow handling mode for weight and biases quantizers(`SAT`, `SAT_SYM`, `WRAP`, `WRAP_SM`).
`round_mode`	str	`"RND"`	Rounding mode (`TRN`, `RND`, `RND_CONV`, `RND_ZERO`, etc.).
`use_relu_multiplier`	bool	`true`	Enable a learned bit-shift multiplier inside ReLU layers.

Fine-tuning parameters

Field	Type	Default	Description
`experiment_name`	str	`"experiment_1"`	Name of the study.
`model_name`	str	`"resnet18"`	Model architecture name.
`sampler`	str	`GridSampler`	Sampler selection for the search space.
`num_trials`	int	`0`	Number of trials.
`hyperparameter_search`	HyperparameterSearch	`{}`	Ranges for non-grid samplers.

Samplers

Field	Type	Default	Description
`type`	str	`"TPESampler"`	Sampler class name (e.g., `TPESampler`, `GridSampler`).
`params`	Dict[str, Any]	`{}`	Sampler-specific kwargs (e.g., `seed`, `search_space`).

More about samplers can be found in {optuna documentation}

HyperparameterSearch

Field	Type	Default	Description
`numerical`	Dict[str, List[Union[int, float]]]	`{}`	Numeric ranges `[low, high, step]`.
`categorical`	Optional[Dict[str, List[str]]]	`{}`	Categorical choices.

Pruning methods

PQuantML supports seven different pruning methods.

Method Overview

Method	Model
`cs`	`CSPruningModel`
`dst`	`DSTPruningModel`
`pdp`	`PDPPruningModel`
`wanda`	`WandaPruningModel`
`autosparse`	`AutoSparsePruningModel`
`activation_pruning`	`ActivationPruningModel`
`mdmm`	`MDMMPruningModel`

There are the parameters shared by all methods:

Field	Type	Default	Description
`disable_pruning_for_layers`	List[str]	`[]`	Layer names to exclude from pruning.
`enable_pruning`	bool	`true`	Master pruning on/off switch.
`threshold_decay`	float	`0.0`	Optional pruning threshold decay term.

Layer names in `disable_pruning_for_layers` field  must match your framework’s naming (e.g., Keras `layer.name`).

There are more details about every pruning method:

CS Pruning

Field	Type	Default	Description
`pruning_method`	str	`cs`	Selects this pruning schema.
`final_temp`	int	`200`	Target temperature at the end of the schedule.
`threshold_init`	int	`0`	Initial sparsification threshold.

DST Pruning

Field	Type	Default	Description
`pruning_method`	str	`dst`	Selects this pruning schema.
`alpha`	float	`5.0e-06`	Mask dynamics update coefficient.
`max_pruning_pct`	float	`0.99`	Upper bound on total pruning ratio.
`threshold_init`	float	`0.0`	Initial threshold value.
`threshold_type`	str	`"channelwise"`	Thresholding granularity.

PDP Pruning

Field	Type	Default	Description
`pruning_method`	str	`pdp`	Selects this pruning schema.
`epsilon`	float	`0.015`	Smoothing/regularization factor for gating.
`sparsity`	float	`0.8`	Target sparsity level (0–1).
`temperature`	float	`1.0e-05`	Annealing temperature.
`structured_pruning`	bool	`false`	Enable structured pruning.

Wanda Pruning

Field	Type	Default	Description
`pruning_method`	str	`wanda`	Selects this pruning schema.
`M`	Optional[int]	`null`	Optional grouping constant.
`N`	Optional[int]	`null`	Optional grouping constant.
`sparsity`	float	`0.9`	Target sparsity level (0–1).
`t_delta`	int	`100`	Window size / steps for stats collection.
`t_start_collecting_batch`	int	`100`	Warm-up steps before collecting statistics.
`calculate_pruning_budget`	bool	`true`	Auto-compute pruning budget from data.

Autosparse Pruning

Field	Type	Default	Description
`pruning_method`	str	`autosparse`	Selects this pruning schema.
`alpha`	float	`0.5`	Weight/penalty coefficient.
`alpha_reset_epoch`	int	`90`	Epoch at which `alpha` is reset/tuned.
`autotune_epochs`	int	`10`	Number of epochs in the tuning window.
`backward_sparsity`	bool	`false`	Apply sparsity in backward pass (if supported).
`threshold_init`	float	`-5.0`	Initial threshold (often in logit space).
`threshold_type`	str	`"channelwise"`	Thresholding granularity.

Activation Pruning

Field	Type	Default	Description
`pruning_method`	str	`activation_pruning`	Selects this pruning schema.
`threshold`	float	`0.3`	Activation magnitude cutoff.
`t_delta`	int	`50`	Steps used to aggregate statistics.
`t_start_collecting_batch`	int	`50`	Steps to skip before collecting statistics.

MDMM Pruning

Field	Type	Default	Description
`pruning_method`	str	`mdmm`	Selects this pruning schema.
`constraint_type`	ConstraintType	`"Equality"`	Constraint form: equality / ≤ / ≥.
`target_value`	float	`0.0`	Target value for the chosen metric.
`metric_type`	MetricType	`"UnstructuredSparsity"`	Specifies which metric is constrained.
`target_sparsity`	float	`0.9`	Target sparsity when constraining sparsity.
`rf`	int	`1`	Regularization / frequency parameter.
`epsilon`	float	`1.0e-03`	Feasibility tolerance.
`scale`	float	`10.0`	Penalty scaling for constraint violation.
`damping`	float	`1.0`	Damping term for numerical stability.
`use_grad`	bool	`false`	Use gradient information during updates.
`l0_mode`	`"coarse"` \| `"smooth"`	`"coarse"`	L0 approximation mode.
`scale_mode`	`"mean"` \| `"sum"`	`"mean"`	Aggregation mode for penalties.

Optionally, there is also FITCompress method implemented for PyTorch:

FitCompress method

Field	Type	Default	Description
`enable_fitcompress`	bool	`false`	Master switch that enables or disables FITCompress.
`optimize_quantization`	bool	`true`	Whether FITCompress searches over quantization bit-width candidates.
`quantization_schedule`	List[float]	`[7., 4., 3., 2.]`	Candidate bit-widths evaluated during quantization search.
`pruning_schedule`	dict	`{start: 0, end: -3, steps: 40}`	Logarithmic pruning curve (base 10) with defined start, end, and step count.
`compression_goal`	float	`0.10`	Target compression ratio for the search procedure.
`optimize_pruning`	bool	`false`	Whether FITCompress searches over pruning ratios.
`greedy_astar`	bool	`true`	Disable fallback in A* search: once a candidate is selected, all others discarded.
`approximate`	bool	`true`	Use Fisher Trace approximations to speed up FIT score estimation.
`f_lambda`	float	`1`	Multiplicative factor λ in the distance function (g + λf).

Quantization layers in PQuantML

PQConv*D: Convolutional layers.
PQAvgPool*D: Average pooling layers.
PQBatchNorm*D: BatchNorm layers.
PQDense: Linear layer.
PQActivation: Activation layers (ReLU, Tanh)

Currently, PQuantML supports two quantization modes: layer-wise fixed-point quantization, where each tensor uses a single
bit-width configuration, and High-Granularity Quantization (HGQ).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage Reference

Config file

Training parameters

Quantization parameters

Fine-tuning parameters

Samplers

HyperparameterSearch

Pruning methods

Method Overview

CS Pruning

DST Pruning

PDP Pruning

Wanda Pruning

Autosparse Pruning

Activation Pruning

MDMM Pruning

FitCompress method

Quantization layers in PQuantML

FilesExpand file tree

reference.md

Latest commit

History

reference.md

File metadata and controls

Usage Reference

Config file

Training parameters

Quantization parameters

Fine-tuning parameters

Samplers

HyperparameterSearch

Pruning methods

Method Overview

CS Pruning

DST Pruning

PDP Pruning

Wanda Pruning

Autosparse Pruning

Activation Pruning

MDMM Pruning

FitCompress method

Quantization layers in PQuantML