This is the official repository for our paper accepted to 2025 IEEE International Conference on Big Data:
"SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space"
View on arXiv
This repository contains all code and scripts used in our experiments on simultaneous weight and architecture optimization of neural networks using a GPT2-based autoencoder framework. The method encodes MLPs into a universal latent space and enables fine-grained architecture discovery (e.g., neuron-level activation, layer width) and finds sparse, compact models with strong performance.
The workflow is divided into several stages:
- Train the AutoEncoder on synthetic datasets of MLPs.
- Train sparse MLPs on real-world or benchmark datasets (e.g., CORNN), using trained embedding space to optimize for performant and compact MLPs.
- (Optional) Compression of networks and variable input/output size using the latent-space embedding to compress large networks.
Install dependencies:
pip install -r requirements.txtNavigate to the autoencoder directory and train the model:
cd train_autoencoder
python main.py --cuda 0
We provide a pretrained AutoEncoder checkpoint to help reproduce our results. The checkpoint file is not hosted in this repository, but can be downloaded from the following Google Drive link: Download pretrain_ae.pth
After downloading, place the file under the checkpoints/ directory:
checkpoints/pretrain_ae.pth
You do not need to modify any code — all scripts (main.py, train_mlp.py, etc.) automatically load the checkpoint if it exists. If you’d prefer to retrain from scratch, simply remove the checkpoint or comment out the loading lines.
Navigate to the autoencoder directory and train the model:
cd train_mlp
python train_mlp.py --cuda 0
This training uses the CORNN dataset as the benchmark for functional approximation tasks.
We do not host the dataset in this repository. Please follow the instructions in the official CORNN GitHub repository to download and prepare the dataset.
In our code, we assume that the CORNN dataset is available as a Python module (e.g., lib/CORNN.py) in the codebase, or properly installed and imported.
This section explores how SWAT-NN can compress large MLPs into smaller subnetworks using the embedding-based optimization framework. We demonstrate this by decomposing a large pretrained MLP into two smaller subnetworks and then compressing them individually using a 4 hidden layer to 2 hidden layer AutoEncoder variant.
Navigate into compress_large_NN folder:
- Generate a Large MLP
python generate_dataset.py
Generates a large MLP model and dataset.
- Split into Two Subnetworks
python split_small_MLP.py
Splits the large model into two smaller subnetworks. Optional: These two steps can be skipped — we provide their checkpoints in compress_large_NN/checkpoints/.
- Compress the Subnetworks
python main_compress.py --cuda 0
Uses SWAT-NN to compress the two subnetworks individually. Note: This compression uses a different AutoEncoder from the one used in earlier sections.
AutoEncoder Checkpoint (4-to-2 variant): Please download the pretrained 4-to-2 AutoEncoder checkpoint from the following pretrained_4_to_2_ae.pth and place it in the compress_large_NN/checkpoints/ folder.
- Visualize Output Consistency
python scatter_plot.py
Generates scatter plots comparing predicted vs. ground truth outputs.
For questions, please contact Zitong Huang chuang95@usc.edu.