SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

This is the official repository for our paper accepted to 2025 IEEE International Conference on Big Data: "SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space"
View on arXiv

Overview

This repository contains all code and scripts used in our experiments on simultaneous weight and architecture optimization of neural networks using a GPT2-based autoencoder framework. The method encodes MLPs into a universal latent space and enables fine-grained architecture discovery (e.g., neuron-level activation, layer width) and finds sparse, compact models with strong performance.

The workflow is divided into several stages:

Train the AutoEncoder on synthetic datasets of MLPs.
Train sparse MLPs on real-world or benchmark datasets (e.g., CORNN), using trained embedding space to optimize for performant and compact MLPs.
(Optional) Compression of networks and variable input/output size using the latent-space embedding to compress large networks.

Quick Start

Install dependencies:

pip install -r requirements.txt

Train autoencoder

Navigate to the autoencoder directory and train the model:

cd train_autoencoder
python main.py --cuda 0

We provide a pretrained AutoEncoder checkpoint to help reproduce our results. The checkpoint file is not hosted in this repository, but can be downloaded from the following Google Drive link: Download pretrain_ae.pth

After downloading, place the file under the checkpoints/ directory:

checkpoints/pretrain_ae.pth

You do not need to modify any code — all scripts (main.py, train_mlp.py, etc.) automatically load the checkpoint if it exists. If you’d prefer to retrain from scratch, simply remove the checkpoint or comment out the loading lines.

Train MLPs on Benchmark Dataset

Navigate to the autoencoder directory and train the model:

cd train_mlp
python train_mlp.py --cuda 0

Dataset

This training uses the CORNN dataset as the benchmark for functional approximation tasks.

We do not host the dataset in this repository. Please follow the instructions in the official CORNN GitHub repository to download and prepare the dataset.

In our code, we assume that the CORNN dataset is available as a Python module (e.g., lib/CORNN.py) in the codebase, or properly installed and imported.

(Discussion) Compression of networks and variable input/output size

This section explores how SWAT-NN can compress large MLPs into smaller subnetworks using the embedding-based optimization framework. We demonstrate this by decomposing a large pretrained MLP into two smaller subnetworks and then compressing them individually using a 4 hidden layer to 2 hidden layer AutoEncoder variant.

Navigate into compress_large_NN folder:

Generate a Large MLP

python generate_dataset.py

Generates a large MLP model and dataset.

Split into Two Subnetworks

python split_small_MLP.py

Splits the large model into two smaller subnetworks. Optional: These two steps can be skipped — we provide their checkpoints in compress_large_NN/checkpoints/.

Compress the Subnetworks

python main_compress.py --cuda 0

Uses SWAT-NN to compress the two subnetworks individually. Note: This compression uses a different AutoEncoder from the one used in earlier sections.

AutoEncoder Checkpoint (4-to-2 variant): Please download the pretrained 4-to-2 AutoEncoder checkpoint from the following pretrained_4_to_2_ae.pth and place it in the compress_large_NN/checkpoints/ folder.

Visualize Output Consistency

python scatter_plot.py

Generates scatter plots comparing predicted vs. ground truth outputs.

For questions, please contact Zitong Huang chuang95@usc.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
compress_large_NN		compress_large_NN
train_autoencoder		train_autoencoder
train_mlp		train_mlp
.DS_Store		.DS_Store
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Overview

Quick Start

Train autoencoder

Train MLPs on Benchmark Dataset

Dataset

(Discussion) Compression of networks and variable input/output size

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Overview

Quick Start

Train autoencoder

Train MLPs on Benchmark Dataset

Dataset

(Discussion) Compression of networks and variable input/output size

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages