FAN (Foundation Encoders Are All You Need for Preference-Aware Personalization)

This repository hosts the official implementation of:

Hyungjin Kim, Seokho Ahn, and Young-Duk Seo, Foundation Encoders Are All You Need for Preference-Aware Personalization, CVPR 2026 [cvpr] [supp]

News

[2026.05.23]: CVPR paper and supplementary materials released
[2026.03.20]: Repository created

Introduction

FAN enables preference-aware personalization using only foundation encoders, without additional structures or fine-tuning. By reconstructing the self-attention mechanism of transformer-based encoders, FAN integrates user preferences while preserving target fidelity. It works seamlessly with OpenCLIP and Google T5 across Stable Diffusion V1/XL/V3 and FLUX in text-to-image (T2I) diffusion models, and naturally extends to multimodal retrieval, image-conditioned generation, vision-language understanding, and group- and brand-level conditioning without any modification.

FAN consists of three key components: (a) Tailored profiling to precisely identify user preferences; (b) Personalized attention to integrate these profiles into the conditioning process; and (c) Conditioning optimization to synthesize high-quality personalized results while preserving target queries.

Performance

Qualitative results

Parameter comparison

FAN achieves personalization without any additional trainable parameters, unlike existing methods that rely on large-scale LLMs or auxiliary adapters.

Quick Start

FAN is designed for easy use with the diffusers and transformers libraries.

Setup

pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub

Usage

T2I diffusion models

import torch

from diffusers import FluxPipeline
from fan import personalized_t2i_encoder

# Load pipeline and FAN
pipeline = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype = torch.bfloat16).to("cuda")
fan = personalized_t2i_encoder(pipeline)

# Generate personalized images
with torch.no_grad():
    cond, pool_cond = fan(
    	"A photograph of an astronaut riding a horse",
    	["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
    	weight = [1.0],
    	alpha = 0.4
    )
    images = pipeline(
    	prompt_embeds = cond.type(pipeline.dtype),
    	pooled_prompt_embeds = pool_cond.type(pipeline.dtype) if pool_cond is not None else pool_cond
    ).images

images[0].save("personalized_image.png")

unCLIP

import torch

from diffusers import StableUnCLIPImg2ImgPipeline
from diffusers.utils import load_image
from fan import FAN

face1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png"
face2 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png"

size = (512, 512)
target = load_image(face1).resize(size)
ref = load_image(face2).resize(size)

# Load pipeline and FAN
pipeline = StableUnCLIPImg2ImgPipeline.from_pretrained("sd2-community/stable-diffusion-2-1-unclip", torch_dtype = torch.float16).to("cuda")
fan = FAN(pipeline.image_encoder, pipeline.feature_extractor)

# Generate personalized images
with torch.no_grad():
    cond = fan.get_image_feature(target, ref, weight = [1.0], alpha = 0.5)
    images = pipeline(image_embeds = cond).images

images[0].save("personalized_image.png")

OpenCLIP model

from transformers import CLIPModel, CLIPProcessor
from fan import FAN

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
fan = FAN(model, processor, decoder = "./weight/L.pth") #The decoder uses only OpenCLIP text encoders.

More applications: See inference.ipynb

Key Parameters

Parameter	Description	Value
`prompt`	Target prompt	String
`ref`	Reference prompts	List of strings
`alpha`	Personalization degree	Float (0–1)
`weight`	Per-reference preference intensity	List of floats
`sample_size`	Sampling ratio for user profiling	Float (0–1)

Supported foundation T2I models

FAN works with a wide variety of foundation T2I models that uses text encoders with pretrained weights:

Architecture	Pipeline	Text encoder	Weight
Stable Diffusion V1	`runwayml/stable-diffusion-v1-5`, `prompthero/openjourney-v4`, `stablediffusionapi/realistic-vision-v51`, `stablediffusionapi/deliberate-v2`, `stablediffusionapi/anything-v5`, `WarriorMama777/AbyssOrangeMix2`, ...	`openai/clip-vit-large-patch14`	`L.pth`
Stable Diffusion XL	`stabilityai/stable-diffusion-xl-base-1.0`, ...	`openai/clip-vit-large-patch14`, `laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`	`L.pth`, `bigG.pth`
Stable Diffusion V3	`stabilityai/stable-diffusion-3.5-large`, `stabilityai/stable-diffusion-3.5-medium`, ...	`openai/clip-vit-large-patch14`, `laion/CLIP-ViT-bigG-14-laion2B-39B-b160k`, `google/t5-v1_1-xxl`	`L.pth`, `bigG.pth`
FLUX	`black-forest-labs/FLUX.1-dev`, ...	`openai/clip-vit-large-patch14`, `google/t5-v1_1-xxl`	`L.pth`

Other applications

Multimodal retrieval (CLIP retrieval)

Image-conditioned generation (unCLIP)

Vision-language understanding

Group- and brand-level generation

Degree of personalization

Citation

@InProceedings{kim2026fan,
    author    = {Kim, Hyungjin and Ahn, Seokho and Seo, Young-Duk},
    title     = {Foundation Encoders Are All You Need for Preference-Aware Personalization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
asset		asset
fan		fan
image		image
weight		weight
LICENSE		LICENSE
inference.ipynb		inference.ipynb
inference.py		inference.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAN (Foundation Encoders Are All You Need for Preference-Aware Personalization)

News

Introduction

Performance

Qualitative results

Parameter comparison

Quick Start

Setup

Usage

Key Parameters

Supported foundation T2I models

Other applications

Degree of personalization

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FAN (Foundation Encoders Are All You Need for Preference-Aware Personalization)

News

Introduction

Performance

Qualitative results

Parameter comparison

Quick Start

Setup

Usage

Key Parameters

Supported foundation T2I models

Other applications

Degree of personalization

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages