Skip to content

Burf/FAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FAN (Foundation Encoders Are All You Need for Preference-Aware Personalization)

Framework: PyTorch Library: diffusers License: MIT

This repository hosts the official implementation of:

Hyungjin Kim, Seokho Ahn, and Young-Duk Seo, Foundation Encoders Are All You Need for Preference-Aware Personalization, CVPR 2026 [cvpr] [supp]

News

  • [2026.05.23]: CVPR paper and supplementary materials released
  • [2026.03.20]: Repository created

Introduction

FAN enables preference-aware personalization using only foundation encoders, without additional structures or fine-tuning. By reconstructing the self-attention mechanism of transformer-based encoders, FAN integrates user preferences while preserving target fidelity. It works seamlessly with OpenCLIP and Google T5 across Stable Diffusion V1/XL/V3 and FLUX in text-to-image (T2I) diffusion models, and naturally extends to multimodal retrieval, image-conditioned generation, vision-language understanding, and group- and brand-level conditioning without any modification.

FAN consists of three key components: (a) Tailored profiling to precisely identify user preferences; (b) Personalized attention to integrate these profiles into the conditioning process; and (c) Conditioning optimization to synthesize high-quality personalized results while preserving target queries.

Performance

Qualitative results

Parameter comparison

FAN achieves personalization without any additional trainable parameters, unlike existing methods that rely on large-scale LLMs or auxiliary adapters.

Quick Start

FAN is designed for easy use with the diffusers and transformers libraries.

Setup

pip install torch torchvision diffusers transformers accelerate safetensors huggingface-hub

Usage

  • T2I diffusion models
import torch

from diffusers import FluxPipeline
from fan import personalized_t2i_encoder

# Load pipeline and FAN
pipeline = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype = torch.bfloat16).to("cuda")
fan = personalized_t2i_encoder(pipeline)

# Generate personalized images
with torch.no_grad():
    cond, pool_cond = fan(
    	"A photograph of an astronaut riding a horse",
    	["A retro-futuristic space exploration movie poster with bold, vibrant colors"],
    	weight = [1.0],
    	alpha = 0.4
    )
    images = pipeline(
    	prompt_embeds = cond.type(pipeline.dtype),
    	pooled_prompt_embeds = pool_cond.type(pipeline.dtype) if pool_cond is not None else pool_cond
    ).images

images[0].save("personalized_image.png")
  • unCLIP
import torch

from diffusers import StableUnCLIPImg2ImgPipeline
from diffusers.utils import load_image
from fan import FAN

face1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png"
face2 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png"

size = (512, 512)
target = load_image(face1).resize(size)
ref = load_image(face2).resize(size)

# Load pipeline and FAN
pipeline = StableUnCLIPImg2ImgPipeline.from_pretrained("sd2-community/stable-diffusion-2-1-unclip", torch_dtype = torch.float16).to("cuda")
fan = FAN(pipeline.image_encoder, pipeline.feature_extractor)

# Generate personalized images
with torch.no_grad():
    cond = fan.get_image_feature(target, ref, weight = [1.0], alpha = 0.5)
    images = pipeline(image_embeds = cond).images

images[0].save("personalized_image.png")
  • OpenCLIP model
from transformers import CLIPModel, CLIPProcessor
from fan import FAN

model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")
fan = FAN(model, processor, decoder = "./weight/L.pth") #The decoder uses only OpenCLIP text encoders.

Key Parameters

Parameter Description Value
prompt Target prompt String
ref Reference prompts List of strings
alpha Personalization degree Float (0–1)
weight Per-reference preference intensity List of floats
sample_size Sampling ratio for user profiling Float (0–1)

Supported foundation T2I models

FAN works with a wide variety of foundation T2I models that uses text encoders with pretrained weights:

Architecture Pipeline Text encoder Weight
Stable Diffusion V1 runwayml/stable-diffusion-v1-5, prompthero/openjourney-v4,
stablediffusionapi/realistic-vision-v51, stablediffusionapi/deliberate-v2,
stablediffusionapi/anything-v5, WarriorMama777/AbyssOrangeMix2, ...
openai/clip-vit-large-patch14 L.pth
Stable Diffusion XL stabilityai/stable-diffusion-xl-base-1.0, ... openai/clip-vit-large-patch14,
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
L.pth,
bigG.pth
Stable Diffusion V3 stabilityai/stable-diffusion-3.5-large,
stabilityai/stable-diffusion-3.5-medium, ...
openai/clip-vit-large-patch14,
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k,
google/t5-v1_1-xxl
L.pth,
bigG.pth
FLUX black-forest-labs/FLUX.1-dev, ... openai/clip-vit-large-patch14,
google/t5-v1_1-xxl
L.pth

Other applications

  • Multimodal retrieval (CLIP retrieval)

  • Image-conditioned generation (unCLIP)

  • Vision-language understanding

  • Group- and brand-level generation

Degree of personalization

Citation

@InProceedings{kim2026fan,
    author    = {Kim, Hyungjin and Ahn, Seokho and Seo, Young-Duk},
    title     = {Foundation Encoders Are All You Need for Preference-Aware Personalization},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026}
}

About

Foundation Encoders Are All You Need for Preference-Aware Personalization, CVPR 2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors