Intro: Text-to AI generated image Quality Evaluator
Used DL/CV Techniques: CLIP Vision Transformer, Resnet50
Tech-stack used: Pytorch and Python
Metrics: SRCC, PLCC
What is done during the project: Done literative review, we replicated results of a research paper with our code and improved the results with slight modification in methodology
Dataset used: AGIQA‑3k contains around 3,000 AI-generated images along with their text prompts and human-labeled scores for two aspects: perceptual quality and text–image alignment. For each image–prompt pair, human annotators provide MOS-style scores.
For perceptual quality between image and patch embeddings with 11 text-variants and their similarity measurement was converted to scores by MLP head and
For image-text alignment between image and text-prompt, patch and text-prompt and their similarity measurement was converted to scores by MLP head.
The scores was then compared with ground truth values.
During inference, SRCC and PLCC was used as measure if low poor quality else high quality images.
For more info check out code and report.