The most comprehensive fragrance database available — 146,400+ structured records across six interconnected CSV files with 23 language translations, plus 4.9M+ user-generated content rows in Apache Parquet companion datasets covering user reviews, editorial news articles, and community discussions. A second fragrance database (Parfumo, 219,963 perfumes) is also available in the bundle — see "Also Available: Parfumo Database" below.
Keywords: fragrance database · perfume dataset · Fragrantica · Parfumo · cross-database · record linkage · user reviews · perfume reviews · fragrance news · perfumery articles · cosmetics dataset · multilingual perfume data · scent recommendation · fragrance recommender system · perfume sentiment analysis · perfumer profiles · accord taxonomy · notes pyramid · NLP fragrance corpus · 23 languages
FragDB provides structured data for the fragrance industry:
| File | Records | Fields | Description |
|---|---|---|---|
fragrances.csv |
132,858 | 30 | Main fragrance database |
brands.csv |
7,927 | 54 | Brand profiles + translations |
perfumers.csv |
3,005 | 42 | Perfumer profiles + translations |
notes.csv |
2,550 | 55 | Fragrance notes + translations |
accords.csv |
92 | 27 | Accords + translations |
translations.csv |
34 | 25 | Vocabulary: gender & voting labels × 23 languages |
This repository ships Fragrantica data (132K perfumes, 23 languages). A second fragrance database — Parfumo (219,963 perfumes, English-only) — is also available as part of the bundle.
- Free sample: see
samples/parfumo/(6 CSV with 10 fully-populated perfumes + reference closure) - Cross-walk between F and P: 80,968 matched perfume pairs in
samples/cross/matched_pairs_sample.csv(first 30 shown — full set in bundle) - Column equivalence:
samples/cross/field_equivalence_map.csv— schema-level F ↔ P relation (206 rows) - Full Parfumo data + cross-walk: fragdb.net (Cross-Source bundle tier $400+)
| File | Rows | Cols | Description |
|---|---|---|---|
samples/parfumo/perfumes.csv |
10 | 34 | Master catalog (10 fully-populated rows) |
samples/parfumo/brands.csv |
6 | 12 | Brand catalog |
samples/parfumo/perfumers.csv |
11 | 11 | Perfumer catalog |
samples/parfumo/notes.csv |
60 | 11 | Notes catalog |
samples/parfumo/notes_categories.csv |
79 | 6 | Hierarchical taxonomy (P-only) |
samples/parfumo/accords.csv |
18 | 4 | Accord catalog |
- Fragrantica: refreshed 2026-06-10 (v5.6)
- Parfumo: 1.1 (last data prep 2026-05-26)
- Cross-walk last refreshed: 2026-05-26 (Phase 2B Fellegi-Sunter, quarterly rerun cadence; ~99% precision)
Cross-walk artifacts allow joining F and P at perfume / brand / perfumer / note / accord level. Methodology + usage notes: samples/cross/README.md. Schema-level join semantics: samples/cross/field_equivalence_map.csv (206 rows: same_format / diff_format / partial_overlap / F-only / P-only).
- 23 languages — English + 22 translations for all labels, note names, accords, countries, statuses
- Relational structure — Files linked via unique IDs
- Rich fragrance data — Notes pyramid, accords, ratings, votes
- Brand profiles — Logo, country, website, parent company (country/activity translated)
- Perfumer profiles — Photo, status, company, education, biography (status translated)
- Notes reference — 2,550 notes with translations, Latin names, groups, odor profiles
- Accords reference — Display colors + translated names
- Translation vocabulary — 34 entries for gender and voting labels
- Pipe-delimited CSV — Easy parsing, UTF-8 encoded
import pandas as pd
# Load all files
fragrances = pd.read_csv('fragrances.csv', sep='|', encoding='utf-8')
brands = pd.read_csv('brands.csv', sep='|', encoding='utf-8')
notes = pd.read_csv('notes.csv', sep='|', encoding='utf-8')
translations = pd.read_csv('translations.csv', sep='|', encoding='utf-8')
# Join fragrances with brands
fragrances['brand_id'] = fragrances['brand'].str.split(';').str[1]
df = fragrances.merge(brands, left_on='brand_id', right_on='id', suffixes=('', '_brand'))
# Translate gender to any language
trans = translations.set_index('id')
df['gender_ru'] = df['gender'].map(lambda x: trans.loc[x, 'ru'] if x in trans.index else x)
# Brand country in Japanese
print(df[['name', 'name_brand', 'country_ja', 'gender_ru']].head())const { parse } = require('csv-parse/sync');
const fs = require('fs');
// Load files
const fragrances = parse(fs.readFileSync('fragrances.csv', 'utf-8'), { columns: true, delimiter: '|' });
const brands = parse(fs.readFileSync('brands.csv', 'utf-8'), { columns: true, delimiter: '|' });
const translations = parse(fs.readFileSync('translations.csv', 'utf-8'), { columns: true, delimiter: '|' });
// Build lookup maps
const brandsMap = new Map(brands.map(b => [b.id, b]));
const transMap = new Map(translations.map(t => [t.id, t]));
// Get fragrance with translated fields
const frag = fragrances[0];
const [brandName, brandId] = frag.brand.split(';');
const brand = brandsMap.get(brandId);
const genderRu = transMap.get(frag.gender)?.ru || frag.gender;
console.log(`${frag.name} by ${brandName} (${brand?.country_ru}), ${genderRu}`);-- Import
COPY fragrances FROM 'fragrances.csv' DELIMITER '|' CSV HEADER ENCODING 'UTF8';
COPY brands FROM 'brands.csv' DELIMITER '|' CSV HEADER ENCODING 'UTF8';
COPY translations FROM 'translations.csv' DELIMITER '|' CSV HEADER ENCODING 'UTF8';
-- Join and translate gender to Russian
SELECT f.name, b.name AS brand, b.country_ru, t.ru AS gender_ru
FROM fragrances f
JOIN brands b ON SPLIT_PART(f.brand, ';', 2) = b.id
JOIN translations t ON f.gender = t.id;See DATA_DICTIONARY.md for complete field documentation.
FragDB ships with three Apache Parquet datasets containing 4.9 million rows of user-generated content and editorial coverage — the largest publicly-organized corpus of fragrance reviews and perfumery journalism. Use them for NLP, sentiment analysis, recommendation systems, market research, or training language models on fragrance-specific text.
The world's largest collection of structured fragrance reviews. Every entry includes the perfume ID (joinable with fragrances.csv), author username, posting date, full review text, avatar URL, and language code.
- 4,643,851 user reviews spanning every major perfume on Fragrantica
- 23 languages — English (1.69M reviews), Russian, Portuguese, Spanish, Korean, Turkish, Japanese, Polish, Italian, Hungarian, Serbian, Swedish, German, Hebrew, Ukrainian, French, Arabic, Greek, Czech, Chinese, Romanian, Mongolian, Dutch
- Coverage: 70.6% of all fragrances in the database have at least one review (93,305 of 132,160 PIDs)
- Deterministic global primary key — stable comment IDs survive re-scrapes
- Zero duplicate rows, zero foreign key orphans against
fragrances.csv.pid - Independent UGC per language — each language is genuine localized content, not machine translation
- 8 fields:
pid,lang,comment_id,author,date,text,avatar_url,gradient_class - PyArrow large_string format — combined corpus exceeds 32-bit string offset limit
Use cases: sentiment analysis · review classification · recommendation systems · perfume similarity from text · language detection benchmark · multilingual NLP training corpus · fragrance market research · author network analysis · trend detection by language
Two decades of professional fragrance journalism from Fragrantica's editorial team. Every article includes title, author, full text (plain + HTML), category, related perfumes/brands/perfumers, publication date, and main image. Foreign keys to fragrances, brands, and perfumers make this a powerful resource for content-based recommendation and knowledge graph construction.
- 24,440 editorial articles from 2008 to 2026 — the complete public archive
- 30+ categories — top: New Fragrances (34.9%), Fragrance Reviews (22.8%), Niche Perfumery (10.4%), Designer Brands, Interviews, History, Industry News, Niche Houses, and more
- Bilingual storage —
text(plain) for NLP / search,text_html(preserved markup) for rich display - Linked entities —
related_pids[],related_brands[],related_perfumers[]as JSON arrays - 0% orphans over 119,662 PID references — clean foreign keys
- Modern + archived — 63.1% archived legacy articles, 36.9% modern fully-dated articles
- 16 fields:
nid,title,category,author,url,is_archived,date_unix,description,text,text_html,main_image,article_images,related_pids,related_brands,related_perfumers,comments_count - List fields stored as JSON-encoded strings — never null (empty =
"[]")
Use cases: content recommendation · article search engine · perfume knowledge graph · trend analysis · author influence study · category classification · entity linking · timeline analysis · industry research · niche perfumery research · fragrance journalism corpus
Community discussions attached to editorial articles, with threading support for replies. Joinable with news.parquet via nid.
- 263,798 threaded comments across 21,820 articles (89.3% of news articles have at least one comment)
- 4.9% reply rate — threaded conversations with reply detection (
is_replyflag) - 100% populated timestamps —
date_unixparsed for every comment - 9 fields:
nid,comment_id,is_reply,author,date,date_unix,text,avatar_url,gradient - Zero foreign key orphans against
news.parquet.nid
Use cases: community engagement analysis · threaded discussion mining · reply network construction · comment sentiment · author activity profiles · temporal analysis of community responses
The parquet datasets ship with all paid tiers except the $200 Core:
| Tier | CSV Core | Parquet Datasets |
|---|---|---|
| $200 One-Time Core | ✅ | ❌ |
| $400 One-Time Full Database | ✅ | ✅ |
| Annual Subscription | ✅ | ✅ (always latest) |
| Lifetime Access | ✅ | ✅ (always latest) |
See https://fragdb.net/#pricing for complete tier comparison.
This repository includes free parquet preview samples in samples/:
comments_sample.parquet— 25 user reviews (8 fields)news_sample.parquet— 20 editorial articles (16 fields)news_comments_sample.parquet— 20 threaded news comments (9 fields)SPEC.md— full field-by-field schema documentation (Apache Parquet)
import pyarrow.parquet as pq
import pandas as pd
# Read user reviews
reviews = pq.read_table('comments.parquet').to_pandas()
print(reviews.head())
print(f"Total reviews: {len(reviews):,}")
print(f"Languages: {reviews['lang'].nunique()}")
# Join with CSV fragrance metadata
fragrances = pd.read_csv('fragrances.csv', sep='|')
reviews_with_frag = reviews.merge(fragrances, on='pid', how='left')
# Read news articles
import json
news = pq.read_table('news.parquet').to_pandas()
# Parse JSON-encoded list fields
news['related_pids_list'] = news['related_pids'].apply(json.loads)
news['related_brands_list'] = news['related_brands'].apply(json.loads)
print(news[['nid', 'title', 'category', 'date_unix']].head())
# Read news comments and join with articles
news_comments = pq.read_table('news_comments.parquet').to_pandas()
discussion = news_comments.merge(news[['nid', 'title']], on='nid')
print(discussion[['nid', 'title', 'author', 'text']].head())Full schema, field types, and audit statistics are documented in SPEC.md.
- Perfumer transliterations expanded to 9 languages — added Hebrew, Greek, Mongolian (was 6: ru, uk, ja, zh, ko, ar)
perfumers.csv: 39 → 42 columns (3 new transliteration fields)- Data update: 130,086 → 130,949 fragrances (+863), 7,776 → 7,815 brands, 2,960 → 2,968 perfumers, 2,517 → 2,522 notes
- 23 languages — all labels, note names, accords, countries, statuses translated
- translations.csv — vocabulary file for gender values and voting labels
- Compact notes pyramid —
note_id,opacity,weight(name/icon via notes.csv JOIN) - Each note name variant (Rose, Damask Rose, Turkish Rose) has its own ID
- Gender & voting fields use translation IDs instead of English text
See DATA_DICTIONARY.md for complete field documentation with parsing examples.
The free sample includes 10 records per file across all six CSV files, plus parquet samples and SPEC.md:
| File | Records | Description |
|---|---|---|
| fragrances.csv | 10 | Iconic fragrances (30 fields) |
| brands.csv | 10 | Brand profiles (54 fields, 22 lang) |
| perfumers.csv | 10 | Perfumer profiles (42 fields, 22 lang + 9 translit) |
| notes.csv | 10 | Fragrance notes (55 fields, 22 lang) |
| accords.csv | 10 | Accords with colors (27 fields, 22 lang) |
| translations.csv | 34 | Gender & voting vocabulary (full, 25 fields) |
| File | Records | Description |
|---|---|---|
| comments_sample.parquet | 25 | User reviews preview (8 fields) |
| news_sample.parquet | 20 | Editorial articles preview (16 fields) |
| news_comments_sample.parquet | 20 | News comments preview (9 fields) |
| SPEC.md | — | Parquet schema documentation |
Preview: SAMPLE_PREVIEW.md
- DATA_DICTIONARY.md — Complete field documentation with parsing examples
- CHANGELOG.md — Version history
- E-commerce — Enrich product listings with detailed fragrance data, notes, accords
- Mobile Apps — Build fragrance collection managers, scent discovery apps, perfume catalog apps
- Data Analysis — Analyze fragrance industry trends by brand, country, perfumer, year
- Recommendations — Content-based or collaborative filtering systems using accord/note vectors
- Content Creation — Power blogs, videos, fragrance reviews with accurate data
- Multilingual UIs — Localized perfume catalogs in 23 languages out of the box
- Knowledge Graphs — Brand → Perfumer → Fragrance → Notes → Accords graph construction
- Market Research — Country-of-origin analysis, parent company portfolios, perfumer productivity stats
- NLP & Sentiment Analysis — Train models on 4.6M multilingual fragrance reviews
- Recommender Systems — Hybrid models combining CSV structure with review text similarity
- Language Models — Domain-specific corpus for fragrance/perfumery LLM fine-tuning
- Review Classification — Identify positive/negative reviews, fake review detection
- Trend Detection — News article timeline analysis, emerging fragrance trends
- Author Networks — Identify influential reviewers, perfumery journalists, community leaders
- Content-Based Discovery — "Articles about this perfume" — JOIN news.related_pids with fragrances.pid
- Community Analytics — Reply networks, engagement metrics on editorial content
- Cross-Language Studies — Compare review sentiment across 23 languages for the same fragrance
- Search Engines — Full-text search across reviews, articles, and structured metadata
- Entity Resolution — Match journalist's
related_brands[]mentions withbrands.csvIDs - Knowledge Extraction — Mine 24K editorial articles for perfume facts, launch dates, perfumer interviews
The free sample contains 10 records per file. The full FragDB database includes:
| Feature | Free Sample | Full Database |
|---|---|---|
| Fragrances | 10 | 132,858 |
| Brands | 9 (referenced) | 7,927 |
| Perfumers | 15 (referenced) | 3,005 |
| Notes | 86 (referenced) | 2,550 |
| Accords | 32 (referenced) | 92 |
| Translations | 34 (full) | 34 |
| Languages | 23 | 23 |
| Total F Records | ~186 | 146,432 |
| Parfumo perfumes (in bundle) | 10 (samples/parfumo/) | 219,963 |
| Cross-walk F↔P pairs | 30 (samples/cross/) | 80,968 |
| Updates | None | Regular |
| Commercial Use | Yes (sample) | Yes (licensed) |
Contributions are welcome! See CONTRIBUTING.md for guidelines.
- Bug fixes for code examples
- New language examples
- Documentation improvements
- Use case additions
- Sample Data & Code: MIT License
- Full Database: Commercial license (see fragdb.net)
- Website: fragdb.net
- Kaggle: kaggle.com/datasets/eriklindqvist/fragdb-fragrance-database
- Hugging Face: huggingface.co/datasets/FragDBnet/fragrance-database
- Documentation: DATA_DICTIONARY.md
- Issues: GitHub Issues
Built with data passion by the FragDB team.





