Skip to content

Add results for VN-MTEB#430

Open
BaoLocPham wants to merge 24 commits intoembeddings-benchmark:mainfrom
BaoLocPham:main
Open

Add results for VN-MTEB#430
BaoLocPham wants to merge 24 commits intoembeddings-benchmark:mainfrom
BaoLocPham:main

Conversation

@BaoLocPham
Copy link
Copy Markdown
Contributor

Checklist

  • My model has a model sheet, report, or similar
  • My model has a reference implementation in mteb/models/model_implementations/, this can be as an API.
  • The results submitted are obtained using the reference implementation
  • My model is available, either as a publicly accessible API or publicly on e.g., Huggingface
  • I solemnly swear that for all results submitted I have not trained on the evaluation dataset including training splits. If I have, I have disclosed it clearly.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Model Results Comparison

Reference models: intfloat/multilingual-e5-large, google/gemini-embedding-001
New models evaluated: AITeamVN/Vietnamese_Embedding, Alibaba-NLP/gte-Qwen2-7B-instruct, BAAI/bge-large-en-v1.5, BAAI/bge-m3, GreenNode/GreenNode-Embedding-E5-Large-VN-V1, GreenNode/GreenNode-Embedding-KaLM-Mini-Instruct-VN-V1, GreenNode/GreenNode-Embedding-Large-VN-Mixed-V1, GreenNode/GreenNode-Embedding-Large-VN-V1, IEITYuan/Yuan-embedding-2.0-en, Qwen/Qwen3-Embedding-0.6B, Qwen/Qwen3-Embedding-4B, SamilPwC-AXNode-GenAI/PwC-Embedding_expr, Snowflake/snowflake-arctic-embed-m-v1.5, VoVanPhuc/sup-SimCSE-VietNamese-phobert-base, athrael-soju/colqwen3.5-4.5B-v3, bflhc/Octen-Embedding-0.6B, bkai-foundation-models/vietnamese-bi-encoder, codefuse-ai/F2LLM-0.6B, codefuse-ai/F2LLM-v2-0.6B, codefuse-ai/F2LLM-v2-1.7B, codefuse-ai/F2LLM-v2-14B, codefuse-ai/F2LLM-v2-160M, codefuse-ai/F2LLM-v2-330M, codefuse-ai/F2LLM-v2-4B, codefuse-ai/F2LLM-v2-80M, codefuse-ai/F2LLM-v2-8B, contextboxai/halong_embedding, google/embeddinggemma-300m, infgrad/Jasper-Token-Compression-600M, intfloat/multilingual-e5-base, intfloat/multilingual-e5-large-instruct, intfloat/multilingual-e5-large, intfloat/multilingual-e5-large, intfloat/multilingual-e5-small, jinaai/jina-embeddings-v3, jinaai/jina-embeddings-v5-text-nano, jinaai/jina-embeddings-v5-text-small, microsoft/harrier-oss-v1-0.6b, microsoft/harrier-oss-v1-270m, microsoft/harrier-oss-v1-27b, minishlab/potion-base-32M, minishlab/potion-multilingual-128M, minishlab/potion-multilingual-128M, minishlab/potion-retrieval-32M, mteb/baseline-bm25s, nanovdr/NanoVDR-S-Multi, nomic-ai/nomic-embed-text-v1.5, perplexity-ai/pplx-embed-v1-0.6b, perplexity-ai/pplx-embed-v1-4b, qihoo360/Zhinao-ChineseModernBert-Embedding, sentence-transformers/LaBSE, sentence-transformers/all-MiniLM-L6-v2, sentence-transformers/all-mpnet-base-v2, sentence-transformers/paraphrase-multilingual-mpnet-base-v2, sentence-transformers/static-similarity-mrl-multilingual-v1, telepix/PIXIE-Rune-v1.0, voyageai/voyage-4-nano
Tasks: AFQMC, AILAStatutes, ARCChallenge, ATEC, AfriSentiClassification, AllegroReviews, AlloProfClusteringP2P, AlloProfClusteringS2S, AlloProfClusteringS2S.v2, AlloprofReranking, AlloprofRetrieval, AlphaNLI, AmazonCounterfactualClassification, AmazonCounterfactualVNClassification, AmazonPolarityClassification, AmazonPolarityVNClassification, AmazonReviewsClassification, AmazonReviewsVNClassification, AngryTweetsClassification, AppsRetrieval, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArguAna, ArguAna-Fa, ArguAna-Fa.v2, ArguAna-NL, ArguAna-NL.v2, ArguAna-VN, ArmenianParaphrasePC, ArxivClusteringP2P, AskUbuntuDupQuestions, AskUbuntuDupQuestions-VN, BIOSSES, BIOSSES-VN, BQ, BSARDRetrieval, BUCC, BUCC.v2, Banking77Classification, Banking77VNClassification, BelebeleRetrieval, BengaliSentimentAnalysis, BeytooteClustering, BibleNLPBitextMining, BigPatentClustering.v2, BiorxivClusteringP2P, BiorxivClusteringP2P.v2, BiorxivClusteringS2S, BlurbsClusteringP2P, BlurbsClusteringS2S, BornholmBitextMining, BrazilianToxicTweetsClassification, BrightAopsRetrieval, BrightBiologyLongRetrieval, BrightBiologyRetrieval, BrightEarthScienceLongRetrieval, BrightEarthScienceRetrieval, BrightEconomicsLongRetrieval, BrightEconomicsRetrieval, BrightLeetcodeRetrieval, BrightLongRetrieval, BrightPonyLongRetrieval, BrightPonyRetrieval, BrightPsychologyLongRetrieval, BrightPsychologyRetrieval, BrightRetrieval, BrightRoboticsLongRetrieval, BrightRoboticsRetrieval, BrightStackoverflowLongRetrieval, BrightStackoverflowRetrieval, BrightSustainableLivingLongRetrieval, BrightSustainableLivingRetrieval, BrightTheoremQAQuestionsRetrieval, BrightTheoremQATheoremsRetrieval, BuiltBenchClusteringP2P, BuiltBenchClusteringS2S, BuiltBenchReranking, BuiltBenchRetrieval, BulgarianStoreReviewSentimentClassfication, CBD, CDSC-E, CDSC-R, CEDRClassification, CExaPPC, CLSClusteringP2P, CLSClusteringP2P.v2, CLSClusteringS2S, CMedQAv1-reranking, CMedQAv2-reranking, COIRCodeSearchNetRetrieval, CQADupstack-NL, CQADupstackAndroid-NL, CQADupstackAndroid-VN, CQADupstackAndroidRetrieval, CQADupstackAndroidRetrieval-Fa, CQADupstackEnglish-NL, CQADupstackEnglishRetrieval, CQADupstackEnglishRetrieval-Fa, CQADupstackGaming-NL, CQADupstackGamingRetrieval, CQADupstackGamingRetrieval-Fa, CQADupstackGis-NL, CQADupstackGis-VN, CQADupstackGisRetrieval-Fa, CQADupstackMathematica-NL, CQADupstackMathematica-VN, CQADupstackMathematicaRetrieval-Fa, CQADupstackPhysics-NL, CQADupstackPhysics-VN, CQADupstackPhysicsRetrieval-Fa, CQADupstackProgrammers-NL, CQADupstackProgrammers-VN, CQADupstackProgrammersRetrieval-Fa, CQADupstackRetrieval-Fa, CQADupstackStats-NL, CQADupstackStats-VN, CQADupstackStatsRetrieval-Fa, CQADupstackTex-NL, CQADupstackTex-VN, CQADupstackTexRetrieval-Fa, CQADupstackUnix-NL, CQADupstackUnix-VN, CQADupstackUnixRetrieval, CQADupstackUnixRetrieval-Fa, CQADupstackWebmasters-NL, CQADupstackWebmasters-VN, CQADupstackWebmastersRetrieval-Fa, CQADupstackWordpress-NL, CQADupstackWordpress-VN, CQADupstackWordpressRetrieval-Fa, CSFDSKMovieReviewSentimentClassification, CTKFactsNLI, CUREv1, CataloniaTweetClassification, ChemHotpotQARetrieval, ChemNQRetrieval, ClimateFEVER, ClimateFEVER-Fa, ClimateFEVER-NL, ClimateFEVER-VN, ClimateFEVERHardNegatives, CmedqaRetrieval, Cmnli, CodeEditSearchRetrieval, CodeFeedbackMT, CodeFeedbackST, CodeSearchNetCCRetrieval, CodeSearchNetRetrieval, CodeTransOceanContest, CodeTransOceanDL, Core17InstructionRetrieval, CosQA, CovidDisinformationNLMultiLabelClassification, CovidRetrieval, CyrillicTurkicLangClassification, CzechProductReviewSentimentClassification, DBPedia, DBPedia-Fa, DBPedia-NL, DBPedia-VN, DBpediaClassification, DKHateClassification, DalajClassification, DanFeverRetrieval, DanishPoliticalCommentsClassification, DeepSentiPers, DeepSentiPers.v2, DiaBlaBitextMining, DigikalamagClassification, DigikalamagClustering, DuRetrieval, DutchBookReviewSentimentClassification.v2, DutchColaClassification, DutchGovernmentBiasClassification, DutchNewsArticlesClassification, DutchNewsArticlesClusteringP2P, DutchNewsArticlesClusteringS2S, DutchNewsArticlesRetrieval, DutchSarcasticHeadlinesClassification, ESCIReranking, EcomRetrieval, EightTagsClustering, EmotionClassification, EmotionVNClassification, EstonianValenceClassification, FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-VN, FEVERHardNegatives, FalseFriendsGermanEnglish, FaroeseSTS, FarsTail, FarsiParaphraseDetection, Farsick, FiQA2018, FiQA2018-Fa, FiQA2018-Fa.v2, FiQA2018-NL, FiQA2018-VN, FilipinoShopeeReviewsClassification, FinParaSTS, FinancialPhrasebankClassification, FloresBitextMining, GeoreviewClassification, GeoreviewClusteringP2P, GerDaLIR, GerDaLIRSmall, GermanDPR, GermanQuAD-Retrieval, GermanSTSBenchmark, GreekLegalCodeClassification, GreenNodeTableMarkdownRetrieval, GujaratiNewsClassification, HALClusteringS2S, HALClusteringS2S.v2, HUMEArxivClusteringP2P, HUMECore17InstructionReranking, HUMEEmotionClassification, HUMEMultilingualSentimentClassification, HUMENews21InstructionReranking, HUMERedditClusteringP2P, HUMERobust04InstructionReranking, HUMESIB200ClusteringS2S, HUMESICK-R, HUMESTS12, HUMESTS22, HUMESTSBenchmark, HUMEToxicConversationsClassification, HUMETweetSentimentExtractionClassification, HUMEWikiCitiesClustering, HUMEWikipediaRerankingMultilingual, HagridRetrieval, HamshahriClustring, HeadlineClassification, HellaSwag, HindiDiscourseClassification, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-VN, HotpotQAHardNegatives, IFlyTek, IN22ConvBitextMining, IN22GenBitextMining, IconclassClassification, IconclassClusteringS2S, ImdbClassification, ImdbVNClassification, InappropriatenessClassification, IndicCrosslingualSTS, IndicGenBenchFloresBitextMining, IndicLangClassification, IndonesianIdClickbaitClassification, IsiZuluNewsClassification, ItaCaseholdClassification, JDReview, JQaRAReranking, JSICK, JSTS, JaCWIRReranking, JaCWIRRetrieval, JaGovFaqsRetrieval, JapaneseSentimentClassification, JaqketRetrieval, KLUE-STS, KLUE-TC, KinopoiskClassification, Ko-StrategyQA, KorHateSpeechMLClassification, KorSTS, KorSarcasmClassification, KurdishSentimentClassification, LCQMC, LEMBPasskeyRetrieval, LanguageClassification, LccSentimentClassification, LeCaRDv2, LegalBenchConsumerContractsQA, LegalBenchCorporateLobbying, LegalQANLRetrieval, LegalQuAD, LivedoorNewsClustering.v2, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MKQARetrieval, MLQARetrieval, MLSUMClusteringP2P, MLSUMClusteringS2S, MMarcoReranking, MMarcoRetrieval, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-VN, MTOPDomainClassification, MTOPDomainVNClassification, MTOPIntentClassification, MTOPIntentVNClassification, MacedonianTweetSentimentClassification, MalayalamNewsClassification, MalteseNewsClassification, MasakhaNEWSClassification, MasakhaNEWSClusteringP2P, MasakhaNEWSClusteringS2S, MassiveIntentClassification, MassiveIntentVNClassification, MassiveScenarioClassification, MassiveScenarioVNClassification, MedicalQARetrieval, MedicalRetrieval, MedrxivClusteringP2P, MedrxivClusteringP2P.v2, MedrxivClusteringS2S.v2, MewsC16JaClustering, MindSmallReranking, MintakaRetrieval, MrTidyRetrieval, MultiEURLEXMultilabelClassification, MultiHateClassification, MultiLongDocReranking, MultiLongDocRetrieval, MultilingualSentiment, MultilingualSentimentClassification, NFCorpus, NFCorpus-Fa, NFCorpus-NL, NFCorpus-NL.v2, NFCorpus-VN, NLPJournalAbsArticleRetrieval.V2, NLPJournalAbsIntroRetrieval, NLPJournalAbsIntroRetrieval.V2, NLPJournalTitleAbsRetrieval, NLPJournalTitleAbsRetrieval.V2, NLPJournalTitleIntroRetrieval, NLPJournalTitleIntroRetrieval.V2, NLPTwitterAnalysisClassification, NLPTwitterAnalysisClassification.v2, NLPTwitterAnalysisClustering, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-VN, NTREXBitextMining, NanoArguAnaRetrieval, NanoClimateFEVER-VN, NanoClimateFeverRetrieval, NanoDBPedia-VN, NanoDBPediaRetrieval, NanoFEVER-VN, NanoFEVERRetrieval, NanoFiQA2018Retrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNFCorpusRetrieval, NanoNQ-VN, NanoNQRetrieval, NanoQuoraRetrieval, NanoSCIDOCSRetrieval, NanoSciFactRetrieval, NanoTouche2020Retrieval, NepaliNewsClassification, NeuCLIR2023RetrievalHardNegatives, News21InstructionRetrieval, NoRecClassification, NollySentiBitextMining, NorQuadRetrieval, NordicLangClassification, NorwegianCourtsBitextMining, NorwegianParliamentClassification, NusaParagraphEmotionClassification, NusaTranslationBitextMining, NusaX-senti, NusaXBitextMining, Ocnli, OdiaNewsClassification, OnlineShopping, OpenTenderClassification, OpenTenderClusteringP2P, OpenTenderClusteringS2S, OpenTenderRetrieval, OpusparcusPC, PAC, PAWSX, PIQA, PSC, ParsinluEntail, ParsinluQueryParaphPC, PawsXPairClassification, PerShopDomainClassification, PerShopIntentClassification, PersianFoodSentimentClassification, PersianTextEmotion, PersianTextEmotion.v2, PersianWebDocumentRetrieval, PhincBitextMining, PlscClusteringP2P, PlscClusteringP2P.v2, PlscClusteringS2S, PoemSentimentClassification, PolEmo2.0-IN, PolEmo2.0-OUT, PpcPC, PubChemSMILESBitextMining, PubChemSMILESPC, PubChemSynonymPC, PubChemWikiPairClassification, PubChemWikiParagraphsPC, PublicHealthQA, PunjabiNewsClassification, QBQTC, Quail, Query2Query, Quora-NL, Quora-VN, QuoraRetrieval, QuoraRetrieval-Fa, QuoraRetrieval-Fa.v2, RARbCode, RARbMath, RTE3, RUParaPhraserSTS, RedditClustering-VN, RedditClusteringP2P-VN, RiaNewsRetrieval, RiaNewsRetrievalHardNegatives.v2, Robust04InstructionRetrieval, RomaniBibleClustering, RuBQReranking, RuBQRetrieval, RuReviewsClassification, RuSTSBenchmarkSTS, RuSciBenchGRNTIClassification, RuSciBenchGRNTIClusteringP2P, RuSciBenchOECDClassification, RuSciBenchOECDClusteringP2P, SAMSumFa, SCIDOCS, SCIDOCS-Fa, SCIDOCS-Fa.v2, SCIDOCS-NL, SCIDOCS-NL.v2, SCIDOCS-VN, SDSGlovesClassification, SIB200Classification, SIB200ClusteringS2S, SICK-E-PL, SICK-NL-STS, SICK-R, SICK-R-PL, SICK-R-VN, SICKFr, SICKNLPairClassification, SIDClassification, SIDClassification.v2, SIDClustring, SIQA, SNLHierarchicalClusteringP2P, SNLHierarchicalClusteringS2S, SNLRetrieval, STS12, STS13, STS14, STS15, STS17, STS22, STS22.v2, STSB, STSBenchmark, STSBenchmark-VN, STSBenchmarkMultilingualSTS, STSES, SanskritShlokasClassification, ScalaClassification, SciDocsRR-VN, SciFact, SciFact-Fa, SciFact-Fa.v2, SciFact-NL, SciFact-NL.v2, SciFact-PL, SciFact-VN, SemRel24STS, SensitiveTopicsClassification, SentimentAnalysisHindi, SentimentDKSF, SinhalaNewsClassification, SiswatiNewsClassification, SlovakMovieReviewSentimentClassification, SpanishNewsClassification.v2, SpanishPassageRetrievalS2P, SpanishPassageRetrievalS2S, SpanishSentimentClassification.v2, SpartQA, SprintDuplicateQuestions, SprintDuplicateQuestions-VN, StackExchangeClustering-VN, StackExchangeClustering.v2, StackExchangeClusteringP2P-VN, StackExchangeClusteringP2P.v2, StackOverflowDupQuestions-VN, StackOverflowQA, StatcanDialogueDatasetRetrieval, StyleClassification, SummEvalFr, SummEvalSummarization.v2, SwahiliNewsClassification, SweFaqRetrieval, SweRecClassification, SwedishSentimentClassification, SwednClusteringP2P, SwednClusteringS2S, SwednRetrieval, SwissJudgementClassification, SynPerChatbotConvSAAnger, SynPerChatbotConvSAClassification, SynPerChatbotConvSAFear, SynPerChatbotConvSAFriendship, SynPerChatbotConvSAHappiness, SynPerChatbotConvSAJealousy, SynPerChatbotConvSALove, SynPerChatbotConvSASadness, SynPerChatbotConvSASatisfaction, SynPerChatbotConvSASurprise, SynPerChatbotConvSAToneChatbotClassification, SynPerChatbotConvSAToneUserClassification, SynPerChatbotRAGFAQPC, SynPerChatbotRAGFAQRetrieval, SynPerChatbotRAGSumSRetrieval, SynPerChatbotRAGToneChatbotClassification, SynPerChatbotRAGToneUserClassification, SynPerChatbotRAGTopicsRetrieval, SynPerChatbotSatisfactionLevelClassification, SynPerChatbotSumSRetrieval, SynPerChatbotToneChatbotClassification, SynPerChatbotToneUserClassification, SynPerChatbotTopicsRetrieval, SynPerQAPC, SynPerQARetrieval, SynPerSTS, SynPerTextKeywordsPC, SynPerTextToneClassification, SynPerTextToneClassification.v3, SyntecReranking, SyntecRetrieval, SyntheticText2SQL, T2Reranking, T2Retrieval, TERRa, TNews, TRECCOVID, TRECCOVID-Fa, TRECCOVID-Fa.v2, TRECCOVID-NL, TRECCOVID-PL, TRECCOVID-VN, TV2Nordretrieval, TVPLRetrieval, Tatoeba, TempReasonL1, TempReasonL2Context, TempReasonL2Fact, TempReasonL2Pure, TempReasonL3Context, TempReasonL3Fact, TempReasonL3Pure, TenKGnadClusteringP2P, TenKGnadClusteringS2S, ThuNewsClusteringP2P, ThuNewsClusteringS2S, Touche2020-Fa, Touche2020-NL, Touche2020-VN, Touche2020Retrieval.v3, ToxicChatClassification, ToxicConversationsClassification, ToxicConversationsVNClassification, TswanaNewsClassification, TweetSentimentClassification, TweetSentimentExtractionClassification, TweetSentimentExtractionVNClassification, TweetTopicSingleClassification, TwentyNewsgroupsClustering-VN, TwentyNewsgroupsClustering.v2, TwitterHjerneRetrieval, TwitterSemEval2015, TwitterSemEval2015-VN, TwitterURLCorpus, TwitterURLCorpus-VN, UrduRomanSentimentClassification, VABBClusteringP2P, VABBClusteringS2S, VABBMultiLabelClassification, VABBRetrieval, VGHierarchicalClusteringP2P, VGHierarchicalClusteringS2S, VaccinChatNLClassification, VideoRetrieval, Vidore2BioMedicalLecturesRetrieval, Vidore2ESGReportsHLRetrieval, Vidore2ESGReportsRetrieval, Vidore2EconomicsReportsRetrieval, Vidore3ComputerScienceRetrieval, Vidore3ComputerScienceRetrieval.v2, Vidore3EnergyRetrieval, Vidore3EnergyRetrieval.v2, Vidore3FinanceEnRetrieval, Vidore3FinanceEnRetrieval.v2, Vidore3FinanceFrRetrieval, Vidore3FinanceFrRetrieval.v2, Vidore3HrRetrieval, Vidore3HrRetrieval.v2, Vidore3IndustrialRetrieval, Vidore3IndustrialRetrieval.v2, Vidore3NuclearRetrieval, Vidore3PharmaceuticalsRetrieval, Vidore3PharmaceuticalsRetrieval.v2, Vidore3PhysicsRetrieval, Vidore3PhysicsRetrieval.v2, Vidore3TelecomRetrieval, VidoreArxivQARetrieval, VidoreDocVQARetrieval, VidoreInfoVQARetrieval, VidoreShiftProjectRetrieval, VidoreSyntheticDocQAAIRetrieval, VidoreSyntheticDocQAEnergyRetrieval, VidoreSyntheticDocQAGovernmentReportsRetrieval, VidoreSyntheticDocQAHealthcareIndustryRetrieval, VidoreTabfquadRetrieval, VidoreTatdqaRetrieval, VieQuADRetrieval, VoyageMMarcoReranking, WRIMEClassification, Waimai, WebFAQBitextMiningQAs, WebFAQBitextMiningQuestions, WebFAQRetrieval, WebLINXCandidatesReranking, WebQAT2TRetrieval, WikiCitiesClustering, WikiClusteringP2P.v2, WikipediaRerankingMultilingual, WikipediaRetrievalMultilingual, WinoGrande, WisesightSentimentClassification.v2, WongnaiReviewsClassification, XLWICNLPairClassification, XMarket, XNLI, XPQARetrieval, XQuADRetrieval, ZacLegalTextRetrieval, bBSARDNLRetrieval, indonli, mMARCO-NL

Results for AITeamVN/Vietnamese_Embedding

task_name AITeamVN/Vietnamese_Embedding Max result Model with max result In Training Data
AmazonCounterfactualVNClassification 0.6197 0.6878 BAAI/bge-multilingual-gemma2 False
AmazonPolarityVNClassification 0.8878 0.9057 intfloat/e5-mistral-7b-instruct False
AmazonReviewsVNClassification 0.4448 0.4508 intfloat/e5-mistral-7b-instruct False
ArguAna-VN 0.3793 0.5275 Alibaba-NLP/gte-multilingual-base False
AskUbuntuDupQuestions-VN 0.6187 0.6265 intfloat/e5-mistral-7b-instruct False
BIOSSES-VN 0.7814 0.8445 Alibaba-NLP/gte-multilingual-base False
Banking77VNClassification 0.7921 0.8929 BAAI/bge-multilingual-gemma2 False
CQADupstackAndroid-VN 0.4193 0.4682 intfloat/e5-mistral-7b-instruct False
CQADupstackGis-VN 0.3191 0.3518 intfloat/e5-mistral-7b-instruct False
CQADupstackMathematica-VN 0.2144 0.2526 intfloat/e5-mistral-7b-instruct False
CQADupstackPhysics-VN 0.3552 0.3908 Alibaba-NLP/gte-multilingual-base False
CQADupstackProgrammers-VN 0.3271 0.4042 intfloat/e5-mistral-7b-instruct False
CQADupstackStats-VN 0.2686 0.2955 intfloat/e5-mistral-7b-instruct False
CQADupstackTex-VN 0.2478 0.2810 intfloat/e5-mistral-7b-instruct False
CQADupstackUnix-VN 0.3452 0.3994 intfloat/e5-mistral-7b-instruct False
CQADupstackWebmasters-VN 0.3167 0.3859 intfloat/e5-mistral-7b-instruct False
CQADupstackWordpress-VN 0.2474 0.3162 intfloat/e5-mistral-7b-instruct False
EmotionVNClassification 0.4453 0.5023 BAAI/bge-multilingual-gemma2 False
FiQA2018-VN 0.2994 0.3288 Alibaba-NLP/gte-multilingual-base False
GreenNodeTableMarkdownRetrieval 0.3972 0.3502 intfloat/e5-mistral-7b-instruct False
ImdbVNClassification 0.8306 0.8654 intfloat/e5-mistral-7b-instruct False
MTOPDomainVNClassification 0.8554 0.9166 BAAI/bge-multilingual-gemma2 False
MTOPIntentVNClassification 0.5801 0.7572 BAAI/bge-multilingual-gemma2 False
MassiveIntentVNClassification 0.6774 0.7259 BAAI/bge-multilingual-gemma2 False
MassiveScenarioVNClassification 0.7285 0.7648 BAAI/bge-multilingual-gemma2 False
NFCorpus-VN 0.2539 0.3197 intfloat/e5-mistral-7b-instruct False
NanoClimateFEVER-VN 0.2513 nan False
NanoDBPedia-VN 0.4869 nan False
NanoFEVER-VN 0.8050 nan False
NanoHotpotQA-VN 0.8531 nan False
NanoMSMARCO-VN 0.7961 nan False
NanoNQ-VN 0.7969 nan False
Quora-VN 0.6100 0.5668 Alibaba-NLP/gte-multilingual-base False
RedditClustering-VN 0.4389 0.4991 Alibaba-NLP/gte-multilingual-base False
RedditClusteringP2P-VN 0.5616 0.5975 Alibaba-NLP/gte-multilingual-base False
SCIDOCS-VN 0.1303 0.1523 intfloat/e5-mistral-7b-instruct False
SICK-R-VN 0.7711 0.7791 intfloat/e5-mistral-7b-instruct False
STSBenchmark-VN 0.7719 0.8258 Alibaba-NLP/gte-multilingual-base False
SciDocsRR-VN 0.7996 0.8418 Alibaba-NLP/gte-multilingual-base False
SciFact-VN 0.5512 0.6562 Alibaba-NLP/gte-multilingual-base False
SprintDuplicateQuestions-VN 0.9568 0.9734 Alibaba-NLP/gte-multilingual-base False
StackExchangeClustering-VN 0.5724 0.6080 Alibaba-NLP/gte-multilingual-base False
StackExchangeClusteringP2P-VN 0.3147 0.4146 intfloat/e5-mistral-7b-instruct False
StackOverflowDupQuestions-VN 0.5028 0.5172 intfloat/e5-mistral-7b-instruct False
TRECCOVID-VN 0.2732 0.7742 intfloat/e5-mistral-7b-instruct False
TVPLRetrieval 0.7915 nan False
Touche2020-VN 0.1198 0.2592 intfloat/e5-mistral-7b-instruct False
ToxicConversationsVNClassification 0.6667 0.7319 BAAI/bge-multilingual-gemma2 False
TweetSentimentExtractionVNClassification 0.5576 0.6113 BAAI/bge-multilingual-gemma2 False
TwentyNewsgroupsClustering-VN 0.3928 0.4556 intfloat/e5-mistral-7b-instruct False
TwitterSemEval2015-VN 0.6824 0.7332 intfloat/e5-mistral-7b-instruct False
TwitterURLCorpus-VN 0.8500 0.8698 intfloat/e5-mistral-7b-instruct False
VieQuADRetrieval 0.5564 0.5459 GritLM/GritLM-7B False
ZacLegalTextRetrieval 0.8797 nan False
Average 0.5443 0.5745 nan -

Model have high performance on these tasks: Quora-VN,VieQuADRetrieval,GreenNodeTableMarkdownRetrieval


Results for Alibaba-NLP/gte-Qwen2-7B-instruct

task_name Alibaba-NLP/gte-Qwen2-7B-instruct intfloat/multilingual-e5-large Max result Model with max result In Training Data
BrightBiologyRetrieval 0.2847 0.0174 0.3387 lightonai/Reason-ModernColBERT False
BrightEarthScienceRetrieval 0.3508 0.1506 0.4170 lightonai/Reason-ModernColBERT False
BrightEconomicsRetrieval 0.1757 0.0706 0.2455 lightonai/Reason-ModernColBERT False
BrightPsychologyRetrieval 0.2647 0.0879 0.3104 lightonai/Reason-ModernColBERT False
BrightRoboticsRetrieval 0.148 0.1112 0.2181 lightonai/Reason-ModernColBERT False
BrightStackoverflowRetrieval 0.1619 0.0694 0.2425 lightonai/Reason-ModernColBERT False
Average 0.231 0.0845 0.2954 nan -

Results for BAAI/bge-large-en-v1.5

task_name BAAI/bge-large-en-v1.5 intfloat/multilingual-e5-large Max result Model with max result In Training Data
BrightAopsRetrieval 0.0600 0.0722 0.0825 lightonai/Reason-ModernColBERT False
BrightBiologyLongRetrieval 0.1642 0.0194 nan False
BrightBiologyRetrieval 0.1167 0.0174 0.3387 lightonai/Reason-ModernColBERT False
BrightEarthScienceLongRetrieval 0.2773 0.2155 nan False
BrightEarthScienceRetrieval 0.2456 0.1506 0.4170 lightonai/Reason-ModernColBERT False
BrightEconomicsLongRetrieval 0.2087 0.1359 nan False
BrightEconomicsRetrieval 0.1661 0.0706 0.2455 lightonai/Reason-ModernColBERT False
BrightLeetcodeRetrieval 0.2668 0.2787 0.3086 lightonai/Reason-ModernColBERT False
BrightPonyLongRetrieval 0.0036 0.0234 nan False
BrightPonyRetrieval 0.0572 0.1302 0.0873 lightonai/Reason-ModernColBERT False
BrightPsychologyLongRetrieval 0.1158 0.0594 nan False
BrightPsychologyRetrieval 0.1746 0.0879 0.3104 lightonai/Reason-ModernColBERT False
BrightRoboticsLongRetrieval 0.1089 0.0792 nan False
BrightRoboticsRetrieval 0.1171 0.1112 0.2181 lightonai/Reason-ModernColBERT False
BrightStackoverflowLongRetrieval 0.1325 0.1581 nan False
BrightStackoverflowRetrieval 0.1083 0.0694 0.2425 lightonai/Reason-ModernColBERT False
BrightSustainableLivingLongRetrieval 0.1690 0.0810 nan False
BrightSustainableLivingRetrieval 0.1333 0.0961 0.2021 lightonai/Reason-ModernColBERT False
BrightTheoremQAQuestionsRetrieval 0.1300 0.1296 0.1833 lightonai/Reason-ModernColBERT False
BrightTheoremQATheoremsRetrieval 0.0690 0.0549 0.0929 lightonai/Reason-ModernColBERT False
Average 0.1412 0.1020 0.2274 nan -

Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, DuRetrieval, MLQARetrieval, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoNQ-VN, NanoNQRetrieval


Results for BAAI/bge-m3

task_name BAAI/bge-m3 google/gemini-embedding-001 intfloat/multilingual-e5-large Max result Model with max result In Training Data
BibleNLPBitextMining 0.9805 0.9897 0.9819 0.9899 deepvk/USER-bge-m3 False
BrightAopsRetrieval 0.0456 nan 0.0722 0.0825 lightonai/Reason-ModernColBERT False
BrightBiologyLongRetrieval 0.1472 nan 0.0194 nan False
BrightBiologyRetrieval 0.0948 nan 0.0174 0.3387 lightonai/Reason-ModernColBERT False
BrightEarthScienceLongRetrieval 0.2083 nan 0.2155 nan False
BrightEarthScienceRetrieval 0.1539 nan 0.1506 0.4170 lightonai/Reason-ModernColBERT False
BrightEconomicsLongRetrieval 0.1311 nan 0.1359 nan False
BrightEconomicsRetrieval 0.1188 nan 0.0706 0.2455 lightonai/Reason-ModernColBERT False
BrightLeetcodeRetrieval 0.2477 nan 0.2787 0.3086 lightonai/Reason-ModernColBERT False
BrightPonyLongRetrieval 0.0046 nan 0.0234 nan False
BrightPonyRetrieval 0.1517 nan 0.1302 0.0873 lightonai/Reason-ModernColBERT False
BrightPsychologyLongRetrieval 0.1931 nan 0.0594 nan False
BrightPsychologyRetrieval 0.1326 nan 0.0879 0.3104 lightonai/Reason-ModernColBERT False
BrightRoboticsLongRetrieval 0.1238 nan 0.0792 nan False
BrightRoboticsRetrieval 0.1215 nan 0.1112 0.2181 lightonai/Reason-ModernColBERT False
BrightStackoverflowLongRetrieval 0.0897 nan 0.1581 nan False
BrightStackoverflowRetrieval 0.1063 nan 0.0694 0.2425 lightonai/Reason-ModernColBERT False
BrightSustainableLivingLongRetrieval 0.1685 nan 0.0810 nan False
BrightSustainableLivingRetrieval 0.1017 nan 0.0961 0.2021 lightonai/Reason-ModernColBERT False
BrightTheoremQAQuestionsRetrieval 0.1262 nan 0.1296 0.1833 lightonai/Reason-ModernColBERT False
BrightTheoremQATheoremsRetrieval 0.0434 nan 0.0549 0.0929 lightonai/Reason-ModernColBERT False
MIRACLReranking 0.6469 nan 0.6544 0.6753 ai-sage/Giga-Embeddings-instruct True
MIRACLRetrieval 0.7107 nan 0.5901 0.7713 tencent/KaLM-Embedding-Gemma3-12B-2511 True
MIRACLRetrievalHardNegatives.v2 0.6996 nan 0.5333 0.7743 tencent/KaLM-Embedding-Gemma3-12B-2511 True
MLSUMClusteringP2P 0.4528 nan 0.4631 0.5175 Salesforce/SFR-Embedding-2_R False
MLSUMClusteringS2S 0.4562 nan 0.4681 0.5122 Salesforce/SFR-Embedding-2_R False
MTOPDomainClassification 0.8855 0.9679 0.8988 0.9679 google/gemini-embedding-001 False
MTOPIntentClassification 0.6610 nan 0.6720 0.8844 tencent/KaLM-Embedding-Gemma3-12B-2511 False
MintakaRetrieval 0.2190 nan 0.3037 0.4977 openai/text-embedding-3-large False
MrTidyRetrieval 0.7261 nan 0.6509 0.6603 infly/inf-retriever-v1-1.5b True
MultiLongDocReranking 0.7790 nan 0.8887 0.9338 cl-nagoya/ruri-v3-310m False
MultiLongDocRetrieval 0.4216 nan 0.3175 0.4626 cl-nagoya/ruri-v3-30m False
MultilingualSentimentClassification 0.7623 nan nan 0.8219 deepvk/USER-base False
STS22 0.6789 nan 0.6823 0.7263 Qwen/Qwen3-Embedding-8B False
SpanishNewsClassification.v2 0.8825 nan 0.8862 0.2859 sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 False
SpanishPassageRetrievalS2P 0.4402 nan 0.4196 0.0144 sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 False
SpanishPassageRetrievalS2S 0.7037 nan 0.7232 0.7516 intfloat/e5-mistral-7b-instruct False
SpanishSentimentClassification.v2 0.9533 nan 0.9241 0.5058 sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 False
WebFAQRetrieval 0.7646 nan 0.7596 0.7882 Snowflake/snowflake-arctic-embed-l-v2.0 False
XPQARetrieval 0.5298 nan 0.5073 0.6494 openai/text-embedding-3-large False
XQuADRetrieval 0.9577 nan 0.9703 0.9723 infly/inf-retriever-v1 False
Average 0.4103 0.9788 0.3834 0.5119 nan -

Model have high performance on these tasks: MrTidyRetrieval,SpanishSentimentClassification.v2,SpanishNewsClassification.v2,BrightPonyRetrieval,SpanishPassageRetrievalS2P

Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL


Results for GreenNode/GreenNode-Embedding-E5-Large-VN-V1

task_name GreenNode/GreenNode-Embedding-E5-Large-VN-V1 Max result Model with max result In Training Data
AmazonCounterfactualVNClassification 0.5249 0.6878 BAAI/bge-multilingual-gemma2 False
AmazonPolarityVNClassification 0.7334 0.9057 intfloat/e5-mistral-7b-instruct False
AmazonReviewsVNClassification 0.3682 0.4508 intfloat/e5-mistral-7b-instruct False
ArguAna-VN 0.3893 0.5275 Alibaba-NLP/gte-multilingual-base False
AskUbuntuDupQuestions-VN 0.6038 0.6265 intfloat/e5-mistral-7b-instruct False
BIOSSES-VN 0.7676 0.8445 Alibaba-NLP/gte-multilingual-base False
Banking77VNClassification 0.7692 0.8929 BAAI/bge-multilingual-gemma2 False
CQADupstackAndroid-VN 0.4699 0.4682 intfloat/e5-mistral-7b-instruct False
CQADupstackGis-VN 0.3438 0.3518 intfloat/e5-mistral-7b-instruct False
CQADupstackMathematica-VN 0.2275 0.2526 intfloat/e5-mistral-7b-instruct False
CQADupstackPhysics-VN 0.4082 0.3908 Alibaba-NLP/gte-multilingual-base False
CQADupstackProgrammers-VN 0.3818 0.4042 intfloat/e5-mistral-7b-instruct False
CQADupstackStats-VN 0.2827 0.2955 intfloat/e5-mistral-7b-instruct False
CQADupstackTex-VN 0.2561 0.2810 intfloat/e5-mistral-7b-instruct False
CQADupstackUnix-VN 0.3595 0.3994 intfloat/e5-mistral-7b-instruct False
CQADupstackWebmasters-VN 0.3522 0.3859 intfloat/e5-mistral-7b-instruct False
CQADupstackWordpress-VN 0.2702 0.3162 intfloat/e5-mistral-7b-instruct False
EmotionVNClassification 0.4119 0.5023 BAAI/bge-multilingual-gemma2 False
FiQA2018-VN 0.3174 0.3288 Alibaba-NLP/gte-multilingual-base False
GreenNodeTableMarkdownRetrieval 0.3567 0.3502 intfloat/e5-mistral-7b-instruct True
ImdbVNClassification 0.7393 0.8654 intfloat/e5-mistral-7b-instruct False
MTOPDomainVNClassification 0.8543 0.9166 BAAI/bge-multilingual-gemma2 False
MTOPIntentVNClassification 0.5331 0.7572 BAAI/bge-multilingual-gemma2 False
MassiveIntentVNClassification 0.6460 0.7259 BAAI/bge-multilingual-gemma2 False
MassiveScenarioVNClassification 0.7129 0.7648 BAAI/bge-multilingual-gemma2 False
NFCorpus-VN 0.3123 0.3197 intfloat/e5-mistral-7b-instruct False
NanoClimateFEVER-VN 0.3096 nan False
NanoDBPedia-VN 0.4699 nan False
NanoFEVER-VN 0.9244 nan True
NanoHotpotQA-VN 0.7323 nan True
NanoMSMARCO-VN 0.7998 nan True
NanoNQ-VN 0.8060 nan True
Quora-VN 0.6166 0.5668 Alibaba-NLP/gte-multilingual-base False
RedditClustering-VN 0.4988 0.4991 Alibaba-NLP/gte-multilingual-base False
RedditClusteringP2P-VN 0.5682 0.5975 Alibaba-NLP/gte-multilingual-base False
SCIDOCS-VN 0.1659 0.1523 intfloat/e5-mistral-7b-instruct False
SICK-R-VN 0.7809 0.7791 intfloat/e5-mistral-7b-instruct False
STSBenchmark-VN 0.8092 0.8258 Alibaba-NLP/gte-multilingual-base False
SciDocsRR-VN 0.8402 0.8418 Alibaba-NLP/gte-multilingual-base False
SciFact-VN 0.5846 0.6562 Alibaba-NLP/gte-multilingual-base False
SprintDuplicateQuestions-VN 0.9484 0.9734 Alibaba-NLP/gte-multilingual-base False
StackExchangeClustering-VN 0.6150 0.6080 Alibaba-NLP/gte-multilingual-base False
StackExchangeClusteringP2P-VN 0.3186 0.4146 intfloat/e5-mistral-7b-instruct False
StackOverflowDupQuestions-VN 0.4971 0.5172 intfloat/e5-mistral-7b-instruct False
TRECCOVID-VN 0.6439 0.7742 intfloat/e5-mistral-7b-instruct False
TVPLRetrieval 0.8646 nan False
Touche2020-VN 0.2368 0.2592 intfloat/e5-mistral-7b-instruct False
ToxicConversationsVNClassification 0.6017 0.7319 BAAI/bge-multilingual-gemma2 False
TweetSentimentExtractionVNClassification 0.4940 0.6113 BAAI/bge-multilingual-gemma2 False
TwentyNewsgroupsClustering-VN 0.4605 0.4556 intfloat/e5-mistral-7b-instruct False
TwitterSemEval2015-VN 0.6682 0.7332 intfloat/e5-mistral-7b-instruct False
TwitterURLCorpus-VN 0.8321 0.8698 intfloat/e5-mistral-7b-instruct False
VieQuADRetrieval 0.5217 0.5459 GritLM/GritLM-7B False
Average 0.5472 0.5745 nan -

Model have high performance on these tasks: SICK-R-VN,StackExchangeClustering-VN,Quora-VN,CQADupstackAndroid-VN,TwentyNewsgroupsClustering-VN,CQADupstackPhysics-VN,GreenNodeTableMarkdownRetrieval,SCIDOCS-VN

Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, GreenNodeTableMarkdownRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL


Results for GreenNode/GreenNode-Embedding-KaLM-Mini-Instruct-VN-V1

task_name GreenNode/GreenNode-Embedding-KaLM-Mini-Instruct-VN-V1 Max result Model with max result In Training Data
AmazonCounterfactualVNClassification 0.5354 0.6878 BAAI/bge-multilingual-gemma2 True
AmazonPolarityVNClassification 0.7110 0.9057 intfloat/e5-mistral-7b-instruct True
AmazonReviewsVNClassification 0.3480 0.4508 intfloat/e5-mistral-7b-instruct True
ArguAna-VN 0.3424 0.5275 Alibaba-NLP/gte-multilingual-base False
AskUbuntuDupQuestions-VN 0.5575 0.6265 intfloat/e5-mistral-7b-instruct False
BIOSSES-VN 0.7534 0.8445 Alibaba-NLP/gte-multilingual-base False
Banking77VNClassification 0.7833 0.8929 BAAI/bge-multilingual-gemma2 True
CQADupstackAndroid-VN 0.3856 0.4682 intfloat/e5-mistral-7b-instruct False
CQADupstackGis-VN 0.3022 0.3518 intfloat/e5-mistral-7b-instruct False
CQADupstackMathematica-VN 0.1842 0.2526 intfloat/e5-mistral-7b-instruct False
CQADupstackPhysics-VN 0.3374 0.3908 Alibaba-NLP/gte-multilingual-base False
CQADupstackProgrammers-VN 0.3070 0.4042 intfloat/e5-mistral-7b-instruct False
CQADupstackStats-VN 0.2558 0.2955 intfloat/e5-mistral-7b-instruct False
CQADupstackTex-VN 0.1982 0.2810 intfloat/e5-mistral-7b-instruct False
CQADupstackUnix-VN 0.2833 0.3994 intfloat/e5-mistral-7b-instruct False
CQADupstackWebmasters-VN 0.2964 0.3859 intfloat/e5-mistral-7b-instruct False
CQADupstackWordpress-VN 0.1841 0.3162 intfloat/e5-mistral-7b-instruct False
EmotionVNClassification 0.3361 0.5023 BAAI/bge-multilingual-gemma2 True
FiQA2018-VN 0.2166 0.3288 Alibaba-NLP/gte-multilingual-base True
GreenNodeTableMarkdownRetrieval 0.3642 0.3502 intfloat/e5-mistral-7b-instruct True
ImdbVNClassification 0.7225 0.8654 intfloat/e5-mistral-7b-instruct True
MTOPDomainVNClassification 0.8584 0.9166 BAAI/bge-multilingual-gemma2 True
MTOPIntentVNClassification 0.6424 0.7572 BAAI/bge-multilingual-gemma2 True
MassiveIntentVNClassification 0.6693 0.7259 BAAI/bge-multilingual-gemma2 True
MassiveScenarioVNClassification 0.7062 0.7648 BAAI/bge-multilingual-gemma2 True
NFCorpus-VN 0.2626 0.3197 intfloat/e5-mistral-7b-instruct True
NanoClimateFEVER-VN 0.2604 nan False
NanoDBPedia-VN 0.4128 nan True
NanoFEVER-VN 0.6641 nan True
NanoHotpotQA-VN 0.6926 nan True
NanoMSMARCO-VN 0.6692 nan True
NanoNQ-VN 0.6812 nan True
Quora-VN 0.5072 0.5668 Alibaba-NLP/gte-multilingual-base False
RedditClustering-VN 0.4341 0.4991 Alibaba-NLP/gte-multilingual-base False
RedditClusteringP2P-VN 0.5671 0.5975 Alibaba-NLP/gte-multilingual-base False
SCIDOCS-VN 0.1279 0.1523 intfloat/e5-mistral-7b-instruct False
SICK-R-VN 0.7003 0.7791 intfloat/e5-mistral-7b-instruct False
STSBenchmark-VN 0.6984 0.8258 Alibaba-NLP/gte-multilingual-base False
SciDocsRR-VN 0.7950 0.8418 Alibaba-NLP/gte-multilingual-base False
SciFact-VN 0.5518 0.6562 Alibaba-NLP/gte-multilingual-base True
SprintDuplicateQuestions-VN 0.9173 0.9734 Alibaba-NLP/gte-multilingual-base False
StackExchangeClustering-VN 0.5333 0.6080 Alibaba-NLP/gte-multilingual-base False
StackExchangeClusteringP2P-VN 0.3264 0.4146 intfloat/e5-mistral-7b-instruct False
StackOverflowDupQuestions-VN 0.4517 0.5172 intfloat/e5-mistral-7b-instruct False
TRECCOVID-VN 0.6371 0.7742 intfloat/e5-mistral-7b-instruct True
TVPLRetrieval 0.8420 nan False
Touche2020-VN 0.2353 0.2592 intfloat/e5-mistral-7b-instruct False
ToxicConversationsVNClassification 0.5608 0.7319 BAAI/bge-multilingual-gemma2 True
TweetSentimentExtractionVNClassification 0.4755 0.6113 BAAI/bge-multilingual-gemma2 True
TwentyNewsgroupsClustering-VN 0.4110 0.4556 intfloat/e5-mistral-7b-instruct False
TwitterSemEval2015-VN 0.6202 0.7332 intfloat/e5-mistral-7b-instruct False
TwitterURLCorpus-VN 0.8295 0.8698 intfloat/e5-mistral-7b-instruct False
VieQuADRetrieval 0.4847 0.5459 GritLM/GritLM-7B False
Average 0.5025 0.5745 nan -

Model have high performance on these tasks: GreenNodeTableMarkdownRetrieval

Training datasets: ATEC, AmazonCounterfactualClassification, AmazonCounterfactualVNClassification, AmazonPolarityClassification, AmazonPolarityClassification.v2, AmazonPolarityVNClassification, AmazonReviewsClassification, AmazonReviewsVNClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArxivClusteringP2P, ArxivClusteringP2P.v2, ArxivClusteringS2S, BQ, Banking77Classification, Banking77Classification.v2, Banking77VNClassification, BiorxivClusteringP2P, BiorxivClusteringP2P.v2, BiorxivClusteringS2S, BiorxivClusteringS2S.v2, CQADupstack, CodeFeedbackMT, CodeFeedbackST, ContractNLIConfidentialityOfAgreementLegalBenchClassification, ContractNLIExplicitIdentificationLegalBenchClassification, ContractNLIInclusionOfVerballyConveyedInformationLegalBenchClassification, ContractNLILimitedUseLegalBenchClassification, ContractNLINoLicensingLegalBenchClassification, `ContractNLINoticeOnCompell


Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.

"manhattan_spearman": 0.425955,
"euclidean_pearson": 0.327813,
"euclidean_spearman": 0.412371,
"main_score": 0.412371,
Copy link
Copy Markdown
Member

@Samoed Samoed Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Very big gap between v1 and v2. 0.79 vs 0.41

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a lot. Idk If they updated model weight in these 2 version.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be caused by embeddings-benchmark/mteb#4085. Can you try to run this task with "mteb<2.9.0"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Samoed, with my resource constrains I don't think I can rerun the task anymore.

"cv_recall_at_20": 0.16556,
"cv_recall_at_100": 0.28146,
"cv_recall_at_1000": 0.51656,
"main_score": 0.07924,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here too, but I don't understand how they got these results firstly. Results for bge-m3 are the same

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

seems like this one has gotten stale - @BaoLocPham do you need help rerunning any of the tasks?

@BaoLocPham
Copy link
Copy Markdown
Contributor Author

seems like this one has gotten stale - @BaoLocPham do you need help rerunning any of the tasks?

Hi, @KennethEnevoldsen and @Samoed, I need help in rerunning non-vietnamese text embedding model in the commit.
because apparently the Vietnamese text embedding models don't have big gap between versions.
Vietnamese text embedding model list:

  • AlTeamVN/Vietnamese_Embedding
  • bkai-foundation-models/vietnamese-bi-encoder
  • contextboxai/halong_embedding
  • GreenNode/GreenNode-Embedding-E5-Large-VN-V1
  • GreenNode/GreenNode-Embedding-KaLM-Mini-Instruct-VN-V1
  • GreenNode/GreenNode-Embedding-Large-VN-Mixed-V1
  • GreenNode/GreenNode-Embedding-Large-VN-V1
  • VoVanPhuc/sup-SimCSE-VietNamese-phobert-base

@KennethEnevoldsen KennethEnevoldsen changed the title [ADD] results VN-MTEB Add results for VN-MTEB Mar 23, 2026
@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

@BaoLocPham I would be fine helping you with these, but maybe can I ask you to create an issue on it remove all the invalid results here (that way we can merge what is valid)

@BaoLocPham
Copy link
Copy Markdown
Contributor Author

@BaoLocPham I would be fine helping you with these, but maybe can I ask you to create an issue on it remove all the invalid results here (that way we can merge what is valid)

Okay. I'll create it later.

@BaoLocPham BaoLocPham force-pushed the main branch 2 times, most recently from 7369c96 to 4f1974b Compare April 1, 2026 16:03
bwang-pplx and others added 21 commits April 1, 2026 16:07
…embed-v1-0.6b (embeddings-benchmark#426)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* Add model_meta.json

* Add Vidore2BioMedicalLecturesRetrieval.json

* Add Vidore2ESGReportsHLRetrieval.json

* Add VidoreArxivQARetrieval.json

* Add VidoreDocVQARetrieval.json

* Add VidoreInfoVQARetrieval.json

* Add VidoreShiftProjectRetrieval.json

* Add VidoreSyntheticDocQAAIRetrieval.json

* Add VidoreSyntheticDocQAEnergyRetrieval.json

* Add VidoreSyntheticDocQAGovernmentReportsRetrieval.json

* Add VidoreSyntheticDocQAHealthcareIndustryRetrieval.json

* Add VidoreTabfquadRetrieval.json

* Add VidoreTatdqaRetrieval.json

* Add Vidore2EconomicsReportsRetrieval.json

* Add Vidore2ESGReportsRetrieval.json

* Add Vidore3ComputerScienceRetrieval.v2.json

* Add Vidore3EnergyRetrieval.v2.json

* Add Vidore3FinanceEnRetrieval.v2.json

* Add Vidore3FinanceFrRetrieval.v2.json

* Add Vidore3HrRetrieval.v2.json

* Add Vidore3IndustrialRetrieval.v2.json

* Add Vidore3PharmaceuticalsRetrieval.v2.json

* Add Vidore3PhysicsRetrieval.v2.json

* Update Vidore2ESGReportsHLRetrieval.json with fixed MTEB wrapper results

* Update VidoreArxivQARetrieval.json with fixed MTEB wrapper results

* Update VidoreDocVQARetrieval.json with fixed MTEB wrapper results

* Update VidoreInfoVQARetrieval.json with fixed MTEB wrapper results

* Update VidoreShiftProjectRetrieval.json with fixed MTEB wrapper results

* Update VidoreSyntheticDocQAAIRetrieval.json with fixed MTEB wrapper results

* Update VidoreSyntheticDocQAEnergyRetrieval.json with fixed MTEB wrapper results

* Update VidoreSyntheticDocQAGovernmentReportsRetrieval.json with fixed MTEB wrapper results

* Update VidoreSyntheticDocQAHealthcareIndustryRetrieval.json with fixed MTEB wrapper results

* Update VidoreTabfquadRetrieval.json with fixed MTEB wrapper results

* Update VidoreTatdqaRetrieval.json with fixed MTEB wrapper results

* Update Vidore2BioMedicalLecturesRetrieval.json with fixed MTEB wrapper results

* Update Vidore2EconomicsReportsRetrieval.json with fixed MTEB wrapper results

* Update Vidore2ESGReportsRetrieval.json with fixed MTEB wrapper results

* Update Vidore3ComputerScienceRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3EnergyRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3FinanceEnRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3FinanceFrRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3HrRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3IndustrialRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3PharmaceuticalsRetrieval.v2.json with fixed MTEB wrapper results

* Update Vidore3PhysicsRetrieval.v2.json with fixed MTEB wrapper results
* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* Add files via upload

* f2llm-v2-160m

* f2llm-v2-330m & 0.6B

* f2llm-v2-1.7B & 4B

* f2llm-v2-8B & 14B

* correct model meta file

* reduce sizes

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* Add Vidore3ComputerScienceRetrieval.json

* Add Vidore3EnergyRetrieval.json

* Add Vidore3FinanceEnRetrieval.json

* Add Vidore3FinanceFrRetrieval.json

* Add Vidore3HrRetrieval.json

* Add Vidore3PharmaceuticalsRetrieval.json

* Add Vidore3PhysicsRetrieval.json

* Add Vidore3IndustrialRetrieval.json

* Fix model_meta.json revision

* Revert model_meta.json to original
…-benchmark#447)

* Add results for nanovdr/NanoVDR-S-Multi on ViDoRe v1/v2/v3

* Update model_meta.json: add n_embedding_parameters and loader
…hmark#454)

* Added MTEB results for potion-base-32m and potion-retrieval-32m

* Update metadata

* Removed external folders
* Create Results PR Comment for Results diff

* Added 1 more results file for wide comparison

* Update argument and docstring in script

* Update command in yaml file

* update yaml file

* Revert results

* remove unnecessary model revision

* Updated table format and added it as a tests

* update table format

* Skip comment if nothing is change

* use TaskResult for results fetching

* Remove PR comment generation part and add to only test

* remove changes in yaml file

* Fix CLI command in comment

* fix cli example

* change main score extraction strategy

* Declare and used MTEB_SCORE_EPSILON variable

* modify results for checking tests

* Update tests/test_results_diff.py

Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>

* Added correct fix

* correct namings

* Delete results/aari1995__German_Semantic_STS_V2/22912542b0ec7a7ef369837e28ffe6352a27afc9/AmazonCounterfactualClassification.json

* Revert "Delete results/aari1995__German_Semantic_STS_V2/22912542b0ec7a7ef369837e28ffe6352a27afc9/AmazonCounterfactualClassification.json"

This reverts commit 9fd4099.

* Moved functions in test file

* correct import

* rollback results file

---------

Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
…ngs-benchmark#458)

Add Thai-language task results to support the MTEB(tha, v1) benchmark.
All changes are purely additive — no existing files or scores deleted.
Tasks that already exist in other revisions are skipped to avoid duplicates.

New model (28 Thai tasks):
- voyageai/voyage-4-nano

Thai entries added to 20 existing models (retained revision only):
- New task files where no revision had that task
- Thai subset entries merged into existing multilingual task files

Reference models included: intfloat/multilingual-e5-small,
sentence-transformers/static-similarity-mrl-multilingual-v1,
minishlab/potion-multilingual-128M, mteb/baseline-bm25s

Hardware: LANTA HPC (ThaiSC) — NVIDIA A100-SXM4-40GB
Software: MTEB 2.10.0, sentence-transformers 5.2.3

Co-authored-by: anu <201861+anusoft@users.noreply.github.com>
…hmark#456)

Co-authored-by: Clemente <clemente@Clementes-MacBook-Pro.local>
…s-benchmark#463)

* Add results for microsoft/harrier-oss-v1 (0.6b, 270m, 27b)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update harrier-oss-v1-27b model_meta: remove experiment_kwargs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reduce large FloresBitextMining JSON files for all harrier-oss-v1 variants

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nchmark#466)

* Adding more minishlab/potion-multilingual-128M results

* Adding more minishlab/potion-multilingual-128M results
…ark#469)

* Adding more intfloat/multilingual-e5-small results

* Removing the duplications
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions bot added the stale label Apr 16, 2026
@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

@BaoLocPham are you still working on this PR?

@BaoLocPham
Copy link
Copy Markdown
Contributor Author

@KennethEnevoldsen I'm still working on it, lemme remove files and update my commit later.

@KennethEnevoldsen
Copy link
Copy Markdown
Contributor

Great to hear!

@github-actions github-actions bot removed the stale label Apr 17, 2026
@minhnguyent546
Copy link
Copy Markdown

Hi, could we also have results for the jinaai/jina-embeddings-v5-text model family? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.