Add results for VN-MTEB#430
Conversation
Model Results ComparisonReference models: Results for
|
| task_name | AITeamVN/Vietnamese_Embedding | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| AmazonCounterfactualVNClassification | 0.6197 | 0.6878 | BAAI/bge-multilingual-gemma2 | False |
| AmazonPolarityVNClassification | 0.8878 | 0.9057 | intfloat/e5-mistral-7b-instruct | False |
| AmazonReviewsVNClassification | 0.4448 | 0.4508 | intfloat/e5-mistral-7b-instruct | False |
| ArguAna-VN | 0.3793 | 0.5275 | Alibaba-NLP/gte-multilingual-base | False |
| AskUbuntuDupQuestions-VN | 0.6187 | 0.6265 | intfloat/e5-mistral-7b-instruct | False |
| BIOSSES-VN | 0.7814 | 0.8445 | Alibaba-NLP/gte-multilingual-base | False |
| Banking77VNClassification | 0.7921 | 0.8929 | BAAI/bge-multilingual-gemma2 | False |
| CQADupstackAndroid-VN | 0.4193 | 0.4682 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackGis-VN | 0.3191 | 0.3518 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackMathematica-VN | 0.2144 | 0.2526 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackPhysics-VN | 0.3552 | 0.3908 | Alibaba-NLP/gte-multilingual-base | False |
| CQADupstackProgrammers-VN | 0.3271 | 0.4042 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackStats-VN | 0.2686 | 0.2955 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackTex-VN | 0.2478 | 0.2810 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackUnix-VN | 0.3452 | 0.3994 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackWebmasters-VN | 0.3167 | 0.3859 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackWordpress-VN | 0.2474 | 0.3162 | intfloat/e5-mistral-7b-instruct | False |
| EmotionVNClassification | 0.4453 | 0.5023 | BAAI/bge-multilingual-gemma2 | False |
| FiQA2018-VN | 0.2994 | 0.3288 | Alibaba-NLP/gte-multilingual-base | False |
| GreenNodeTableMarkdownRetrieval | 0.3972 | 0.3502 | intfloat/e5-mistral-7b-instruct | False |
| ImdbVNClassification | 0.8306 | 0.8654 | intfloat/e5-mistral-7b-instruct | False |
| MTOPDomainVNClassification | 0.8554 | 0.9166 | BAAI/bge-multilingual-gemma2 | False |
| MTOPIntentVNClassification | 0.5801 | 0.7572 | BAAI/bge-multilingual-gemma2 | False |
| MassiveIntentVNClassification | 0.6774 | 0.7259 | BAAI/bge-multilingual-gemma2 | False |
| MassiveScenarioVNClassification | 0.7285 | 0.7648 | BAAI/bge-multilingual-gemma2 | False |
| NFCorpus-VN | 0.2539 | 0.3197 | intfloat/e5-mistral-7b-instruct | False |
| NanoClimateFEVER-VN | 0.2513 | nan | False | |
| NanoDBPedia-VN | 0.4869 | nan | False | |
| NanoFEVER-VN | 0.8050 | nan | False | |
| NanoHotpotQA-VN | 0.8531 | nan | False | |
| NanoMSMARCO-VN | 0.7961 | nan | False | |
| NanoNQ-VN | 0.7969 | nan | False | |
| Quora-VN | 0.6100 | 0.5668 | Alibaba-NLP/gte-multilingual-base | False |
| RedditClustering-VN | 0.4389 | 0.4991 | Alibaba-NLP/gte-multilingual-base | False |
| RedditClusteringP2P-VN | 0.5616 | 0.5975 | Alibaba-NLP/gte-multilingual-base | False |
| SCIDOCS-VN | 0.1303 | 0.1523 | intfloat/e5-mistral-7b-instruct | False |
| SICK-R-VN | 0.7711 | 0.7791 | intfloat/e5-mistral-7b-instruct | False |
| STSBenchmark-VN | 0.7719 | 0.8258 | Alibaba-NLP/gte-multilingual-base | False |
| SciDocsRR-VN | 0.7996 | 0.8418 | Alibaba-NLP/gte-multilingual-base | False |
| SciFact-VN | 0.5512 | 0.6562 | Alibaba-NLP/gte-multilingual-base | False |
| SprintDuplicateQuestions-VN | 0.9568 | 0.9734 | Alibaba-NLP/gte-multilingual-base | False |
| StackExchangeClustering-VN | 0.5724 | 0.6080 | Alibaba-NLP/gte-multilingual-base | False |
| StackExchangeClusteringP2P-VN | 0.3147 | 0.4146 | intfloat/e5-mistral-7b-instruct | False |
| StackOverflowDupQuestions-VN | 0.5028 | 0.5172 | intfloat/e5-mistral-7b-instruct | False |
| TRECCOVID-VN | 0.2732 | 0.7742 | intfloat/e5-mistral-7b-instruct | False |
| TVPLRetrieval | 0.7915 | nan | False | |
| Touche2020-VN | 0.1198 | 0.2592 | intfloat/e5-mistral-7b-instruct | False |
| ToxicConversationsVNClassification | 0.6667 | 0.7319 | BAAI/bge-multilingual-gemma2 | False |
| TweetSentimentExtractionVNClassification | 0.5576 | 0.6113 | BAAI/bge-multilingual-gemma2 | False |
| TwentyNewsgroupsClustering-VN | 0.3928 | 0.4556 | intfloat/e5-mistral-7b-instruct | False |
| TwitterSemEval2015-VN | 0.6824 | 0.7332 | intfloat/e5-mistral-7b-instruct | False |
| TwitterURLCorpus-VN | 0.8500 | 0.8698 | intfloat/e5-mistral-7b-instruct | False |
| VieQuADRetrieval | 0.5564 | 0.5459 | GritLM/GritLM-7B | False |
| ZacLegalTextRetrieval | 0.8797 | nan | False | |
| Average | 0.5443 | 0.5745 | nan | - |
Model have high performance on these tasks: Quora-VN,VieQuADRetrieval,GreenNodeTableMarkdownRetrieval
Results for Alibaba-NLP/gte-Qwen2-7B-instruct
| task_name | Alibaba-NLP/gte-Qwen2-7B-instruct | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| BrightBiologyRetrieval | 0.2847 | 0.0174 | 0.3387 | lightonai/Reason-ModernColBERT | False |
| BrightEarthScienceRetrieval | 0.3508 | 0.1506 | 0.4170 | lightonai/Reason-ModernColBERT | False |
| BrightEconomicsRetrieval | 0.1757 | 0.0706 | 0.2455 | lightonai/Reason-ModernColBERT | False |
| BrightPsychologyRetrieval | 0.2647 | 0.0879 | 0.3104 | lightonai/Reason-ModernColBERT | False |
| BrightRoboticsRetrieval | 0.148 | 0.1112 | 0.2181 | lightonai/Reason-ModernColBERT | False |
| BrightStackoverflowRetrieval | 0.1619 | 0.0694 | 0.2425 | lightonai/Reason-ModernColBERT | False |
| Average | 0.231 | 0.0845 | 0.2954 | nan | - |
Results for BAAI/bge-large-en-v1.5
| task_name | BAAI/bge-large-en-v1.5 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|
| BrightAopsRetrieval | 0.0600 | 0.0722 | 0.0825 | lightonai/Reason-ModernColBERT | False |
| BrightBiologyLongRetrieval | 0.1642 | 0.0194 | nan | False | |
| BrightBiologyRetrieval | 0.1167 | 0.0174 | 0.3387 | lightonai/Reason-ModernColBERT | False |
| BrightEarthScienceLongRetrieval | 0.2773 | 0.2155 | nan | False | |
| BrightEarthScienceRetrieval | 0.2456 | 0.1506 | 0.4170 | lightonai/Reason-ModernColBERT | False |
| BrightEconomicsLongRetrieval | 0.2087 | 0.1359 | nan | False | |
| BrightEconomicsRetrieval | 0.1661 | 0.0706 | 0.2455 | lightonai/Reason-ModernColBERT | False |
| BrightLeetcodeRetrieval | 0.2668 | 0.2787 | 0.3086 | lightonai/Reason-ModernColBERT | False |
| BrightPonyLongRetrieval | 0.0036 | 0.0234 | nan | False | |
| BrightPonyRetrieval | 0.0572 | 0.1302 | 0.0873 | lightonai/Reason-ModernColBERT | False |
| BrightPsychologyLongRetrieval | 0.1158 | 0.0594 | nan | False | |
| BrightPsychologyRetrieval | 0.1746 | 0.0879 | 0.3104 | lightonai/Reason-ModernColBERT | False |
| BrightRoboticsLongRetrieval | 0.1089 | 0.0792 | nan | False | |
| BrightRoboticsRetrieval | 0.1171 | 0.1112 | 0.2181 | lightonai/Reason-ModernColBERT | False |
| BrightStackoverflowLongRetrieval | 0.1325 | 0.1581 | nan | False | |
| BrightStackoverflowRetrieval | 0.1083 | 0.0694 | 0.2425 | lightonai/Reason-ModernColBERT | False |
| BrightSustainableLivingLongRetrieval | 0.1690 | 0.0810 | nan | False | |
| BrightSustainableLivingRetrieval | 0.1333 | 0.0961 | 0.2021 | lightonai/Reason-ModernColBERT | False |
| BrightTheoremQAQuestionsRetrieval | 0.1300 | 0.1296 | 0.1833 | lightonai/Reason-ModernColBERT | False |
| BrightTheoremQATheoremsRetrieval | 0.0690 | 0.0549 | 0.0929 | lightonai/Reason-ModernColBERT | False |
| Average | 0.1412 | 0.1020 | 0.2274 | nan | - |
Training datasets: AmazonReviewsClassification, AmazonReviewsVNClassification, DuRetrieval, MLQARetrieval, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoNQ-VN, NanoNQRetrieval
Results for BAAI/bge-m3
| task_name | BAAI/bge-m3 | google/gemini-embedding-001 | intfloat/multilingual-e5-large | Max result | Model with max result | In Training Data |
|---|---|---|---|---|---|---|
| BibleNLPBitextMining | 0.9805 | 0.9897 | 0.9819 | 0.9899 | deepvk/USER-bge-m3 | False |
| BrightAopsRetrieval | 0.0456 | nan | 0.0722 | 0.0825 | lightonai/Reason-ModernColBERT | False |
| BrightBiologyLongRetrieval | 0.1472 | nan | 0.0194 | nan | False | |
| BrightBiologyRetrieval | 0.0948 | nan | 0.0174 | 0.3387 | lightonai/Reason-ModernColBERT | False |
| BrightEarthScienceLongRetrieval | 0.2083 | nan | 0.2155 | nan | False | |
| BrightEarthScienceRetrieval | 0.1539 | nan | 0.1506 | 0.4170 | lightonai/Reason-ModernColBERT | False |
| BrightEconomicsLongRetrieval | 0.1311 | nan | 0.1359 | nan | False | |
| BrightEconomicsRetrieval | 0.1188 | nan | 0.0706 | 0.2455 | lightonai/Reason-ModernColBERT | False |
| BrightLeetcodeRetrieval | 0.2477 | nan | 0.2787 | 0.3086 | lightonai/Reason-ModernColBERT | False |
| BrightPonyLongRetrieval | 0.0046 | nan | 0.0234 | nan | False | |
| BrightPonyRetrieval | 0.1517 | nan | 0.1302 | 0.0873 | lightonai/Reason-ModernColBERT | False |
| BrightPsychologyLongRetrieval | 0.1931 | nan | 0.0594 | nan | False | |
| BrightPsychologyRetrieval | 0.1326 | nan | 0.0879 | 0.3104 | lightonai/Reason-ModernColBERT | False |
| BrightRoboticsLongRetrieval | 0.1238 | nan | 0.0792 | nan | False | |
| BrightRoboticsRetrieval | 0.1215 | nan | 0.1112 | 0.2181 | lightonai/Reason-ModernColBERT | False |
| BrightStackoverflowLongRetrieval | 0.0897 | nan | 0.1581 | nan | False | |
| BrightStackoverflowRetrieval | 0.1063 | nan | 0.0694 | 0.2425 | lightonai/Reason-ModernColBERT | False |
| BrightSustainableLivingLongRetrieval | 0.1685 | nan | 0.0810 | nan | False | |
| BrightSustainableLivingRetrieval | 0.1017 | nan | 0.0961 | 0.2021 | lightonai/Reason-ModernColBERT | False |
| BrightTheoremQAQuestionsRetrieval | 0.1262 | nan | 0.1296 | 0.1833 | lightonai/Reason-ModernColBERT | False |
| BrightTheoremQATheoremsRetrieval | 0.0434 | nan | 0.0549 | 0.0929 | lightonai/Reason-ModernColBERT | False |
| MIRACLReranking | 0.6469 | nan | 0.6544 | 0.6753 | ai-sage/Giga-Embeddings-instruct | True |
| MIRACLRetrieval | 0.7107 | nan | 0.5901 | 0.7713 | tencent/KaLM-Embedding-Gemma3-12B-2511 | True |
| MIRACLRetrievalHardNegatives.v2 | 0.6996 | nan | 0.5333 | 0.7743 | tencent/KaLM-Embedding-Gemma3-12B-2511 | True |
| MLSUMClusteringP2P | 0.4528 | nan | 0.4631 | 0.5175 | Salesforce/SFR-Embedding-2_R | False |
| MLSUMClusteringS2S | 0.4562 | nan | 0.4681 | 0.5122 | Salesforce/SFR-Embedding-2_R | False |
| MTOPDomainClassification | 0.8855 | 0.9679 | 0.8988 | 0.9679 | google/gemini-embedding-001 | False |
| MTOPIntentClassification | 0.6610 | nan | 0.6720 | 0.8844 | tencent/KaLM-Embedding-Gemma3-12B-2511 | False |
| MintakaRetrieval | 0.2190 | nan | 0.3037 | 0.4977 | openai/text-embedding-3-large | False |
| MrTidyRetrieval | 0.7261 | nan | 0.6509 | 0.6603 | infly/inf-retriever-v1-1.5b | True |
| MultiLongDocReranking | 0.7790 | nan | 0.8887 | 0.9338 | cl-nagoya/ruri-v3-310m | False |
| MultiLongDocRetrieval | 0.4216 | nan | 0.3175 | 0.4626 | cl-nagoya/ruri-v3-30m | False |
| MultilingualSentimentClassification | 0.7623 | nan | nan | 0.8219 | deepvk/USER-base | False |
| STS22 | 0.6789 | nan | 0.6823 | 0.7263 | Qwen/Qwen3-Embedding-8B | False |
| SpanishNewsClassification.v2 | 0.8825 | nan | 0.8862 | 0.2859 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | False |
| SpanishPassageRetrievalS2P | 0.4402 | nan | 0.4196 | 0.0144 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | False |
| SpanishPassageRetrievalS2S | 0.7037 | nan | 0.7232 | 0.7516 | intfloat/e5-mistral-7b-instruct | False |
| SpanishSentimentClassification.v2 | 0.9533 | nan | 0.9241 | 0.5058 | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | False |
| WebFAQRetrieval | 0.7646 | nan | 0.7596 | 0.7882 | Snowflake/snowflake-arctic-embed-l-v2.0 | False |
| XPQARetrieval | 0.5298 | nan | 0.5073 | 0.6494 | openai/text-embedding-3-large | False |
| XQuADRetrieval | 0.9577 | nan | 0.9703 | 0.9723 | infly/inf-retriever-v1 | False |
| Average | 0.4103 | 0.9788 | 0.3834 | 0.5119 | nan | - |
Model have high performance on these tasks: MrTidyRetrieval,SpanishSentimentClassification.v2,SpanishNewsClassification.v2,BrightPonyRetrieval,SpanishPassageRetrievalS2P
Training datasets: CMedQAv1-reranking, CMedQAv2-reranking, CmedqaRetrieval, CodeSearchNet, DuRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, LeCaRDv2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MMarcoReranking, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, T2Reranking, T2Retrieval, mMARCO-NL
Results for GreenNode/GreenNode-Embedding-E5-Large-VN-V1
| task_name | GreenNode/GreenNode-Embedding-E5-Large-VN-V1 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| AmazonCounterfactualVNClassification | 0.5249 | 0.6878 | BAAI/bge-multilingual-gemma2 | False |
| AmazonPolarityVNClassification | 0.7334 | 0.9057 | intfloat/e5-mistral-7b-instruct | False |
| AmazonReviewsVNClassification | 0.3682 | 0.4508 | intfloat/e5-mistral-7b-instruct | False |
| ArguAna-VN | 0.3893 | 0.5275 | Alibaba-NLP/gte-multilingual-base | False |
| AskUbuntuDupQuestions-VN | 0.6038 | 0.6265 | intfloat/e5-mistral-7b-instruct | False |
| BIOSSES-VN | 0.7676 | 0.8445 | Alibaba-NLP/gte-multilingual-base | False |
| Banking77VNClassification | 0.7692 | 0.8929 | BAAI/bge-multilingual-gemma2 | False |
| CQADupstackAndroid-VN | 0.4699 | 0.4682 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackGis-VN | 0.3438 | 0.3518 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackMathematica-VN | 0.2275 | 0.2526 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackPhysics-VN | 0.4082 | 0.3908 | Alibaba-NLP/gte-multilingual-base | False |
| CQADupstackProgrammers-VN | 0.3818 | 0.4042 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackStats-VN | 0.2827 | 0.2955 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackTex-VN | 0.2561 | 0.2810 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackUnix-VN | 0.3595 | 0.3994 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackWebmasters-VN | 0.3522 | 0.3859 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackWordpress-VN | 0.2702 | 0.3162 | intfloat/e5-mistral-7b-instruct | False |
| EmotionVNClassification | 0.4119 | 0.5023 | BAAI/bge-multilingual-gemma2 | False |
| FiQA2018-VN | 0.3174 | 0.3288 | Alibaba-NLP/gte-multilingual-base | False |
| GreenNodeTableMarkdownRetrieval | 0.3567 | 0.3502 | intfloat/e5-mistral-7b-instruct | True |
| ImdbVNClassification | 0.7393 | 0.8654 | intfloat/e5-mistral-7b-instruct | False |
| MTOPDomainVNClassification | 0.8543 | 0.9166 | BAAI/bge-multilingual-gemma2 | False |
| MTOPIntentVNClassification | 0.5331 | 0.7572 | BAAI/bge-multilingual-gemma2 | False |
| MassiveIntentVNClassification | 0.6460 | 0.7259 | BAAI/bge-multilingual-gemma2 | False |
| MassiveScenarioVNClassification | 0.7129 | 0.7648 | BAAI/bge-multilingual-gemma2 | False |
| NFCorpus-VN | 0.3123 | 0.3197 | intfloat/e5-mistral-7b-instruct | False |
| NanoClimateFEVER-VN | 0.3096 | nan | False | |
| NanoDBPedia-VN | 0.4699 | nan | False | |
| NanoFEVER-VN | 0.9244 | nan | True | |
| NanoHotpotQA-VN | 0.7323 | nan | True | |
| NanoMSMARCO-VN | 0.7998 | nan | True | |
| NanoNQ-VN | 0.8060 | nan | True | |
| Quora-VN | 0.6166 | 0.5668 | Alibaba-NLP/gte-multilingual-base | False |
| RedditClustering-VN | 0.4988 | 0.4991 | Alibaba-NLP/gte-multilingual-base | False |
| RedditClusteringP2P-VN | 0.5682 | 0.5975 | Alibaba-NLP/gte-multilingual-base | False |
| SCIDOCS-VN | 0.1659 | 0.1523 | intfloat/e5-mistral-7b-instruct | False |
| SICK-R-VN | 0.7809 | 0.7791 | intfloat/e5-mistral-7b-instruct | False |
| STSBenchmark-VN | 0.8092 | 0.8258 | Alibaba-NLP/gte-multilingual-base | False |
| SciDocsRR-VN | 0.8402 | 0.8418 | Alibaba-NLP/gte-multilingual-base | False |
| SciFact-VN | 0.5846 | 0.6562 | Alibaba-NLP/gte-multilingual-base | False |
| SprintDuplicateQuestions-VN | 0.9484 | 0.9734 | Alibaba-NLP/gte-multilingual-base | False |
| StackExchangeClustering-VN | 0.6150 | 0.6080 | Alibaba-NLP/gte-multilingual-base | False |
| StackExchangeClusteringP2P-VN | 0.3186 | 0.4146 | intfloat/e5-mistral-7b-instruct | False |
| StackOverflowDupQuestions-VN | 0.4971 | 0.5172 | intfloat/e5-mistral-7b-instruct | False |
| TRECCOVID-VN | 0.6439 | 0.7742 | intfloat/e5-mistral-7b-instruct | False |
| TVPLRetrieval | 0.8646 | nan | False | |
| Touche2020-VN | 0.2368 | 0.2592 | intfloat/e5-mistral-7b-instruct | False |
| ToxicConversationsVNClassification | 0.6017 | 0.7319 | BAAI/bge-multilingual-gemma2 | False |
| TweetSentimentExtractionVNClassification | 0.4940 | 0.6113 | BAAI/bge-multilingual-gemma2 | False |
| TwentyNewsgroupsClustering-VN | 0.4605 | 0.4556 | intfloat/e5-mistral-7b-instruct | False |
| TwitterSemEval2015-VN | 0.6682 | 0.7332 | intfloat/e5-mistral-7b-instruct | False |
| TwitterURLCorpus-VN | 0.8321 | 0.8698 | intfloat/e5-mistral-7b-instruct | False |
| VieQuADRetrieval | 0.5217 | 0.5459 | GritLM/GritLM-7B | False |
| Average | 0.5472 | 0.5745 | nan | - |
Model have high performance on these tasks: SICK-R-VN,StackExchangeClustering-VN,Quora-VN,CQADupstackAndroid-VN,TwentyNewsgroupsClustering-VN,CQADupstackPhysics-VN,GreenNodeTableMarkdownRetrieval,SCIDOCS-VN
Training datasets: FEVER, FEVER-FaHardNegatives, FEVER-NL, FEVER-PL, FEVER-VN, FEVERHardNegatives, FEVERHardNegatives.v2, GreenNodeTableMarkdownRetrieval, HotpotQA, HotpotQA-Fa, HotpotQA-FaHardNegatives, HotpotQA-NL, HotpotQA-PL, HotpotQA-PLHardNegatives, HotpotQA-VN, HotpotQAHardNegatives, HotpotQAHardNegatives.v2, MIRACLJaRetrievalLite, MIRACLReranking, MIRACLRetrieval, MIRACLRetrievalHardNegatives, MIRACLRetrievalHardNegatives.v2, MSMARCO, MSMARCO-Fa, MSMARCO-FaHardNegatives, MSMARCO-PL, MSMARCO-PLHardNegatives, MSMARCO-VN, MSMARCOHardNegatives, MSMARCOv2, MrTidyRetrieval, MrTyDiJaRetrievalLite, NQ, NQ-Fa, NQ-FaHardNegatives, NQ-NL, NQ-PL, NQ-PLHardNegatives, NQ-VN, NQHardNegatives, NanoFEVER-VN, NanoFEVERRetrieval, NanoHotpotQA-VN, NanoHotpotQARetrieval, NanoMSMARCO-VN, NanoMSMARCORetrieval, NanoNQ-VN, NanoNQRetrieval, XQuADRetrieval, mMARCO-NL
Results for GreenNode/GreenNode-Embedding-KaLM-Mini-Instruct-VN-V1
| task_name | GreenNode/GreenNode-Embedding-KaLM-Mini-Instruct-VN-V1 | Max result | Model with max result | In Training Data |
|---|---|---|---|---|
| AmazonCounterfactualVNClassification | 0.5354 | 0.6878 | BAAI/bge-multilingual-gemma2 | True |
| AmazonPolarityVNClassification | 0.7110 | 0.9057 | intfloat/e5-mistral-7b-instruct | True |
| AmazonReviewsVNClassification | 0.3480 | 0.4508 | intfloat/e5-mistral-7b-instruct | True |
| ArguAna-VN | 0.3424 | 0.5275 | Alibaba-NLP/gte-multilingual-base | False |
| AskUbuntuDupQuestions-VN | 0.5575 | 0.6265 | intfloat/e5-mistral-7b-instruct | False |
| BIOSSES-VN | 0.7534 | 0.8445 | Alibaba-NLP/gte-multilingual-base | False |
| Banking77VNClassification | 0.7833 | 0.8929 | BAAI/bge-multilingual-gemma2 | True |
| CQADupstackAndroid-VN | 0.3856 | 0.4682 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackGis-VN | 0.3022 | 0.3518 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackMathematica-VN | 0.1842 | 0.2526 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackPhysics-VN | 0.3374 | 0.3908 | Alibaba-NLP/gte-multilingual-base | False |
| CQADupstackProgrammers-VN | 0.3070 | 0.4042 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackStats-VN | 0.2558 | 0.2955 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackTex-VN | 0.1982 | 0.2810 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackUnix-VN | 0.2833 | 0.3994 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackWebmasters-VN | 0.2964 | 0.3859 | intfloat/e5-mistral-7b-instruct | False |
| CQADupstackWordpress-VN | 0.1841 | 0.3162 | intfloat/e5-mistral-7b-instruct | False |
| EmotionVNClassification | 0.3361 | 0.5023 | BAAI/bge-multilingual-gemma2 | True |
| FiQA2018-VN | 0.2166 | 0.3288 | Alibaba-NLP/gte-multilingual-base | True |
| GreenNodeTableMarkdownRetrieval | 0.3642 | 0.3502 | intfloat/e5-mistral-7b-instruct | True |
| ImdbVNClassification | 0.7225 | 0.8654 | intfloat/e5-mistral-7b-instruct | True |
| MTOPDomainVNClassification | 0.8584 | 0.9166 | BAAI/bge-multilingual-gemma2 | True |
| MTOPIntentVNClassification | 0.6424 | 0.7572 | BAAI/bge-multilingual-gemma2 | True |
| MassiveIntentVNClassification | 0.6693 | 0.7259 | BAAI/bge-multilingual-gemma2 | True |
| MassiveScenarioVNClassification | 0.7062 | 0.7648 | BAAI/bge-multilingual-gemma2 | True |
| NFCorpus-VN | 0.2626 | 0.3197 | intfloat/e5-mistral-7b-instruct | True |
| NanoClimateFEVER-VN | 0.2604 | nan | False | |
| NanoDBPedia-VN | 0.4128 | nan | True | |
| NanoFEVER-VN | 0.6641 | nan | True | |
| NanoHotpotQA-VN | 0.6926 | nan | True | |
| NanoMSMARCO-VN | 0.6692 | nan | True | |
| NanoNQ-VN | 0.6812 | nan | True | |
| Quora-VN | 0.5072 | 0.5668 | Alibaba-NLP/gte-multilingual-base | False |
| RedditClustering-VN | 0.4341 | 0.4991 | Alibaba-NLP/gte-multilingual-base | False |
| RedditClusteringP2P-VN | 0.5671 | 0.5975 | Alibaba-NLP/gte-multilingual-base | False |
| SCIDOCS-VN | 0.1279 | 0.1523 | intfloat/e5-mistral-7b-instruct | False |
| SICK-R-VN | 0.7003 | 0.7791 | intfloat/e5-mistral-7b-instruct | False |
| STSBenchmark-VN | 0.6984 | 0.8258 | Alibaba-NLP/gte-multilingual-base | False |
| SciDocsRR-VN | 0.7950 | 0.8418 | Alibaba-NLP/gte-multilingual-base | False |
| SciFact-VN | 0.5518 | 0.6562 | Alibaba-NLP/gte-multilingual-base | True |
| SprintDuplicateQuestions-VN | 0.9173 | 0.9734 | Alibaba-NLP/gte-multilingual-base | False |
| StackExchangeClustering-VN | 0.5333 | 0.6080 | Alibaba-NLP/gte-multilingual-base | False |
| StackExchangeClusteringP2P-VN | 0.3264 | 0.4146 | intfloat/e5-mistral-7b-instruct | False |
| StackOverflowDupQuestions-VN | 0.4517 | 0.5172 | intfloat/e5-mistral-7b-instruct | False |
| TRECCOVID-VN | 0.6371 | 0.7742 | intfloat/e5-mistral-7b-instruct | True |
| TVPLRetrieval | 0.8420 | nan | False | |
| Touche2020-VN | 0.2353 | 0.2592 | intfloat/e5-mistral-7b-instruct | False |
| ToxicConversationsVNClassification | 0.5608 | 0.7319 | BAAI/bge-multilingual-gemma2 | True |
| TweetSentimentExtractionVNClassification | 0.4755 | 0.6113 | BAAI/bge-multilingual-gemma2 | True |
| TwentyNewsgroupsClustering-VN | 0.4110 | 0.4556 | intfloat/e5-mistral-7b-instruct | False |
| TwitterSemEval2015-VN | 0.6202 | 0.7332 | intfloat/e5-mistral-7b-instruct | False |
| TwitterURLCorpus-VN | 0.8295 | 0.8698 | intfloat/e5-mistral-7b-instruct | False |
| VieQuADRetrieval | 0.4847 | 0.5459 | GritLM/GritLM-7B | False |
| Average | 0.5025 | 0.5745 | nan | - |
Model have high performance on these tasks: GreenNodeTableMarkdownRetrieval
Training datasets: ATEC, AmazonCounterfactualClassification, AmazonCounterfactualVNClassification, AmazonPolarityClassification, AmazonPolarityClassification.v2, AmazonPolarityVNClassification, AmazonReviewsClassification, AmazonReviewsVNClassification, ArXivHierarchicalClusteringP2P, ArXivHierarchicalClusteringS2S, ArxivClusteringP2P, ArxivClusteringP2P.v2, ArxivClusteringS2S, BQ, Banking77Classification, Banking77Classification.v2, Banking77VNClassification, BiorxivClusteringP2P, BiorxivClusteringP2P.v2, BiorxivClusteringS2S, BiorxivClusteringS2S.v2, CQADupstack, CodeFeedbackMT, CodeFeedbackST, ContractNLIConfidentialityOfAgreementLegalBenchClassification, ContractNLIExplicitIdentificationLegalBenchClassification, ContractNLIInclusionOfVerballyConveyedInformationLegalBenchClassification, ContractNLILimitedUseLegalBenchClassification, ContractNLINoLicensingLegalBenchClassification, `ContractNLINoticeOnCompell
Note: Content truncated due to GitHub API limits. See the full report in the workflow artifacts.
| "manhattan_spearman": 0.425955, | ||
| "euclidean_pearson": 0.327813, | ||
| "euclidean_spearman": 0.412371, | ||
| "main_score": 0.412371, |
There was a problem hiding this comment.
Hm. Very big gap between v1 and v2. 0.79 vs 0.41
There was a problem hiding this comment.
that's a lot. Idk If they updated model weight in these 2 version.
There was a problem hiding this comment.
Maybe this can be caused by embeddings-benchmark/mteb#4085. Can you try to run this task with "mteb<2.9.0"
There was a problem hiding this comment.
Hi @Samoed, with my resource constrains I don't think I can rerun the task anymore.
| "cv_recall_at_20": 0.16556, | ||
| "cv_recall_at_100": 0.28146, | ||
| "cv_recall_at_1000": 0.51656, | ||
| "main_score": 0.07924, |
There was a problem hiding this comment.
And here too, but I don't understand how they got these results firstly. Results for bge-m3 are the same
|
seems like this one has gotten stale - @BaoLocPham do you need help rerunning any of the tasks? |
Hi, @KennethEnevoldsen and @Samoed, I need help in rerunning non-vietnamese text embedding model in the commit.
|
|
@BaoLocPham I would be fine helping you with these, but maybe can I ask you to create an issue on it remove all the invalid results here (that way we can merge what is valid) |
Okay. I'll create it later. |
7369c96 to
4f1974b
Compare
…embed-v1-0.6b (embeddings-benchmark#426) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* Add model_meta.json * Add Vidore2BioMedicalLecturesRetrieval.json * Add Vidore2ESGReportsHLRetrieval.json * Add VidoreArxivQARetrieval.json * Add VidoreDocVQARetrieval.json * Add VidoreInfoVQARetrieval.json * Add VidoreShiftProjectRetrieval.json * Add VidoreSyntheticDocQAAIRetrieval.json * Add VidoreSyntheticDocQAEnergyRetrieval.json * Add VidoreSyntheticDocQAGovernmentReportsRetrieval.json * Add VidoreSyntheticDocQAHealthcareIndustryRetrieval.json * Add VidoreTabfquadRetrieval.json * Add VidoreTatdqaRetrieval.json * Add Vidore2EconomicsReportsRetrieval.json * Add Vidore2ESGReportsRetrieval.json * Add Vidore3ComputerScienceRetrieval.v2.json * Add Vidore3EnergyRetrieval.v2.json * Add Vidore3FinanceEnRetrieval.v2.json * Add Vidore3FinanceFrRetrieval.v2.json * Add Vidore3HrRetrieval.v2.json * Add Vidore3IndustrialRetrieval.v2.json * Add Vidore3PharmaceuticalsRetrieval.v2.json * Add Vidore3PhysicsRetrieval.v2.json * Update Vidore2ESGReportsHLRetrieval.json with fixed MTEB wrapper results * Update VidoreArxivQARetrieval.json with fixed MTEB wrapper results * Update VidoreDocVQARetrieval.json with fixed MTEB wrapper results * Update VidoreInfoVQARetrieval.json with fixed MTEB wrapper results * Update VidoreShiftProjectRetrieval.json with fixed MTEB wrapper results * Update VidoreSyntheticDocQAAIRetrieval.json with fixed MTEB wrapper results * Update VidoreSyntheticDocQAEnergyRetrieval.json with fixed MTEB wrapper results * Update VidoreSyntheticDocQAGovernmentReportsRetrieval.json with fixed MTEB wrapper results * Update VidoreSyntheticDocQAHealthcareIndustryRetrieval.json with fixed MTEB wrapper results * Update VidoreTabfquadRetrieval.json with fixed MTEB wrapper results * Update VidoreTatdqaRetrieval.json with fixed MTEB wrapper results * Update Vidore2BioMedicalLecturesRetrieval.json with fixed MTEB wrapper results * Update Vidore2EconomicsReportsRetrieval.json with fixed MTEB wrapper results * Update Vidore2ESGReportsRetrieval.json with fixed MTEB wrapper results * Update Vidore3ComputerScienceRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3EnergyRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3FinanceEnRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3FinanceFrRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3HrRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3IndustrialRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3PharmaceuticalsRetrieval.v2.json with fixed MTEB wrapper results * Update Vidore3PhysicsRetrieval.v2.json with fixed MTEB wrapper results
* Add files via upload * Add files via upload * Add files via upload * Add files via upload * Add files via upload * f2llm-v2-160m * f2llm-v2-330m & 0.6B * f2llm-v2-1.7B & 4B * f2llm-v2-8B & 14B * correct model meta file * reduce sizes --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com>
* Add Vidore3ComputerScienceRetrieval.json * Add Vidore3EnergyRetrieval.json * Add Vidore3FinanceEnRetrieval.json * Add Vidore3FinanceFrRetrieval.json * Add Vidore3HrRetrieval.json * Add Vidore3PharmaceuticalsRetrieval.json * Add Vidore3PhysicsRetrieval.json * Add Vidore3IndustrialRetrieval.json * Fix model_meta.json revision * Revert model_meta.json to original
…-benchmark#447) * Add results for nanovdr/NanoVDR-S-Multi on ViDoRe v1/v2/v3 * Update model_meta.json: add n_embedding_parameters and loader
…hmark#454) * Added MTEB results for potion-base-32m and potion-retrieval-32m * Update metadata * Removed external folders
* Create Results PR Comment for Results diff * Added 1 more results file for wide comparison * Update argument and docstring in script * Update command in yaml file * update yaml file * Revert results * remove unnecessary model revision * Updated table format and added it as a tests * update table format * Skip comment if nothing is change * use TaskResult for results fetching * Remove PR comment generation part and add to only test * remove changes in yaml file * Fix CLI command in comment * fix cli example * change main score extraction strategy * Declare and used MTEB_SCORE_EPSILON variable * modify results for checking tests * Update tests/test_results_diff.py Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me> * Added correct fix * correct namings * Delete results/aari1995__German_Semantic_STS_V2/22912542b0ec7a7ef369837e28ffe6352a27afc9/AmazonCounterfactualClassification.json * Revert "Delete results/aari1995__German_Semantic_STS_V2/22912542b0ec7a7ef369837e28ffe6352a27afc9/AmazonCounterfactualClassification.json" This reverts commit 9fd4099. * Moved functions in test file * correct import * rollback results file --------- Co-authored-by: Roman Solomatin <36135455+Samoed@users.noreply.github.com> Co-authored-by: Kenneth Enevoldsen <kenevoldsen@pm.me>
…ngs-benchmark#458) Add Thai-language task results to support the MTEB(tha, v1) benchmark. All changes are purely additive — no existing files or scores deleted. Tasks that already exist in other revisions are skipped to avoid duplicates. New model (28 Thai tasks): - voyageai/voyage-4-nano Thai entries added to 20 existing models (retained revision only): - New task files where no revision had that task - Thai subset entries merged into existing multilingual task files Reference models included: intfloat/multilingual-e5-small, sentence-transformers/static-similarity-mrl-multilingual-v1, minishlab/potion-multilingual-128M, mteb/baseline-bm25s Hardware: LANTA HPC (ThaiSC) — NVIDIA A100-SXM4-40GB Software: MTEB 2.10.0, sentence-transformers 5.2.3 Co-authored-by: anu <201861+anusoft@users.noreply.github.com>
…hmark#456) Co-authored-by: Clemente <clemente@Clementes-MacBook-Pro.local>
…s-benchmark#463) * Add results for microsoft/harrier-oss-v1 (0.6b, 270m, 27b) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update harrier-oss-v1-27b model_meta: remove experiment_kwargs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reduce large FloresBitextMining JSON files for all harrier-oss-v1 variants Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nchmark#466) * Adding more minishlab/potion-multilingual-128M results * Adding more minishlab/potion-multilingual-128M results
…ark#469) * Adding more intfloat/multilingual-e5-small results * Removing the duplications
|
This pull request has been automatically marked as stale due to inactivity. |
|
@BaoLocPham are you still working on this PR? |
|
@KennethEnevoldsen I'm still working on it, lemme remove files and update my commit later. |
|
Great to hear! |
|
Hi, could we also have results for the jinaai/jina-embeddings-v5-text model family? Thanks. |
Checklist
mteb/models/model_implementations/, this can be as an API.