Pre-trained Assamese GloVe Embedding Model

This repository contains the pre-trained Assamese GloVe embedding models generated using a large Assamese corpus. The repository includes three different files with varying vector sizes:

GloVe.300k.50d (Download)
GloVe.400k.100d(Download)
GloVe.400k.300d(Download)

Acknowledgment

This work was inspired by the GloVe project, an unsupervised learning algorithm for obtaining vector representations for words, developed by the Stanford Natural Language Processing Group. I would like to acknowledge the GloVe project for their significant contributions to the field of Natural Language Processing.

How to Use

To use the pre-trained Assamese GloVe embedding models in your project, you can easily import them using the gensim library in Python.

you will first need to convert it to Word2Vec format. Here are the steps to do so:

Install the gensim library using pip:
```
pip install gensim
```
Import the glove2word2vec function from the gensim.scripts module:
```
from gensim.scripts.glove2word2vec import glove2word2vec
```

Define the input and output file paths:

glove_input_file = 'GloVe.400k.300d.txt'
word2vec_output_file = 'word2vec.400k.300d.txt'

Use the glove2word2vec function to convert the GloVe model to Word2Vec format:
```
glove2word2vec(glove_input_file, word2vec_output_file)
```

Once you have converted the GloVe model to Word2Vec format, you can load it into Gensim using the Word2Vec.load_word2vec_format method.

Load the Assamese GloVe embedding model using the KeyedVectors.load_word2vec_format method:

from gensim.models import KeyedVectors

model = KeyedVectors.load_word2vec_format('GloVe.400k.100d', binary=False)

Now you can use the loaded model to perform various NLP tasks, such as word similarity, analogy, and more.

Converting a GloVe model to a word2vec model using the gensim.scripts.glove2word2vec function does not change the embedding values themselves. It only changes the format of the file, from the GloVe format to the word2vec format, so that it can be loaded with the KeyedVectors.load_word2vec_format() method.

Example Usage

To illustrate how to use the model, here are two examples:

Compute cosine similarity between some Assamese words

word1 = 'মেকুৰী'
word2 = 'কুকুৰ'
cos_sim = model.similarity(word1, word2)
print(f"Cosine similarity between '{word1}' and '{word2}': {cos_sim:.4f}")

Output:

Cosine similarity between 'মেকুৰী' and 'কুকুৰ': 0.5876

Find 10 most similar words to a given word

# Find 10 most similar words to a given word
word = 'আহাৰ'
topn = 10
similar_words = model.similar_by_word(word, topn=topn)
print(f"{topn} most similar words to '{word}':")
for i, (w, sim) in enumerate(similar_words):
    print(f"{i+1}. {w} ({sim:.4f})")

Output:

10 most similar words to 'আহাৰ':
1. দুপৰীয়াৰ (0.6657)
2. আহাৰৰ (0.6602)
3. জুমিয়ে (0.6398)
4. সুষম (0.6215)
5. আহাৰো (0.5948)
6. খাদ্য (0.5947)
7. শাওণমহীয়া (0.5946)
8. নিৰামিষ (0.5889)
9. তিনিসাজ (0.5861)
10. ভিটামিনযুক্ত (0.5848)

When utilizing the Assamese GloVe file from this repository, please ensure that you properly cite and acknowledge the work of the creators by utilizing the appropriate link. The pre-trained embedding model has the ability to be used for various natural language processing tasks in Assamese, and its use is highly encouraged. Feedback is also welcomed to promote research and development of NLP tools for under-resourced languages.

License

This work is licensed under the MIT License. Please refer to the LICENSE file for more information.

If you have any questions or suggestions, feel free to create an issue or contact me directly.

Thank you!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Instructions		Instructions
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pre-trained Assamese GloVe Embedding Model

Acknowledgment

How to Use

Example Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Pre-trained Assamese GloVe Embedding Model

Acknowledgment

How to Use

Example Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages