Thank you for your interest in contributing to the Multi Natural Language Inference (MNLI) Approach project! This document provides guidelines and instructions for contributing.
- Code of Conduct
- Getting Started
- Development Environment
- Coding Standards
- Adding New Features
- Pull Request Process
- Reporting Issues
- Documentation
- Testing
Please be respectful and considerate of others when contributing to this project. We aim to foster an inclusive and welcoming community.
- Fork the repository on GitHub
- Clone your fork locally:
git clone https://github.com/mlengineershub/MNLI-MultiModel-Benchmark cd MNLI-MultiModel-Benchmark - Set up the upstream remote:
git remote add upstream https://github.com/mlengineershub/MNLI-MultiModel-Benchmark
- Create a new branch for your feature or bugfix:
git checkout -b feature/your-feature-name
-
Install the required dependencies:
pip install -r requirements.txt
-
Install the package in development mode:
pip install -e . -
Make sure you can run the existing models:
python src/main.py --train --models decision_tree
- Follow PEP 8 style guidelines for Python code
- Use meaningful variable and function names
- Add docstrings to all functions, classes, and modules
- Keep functions focused on a single responsibility
- Comment complex code sections
- Use type hints where appropriate
Example:
def preprocess_text(text: str, remove_stopwords: bool = True) -> str:
"""
Preprocess the input text by converting to lowercase, removing punctuation,
and optionally removing stopwords.
Args:
text: The input text to preprocess
remove_stopwords: Whether to remove stopwords
Returns:
The preprocessed text
"""
# Implementation here
return processed_text- Create a new file in the
src/directory for your model (e.g.,src/your_model.py) - Implement your model following the existing patterns
- Update
src/models.pyto include your model in the model factory - Add appropriate hyperparameters to
config/configuration.yaml - Add tests for your model
- Update documentation to reflect the new model
- Ensure the dataset follows the same format as existing datasets
- Add preprocessing code if needed
- Update documentation to include information about the new dataset
-
Update your fork with the latest changes from the upstream repository:
git fetch upstream git rebase upstream/main
-
Make sure your code passes all tests:
# Add test command here -
Push your changes to your fork:
git push origin feature/your-feature-name
-
Create a pull request on GitHub with a clear title and description:
- Describe what changes you've made
- Reference any related issues
- Explain how to test your changes
- Include any necessary documentation updates
-
Address any feedback from code reviews
When reporting issues, please include:
- A clear and descriptive title
- Steps to reproduce the issue
- Expected behavior
- Actual behavior
- Environment information (OS, Python version, package versions)
- Any relevant logs or error messages
- Update the README.md when adding new features or changing existing functionality
- Add docstrings to all new functions, classes, and modules
- Include examples where appropriate
- Update any diagrams or visualizations if necessary
- Write tests for all new functionality
- Ensure existing tests pass with your changes
- Test your changes with different hyperparameter configurations
- Verify that your model works with the existing pipeline
When adding a new model, please include:
- Performance metrics on the development and test sets
- Confusion matrices
- Comparison with existing models
- Analysis of strengths and weaknesses
Thank you for contributing to the NLI-Approach project!