Skip to content

Conversation

@venkamita
Copy link

Adds a baseline machine learning project to classify bacterial genes as essential or non-essential using DNA sequence data from the macwiatrak/bacbench-essential-genes-dna dataset.
Files Added:
main.py – Implements the Logistic Regression pipeline with 4-mer feature extraction.
requirements.txt – Lists all Python dependencies needed to run the project.
README.md – Project overview, dataset description, preprocessing steps, model evaluation, and usage instructions.
Notes:

Serves as a simple baseline for essential gene prediction.
First ML project attempt; AI was used only for debugging assistance.
Follow-up improvements could include handling class imbalance, overlapping k-mers, and more advanced models.

@github-actions
Copy link

github-actions bot commented Dec 7, 2025

👋 @venkamita
Thank you for raising your pull request.
Please make sure you have followed our contributing guidelines. We will review it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant