You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
frontier-evals-harness is a lightweight framework for benchmarking frontier language models. It provides deterministic suite versioning, modular adapters, standardized scoring, and paired statistical comparisons with confidence intervals. Built for regression tracking and analysis, it enables reproducible evaluation without infrastructure.
Quantitative performance benchmarking of machine learning classifiers for phishing detection, utilizing precision, recall, and F1-score optimization on high-dimensional labeled datasets.
Classification models for detecting fake reviews and predicting software bugs. Includes implementations of decision trees, bagging, random forests, logistic regression, and Naive Bayes, with statistical evaluation using McNemar's test.