Skip to content
View Mike-E-Log's full-sized avatar

Highlights

  • Pro

Block or report Mike-E-Log

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Mike-E-Log/README.md

AI quality AI testing best practices

Mike Ilog · AI Engineer (agents + quality testing)

i pick up a new field fast and turn it into working AI. agents that actually DO the work (not demos). and i don't ship any of it until i've proven it works with independent checks, because untested AI is just a bluff.


Contributions


Tools & methods

  • Building AI agents: Claude Agent SDK, MCP, Anthropic SDK
  • Checking AI quality: agreement scoring (Cohen's kappa, Kendall-tau), benchmarks (MT-Bench), AI-graded-by-AI (LLM-as-judge)
  • Languages: Python, TypeScript

Contact

cooperation FTW · located on Earth

Pinned Loading

  1. ai-eval-toolkit ai-eval-toolkit Public

    Eval toolkit for LLM-as-judge calibration — Cohen's kappa, Kendall-tau, regression gates.

    Python

  2. agentic-eval-harness agentic-eval-harness Public

    Eval-gated runner driving Claude Code through phases with cross-vendor decision-support gates.

    Python 1

  3. ai-engineer-best-practices ai-engineer-best-practices Public

    Field-tested patterns for shipping LLM systems — prompts, evals, agents, observability.

    Python