Skip to content

Latest commit

 

History

History
91 lines (66 loc) · 3.96 KB

File metadata and controls

91 lines (66 loc) · 3.96 KB
layout post
date 2025-03-09
lastchange v004 + sport :ReasonThruAI.md
file software-evaluation-criteria
title Reason Thru AI
excerpt Ecosystem for students to use, analyze, and maintain benchmarks of AI tools
tags
AI
Software
image
feature credit creditlink
comments true
created 2017-03-21

{{ page.excerpt }} {% include l18n.html %} {% include _toc.html %}

When Generative AI burst on the scene in 2022, they were not able to count the number of "r"s in the word strawberry.

But in 2025, "reasoning" capabilities has been added to enable LLMs (Large Language Models) to explain how they solve multi-step word problems (using "Chain of Thought" processes).

To assess whether one AI version is better than another, word problems from academic competitions and exams (such as the SAT) are being reused.

My notes at https://bomonike.github.io/ai-benchmarks describes how individual researchers are constructing specific benchmarks (before moving to other jobs).

My concern is that this creates a reinforcement cycle where the same problems are used over and over again, leading to a lack of variety and depth in the assessment.

MY PROPOSAL: Make AI benchmarking a sport

Since high school students (and their teachers) are studying SAT problems anyway, provide them a way to input and compare how different AI versions solve problems.

This also provide opportunities for students to:

  • improve their prompt crafting skills
  • show off their skill at solving math, chemistry, biology and language translation problems
  • show potential employers that they can craft text prompts that extract value from AI tools
  • provide the world precise explanations of where AI can go wrong
  • generate a (potentially) large source of problems to benchmark/test AI tool

Prospects for high-paying work now are greatest for those who are able to harness AI.

Thus, I am creating a YouTube channel and related collaborative websites for students to accept and display videos that compare how Chain of Thought across different AI, whether working or not.

This provides diligent students a competitive advantage at getting paid work managing AI.

Connect with me and message me if you are interested in making this happen for students.

// Wilson Mar https://linkedin.com/in/wilsonmar

GenAI providers

Contributions provide actionable feedback to GenAI providers and users. So I am contacting AI companies for sponsorship of credits for our students:

AI in Education

  • VIDEO: Education in the age of AI (Artificial Intelligence) | Dale Lane | TEDxWinchester

  • VIDEO: AI Will Set Education Back 2500 Years... And That’s a Good Thing | Robert Clapperton | TEDxUW

  • VIDEO: AI AND THE FUTURE OF EDUCATION (Full Documentary) by Plastico Film: [19m13s] What are skills of the future?

    • "Analytical thinking and innovation"
    • active listening and learning strategies
    • technology use monitoring
    • creativity, originality, and initiative.