Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions app/layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -30,26 +30,27 @@ export const metadata: Metadata = {
},
manifest: '/learn/site.webmanifest',
openGraph: {
title: 'Multicorn — The trusted layer between humans and AI agents',
title: 'Multicorn - AI agent governance for production teams',
description:
'Consent screens, spending controls, and activity logging for every AI agent. Open-source SDK, enterprise-grade controls.',
'Consent screens, spending controls, and audit trails for every AI agent you deploy.',
url: 'https://multicorn.ai',
siteName: 'Multicorn',
type: 'website',
images: [
{
url: 'https://multicorn.ai/images/og-image.png',
url: 'https://multicorn.ai/og-image.png',
width: 1200,
height: 630,
alt: 'Multicorn — The trusted layer between humans and AI agents',
alt: 'Multicorn - AI agent governance for production teams',
},
],
},
twitter: {
card: 'summary_large_image',
title: 'Multicorn — The trusted layer between humans and AI agents',
description: 'Consent screens, spending controls, and activity logging for every AI agent.',
images: ['https://multicorn.ai/images/og-image.png'],
title: 'Multicorn - AI agent governance for production teams',
description:
'Consent screens, spending controls, and audit trails for every AI agent you deploy.',
images: ['https://multicorn.ai/og-image.png'],
},
}

Expand Down
19 changes: 17 additions & 2 deletions components/InstallBanner.tsx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
'use client'

import { useState, useRef, useEffect } from 'react'
import { useState, useRef, useEffect, useLayoutEffect } from 'react'

interface BeforeInstallPromptEvent extends Event {
prompt: () => Promise<void>
Expand All @@ -10,8 +10,23 @@ interface BeforeInstallPromptEvent extends Event {
export function InstallBanner() {
const [canInstall, setCanInstall] = useState(false)
const [dismissed, setDismissed] = useState(false)
const [isStandalone, setIsStandalone] = useState(false)
const [standaloneChecked, setStandaloneChecked] = useState(false)
const installEventRef = useRef<BeforeInstallPromptEvent | null>(null)

useLayoutEffect(() => {
const mq = window.matchMedia('(display-mode: standalone)')
const sync = () => {
const running =
mq.matches || (navigator as Navigator & { standalone?: boolean }).standalone === true
setIsStandalone(running)
setStandaloneChecked(true)
}
sync()
mq.addEventListener('change', sync)
return () => mq.removeEventListener('change', sync)
}, [])

useEffect(() => {
const handleBeforeInstall = (e: Event) => {
e.preventDefault()
Expand Down Expand Up @@ -39,7 +54,7 @@ export function InstallBanner() {
if (outcome === 'accepted') setCanInstall(false)
}

if (!canInstall || dismissed) return null
if (!standaloneChecked || isStandalone || !canInstall || dismissed) return null

return (
<div className="fixed bottom-0 left-0 right-0 z-50 flex items-center justify-between bg-violet-600 px-4 py-3 text-white">
Expand Down
61 changes: 61 additions & 0 deletions content/blog/what-minimax-m27-actually-does.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: "What MiniMax M2.7 Actually Does (And What It Doesn't)"
date: '2026-03-20'
author: 'Rachelle Rathbone'
excerpt: "MiniMax's M2.7 is a genuinely interesting release. Here's what the self-evolution claims actually mean, what the benchmarks show, and why the 'there goes software engineering' reaction misses the more important question."
tags: ['ai-models', 'agents', 'software-engineering']
---

MiniMax released M2.7 yesterday and the discourse immediately split into two camps: people calling it the end of software engineering, and people dismissing it as Chinese AI hype. Both reactions miss what's actually interesting about this release.

Let's look at what M2.7 actually did.

## The self-evolution loop

The headline capability is that M2.7 participated in its own improvement. MiniMax gave the model access to its own reinforcement learning scaffold and let it run. The loop looked like this: analyze failure trajectories, plan code changes, modify the scaffold, run evaluations, compare results, decide to keep or revert. It ran that loop autonomously for over 100 rounds.

The result was a 30% improvement on internal evaluation sets.

That's worth unpacking. "Internal evaluation sets" means MiniMax's own benchmarks, not independent ones. That's a real caveat. A model optimizing against its own evals is doing something genuinely useful, but it's not the same as a 30% improvement on SWE-Bench or any external measure. The improvement is real. The scope of it is narrower than the headline implies.

What the model was actually doing during those 100 rounds is more concrete than "improving itself" sounds. It was finding better sampling parameters. It was writing more specific workflow guidelines. It was adding loop detection to its own agent scaffold. Useful, iterative, engineering work. Impressive that a model did it autonomously. Not magic.

## The benchmarks that matter

The external benchmark numbers are more straightforwardly impressive. M2.7 scored 56.22% on SWE-Pro, which tests real-world software engineering tasks in live codebases. That puts it at the top of the current field, matching the best Western models. On MLE Bench Lite, running on a single A30 GPU over 24 hours, it hit a 66.6% medal rate across machine learning competitions, tying with Gemini 3.1.

For OpenClaw tool usage specifically, M2.7 approaches Sonnet 4.6 on MiniMax's own MM-Claw evaluation. That's a notable data point if you're building or deploying agents on OpenClaw.

These numbers represent a real capability jump, particularly from a Chinese lab that was considered behind 12 months ago. The gap has closed.

## What this means for software engineering jobs

The "there goes software engineering" reaction is understandable but imprecise.

M2.7 handling 30-50% of MiniMax's own R&D workflow autonomously covers log reading, debugging, metric analysis, and code repairs. That's real work. It's also well-defined, repeatable work with clear success criteria. The model can evaluate whether a fix worked. It can compare metric before and after. It can follow a structured loop.

What it's not doing: talking to your product manager about what the feature should actually do. Making the call to revert a change that technically passes evals but feels wrong for the product. Deciding which technical debt is worth paying down now versus later. Understanding why a customer is churning and what to build next.

The tasks that are genuinely at risk are the ones that look like "execute this well-specified thing repeatedly." The tasks that aren't are the ones where the specification itself is the hard part.

That's been true for a while. M2.7 moves the line, but it doesn't eliminate it.

The more honest framing is that the shape of software engineering work is changing faster than most people expected. Junior tasks that used to be good entry points are being automated. That's a real problem for people trying to build experience, and it deserves a more serious conversation than either "we're all fine" or "it's over."

## The question nobody's asking

Here's what I think is actually the most important thing about M2.7, and it's not in any of the takes I've seen.

MiniMax ran this self-improvement loop inside their own infrastructure. They had full visibility into every round. They could see what the model changed, evaluate the results, and stop it. The governance was implicit in the setup.

Most teams deploying agents right now don't have that setup.

As models get more capable, the question isn't just "can this agent do the task?" It's "when it does the task autonomously, do I know what it did, can I approve what matters, and can I explain it afterward?"

M2.7 doing 100 rounds of autonomous code modification is impressive inside a controlled research environment. The same capability deployed in a production engineering workflow, touching real codebases, with no audit trail and no approval layer, is a different situation entirely.

More capable agents doing more real work makes that question more urgent, not less. The capability curve and the governance curve need to move together. Right now they're not.

---

_Sources: [MiniMax M2.7 release post](https://www.minimax.io/news/minimax-m27-en), [VentureBeat](https://venturebeat.com/technology/new-minimax-m2-7-proprietary-ai-model-is-self-evolving-and-can-perform-30-50), [CnTechPost](https://cntechpost.com/2026/03/18/minimax-releases-next-gen-ai-model-m2-7-self-evolution-capabilities/)_
28 changes: 0 additions & 28 deletions public/images/og-card.svg

This file was deleted.

Binary file added public/og-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading