[Security] Security llm matrix automation by spong · Pull Request #6960 · elastic/docs-content

spong · 2026-06-17T19:01:25Z

Note

Currently iterating on artifact generation with @dhru42 & @patrykkopycinski over on elastic/kibana#273827. Once we have consensus on the generated artifact, I'll sort out the bucket configuration and update this PR and then we can do a proper review. In the meantime, let me know if anything here doesn't follow current best practices, but I tried best I could to match current automations.

My only open question here is if this is too much noise once we scale up to multiple serverless releases per week (which will put out of sync with our weekly eval runs anyway, so that's probably a larger conversation anyway).

Summary

Replaces the hand-maintained Security LLM performance matrix tables with auto-generated CSVs embedded via :::{csv-include}, and adds a keyless-WIF GitHub Action that keeps them current from the Elastic Security LLM evaluation pipeline. Companion to the Kibana PR (elastic/kibana#273827) that generates the matrix (closes elastic/security-team#16394).

What changed

solutions/security/ai/large-language-model-performance-matrix.md — the two Markdown tables are replaced by :::{csv-include} directives (same pattern as eis-supported-models.md).
solutions/security/ai/llm-performance-matrix/{proprietary,open-source}-models.csv — generated from a real golden-cluster run (branch main, 2026-06-15); refreshed automatically going forward.
.github/workflows/sync-llm-matrix-keyless.yml — weekly schedule (serverless latest → PR to main) + manual workflow_dispatch (version input → PR to the <version> branch). Mirrors sync-sheets-keyless.yml.
.github/scripts/llm-matrix/sync_matrix.sh — pulls the CSVs from GCS.

Note: this also moves the page to the new Agent Builder column taxonomy (Alert Triage / Detection Engineering / Investigation / KB Retrieval / Workflow Execution / Overall), replacing the previous columns.

Required repo configuration

Reuses the existing sheet2docs keyless-WIF variables (GCP_WORKLOAD_IDENTITY_PROVIDER, GCP_SERVICE_ACCOUNT_EMAIL, GCP_PROJECT_ID). One new variable: LLM_MATRIX_GCS_BUCKET (bucket name, no gs://). The reader identity needs roles/storage.objectViewer on that bucket.

Updating a versioned (released) Stack matrix

The weekly job keeps serverless/latest (→ main) current automatically. To refresh a released version (e.g. a new model in 9.2):

Ensure the model exists in that version's matrix config and run the eval suites against the <version> branch so results land on the golden cluster.
Run the Kibana kibana-evals-security-matrix pipeline with MATRIX_BRANCH=<version> + MATRIX_VERSION=<version> (overwrites gs://<bucket>/security/<version>/).
Run this workflow via workflow_dispatch with version=<version> → opens a PR against the <version> docs branch.

Reviewer notes (scores — cc @dhru42 / @patrykkopycinski )

Values are from a real run and the math is verified end-to-end (GPT OSS 120B's Alert Triage 7.31 reproduces exactly from raw per-evaluator means). Open scoring decisions for eval owners (numbers may move as suites are tuned):

Alert Triage saturates at 10 for most proprietary models — honest, but the column blends AttachmentReadCompliance (tool compliance) with answer-quality criteria, weighted equally. Keep the blend, or exclude AttachmentReadCompliance to make it quality-only.
criteria counts N/A as pass and each triage scenario has a single example → all-or-nothing per criterion.
Only quality evaluators feed the matrix (observability metrics excluded). Confirm the desired set.
Cluster metadata mislabeled (family: "Claude" / provider: "Elastic" for every model) — matrix unaffected, but flag to the eval-ingestion owner.

Test plan / how to validate

Confirm both :::{csv-include} tables render on the page preview.
Run the workflow with dry_run=true once the bucket exists; verify the CSV diff before enabling the schedule.
Sanity-check the model lineup + column taxonomy with eval owners.

Open items

Provision the GCS bucket + grant the docs-content WIF reader access; set LLM_MATRIX_GCS_BUCKET.
Merge the companion Kibana PR ([Security Solution] Create artifact generation pipeline for generating LLM Performance Matrix kibana#273827) first so the pipeline can publish real artifacts.

Generative AI disclosure

Did you use a generative AI (GenAI) tool to assist in creating this contribution?

Yes
No

If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.).

Tool(s) and model(s) used: PR developed with Cursor + Claude Opus 4.8

…llm-matrix-automation

github-actions · 2026-06-17T19:03:16Z

Elastic Docs AI PR menu

Check the box to run an AI review for this pull request.

Review docs changes (docs-review). Status: not started.

Powered by GitHub Agentic Workflows and docs-actions. For more information, reach out to the docs team.

github-actions · 2026-06-17T19:04:01Z

🔍 Preview links for changed docs

solutions/security/ai/large-language-model-performance-matrix.md

github-actions · 2026-06-17T19:04:13Z

✅ Elastic Docs Style Checker (Vale)

No issues found on modified lines!

The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.

spong added 2 commits June 17, 2026 12:42

Add llm perf matrix workflow automation

f168f2e

Merge branch 'main' of github.com:elastic/docs-content into security-…

042449e

…llm-matrix-automation

spong requested review from dhru42 and patrykkopycinski June 17, 2026 19:01

spong requested review from a team as code owners June 17, 2026 19:01

spong requested a review from theletterf June 17, 2026 19:01

github-actions Bot deployed to docs-preview June 17, 2026 19:06 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Security llm matrix automation#6960

[Security] Security llm matrix automation#6960
spong wants to merge 2 commits into
mainfrom
security-llm-matrix-automation

spong commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

spong commented Jun 17, 2026

Summary

What changed

Required repo configuration

Updating a versioned (released) Stack matrix

Reviewer notes (scores — cc @dhru42 / @patrykkopycinski )

Test plan / how to validate

Open items

Generative AI disclosure

Uh oh!

github-actions Bot commented Jun 17, 2026

Elastic Docs AI PR menu

Uh oh!

github-actions Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

github-actions Bot commented Jun 17, 2026

✅ Elastic Docs Style Checker (Vale)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 17, 2026 •

edited

Loading