Skip to content

Add automatic prod failure monitor workflow #2

@RickardHF

Description

@RickardHF

Add a GitHub Actions workflow that automatically creates issues when workflows fail on the main branch.

What to do

Create .github/workflows/prod-failure-monitor.yml with the following content:

name: Prod Failure Monitor

on:
  workflow_run:
    types: [completed]

permissions:
  issues: write
  actions: read

jobs:
  create-issue-on-failure:
    if: >-
      github.event.workflow_run.conclusion == 'failure' &&
      (github.event.workflow_run.head_branch == 'main' || github.event.workflow_run.head_branch == 'master')
    runs-on: ubuntu-latest
    steps:
      - name: Create issue for failed workflow
        uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
        with:
          script: |
            const run = context.payload.workflow_run;
            const title = `🔴 Workflow "${run.name}" failed on ${run.head_branch}`;

            // Ensure the workflow-failure label exists
            try {
              await github.rest.issues.getLabel({
                owner: context.repo.owner,
                repo: context.repo.repo,
                name: 'workflow-failure'
              });
            } catch (e) {
              if (e.status === 404) {
                await github.rest.issues.createLabel({
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  name: 'workflow-failure',
                  color: 'B60205',
                  description: 'Automatically created for workflow failures on main'
                });
              }
            }

            // Check for existing open issue for this workflow
            const existingIssues = await github.rest.issues.listForRepo({
              owner: context.repo.owner,
              repo: context.repo.repo,
              state: 'open',
              labels: 'workflow-failure'
            });

            const existing = existingIssues.data.find(i => i.title === title);
            if (existing) {
              await github.rest.issues.createComment({
                owner: context.repo.owner,
                repo: context.repo.repo,
                issue_number: existing.number,
                body: [
                  `⚠️ **Workflow failed again**`,
                  ``,
                  `- **Run**: ${run.html_url}`,
                  `- **Commit**: \`${run.head_sha.substring(0, 7)}\``,
                  `- **Time**: ${run.updated_at}`,
                  `- **Triggered by**: @${run.actor?.login || 'unknown'}`
                ].join('\n')
              });
              return;
            }

            // Get failed job details
            const jobs = await github.rest.actions.listJobsForWorkflowRun({
              owner: context.repo.owner,
              repo: context.repo.repo,
              run_id: run.id
            });

            const failedJobs = jobs.data.jobs.filter(j => j.conclusion === 'failure');
            let jobDetails = '';
            for (const job of failedJobs) {
              jobDetails += `### ❌ Job: \`${job.name}\`\n`;
              jobDetails += `- **URL**: ${job.html_url}\n`;
              const failedSteps = (job.steps || []).filter(s => s.conclusion === 'failure');
              if (failedSteps.length > 0) {
                jobDetails += `- **Failed steps**:\n`;
                for (const step of failedSteps) {
                  jobDetails += `  - \`${step.name}\`\n`;
                }
              }
              jobDetails += '\n';
            }

            const body = [
              `## Workflow Failure on \`${run.head_branch}\``,
              ``,
              `| Field | Value |`,
              `|-------|-------|`,
              `| **Workflow** | ${run.name} |`,
              `| **Branch** | \`${run.head_branch}\` |`,
              `| **Commit** | \`${run.head_sha.substring(0, 7)}\` |`,
              `| **Run** | [#${run.run_number}](${run.html_url}) |`,
              `| **Triggered by** | @${run.actor?.login || 'unknown'} |`,
              `| **Time** | ${run.updated_at} |`,
              ``,
              `## Failed Jobs`,
              ``,
              jobDetails || '_No job details available._',
              ``,
              `> 💡 Check the [workflow run logs](${run.html_url}) for full error details.`,
              ``,
              `## Action Required`,
              ``,
              `Please investigate the failure and fix the issue. Check the workflow run logs for detailed error messages.`,
              ``,
              `---`,
              `_This issue was automatically created by the Prod Failure Monitor workflow._`
            ].join('\n');

            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: title,
              body: body,
              labels: ['workflow-failure']
            });

Why

This workflow monitors for CI/CD failures on the main branch and creates issues automatically so they can be tracked and fixed promptly. When a workflow fails, an issue is created with:

  • Failed job names and steps
  • Links to the workflow run logs
  • Commit and author information

Duplicate issues are avoided — if an issue already exists for the same workflow, a comment is added instead.

Context

This is part of a cross-repo initiative to add production failure monitoring. See PersonalAgent#8.

Also needed on:

  • RickardHF/hema-ai
  • RickardHF/langapp

Generated by Personal Assistant Agent for issue #8 ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions