Skip to content

Commit 4ecee28

Browse files
authored
Merge pull request #7 from grove-platform/55611-gh-metrics-cronjob
DOCSP-55611: Add Kanopy cronjob deployment for GitHub metrics collection
2 parents 2e966d7 + a90dcae commit 4ecee28

6 files changed

Lines changed: 354 additions & 8 deletions

File tree

github-metrics/.drone.yml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
kind: pipeline
3+
type: kubernetes
4+
name: github-metrics
5+
6+
trigger:
7+
branch:
8+
- main
9+
event:
10+
- push
11+
12+
steps:
13+
- name: check-changes
14+
image: alpine/git
15+
commands:
16+
- |
17+
# Check if any files in github-metrics/ directory changed
18+
git diff --name-only $DRONE_COMMIT_BEFORE $DRONE_COMMIT_AFTER | grep -q "^github-metrics/" && echo "Changes detected" || (echo "No changes in github-metrics/, skipping" && exit 78)
19+
20+
- name: test
21+
image: node:20-alpine
22+
commands:
23+
- cd github-metrics
24+
- npm ci
25+
- node --version
26+
- npm --version
27+
- echo "Validating package.json and dependencies..."
28+
29+
- name: publish
30+
image: plugins/kaniko-ecr
31+
settings:
32+
create_repository: true
33+
registry: 795250896452.dkr.ecr.us-east-1.amazonaws.com
34+
repo: docs/github-metrics
35+
tags:
36+
- git-${DRONE_COMMIT_SHA:0:7}
37+
- latest
38+
access_key:
39+
from_secret: ecr_access_key
40+
secret_key:
41+
from_secret: ecr_secret_key
42+
context: github-metrics
43+
dockerfile: github-metrics/Dockerfile
44+
45+
- name: deploy
46+
image: quay.io/mongodb/drone-helm:v3
47+
settings:
48+
chart: mongodb/cronjobs
49+
chart_version: 1.21.2
50+
add_repos: [ mongodb=https://10gen.github.io/helm-charts ]
51+
namespace: docs
52+
release: github-metrics
53+
values: image.tag=git-${DRONE_COMMIT_SHA:0:7},image.repository=795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/github-metrics
54+
values_files: [ 'github-metrics/cronjobs.yml' ]
55+
api_server: https://api.prod.corp.mongodb.com
56+
kubernetes_token:
57+
from_secret: kubernetes_token
58+
59+
- name: notify-slack
60+
image: alpine/curl
61+
environment:
62+
SLACK_WEBHOOK:
63+
from_secret: slack_webhook_url
64+
commands:
65+
- |
66+
if [ "$DRONE_BUILD_STATUS" = "success" ]; then
67+
STATUS_MSG="✅ *GitHub Metrics CronJob Deploy Succeeded*"
68+
else
69+
STATUS_MSG="❌ *GitHub Metrics CronJob Deploy Failed*"
70+
fi
71+
curl -X POST -H 'Content-type: application/json' \
72+
--data "{\"text\": \"$STATUS_MSG\n*Repo:* $DRONE_REPO_NAME\n*Branch:* $DRONE_BRANCH\n*Commit:* ${DRONE_COMMIT_SHA:0:7}\n*Author:* $DRONE_COMMIT_AUTHOR\n*Build:* <$DRONE_BUILD_LINK|#$DRONE_BUILD_NUMBER>\"}" \
73+
"$SLACK_WEBHOOK"
74+
when:
75+
status:
76+
- success
77+
- failure

github-metrics/Dockerfile

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
FROM node:20-alpine
2+
3+
# Set working directory
4+
WORKDIR /app
5+
6+
# Copy package files first (for better Docker layer caching)
7+
COPY package.json package-lock.json ./
8+
9+
# Install dependencies (use ci for reproducible builds)
10+
RUN npm ci --only=production
11+
12+
# Copy the rest of the application files
13+
COPY . .
14+
15+
# Create a non-root user for security best practices
16+
RUN addgroup -g 1001 -S nodejs && \
17+
adduser -S nodejs -u 1001 && \
18+
chown -R nodejs:nodejs /app
19+
20+
# Switch to non-root user
21+
USER nodejs
22+
23+
# Set NODE_ENV to production
24+
ENV NODE_ENV=production
25+
26+
# Command to run the application
27+
# This will be executed by the Kubernetes CronJob
28+
CMD ["node", "index.js"]
29+

github-metrics/README.md

Lines changed: 109 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
This directory contains tooling to enable us to track various GitHub project metrics programmatically.
44

5-
Currently, it contains a PoC for a simple pipeline to pull metrics from GitHub into MongoDB Atlas.
5+
This tool runs as a Kubernetes CronJob on Kanopy, automatically collecting metrics from GitHub approximately every 13-14 days and storing them in MongoDB Atlas.
66

7-
Planned future work:
7+
Planned future work:
88

99
- Add logic to work with pulled maintenance metrics once available in the test repo
1010
- Set up Atlas Charts to visualize the data
@@ -14,7 +14,7 @@ Planned future work:
1414
### Get metrics from GitHub
1515

1616
This is a simple PoC that uses [octokit](https://github.com/octokit/octokit.js) to get the following data out of GitHub
17-
for a given repository over a trailing 14 day period:
17+
for a given repository over a trailing 14-day period:
1818

1919
- Views
2020
- Unique Views
@@ -24,7 +24,7 @@ for a given repository over a trailing 14 day period:
2424
- Top 10 referral sources
2525
- Top 10 paths/destinations in the repo
2626

27-
The intent is to also get the following maintenance-related stats for a given repository over a trailing 14 day period:
27+
The intent is to also get the following maintenance-related stats for a given repository over a trailing 14-day period:
2828

2929
- Code frequency
3030
- Commit count
@@ -119,7 +119,7 @@ For this project, as a MongoDB org member, you must also auth your PAT with SSO.
119119
npm install
120120
```
121121

122-
3. **Run the utility**
122+
3. **Manually run the utility**
123123

124124
From the root of the directory, run the following command to run the utility:
125125

@@ -132,3 +132,107 @@ For this project, as a MongoDB org member, you must also auth your PAT with SSO.
132132
```
133133
A document was inserted into mongodb_docs-notebooks with the _id: 678197a0ffe1539ff213bd86
134134
```
135+
136+
## Automated Deployment (Kanopy CronJob)
137+
138+
This tool is deployed as a Kubernetes CronJob on Kanopy that runs automatically approximately every 13-14 days.
139+
140+
### Deployment Architecture
141+
142+
The deployment consists of three main components:
143+
144+
1. **Dockerfile**: Containerizes the Node.js application
145+
2. **cronjobs.yml**: Helm values file that configures the CronJob schedule and resources
146+
3. **.drone.yml**: CI/CD pipeline that builds, publishes, and deploys the application
147+
148+
### CronJob Schedule
149+
150+
The cronjob is **scheduled to run weekly on Mondays at 8:00 AM UTC** (`0 8 * * 1`), but the application includes smart logic to prevent running too frequently:
151+
152+
- The cronjob triggers every Monday
153+
- The application checks if 13 days have passed since the last successful run
154+
- If less than 13 days have passed, the job exits early without collecting metrics
155+
- If 13 days or more have passed, it collects metrics and updates the timestamp
156+
157+
The last run timestamp is stored in a persistent volume (`/data/last-run.json`) that survives between cronjob executions.
158+
159+
#### Environment Variables
160+
161+
The following environment variables can be configured:
162+
163+
- **`ATLAS_CONNECTION_STRING`** (required): MongoDB Atlas connection string for storing metrics
164+
- **`GITHUB_TOKEN`** (required): GitHub Personal Access Token with `repo` permissions
165+
- **`STATE_FILE_PATH`** (optional): Path to the state file for tracking last run timestamp. Default: `/data/last-run.json`
166+
- **`MIN_DAYS_BETWEEN_RUNS`** (optional): Minimum number of days between metric collection runs. Default: `13`
167+
168+
The required secrets (`ATLAS_CONNECTION_STRING` and `GITHUB_TOKEN`) are configured in `cronjobs.yml` as Kubernetes secrets.
169+
170+
### Deployment Process
171+
172+
The deployment is fully automated via Drone CI/CD with the following steps:
173+
174+
1. **Check Changes**: Verifies if files in `github-metrics/` directory changed
175+
2. **Test**: Validates dependencies with `npm ci`
176+
3. **Build**: Builds Docker image using Kaniko and publishes to ECR
177+
4. **Deploy**: Deploys to production Kanopy cluster using Helm
178+
5. **Notify**: Sends Slack notification on success or failure
179+
180+
The pipeline only runs on pushes to the `main` branch and skips if no github-metrics files changed.
181+
182+
### Manual Deployment
183+
184+
To manually trigger a deployment:
185+
186+
1. Push changes to the `main` branch
187+
2. Drone will automatically run the test, build, and deploy pipelines
188+
189+
### Manually Triggering the CronJob
190+
191+
To manually run the cronjob outside of its schedule:
192+
193+
```bash
194+
# Find the cronjob
195+
kubectl get cronjobs -n docs
196+
197+
# Create a one-time job from the cronjob
198+
kubectl create job --from=cronjob/github-metrics-collection \
199+
github-metrics-manual-$(date +%s) -n docs
200+
201+
# Check the job status
202+
kubectl get jobs -n docs
203+
204+
# View logs
205+
kubectl logs -n docs job/github-metrics-manual-<timestamp>
206+
```
207+
208+
### Monitoring
209+
210+
To check the status of the cronjob:
211+
212+
```bash
213+
# View cronjob details
214+
kubectl get cronjob github-metrics-collection -n docs
215+
216+
# View recent job runs
217+
kubectl get jobs -n docs | grep github-metrics
218+
219+
# View logs from the most recent run
220+
kubectl logs -n docs -l job-name=<job-name>
221+
222+
# Check the last run timestamp (requires exec into a pod)
223+
kubectl exec -n docs <pod-name> -- cat /data/last-run.json
224+
```
225+
226+
The logs will show whether the job ran or was skipped:
227+
- `Skipping run - only X days since last run (need 13)` - Job skipped, not enough time passed
228+
- `Proceeding with run - X days since last run` - Job is collecting metrics
229+
230+
### Configuration Changes
231+
232+
To modify the cronjob configuration:
233+
234+
1. **Change schedule**: Edit `cronjobs.yml` and update the `schedule` field
235+
2. **Change resources**: Edit `cronjobs.yml` and update the `resources` section
236+
3. **Change repositories tracked**: Edit `repo-details.json`
237+
238+
After making changes, commit and push to the `main` branch. Drone will automatically deploy the updates.

github-metrics/check-last-run.js

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
import fs from 'fs';
2+
import { writeFile, mkdir } from 'fs/promises';
3+
import path from 'path';
4+
5+
// Path to the state file (mounted from persistent volume)
6+
// Can be overridden via STATE_FILE_PATH environment variable
7+
const STATE_FILE_PATH = process.env.STATE_FILE_PATH || '/data/last-run.json';
8+
9+
// Minimum days between runs (13 days to account for timing variations with weekly Monday runs)
10+
// Can be overridden via MIN_DAYS_BETWEEN_RUNS environment variable
11+
const MIN_DAYS_BETWEEN_RUNS = parseInt(process.env.MIN_DAYS_BETWEEN_RUNS || '13', 10);
12+
13+
/**
14+
* Check if enough time has passed since the last run
15+
* @returns {boolean} true if should run, false if should skip
16+
*/
17+
export function shouldRun() {
18+
try {
19+
// Check if state file exists
20+
if (!fs.existsSync(STATE_FILE_PATH)) {
21+
console.log('No previous run found. Running for the first time.');
22+
return true;
23+
}
24+
25+
// Read the last run timestamp
26+
const stateData = JSON.parse(fs.readFileSync(STATE_FILE_PATH, 'utf8'));
27+
const lastRunTime = new Date(stateData.lastRun);
28+
const now = new Date();
29+
30+
// Calculate days since last run
31+
const daysSinceLastRun = (now - lastRunTime) / (1000 * 60 * 60 * 24);
32+
33+
console.log(`Last run: ${lastRunTime.toISOString()}`);
34+
console.log(`Days since last run: ${daysSinceLastRun.toFixed(2)}`);
35+
console.log(`Minimum days required: ${MIN_DAYS_BETWEEN_RUNS}`);
36+
37+
if (daysSinceLastRun < MIN_DAYS_BETWEEN_RUNS) {
38+
console.log(`⏭️ Skipping run - only ${daysSinceLastRun.toFixed(2)} days since last run (need ${MIN_DAYS_BETWEEN_RUNS})`);
39+
return false;
40+
}
41+
42+
console.log(`✅ Proceeding with run - ${daysSinceLastRun.toFixed(2)} days since last run`);
43+
return true;
44+
45+
} catch (error) {
46+
console.error('Error checking last run time:', error.message);
47+
console.log('Proceeding with run due to error reading state file');
48+
return true; // Run if we can't read the state file
49+
}
50+
}
51+
52+
/**
53+
* Update the state file with the current timestamp
54+
*/
55+
export async function updateLastRun() {
56+
try {
57+
const now = new Date();
58+
const stateData = {
59+
lastRun: now.toISOString(),
60+
timestamp: now.getTime()
61+
};
62+
63+
// Ensure the directory exists
64+
const dir = path.dirname(STATE_FILE_PATH);
65+
if (!fs.existsSync(dir)) {
66+
await mkdir(dir, { recursive: true });
67+
}
68+
69+
// Write the state file
70+
await writeFile(STATE_FILE_PATH, JSON.stringify(stateData, null, 2), 'utf8');
71+
console.log(`✅ Updated last run timestamp: ${now.toISOString()}`);
72+
73+
} catch (error) {
74+
console.error('Error updating last run time:', error.message);
75+
// Don't throw - we don't want to fail the job just because we can't write the state file
76+
}
77+
}
78+
79+
export { MIN_DAYS_BETWEEN_RUNS };
80+

github-metrics/cronjobs.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
# `image` can be skipped if the values are being set in your .drone.yml file
3+
image:
4+
repository: 795250896452.dkr.ecr.us-east-1.amazonaws.com/docs/github-metrics
5+
tag: latest
6+
7+
# global secrets are references to k8s Secrets
8+
globalEnvSecrets:
9+
GITHUB_TOKEN: github-token
10+
ATLAS_CONNECTION_STRING: atlas-connection-string
11+
12+
cronJobs:
13+
- name: github-metrics-collection
14+
# Run weekly on Mondays at 8am UTC
15+
# The application checks if it ran in the last 14 days and skips if so
16+
# Cron format: minute hour day-of-month month day-of-week
17+
# 0 = Sunday, 1 = Monday, etc.
18+
schedule: "0 8 * * 1"
19+
command:
20+
- node
21+
- index.js
22+
resources:
23+
requests:
24+
cpu: 100m
25+
memory: 256Mi
26+
limits:
27+
cpu: 500m
28+
memory: 512Mi
29+
# Persistent volume to store last run timestamp
30+
persistence:
31+
enabled: true
32+
storageClass: "standard"
33+
accessMode: ReadWriteOnce
34+
size: 1Gi
35+
mountPath: /data

0 commit comments

Comments
 (0)