Skip to content

ci: Optimizing costs for AWS self-hosted runners#1353

Merged
ricardosalveti merged 1 commit intoqualcomm-linux:masterfrom
crucible-runner:decomSSD
Jan 9, 2026
Merged

ci: Optimizing costs for AWS self-hosted runners#1353
ricardosalveti merged 1 commit intoqualcomm-linux:masterfrom
crucible-runner:decomSSD

Conversation

@sampra2025
Copy link
Copy Markdown
Contributor

@sampra2025 sampra2025 commented Jan 5, 2026

With this update, the AWS self-hosted runner instance type will change from "c5ad.8xlarge" to "m8a.4xlarge". This adjustment will improve build performance by at least 20% and lower the hourly cost by 30%.

The runner instance has been tested in the Stage environment. For reference, here is a sample build workflow:
https://github.com/qualcomm-linux-stg/aws-meta-qcom/actions/runs/20283368465

image

Copy link
Copy Markdown
Contributor

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start your commit message by describing the issue that you are fixing. Then continue with describing what has to be done. Don't describe patch contents, it's obvious from the patch itself.

@ndechesne
Copy link
Copy Markdown
Contributor

I am actually surprised by the results, I was expecting the ssd. to have a bigger impact, but if we save time and money, sure. Can we run a few more comparative tests?

    With this change, an AWS self hosted runner instance type will be
    changed to "m8a.4xlarge" from the current instance type
    "c5ad.8xlarge". With this change we can increase a build performance
    by at least 20% while reducing an hourly cost by 30%.

    This runner instance has been verified on Stage envionrment, here
    is a sample build workflow for referance.
    https://github.com/qualcomm-linux-stg/aws-meta-qcom/actions/runs/20283368465

Signed-off-by: Satish Mhaske <smhaske@qti.qualcomm.com>
@lumag
Copy link
Copy Markdown
Contributor

lumag commented Jan 6, 2026

Commit message is even more weirdly formatted and it still doesn't describe the reasons for the change. Please start by describing the problem, then describe the solution. Avoid phrases like "This change" or "This patch".

@ricardosalveti
Copy link
Copy Markdown
Contributor

I am actually surprised by the results, I was expecting the ssd. to have a bigger impact, but if we save time and money, sure. Can we run a few more comparative tests?

Yeah, curious if it will also be better on a clean build, without any previous sstate cache.

@sampra2025 sampra2025 changed the title ci: Replace AWS Runner with non-SSD attached ci: Optimizing costs for AWS self-hosted runners Jan 6, 2026
@sampra2025 sampra2025 requested a review from lumag January 6, 2026 19:22
@sampra2025
Copy link
Copy Markdown
Contributor Author

I am actually surprised by the results, I was expecting the ssd. to have a bigger impact, but if we save time and money, sure. Can we run a few more comparative tests?

Yeah, curious if it will also be better on a clean build, without any previous sstate cache.

Thank you for your feedback. With the recent build, we are seeing similar performance. I agree that running a clean build is a good idea. Please let me know the best approach for running a clean build.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 6, 2026

Test run workflow

Test jobs for commit b627afa

@test-reporting-app
Copy link
Copy Markdown

Test Results

11 files   -   8  11 suites   - 57   18m 26s ⏱️ - 39m 11s
11 tests  -  25  11 ✅  -  25  0 💤 ±0  0 ❌ ±0 
89 runs   - 566  89 ✅  - 563  0 💤  - 3  0 ❌ ±0 

Results for commit b627afa. ± Comparison against base commit 19a15a0.

This pull request removes 25 tests.
0_hotplug ‑ hotplug
1_CPUFreq_Validation ‑ CPUFreq_Validation
2_Interrupts ‑ Interrupts
3_cdsp_remoteproc ‑ cdsp_remoteproc
4_adsp_remoteproc ‑ adsp_remoteproc
5_WiFi_Firmware_Driver ‑ WiFi_Firmware_Driver
6_WiFi_OnOff ‑ OpenCV
6_WiFi_OnOff ‑ WiFi_OnOff
7_OpenCV ‑ OpenCV
8_irq ‑ irq
…

@ricardosalveti
Copy link
Copy Markdown
Contributor

Thank you for your feedback. With the recent build, we are seeing similar performance. I agree that running a clean build is a good idea. Please let me know the best approach for running a clean build.

In your staging environment you can just erase the sstate cache entirely before doing a new build with both worker types, then can easily compare.

@sampra2025
Copy link
Copy Markdown
Contributor Author

Thank you for your feedback. With the recent build, we are seeing similar performance. I agree that running a clean build is a good idea. Please let me know the best approach for running a clean build.

In your staging environment you can just erase the sstate cache entirely before doing a new build with both worker types, then can easily compare.

Great.. I will do it in stage environment and share the result with you.

@sampra2025
Copy link
Copy Markdown
Contributor Author

I am actually surprised by the results, I was expecting the ssd. to have a bigger impact, but if we save time and money, sure. Can we run a few more comparative tests?

Yeah, curious if it will also be better on a clean build, without any previous sstate cache.

Thank you for your feedback. With the recent build, we are seeing similar performance. I agree that running a clean build is a good idea. Please let me know the best approach for running a clean build.

I have completed a clean build after removing the sstate cache. Here is the link for your reference. Please share your feedback.
https://github.com/qualcomm-linux-stg/aws-meta-qcom/actions/runs/20768000476

Note: There are four revisions of builds for this workflow. The latest revision is after removing the sstate cache, while the first revision was done before its removal.

@ricardosalveti
Copy link
Copy Markdown
Contributor

Do you have a similar run but using the old runner instead in hands?

Copy link
Copy Markdown
Contributor

@ricardosalveti ricardosalveti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're facing way too many build issues (segfault, could be caused by oom), let's switch and validate it helps the current situation.

@ricardosalveti ricardosalveti merged commit dc5f692 into qualcomm-linux:master Jan 9, 2026
115 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants