Skip to content

Conversation

@cabutlermit
Copy link
Contributor

@cabutlermit cabutlermit commented Sep 29, 2025

Purpose and background context

This is the first step in migrating this repository to optionally build for either AMD64 or ARM64 CPU Architecture for deployment in AWS. The following changes are made to this repository, to align with Steps 1-3 in How-To: Update a Container App to new CPU Architectire. Other than shifting this repository to the new shared workflows, this does not change anything for the container build nor the CPU architecture (which still defaults to X86_64/AMD64.

Note: There were some minor modifications to the Makefile that are different from the current Makefile template in the mitlib-tf-workloads-ecr repository. The minor changes made here will appear in a new PR on that repo in the next day or two.

How can a reviewer manually see the effects of these changes?

  1. Check the Actions job for opening a PR here. This represents the dev-build.yml workflow. Check the marimo-launcher ECR Repository and verify that the build from the GHA workflow has been uploaded to ECR with the correct tags.
  2. Checkout this branch locally and run make dist-dev, make publish-dev, and make docker-clean. Check the marimo-launcher ECR Repository and verify that the builds from the Makefile commands were properly uploaded to ECR with the expected tags.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
This is the first step in migrating this repository to optionally build
for either AMD64 or ARM64 CPU Architecture for deployment in AWS.

How this addresses that need:
* Update the `Makefile` with the new output from the
mitlib-tf-workloads-ecr repository (and then make some further
modifications that will be reflected back in the mitlib-tf-workloads-ecr
repository soon)
* Update the dev-build.yml workflow with the new output from the
mitlib-tf-workloads-ecr repository
* Update the stage-build.yml workflow with the new output from the
mitlib-tf-workloads-ecr repository
* Update the prod-promote.yml workflow with the new output from the
mitlib-tf-workloads-ecr repository
* Update the README with notes about building and deploying in AWS

Side effects of this change:
None. Since we did not create an `.aws-architecture` file, the builds
for AWS will still default to AMD64 as before.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1448
@cabutlermit cabutlermit marked this pull request as ready for review September 29, 2025 16:26
@cabutlermit cabutlermit requested a review from a team as a code owner September 29, 2025 16:26
@cabutlermit
Copy link
Contributor Author

@MITLibraries/dataeng -- this is really for @ghukill to review (since I've already been in touch with him about this).

@ghukill ghukill self-assigned this Sep 29, 2025
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good to me, but I had a question.

I have created a .aws-architecture file locally with linux/arm64. I have confirmed that make check-arch exits cleanly.

Then I perform a build with make dist-dev.

Lastly, I'm trying to start a container locally and pass a specific --platform. When I use --platform linux/amd64 it works, but I get the following error for linux/arm64:

$ docker run \ 
-p "2718:2718" \
-v "./tests/fixtures:/tmp/fixtures" \
-e NOTEBOOK_MOUNT="/tmp/fixtures/inline_deps" \
--platform linux/arm64 \
marimo-launcher-dev:latest

Unable to find image 'marimo-launcher-dev:latest' locally
docker: Error response from daemon: pull access denied for marimo-launcher-dev, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

It's almost as if docker run --platform linux/arm64 ... is unable to find an image that matches that platform? but I had built it with a .aws-architecture file. Am I missing something here?

I don't doubt it would work as expected in AWS, thinking mostly about local development when we are trying out different architectures.

-t $(ECR_URL_DEV):`git describe --always` \
-t $(ECR_NAME_DEV):latest .
### Terraform-generated Developer Deploy Commands for Dev environment ###
check-arch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this check-arch command. After some cycles with all this, might be neat to weave this into either pre-commit, or make lint, or something.


dist-dev: check-arch ## Build docker container (intended for developer-based manual build)
@ARCH_TAG=$$(cat .arch_tag); \
docker buildx inspect $(ECR_NAME_DEV) >/dev/null 2>&1 || docker buildx create --name $(ECR_NAME_DEV) --use; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cabutlermit - so you did end up going with buildx? Am I correct in remembering you were kind of waffling between approaches for a bit? Sounds like maybe this approach won out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I still ended up withdocker buildx in the end for the Makefile commands (the shared workflows only use docker build). I kept buildx in the Makefile, mostly because I wanted to ensure that the make commands would build either AMD64 or ARM64 regardless of the CPU architecture of the developer's machine.

@ghukill
Copy link
Contributor

ghukill commented Sep 29, 2025

It's looking good to me, but I had a question.

Ah, interesting, a followup.

I did get this to work by using latest-arm64:

% docker run \
-p "2718:2718" \
-v "./tests/fixtures:/tmp/fixtures" \
-e NOTEBOOK_MOUNT="/tmp/fixtures/inline_deps" \
--platform linux/arm64 \
marimo-launcher-dev:latest-arm64  #<------------------

Here are my images:

docker image ls | grep "marimo"
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            latest-arm64        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-1383aca        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-latest-arm64   c3a4fd51c74f   6 minutes ago    438MB
marimo-launcher-dev                                                         latest-arm64        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            latest              536a44c9d3cb   12 minutes ago   417MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-latest         536a44c9d3cb   12 minutes ago   417MB
marimo-launcher-dev                                                         latest              536a44c9d3cb   12 minutes ago   417MB

Maybe my question is: why does it appear that the latest tag gets linux/amd64? My apologies if you've outlined this pretty clearly somewhere.

@cabutlermit
Copy link
Contributor Author

It's looking good to me, but I had a question.

Ah, interesting, a followup.

I did get this to work by using latest-arm64:

% docker run \
-p "2718:2718" \
-v "./tests/fixtures:/tmp/fixtures" \
-e NOTEBOOK_MOUNT="/tmp/fixtures/inline_deps" \
--platform linux/arm64 \
marimo-launcher-dev:latest-arm64  #<------------------

Here are my images:

docker image ls | grep "marimo"
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            latest-arm64        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-1383aca        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-latest-arm64   c3a4fd51c74f   6 minutes ago    438MB
marimo-launcher-dev                                                         latest-arm64        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            latest              536a44c9d3cb   12 minutes ago   417MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-latest         536a44c9d3cb   12 minutes ago   417MB
marimo-launcher-dev                                                         latest              536a44c9d3cb   12 minutes ago   417MB

Maybe my question is: why does it appear that the latest tag gets linux/amd64? My apologies if you've outlined this pretty clearly somewhere.

Ahh... So, this is not fully documented yet, but will be when we get to the next phase is the procoess (steps 4 - ... in the Confluence doc). For now, let's avoid trying to build for a different architecture since the focus of this is to switch to the new make commands and the new GHA workflows without changing anything else.

But, to your question: as oon as you introduce the .aws-architecture file, the make commands and the workflows switch to a slightly different tagging convention. This is mostly because of the infrastructure updates to ensure that the the CPU architecture defined by Terraform matches the CPU architecture chosen by the developr and documented in the .aws-architecture file. Once we explicitly name a CPU architecture, the latest tag goes away and we start using an architecture-specific tag (either latest-amd64 or latest-arm64).

In the long run, if you ever see a container in ECR with latest as a tag, you know that it is from an application that has defaulted to AMD64 (not to be confused with an application that has explicltly chosedn AMD64). If you see latest-amd64 then you know that the developer has explicitly chosen AMD64 as the architecture. And if you see latest-arm64 then you know that the developer has explicitly chosen ARM64 as the architecture.

@ghukill
Copy link
Contributor

ghukill commented Sep 29, 2025

It's looking good to me, but I had a question.

Ah, interesting, a followup.
I did get this to work by using latest-arm64:

% docker run \
-p "2718:2718" \
-v "./tests/fixtures:/tmp/fixtures" \
-e NOTEBOOK_MOUNT="/tmp/fixtures/inline_deps" \
--platform linux/arm64 \
marimo-launcher-dev:latest-arm64  #<------------------

Here are my images:

docker image ls | grep "marimo"
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            latest-arm64        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-1383aca        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-latest-arm64   c3a4fd51c74f   6 minutes ago    438MB
marimo-launcher-dev                                                         latest-arm64        c3a4fd51c74f   6 minutes ago    438MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            latest              536a44c9d3cb   12 minutes ago   417MB
222053980223.dkr.ecr.us-east-1.amazonaws.com/marimo-launcher-dev            make-latest         536a44c9d3cb   12 minutes ago   417MB
marimo-launcher-dev                                                         latest              536a44c9d3cb   12 minutes ago   417MB

Maybe my question is: why does it appear that the latest tag gets linux/amd64? My apologies if you've outlined this pretty clearly somewhere.

Ahh... So, this is not fully documented yet, but will be when we get to the next phase is the procoess (steps 4 - ... in the Confluence doc). For now, let's avoid trying to build for a different architecture since the focus of this is to switch to the new make commands and the new GHA workflows without changing anything else.

But, to your question: as oon as you introduce the .aws-architecture file, the make commands and the workflows switch to a slightly different tagging convention. This is mostly because of the infrastructure updates to ensure that the the CPU architecture defined by Terraform matches the CPU architecture chosen by the developr and documented in the .aws-architecture file. Once we explicitly name a CPU architecture, the latest tag goes away and we start using an architecture-specific tag (either latest-amd64 or latest-arm64).

In the long run, if you ever see a container in ECR with latest as a tag, you know that it is from an application that has defaulted to AMD64 (not to be confused with an application that has explicltly chosedn AMD64). If you see latest-amd64 then you know that the developer has explicitly chosen AMD64 as the architecture. And if you see latest-arm64 then you know that the developer has explicitly chosen ARM64 as the architecture.

Makes sense! Thanks.

@ghukill ghukill self-requested a review September 29, 2025 18:06
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Looking forward to following and working through next steps.

Thanks for the question response, better understanding it's kind of a phased approach.

@cabutlermit
Copy link
Contributor Author

Looks great. Looking forward to following and working through next steps.

Thanks for the question response, better understanding it's kind of a phased approach.

Yup. My approach to this is to have a plan (steps 1-3 in the Confluence doc) to move all our application repos to the new workflows without having to do any architecture testing or code changes. Just update the workflows and keep using AMD64 by default. This way, we can get rid of the old shared workflow files from the .github repository more quickly.

Then, we move on to the optional phase of determining if it makes sense to move a repo from AMD64 to ARM64 and do the steps 4 - ... in the Confluence doc.

@cabutlermit cabutlermit merged commit bccbcd4 into main Sep 29, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants