-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The current EarthBuild/buildkit is behind the upstream moby/buildkit.
The merge strategy needs to be established to simplify the upgrade process and document the EarthBuild changes in fork-specific MD files, thereby easing future maintenance.
The last merge was probably:
$ git log origin/main --merges -n 1
commit 531b303aa8ec03c29c2ceaa140eb0a6d32e6f6f3 (upstream/earthly-main)
Merge: 594835b59 e163acdbb
Author: Brandon Schurman <brandon@earthly.dev>
Date: Wed May 15 16:05:21 2024 -0400
Merge pull request #59 from earthly/brandon/http-proxy
Allows option for client to use grpc default dialer. The builtin dialer in gRPC has support for things like HTTP connect proxy which we need to support BYOC satellites in some environments. Buildkit was previously forcing it's own dialer implementation.Additional details by @earthly
Changes we made to Buildkit as part of @earthly:
- Custom exporter (earthlyoutputs), that enables the following
- Outputting multiple images and artifacts within the same build context (as opposed to having to create different build contexts for each one)
- Exporting via an embedded registry that we added into the buildkit process + a ping-pull mechanism (this allows loading resulting images into docker via docker pull, which is very efficient - it only transfers the layers that have changed and nothing else). This registry has custom storage drivers to be able to access buildkit storage directly.
- Turning off some provenance-related stuff that were causing issues with huge builds (Earthly builds tend to be significantly bigger than round-of-the-mill Docker builds).
- Implement LOCALLY as a series of fake RUN commands that buildkit recognizes and treats in a special way, in order to call back into the client (earthly) to execute the commands there. This had to be done in the Buildkit context, because the core of the scheduling is performed there, and LOCALLY had to be integrated with all that - e.g. to be able to use intermediate files, etc.
- GC stats reported via the gRPC API. Useful to debug GC-related issues, and to get a sense of the size of the cache on disk.
- Atomic shutdown sequence - a gRPC call that attempts to shut down buildkit, but only if there are no active sessions. This operation is atomic, and this helps with drain + hibernate that we were using in Earthly Cloud.
- Limits on number of parallel builds.
- Increased gRPC message sizes.
- gRPC retries.
- Dynamic local dirs (local dirs from which the build can COPY from). In Earthly, the build context can be very dynamic (the complete list of directories is not known statically at the beginning of the build), hence this is needed.
- Host bind mounts - the ability to bind-mount a host directory into the build environment. "Host" here actually means the container in which buildkit executes. This is needed to make WITH DOCKER work performantly. It's important for dockerd to be able to mount the data root in a native directory - and Earthly facilitated this across multiple layers: docker volume mounted in buildkit container, then mounted via buildkit host bind mounts (that we added to buildkit) into buildkit builds.
- Socket mounting - used for the ability to drop into a shell on a failing build via interactive mode (earthly -i)
- Support for LFS in git contexts (when running a build from another repo - e.g. earthly github.com/foo/bar+build)
- sessionTimeout - a safety mechanism to introduce time limits on builds (useful for our free tier)
- Container stats reporting
- The ability to export some results in the middle of the build. Buildkit was originally designed to only produce the resulting image at the end of the build, whereas Earthly needs partial results in some situations - e.g. in order to be able to load images into the WITH DOCKER context, you have to export the images in the middle of the build.
- gRPC Healthchecks (used to manage buildkit in a fleet)
- Credential redacting in git addresses
- Lots of git-related debugging settings
- An entire alternate solver code-named ticktock that we re-implemented from scratch to replace the official one. This fixed the random "inconsistent graph state" errors. This is in a separate branch called earthly-next: earthly/buildkit@earthly-main...earthly-next . A significant number of our users and customers have tried this out and it worked well. We never got around to promoting this to the new default. It would likely improve overall stability, especially in big builds. This can be enabled in earthly via the hidden --ticktock feature flag, and under the hood earthly uses the -ticktock version of the buildkit container. e.g. https://hub.docker.com/repository/docker/earthly/buildkitd/tags/v0.8.15-ticktock/sha256-e2b11830bda91e2a71930b212b555db7e6ab25710390b070d87b9009b0d33c15
- Various bug fixes that we were in the process of upstreaming to official buildkit
Tip: searching for "earthly" in the earthly/buildkit codebase reveals where all of the above is implemented
Note that at some point we abandoned the idea that we would follow the upstream buildkit project because the goals of the projects were too different. Buildkit was designed to be an image builder, whereas Earthly was designed to be a CI/CD framework. You can see in all of the above how the changes we made tried to compensate for this gap: e.g. ability to output multiple things in a single build, LOCALLY, exporting things in the middle of the build, ability to run WITH DOCKER in a buildkit build. None of these things are on Buildkit's roadmap - and likely never will be. The architecture of buildkit never accounted for these possibilities, hence we had to use a lot of hacks to make it all work (e.g. LOCALLY in particular is the hackiest IMO).
(Side note: In our future roadmap we were planning on ripping and replacing buildkit entirely. But that was deemed as a complete re-write that would have taken several years with several full-time engineers on it. We explored going with buildah + daemonless with some basic prototypes. There were also questions about mac compatibility without requiring the user to run podman, etc. I don't remember if we resolved those questions.)