Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Repository Guidelines

## Project Structure & Module Organization
- `lib/remote_persistent_term.ex` defines the core behaviour and `use RemotePersistentTerm` macro.
- `lib/remote_persistent_term/fetcher/` holds fetcher implementations (HTTP, S3, Static) and helpers like HTTP cache logic.
- Tests live in `test/` with paths mirroring modules (e.g., `test/remote_persistent_term/fetcher/http_test.exs`).
- Generated artifacts (`_build/`, `deps/`, `doc/`, `cover/`) are outputs of Mix tasks and should not be edited by hand.

## Build, Test, and Development Commands
- `mix deps.get` installs dependencies.
- `mix compile` builds the library.
- `mix test` runs the full ExUnit suite.
- `mix test test/remote_persistent_term/fetcher/http_test.exs:30` runs a focused test by file and line.
- `mix format` applies the project formatter (see `.formatter.exs`).
- `mix docs` generates ExDoc output into `doc/`.

## Coding Style & Naming Conventions
- Follow `mix format` output (Elixir defaults to 2-space indentation).
- Modules use `CamelCase`; files and functions use `snake_case` (predicates end in `?`).
- Keep option keys consistent with `RemotePersistentTerm` options and fetcher configuration keys.

## Testing Guidelines
- Tests use ExUnit; Mox mocks the ExAws client and Bypass is used for HTTP fetcher tests.
- Prefer deterministic tests and keep external network access mocked or bypassed.
- Name tests with clear behaviour statements; add coverage for new fetcher logic and option validation.

## Commit & Pull Request Guidelines
- Commit messages are short, imperative, and capitalized (e.g., “Improve logs”, “Fix tests”, “Increment version”).
- Keep commits scoped to one change; avoid unrelated refactors in the same commit.
- PRs should include a concise summary, motivation, and the tests you ran (or note if none).
- If a change affects public behaviour or configuration, update documentation and mention it in the PR.

## Notes for Contributors
- CI runs `mix test` on recent OTP/Elixir versions; ensure your changes pass locally before pushing.
- When touching S3 or HTTP fetchers, prefer tests that use Mox/Bypass rather than real services.
63 changes: 47 additions & 16 deletions lib/remote_persistent_term.ex
Original file line number Diff line number Diff line change
Expand Up @@ -268,21 +268,43 @@ defmodule RemotePersistentTerm do
start_meta,
fn ->
{status, version} =
with {:ok, current_version} <- state.fetcher_mod.current_version(state.fetcher_state),
true <- state.current_version != current_version,
:ok <- download_and_store_term(state, deserialize_fun, put_fun) do
{:updated, current_version}
if function_exported?(state.fetcher_mod, :download_if_changed, 2) do
case state.fetcher_mod.download_if_changed(
state.fetcher_state,
state.current_version
) do
{:ok, term, new_version} ->
case store_term(state, deserialize_fun, put_fun, term) do
:ok ->
{:updated, new_version}

{:error, reason} ->
log_update_error(state.name, reason)
{:not_updated, state.current_version}
end

{:not_modified, version} ->
Logger.info("#{state.name} - up to date")
{:not_updated, version || state.current_version}

{:error, reason} ->
log_update_error(state.name, reason)
{:not_updated, state.current_version}
end
else
false ->
Logger.info("#{state.name} - up to date")
{:not_updated, state.current_version}

{:error, reason} ->
Logger.error(
"#{state.name} - failed to update remote term, reason: #{inspect(reason)}"
)

{:not_updated, state.current_version}
with {:ok, current_version} <- state.fetcher_mod.current_version(state.fetcher_state),
true <- state.current_version != current_version,
:ok <- download_and_store_term(state, deserialize_fun, put_fun) do
{:updated, current_version}
else
false ->
Logger.info("#{state.name} - up to date")
{:not_updated, state.current_version}

{:error, reason} ->
log_update_error(state.name, reason)
{:not_updated, state.current_version}
end
end

{version, Map.put(start_meta, :status, status)}
Expand All @@ -305,9 +327,18 @@ defmodule RemotePersistentTerm do
@doc false
def validate_options(opts), do: NimbleOptions.validate(opts, @opts_schema)

defp log_update_error(name, reason) do
Logger.error("#{name} - failed to update remote term, reason: #{inspect(reason)}")
end

defp download_and_store_term(state, deserialize_fun, put_fun) do
with {:ok, term} <- state.fetcher_mod.download(state.fetcher_state),
{:ok, decompressed} <- maybe_decompress(state, term),
with {:ok, term} <- state.fetcher_mod.download(state.fetcher_state) do
store_term(state, deserialize_fun, put_fun, term)
end
end

defp store_term(state, deserialize_fun, put_fun, term) do
with {:ok, decompressed} <- maybe_decompress(state, term),
{:ok, deserialized} <- deserialize_fun.(decompressed) do
put_fun.(deserialized)
end
Expand Down
10 changes: 10 additions & 0 deletions lib/remote_persistent_term/fetcher.ex
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ defmodule RemotePersistentTerm.Fetcher do
@type state :: term()
@type opts :: Keyword.t()
@type version :: String.t()
@type download_if_changed_result ::
{:ok, term(), version()} | {:not_modified, version() | nil} | {:error, term()}

@doc """
Initialize the implementation specific state of the Fetcher.
Expand All @@ -26,4 +28,12 @@ defmodule RemotePersistentTerm.Fetcher do
Download the term from the remote source.
"""
@callback download(state()) :: {:ok, term()} | {:error, term()}

@doc """
Optionally download the term only if it has changed. When implemented, it should
return `{:not_modified, current_version}` for an unchanged term or `{:ok, term, new_version}`.
"""
@callback download_if_changed(state(), version() | nil) :: download_if_changed_result

@optional_callbacks download_if_changed: 2
end
163 changes: 131 additions & 32 deletions lib/remote_persistent_term/fetcher/s3.ex
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
defmodule RemotePersistentTerm.Fetcher.S3 do
@moduledoc """
A Fetcher implementation for AWS S3.

## Versioned vs. non-versioned buckets

This fetcher works with both versioned and non-versioned buckets. It uses the object's
`ETag` as a change token and performs conditional GETs with `If-None-Match` to avoid
re-downloading unchanged data.

- **Versioned buckets**: `HEAD`/`GET` responses include `ETag`; the fetcher uses it for
change detection. The latest object is always whatever S3 returns for the key (no explicit
version ID required).
- **Non-versioned buckets**: only `ETag` is available, which is sufficient to detect
content changes. Overwriting an object with identical bytes may keep the same `ETag`,
which is fine because the content is unchanged.

## S3-compatible services

S3-compatible providers (e.g., DigitalOcean Spaces, Linode Object Storage) should work
as long as they support standard S3 headers: `ETag`, `If-None-Match`, and `304 Not Modified`.
If a provider ignores conditional requests, the fetcher will still function but will
download on every refresh.
"""
require Logger

Expand Down Expand Up @@ -69,6 +89,8 @@ defmodule RemotePersistentTerm.Fetcher.S3 do
"""
@impl true
def init(opts) do
ensure_http_client()

with {:ok, valid_opts} <- NimbleOptions.validate(opts, @opts_schema) do
{:ok,
%__MODULE__{
Expand All @@ -82,17 +104,20 @@ defmodule RemotePersistentTerm.Fetcher.S3 do

@impl true
def current_version(state) do
with {:ok, versions} <- list_object_versions(state),
{:ok, %{etag: etag, version_id: version}} <- find_latest(versions) do
with {:ok, %{headers: headers}} <- head_object(state),
{:ok, version} <- extract_version(headers) do
Logger.info(
bucket: state.bucket,
key: state.key,
version: version,
message: "Found latest version of object"
)

{:ok, etag}
{:ok, version}
else
{:error, {:http_error, 404, _}} ->
{:error, "could not find s3://#{state.bucket}/#{state.key}"}

{:error, {:unexpected_response, %{body: reason}}} ->
{:error, reason}

Expand Down Expand Up @@ -133,60 +158,134 @@ defmodule RemotePersistentTerm.Fetcher.S3 do
end
end

defp list_object_versions(state) do
@impl true
def download_if_changed(state, current_version) do
res =
aws_client_request(
&ExAws.S3.get_bucket_object_versions/2,
get_object_request(
state,
prefix: state.key
if_none_match_opts(current_version),
&failover_on_error?/1
)

with {:ok, %{body: %{versions: versions}}} <- res do
{:ok, versions}
case res do
{:ok, %{status_code: 304}} ->
{:not_modified, current_version}

{:error, {:http_error, 304, _}} ->
{:not_modified, current_version}

{:ok, %{body: body, headers: headers}} ->
with {:ok, version} <- extract_version(headers) do
{:ok, body, version}
end

{:error, reason} ->
{:error, inspect(reason)}
end
end

defp get_object(state) do
aws_client_request(&ExAws.S3.get_object/2, state, state.key)
get_object_request(state, [])
end

defp get_object_request(state, opts, failover_on_error? \\ fn _ -> true end) do
aws_client_request(
fn bucket, request_opts -> ExAws.S3.get_object(bucket, state.key, request_opts) end,
state,
opts,
failover_on_error?
)
end

defp head_object(state) do
aws_client_request(&ExAws.S3.head_object/2, state, state.key)
end

defp find_latest([_ | _] = contents) do
Enum.find(contents, fn
%{is_latest: "true"} ->
true
defp extract_version(headers) do
case header_value(headers, "etag") do
nil -> {:error, :not_found}
value -> {:ok, normalize_etag(value)}
end
end

defp header_value(headers, name) do
downcased = String.downcase(name)

Enum.find_value(headers, fn
{key, value} when is_binary(key) and is_binary(value) ->
if String.downcase(key) == downcased, do: value, else: nil

_ ->
false
nil
end)
|> case do
res when is_map(res) -> {:ok, res}
_ -> {:error, :not_found}
end

defp normalize_etag(value) when is_binary(value) do
value
|> String.trim()
|> String.trim("\"")
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak ETags are incorrectly normalized causing match failures

Low Severity

The normalize_etag function uses String.trim("\"") which only removes quotes from both ends if present. For weak ETags like W/"abc123", this removes only the trailing quote (since the first char is W), producing the malformed value W/"abc123 (missing closing quote). Subsequently, quote_etag wraps this as "W/\"abc123" instead of the correct W/"abc123", causing conditional GET requests to never match for S3-compatible services using weak validators.

Additional Locations (1)

Fix in Cursor Fix in Web


defp if_none_match_opts(nil), do: []
defp if_none_match_opts(etag), do: [if_none_match: quote_etag(etag)]

defp quote_etag(etag) do
etag = String.trim(etag)

if String.starts_with?(etag, "\"") and String.ends_with?(etag, "\"") do
etag
else
"\"#{etag}\""
end
end

defp failover_on_error?({:http_error, 304, _}), do: false
defp failover_on_error?(_reason), do: true

defp ensure_http_client do
case Application.get_env(:ex_aws, :http_client) do
nil ->
Application.put_env(:ex_aws, :http_client, RemotePersistentTerm.Fetcher.S3.HttpClient)

_ ->
:ok
end
end

defp find_latest(_), do: {:error, :not_found}
defp aws_client_request(op, state, opts) do
aws_client_request(op, state, opts, fn _ -> true end)
end

defp aws_client_request(op, %{failover_buckets: nil} = state, opts) do
defp aws_client_request(op, %{failover_buckets: nil} = state, opts, _failover_on_error?) do
perform_request(op, state.bucket, state.region, opts)
end

defp aws_client_request(
op,
%{
failover_buckets: [_|_] = failover_buckets
failover_buckets: [_ | _] = failover_buckets
} = state,
opts
opts,
failover_on_error?
) do
with {:error, reason} <- perform_request(op, state.bucket, state.region, opts) do
Logger.error(%{
bucket: state.bucket,
key: state.key,
region: state.region,
reason: inspect(reason),
message: "Failed to fetch from primary bucket, attempting failover buckets"
})

try_failover_buckets(op, failover_buckets, opts, state)
case perform_request(op, state.bucket, state.region, opts) do
{:error, reason} = error ->
if failover_on_error?.(reason) do
Logger.error(%{
bucket: state.bucket,
key: state.key,
region: state.region,
reason: inspect(reason),
message: "Failed to fetch from primary bucket, attempting failover buckets"
})

try_failover_buckets(op, failover_buckets, opts, state)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failover buckets ignore 304 handling during conditional GETs

Medium Severity

The failover_on_error? predicate is used to prevent treating 304 responses as failures for the primary bucket, but it's not passed to try_failover_buckets. When download_if_changed triggers failover (e.g., primary returns 500) and a failover bucket returns 304 (not modified), the response is incorrectly treated as an error. This causes unnecessary requests to all remaining failover buckets and can result in "All buckets failed" errors with misleading log messages, even when the content is actually unchanged across all failover buckets.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planning to work on the failover logic in a separate PR

else
error
end

result ->
result
end
end

Expand Down
26 changes: 26 additions & 0 deletions lib/remote_persistent_term/fetcher/s3/http_client.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
defmodule RemotePersistentTerm.Fetcher.S3.HttpClient do
@moduledoc """
ExAws HTTP client implementation for Req.
"""

@behaviour ExAws.Request.HttpClient

@impl ExAws.Request.HttpClient
def request(method, url, body, headers, _http_opts) do
request = Req.new(decode_body: false, retry: false)

case Req.request(request, method: method, url: url, body: body, headers: headers) do
{:ok, response} ->
response = %{
status_code: response.status,
headers: Req.get_headers_list(response),
body: response.body
}

{:ok, response}

{:error, reason} ->
{:error, %{reason: reason}}
end
end
end
Loading