Skip to content

Record ETag and Last-Modified for each URL#76

Draft
DilumAluthge wants to merge 3 commits into
mainfrom
dpa/record-etag-and-lastmodified
Draft

Record ETag and Last-Modified for each URL#76
DilumAluthge wants to merge 3 commits into
mainfrom
dpa/record-etag-and-lastmodified

Conversation

@DilumAluthge
Copy link
Copy Markdown
Member

@DilumAluthge DilumAluthge commented Apr 19, 2026

This is needed for #67. We need the ETag and Last-Modified to determine whether the information we have is fresh or stale.

Cross-ref: #51

hat-tip: @ararslan, who suggested using ETag and Last-Modified, because we use S3.

@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch from 86c2c4a to 88f6e3b Compare April 19, 2026 05:50
@DilumAluthge DilumAluthge marked this pull request as ready for review April 19, 2026 06:08
@DilumAluthge DilumAluthge marked this pull request as draft April 19, 2026 06:09
@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch from 421a18c to 16b16b3 Compare April 19, 2026 06:10
@DilumAluthge DilumAluthge marked this pull request as ready for review April 19, 2026 06:11
@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch 2 times, most recently from 153883d to 769fc34 Compare April 19, 2026 09:58
Comment thread src/VersionsJSONUtil.jl Outdated
Comment thread src/VersionsJSONUtil.jl Outdated
@DilumAluthge DilumAluthge requested a review from ararslan April 22, 2026 18:03
@DilumAluthge DilumAluthge added this pull request to the merge queue Apr 24, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 24, 2026
@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch from 9c1fac9 to 09bf910 Compare April 24, 2026 14:28
@DilumAluthge DilumAluthge added this pull request to the merge queue Apr 25, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 26, 2026
@DilumAluthge DilumAluthge added this pull request to the merge queue Apr 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 26, 2026
@DilumAluthge DilumAluthge added this pull request to the merge queue Apr 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Apr 27, 2026
@DilumAluthge DilumAluthge marked this pull request as draft April 27, 2026 00:22
Copy link
Copy Markdown
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how useful my review here actually is but here it is. It makes sense to me to include these values. Does the JSON schema need to be updated for the inclusion?

Comment thread src/VersionsJSONUtil.jl
Comment on lines +156 to +161
local response = nothing
try
response = HTTP.head(url)
catch
error("Encountered error when making HEAD request to URL: $url")
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the idea behind intercepting an exception thrown by HTTP.head is to be able to ensure the URL is logged—is that right? As written, you lose what the actual error was. Instead, you could do something like this:

Suggested change
local response = nothing
try
response = HTTP.head(url)
catch
error("Encountered error when making HEAD request to URL: $url")
end
response = try
HTTP.head(url)
catch
@error "Encounted error when making HEAD request to URL: $url"
rethrow()
end

That said, errors from HTTP.jl generally do tell you what the URL was, so you could alternatively just do

Suggested change
local response = nothing
try
response = HTTP.head(url)
catch
error("Encountered error when making HEAD request to URL: $url")
end
response = HTTP.head(url)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the original error still appears in the stacktrace, right? It'll be something like [our error] "caused by" [original error].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh does it? I remember at some point the output from some errors doubled in length but it was never clear to me why or what makes it do that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double-check locally to make sure.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tested on Julia 1.10, and the original error is still shown.

julia> try
       sqrt(-1)
       catch
       error("Encountered an error: [my debugging info]")
       end
ERROR: Encountered an error: [my debugging info]
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] top-level scope
   @ REPL[1]:4

caused by: DomainError with -1.0:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
 [1] throw_complex_domainerror(f::Symbol, x::Float64)
   @ Base.Math ./math.jl:33
 [2] sqrt
   @ ./math.jl:686 [inlined]
 [3] sqrt(x::Int64)
   @ Base.Math ./math.jl:1578
 [4] top-level scope
   @ REPL[1]:2

Copy link
Copy Markdown
Member Author

@DilumAluthge DilumAluthge May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to why I want to throw my own error, you're right, it's so that I can see the full URL easily.

If I just do the call to HTTP.head(), here's what I get:

julia> import HTTP

julia> HTTP.head("https://example.com/foo/bar/baz")
ERROR: HTTP.Exceptions.StatusError(404, "HEAD", "/foo/bar/baz", HTTP.Messages.Response:
"""
HTTP/1.1 404 Not Found
Date: Fri, 01 May 2026 02:29:11 GMT
Content-Type: text/html
Connection: keep-alive
Server: cloudflare
Age: 8057
cf-cache-status: HIT
Content-Encoding: gzip
CF-RAY: 9f4b5b6c49bbcc9b-BOS

""")
Stacktrace: [elided]

So the error only shows the path (/foo/bar/baz), and I have to go back and look in the code to remind myself what the host was, which is annoying.

Comment thread src/VersionsJSONUtil.jl Outdated
Comment thread src/VersionsJSONUtil.jl Outdated
Comment thread src/VersionsJSONUtil.jl Outdated
Comment thread src/VersionsJSONUtil.jl Outdated
Comment thread src/VersionsJSONUtil.jl Outdated
Comment thread src/VersionsJSONUtil.jl
file_dict["etag"] = headinfo.etag
end
if !isnothing(headinfo.last_modified)
file_dict["last-modified"] = headinfo.last_modified
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of downstream comparison, should we parse this into a DateTime and write it out as ISO-8601?

If we want to handle all of the formats mentioned in RFC 9110, this should work:

let fmts = [dateformat"e, d u y H:M:S \G\M\T",  # IMF-fixdate (RFC 5322)
            dateformat"E, d-u-y H:M:S \G\M\T",  # RFC 850
            dateformat"e u d H:M:S y"]          # ANSI C asctime()
    global function parse_http_date(dt)
        dt = replace(dt, r"\s+" => " ")  # asctime left-pads days with space instead of 0
        for fmt in fmts
            x = tryparse(DateTime, dt, fmt)
            x !== nothing && return x
        end
        throw(ArgumentError("date is not in a recognized format: $dt"))
    end
end

But I think the RFC 5322 format is what we can expect to receive from any server that doesn't think it's currently 1994.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about parsing it, but I don't think we'll ever actually do date comparisons or arithmetic. I.e. my plan is to just see if the value of Last-Modified has changed, instead of comparing the timestamp to now().

@ararslan
Copy link
Copy Markdown
Member

Thank you for doing this! I think this will be quite helpful, especially since a recent from-scratch rebuild I did timed out at 6 hours, and it wasn't even that close to being done.

@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch 3 times, most recently from 18dc798 to 4809fab Compare May 1, 2026 04:49
@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch from 4809fab to d2808ad Compare May 3, 2026 00:07
@DilumAluthge DilumAluthge force-pushed the dpa/record-etag-and-lastmodified branch from d2808ad to 2d97116 Compare May 3, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants