-
-
Notifications
You must be signed in to change notification settings - Fork 6
Record ETag and Last-Modified for each URL
#76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,6 +4,7 @@ using HTTP, JSON, Pkg.BinaryPlatforms, WebCacheUtilities, SHA, Lazy | |
| using Tar: Tar | ||
| import Pkg.BinaryPlatforms: triplet, arch | ||
| import Pkg.PlatformEngines: exe7z | ||
| using URIs: URIs, URI | ||
|
|
||
| "Wrapper types to define three jlext methods for portable, tarball and installer Windows" | ||
| struct WindowsPortable | ||
|
|
@@ -143,6 +144,31 @@ function get_tags() | |
| JSON.parse(String(read(tags_json_path))) | ||
| end | ||
|
|
||
| ##### -------------------------------------------------------------------------------------- | ||
| ##### Get ETag and Last-Modified, so we know if we need to re-download and re-checksum files | ||
|
|
||
| Base.@kwdef struct HeadInfo | ||
| url::URI | ||
| etag::Union{String, Nothing} | ||
| last_modified::Union{String, Nothing} | ||
| end | ||
|
|
||
| function HeadInfo(url) | ||
| local response = nothing | ||
| try | ||
| response = HTTP.head(url) | ||
| catch | ||
| error("Encountered error when making HEAD request to URL: $url") | ||
| end | ||
| etag = HTTP.header(response, "ETag", nothing) | ||
| etag === nothing && @warn "ETag not provided in response from $url" | ||
| last_modified = HTTP.header(response, "Last-Modified", nothing) | ||
| last_modified === nothing && @warn "Last-Modified not provided in response from $url" | ||
| return HeadInfo(; url=URI(url), etag, last_modified) | ||
| end | ||
|
|
||
| ##### -------------------------------------------------------------------------------------- | ||
|
|
||
| function main(out_path) | ||
| tags = get_tags() | ||
| tag_versions = filter(x -> x !== nothing, [vnum_maybe(basename(t["ref"])) for t in tags]) | ||
|
|
@@ -240,6 +266,8 @@ function main(out_path) | |
|
|
||
| end | ||
|
|
||
| headinfo = HeadInfo(url) | ||
|
|
||
| # Build up metadata about this file | ||
| file_dict = Dict( | ||
| "triplet" => triplet(platform), | ||
|
|
@@ -260,6 +288,13 @@ function main(out_path) | |
| file_dict["asc"] = asc_signature | ||
| end | ||
|
|
||
| if !isnothing(headinfo.etag) | ||
| file_dict["etag"] = headinfo.etag | ||
| end | ||
| if !isnothing(headinfo.last_modified) | ||
| file_dict["last-modified"] = headinfo.last_modified | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For ease of downstream comparison, should we parse this into a If we want to handle all of the formats mentioned in RFC 9110, this should work: let fmts = [dateformat"e, d u y H:M:S \G\M\T", # IMF-fixdate (RFC 5322)
dateformat"E, d-u-y H:M:S \G\M\T", # RFC 850
dateformat"e u d H:M:S y"] # ANSI C asctime()
global function parse_http_date(dt)
dt = replace(dt, r"\s+" => " ") # asctime left-pads days with space instead of 0
for fmt in fmts
x = tryparse(DateTime, dt, fmt)
x !== nothing && return x
end
throw(ArgumentError("date is not in a recognized format: $dt"))
end
endBut I think the RFC 5322 format is what we can expect to receive from any server that doesn't think it's currently 1994.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I thought about parsing it, but I don't think we'll ever actually do date comparisons or arithmetic. I.e. my plan is to just see if the value of |
||
| end | ||
|
|
||
| # Right now, all we have are archives, but let's be forward-thinking | ||
| # and make this an array of dictionaries that is easy to extensibly match | ||
| push!(meta[version]["files"], file_dict) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the idea behind intercepting an exception thrown by
HTTP.headis to be able to ensure the URL is logged—is that right? As written, you lose what the actual error was. Instead, you could do something like this:That said, errors from HTTP.jl generally do tell you what the URL was, so you could alternatively just do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the original error still appears in the stacktrace, right? It'll be something like [our error] "caused by" [original error].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh does it? I remember at some point the output from some errors doubled in length but it was never clear to me why or what makes it do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me double-check locally to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I tested on Julia 1.10, and the original error is still shown.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to why I want to throw my own error, you're right, it's so that I can see the full URL easily.
If I just do the call to
HTTP.head(), here's what I get:So the error only shows the path (
/foo/bar/baz), and I have to go back and look in the code to remind myself what the host was, which is annoying.