Conversation
sebastian-nagel
left a comment
There was a problem hiding this comment.
Hi @lfoppiano, thanks! The fix looks good to me.
However, the unit test failed when I tried to run it. Creating a "HTTP 304" response object, resp. the included Content object, might be challenging. See the attached data. It's on you whether to read the response from the segment, or to study it to figure out how it should look like. Use nutch readseg for inspection. There are other Nutch unit tests which use segments.
Otherwise, it's very appreciated that there will be now the first "deeper" unit test for the WARC writer.
|
@sebastian-nagel the test was left failing on purpose, because I wanted to first fix the test running. I managed (or, better, Claude Opus managed - other models weren't able) to find the cause. For now I've separated the |
Thanks! That's a left-over of NUTCH-2852. Unclear why it hits our fork but not upstream Nutch. It should be fixed upstream anyway. I'll also comment on PR #44. |
sebastian-nagel
left a comment
There was a problem hiding this comment.
Hi @lfoppiano, thanks! The fix to change the Content-Type in the WARC header looks good.
See the inline comments about the unit test code.
0f0c4b3 to
b0110c9
Compare
|
@sebastian-nagel I believe now the test works fine with the data from a real case. 🌮 🎉 String payloadDigest = "sha1:abc123";
String blockDigest = "sha1:def456"; |
|
The digests are calculated one level higher in the WarcRecordWriter and then passed to the WarcWriter. No need to test it as part of this PR. |
sebastian-nagel
left a comment
There was a problem hiding this comment.
Thanks @lfoppiano! Looks good to me.
Please merge!
This PR fixed issue #40. I replaced the content type
message/httpwithapplication/http; msgtype=response.