-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix(test): resolve rate-limit failures when downloading datasets #9658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ import ( | |
| "runtime" | ||
| "strconv" | ||
| "strings" | ||
| "time" | ||
|
|
||
| "github.com/pkg/errors" | ||
|
|
||
|
|
@@ -41,10 +42,10 @@ func (c *LocalCluster) HostDgraphBinaryPath() string { | |
| } | ||
|
|
||
| var datafiles = map[string]string{ | ||
| "1million.schema": "https://github.com/dgraph-io/dgraph-benchmarks/blob/main/data/1million.schema?raw=true", | ||
| "1million.rdf.gz": "https://github.com/dgraph-io/dgraph-benchmarks/blob/main/data/1million.rdf.gz?raw=true", | ||
| "21million.schema": "https://github.com/dgraph-io/dgraph-benchmarks/blob/main/data/21million.schema?raw=true", | ||
| "21million.rdf.gz": "https://github.com/dgraph-io/dgraph-benchmarks/blob/main/data/21million.rdf.gz?raw=true", | ||
| "1million.schema": "https://raw.githubusercontent.com/dgraph-io/dgraph-benchmarks/refs/heads/main/data/1million.schema", | ||
| "1million.rdf.gz": "https://media.githubusercontent.com/media/dgraph-io/dgraph-benchmarks/refs/heads/main/data/1million.rdf.gz", | ||
| "21million.schema": "https://raw.githubusercontent.com/dgraph-io/dgraph-benchmarks/refs/heads/main/data/21million.schema", | ||
| "21million.rdf.gz": "https://media.githubusercontent.com/media/dgraph-io/dgraph-benchmarks/refs/heads/main/data/21million.rdf.gz", | ||
| } | ||
|
|
||
| type DatasetType int | ||
|
|
@@ -604,11 +605,22 @@ func (d *Dataset) ensureFile(filename string) string { | |
| } | ||
|
|
||
| func downloadFile(fname, url string) error { | ||
| cmd := exec.Command("wget", "-O", fname, url) | ||
| cmd.Dir = datasetFilesPath | ||
|
|
||
| if _, err := cmd.CombinedOutput(); err != nil { | ||
| return fmt.Errorf("error downloading file %s: %w", fname, err) | ||
| const maxRetries = 3 | ||
| fpath := filepath.Join(datasetFilesPath, fname) | ||
| for attempt := 1; attempt <= maxRetries; attempt++ { | ||
| cmd := exec.Command("wget", "--tries=3", "--waitretry=5", "--retry-connrefused", "-O", fname, url) | ||
| cmd.Dir = datasetFilesPath | ||
|
|
||
| if out, err := cmd.CombinedOutput(); err != nil { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Go-level retry loop (3 attempts) wraps a wget invocation that itself retries 3 times ( Consider either:
Having both layers is redundant and makes the failure timeline hard to reason about. |
||
| log.Printf("attempt %d/%d failed to download %s: %v\n%s", attempt, maxRetries, fname, err, string(out)) | ||
| if attempt < maxRetries { | ||
| time.Sleep(time.Duration(attempt*5) * time.Second) | ||
| continue | ||
| } | ||
| _ = os.Remove(fpath) | ||
| return fmt.Errorf("error downloading file %s after %d attempts: %w", fname, maxRetries, err) | ||
| } | ||
| return nil | ||
| } | ||
| return nil | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1159,7 +1159,27 @@ var rdfFileNames = [...]string{ | |
| "workAt_0.rdf"} | ||
|
|
||
| var ldbcDataFiles = map[string]string{ | ||
| "ldbcTypes.schema": "https://github.com/dgraph-io/dgraph-benchmarks/blob/main/ldbc/sf0.3/ldbcTypes.schema?raw=true", | ||
| "ldbcTypes.schema": "https://media.githubusercontent.com/media/dgraph-io/dgraph-benchmarks/refs/heads/main/ldbc/sf0.3/ldbcTypes.schema", | ||
| } | ||
|
|
||
| func wgetWithRetry(fname, url, dir string) error { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function is a near-exact duplicate of This is especially concerning here because the function contents have clearly had to evolve over time as different failure causes were found and more guards were added. Every place it's duplicated is one more place to find and fix (or forget to fix) when the next issue arises. The duplication is compounded by the double-retry approach — both copies pass |
||
| const maxRetries = 3 | ||
| fpath := filepath.Join(dir, fname) | ||
| for attempt := 1; attempt <= maxRetries; attempt++ { | ||
| cmd := exec.Command("wget", "--tries=3", "--waitretry=5", "--retry-connrefused", "-O", fname, url) | ||
| cmd.Dir = dir | ||
| if out, err := cmd.CombinedOutput(); err != nil { | ||
| fmt.Printf("attempt %d/%d failed to download %s: %v\n%s\n", attempt, maxRetries, fname, err, string(out)) | ||
| if attempt < maxRetries { | ||
| time.Sleep(time.Duration(attempt*5) * time.Second) | ||
| continue | ||
| } | ||
| _ = os.Remove(fpath) | ||
| return fmt.Errorf("failed to download %s after %d attempts: %w", fname, maxRetries, err) | ||
| } | ||
| return nil | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func downloadDataFiles() { | ||
|
|
@@ -1168,12 +1188,13 @@ func downloadDataFiles() { | |
| return | ||
| } | ||
| for fname, link := range datafiles { | ||
| cmd := exec.Command("wget", "-O", fname, link) | ||
| cmd.Dir = *tmp | ||
|
|
||
| if out, err := cmd.CombinedOutput(); err != nil { | ||
| fmt.Printf("Error %v\n", err) | ||
| panic(fmt.Sprintf("error downloading a file: %s", string(out))) | ||
| fpath := filepath.Join(*tmp, fname) | ||
| if fi, err := os.Stat(fpath); err == nil && fi.Size() > 0 { | ||
| fmt.Printf("Skipping %s (already exists)\n", fname) | ||
| continue | ||
| } | ||
| if err := wgetWithRetry(fname, link, *tmp); err != nil { | ||
| panic(fmt.Sprintf("error downloading %s: %v", fname, err)) | ||
| } | ||
| } | ||
| } | ||
|
|
@@ -1189,20 +1210,26 @@ func downloadLDBCFiles(dir string) { | |
| } | ||
|
|
||
| start := time.Now() | ||
| sem := make(chan struct{}, 5) | ||
| var wg sync.WaitGroup | ||
| for fname, link := range ldbcDataFiles { | ||
| fpath := filepath.Join(dir, fname) | ||
| if fi, err := os.Stat(fpath); err == nil && fi.Size() > 0 { | ||
| fmt.Printf("Skipping %s (already exists)\n", fname) | ||
| continue | ||
| } | ||
| wg.Add(1) | ||
| go func(fname, link string, wg *sync.WaitGroup) { | ||
| go func(fname, link string) { | ||
| defer wg.Done() | ||
| start := time.Now() | ||
| cmd := exec.Command("wget", "-O", fname, link) | ||
| cmd.Dir = dir | ||
| if out, err := cmd.CombinedOutput(); err != nil { | ||
| fmt.Printf("Error %v\n", err) | ||
| panic(fmt.Sprintf("error downloading a file: %s", string(out))) | ||
| sem <- struct{}{} | ||
| defer func() { <-sem }() | ||
|
|
||
| dlStart := time.Now() | ||
| if err := wgetWithRetry(fname, link, dir); err != nil { | ||
| panic(fmt.Sprintf("error downloading %s: %v", fname, err)) | ||
| } | ||
| fmt.Printf("Downloaded %s to %s in %s \n", fname, dir, time.Since(start)) | ||
| }(fname, link, &wg) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Calling Consider using |
||
| fmt.Printf("Downloaded %s to %s in %s \n", fname, dir, time.Since(dlStart)) | ||
| }(fname, link) | ||
| } | ||
| wg.Wait() | ||
| fmt.Printf("Downloaded %d files in %s \n", len(ldbcDataFiles), time.Since(start)) | ||
|
|
@@ -1387,7 +1414,9 @@ func run() error { | |
| needsData := testSuiteContainsAny("load", "ldbc", "all") | ||
| if needsData && *tmp == "" { | ||
| *tmp = filepath.Join(os.TempDir(), "dgraph-test-data") | ||
| x.Check(testutil.MakeDirEmpty([]string{*tmp})) | ||
| } | ||
| if needsData { | ||
| x.Check(os.MkdirAll(*tmp, 0755)) | ||
| } | ||
| if testSuiteContainsAny("load", "all") { | ||
| downloadDataFiles() | ||
|
|
@@ -1449,7 +1478,6 @@ func main() { | |
| procId = rand.Intn(1000) | ||
|
|
||
| err := run() | ||
| _ = os.RemoveAll(*tmp) | ||
| if err != nil { | ||
| os.Exit(1) | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing Rather than removing this unconditionally, add a new flag to control it: keepData = pflag.Bool("keep-data", false,
"Preserve downloaded test data after run (for CI caching). Default cleans up.")Then in err := run()
if !*keepData {
_ = os.RemoveAll(*tmp)
}The CI workflows would then pass cd t; ./t --suite=load --tmp=${{ github.workspace }}/test-data --keep-dataThis fits the existing flag pattern (similar to |
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shiva-istari one thing I'm concerned about here is that what version of the dataset fixtures is used is non-deterministic, and there's no way to know for sure which version of which fixture was used in a particular build if the cache was used. Which build seeded that cache? Which version of which fixture was cached when it did?
There's a hardcoded "-v1" suffix on the key, which AFAIK doesn't refer to any version of the test data being used — just the version of the key name itself.
I think instead either:
dataset-dgraphtest-<release-tag>).This will ensure builds get the same version of the test data they would on a fresh checkout even when a cache copy is used, that builds are idempotent and test results are reproducible. And it ensures that any version of a given test data asset is downloaded only once, then cached and reused by every subsequent build that uses it.
Additional concern: The file-exists check that guards re-downloading (
fi.Size() > 0) won't catch corrupted or truncated files from a previous failed run. Combined with replacingMakeDirEmptywithos.MkdirAll(which preserves existing content), there's no mechanism to detect or recover from a bad cached file. A checksum validation or known-size check would improve reliability here.