-
Notifications
You must be signed in to change notification settings - Fork 35
wip: random ddl-dml test #4012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
wip: random ddl-dml test #4012
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the testing capabilities for TiCDC by introducing a sophisticated random DDL and DML test runner. This new utility, integrated into several new weekly tests, aims to thoroughly validate TiCDC's resilience and correctness under diverse and challenging database workloads, including dynamic schema changes, data modifications, and system disruptions like capture failovers. The focus is on simulating realistic operational scenarios to uncover potential issues in data replication and consistency. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive random DDL/DML test runner, which is a great addition for improving testing coverage. The framework is well-structured with separate components for configuration, modeling, workload generation, and verification. I've made a few suggestions to improve code clarity, maintainability, and consistency, primarily in the shell scripts and some of the Go helper functions. Specifically, I've pointed out opportunities to refactor duplicated code in shell scripts, replace magic numbers with constants, and simplify some complex implementations in the Go code. Overall, this is a solid piece of work.
| if command -v rg >/dev/null 2>&1; then | ||
| if rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 20 | rg -n . >/dev/null 2>&1; then | ||
| echo "log scan: panic/fatal/race detected" | ||
| rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 50 || true | ||
| exit 1 | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log scanning logic here is a bit complex and can be simplified. By capturing the output of rg and checking if it's non-empty, you can avoid running rg twice. This also makes the code more readable.
This comment also applies to weekly_rand_multi_failover/run.sh and weekly_rand_single/run.sh which have identical logic.
| if command -v rg >/dev/null 2>&1; then | |
| if rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 20 | rg -n . >/dev/null 2>&1; then | |
| echo "log scan: panic/fatal/race detected" | |
| rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 50 || true | |
| exit 1 | |
| fi | |
| fi | |
| if command -v rg >/dev/null 2>&1; then | |
| log_files_to_scan=( | |
| "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log | |
| "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log | |
| "$WORK_DIR"/cdc_*_consumer_stdout*.log | |
| ) | |
| if matches=$(rg -n -i "panic|fatal|data race" "${log_files_to_scan[@]}" 2>/dev/null); then | |
| echo "log scan: panic/fatal/race detected" | |
| echo "$matches" | head -n 50 | |
| exit 1 | |
| fi | |
| fi |
| if command -v rg >/dev/null 2>&1; then | ||
| if rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 20 | rg -n . >/dev/null 2>&1; then | ||
| echo "log scan: panic/fatal/race detected" | ||
| rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 50 || true | ||
| exit 1 | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log scanning logic here is a bit complex and can be simplified. By capturing the output of rg and checking if it's non-empty, you can avoid running rg twice. This also makes the code more readable.
| if command -v rg >/dev/null 2>&1; then | |
| if rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 20 | rg -n . >/dev/null 2>&1; then | |
| echo "log scan: panic/fatal/race detected" | |
| rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 50 || true | |
| exit 1 | |
| fi | |
| fi | |
| if command -v rg >/dev/null 2>&1; then | |
| log_files_to_scan=( | |
| "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log | |
| "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log | |
| "$WORK_DIR"/cdc_*_consumer_stdout*.log | |
| ) | |
| if matches=$(rg -n -i "panic|fatal|data race" "${log_files_to_scan[@]}" 2>/dev/null); then | |
| echo "log scan: panic/fatal/race detected" | |
| echo "$matches" | head -n 50 | |
| exit 1 | |
| fi | |
| fi |
| if command -v rg >/dev/null 2>&1; then | ||
| if rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 20 | rg -n . >/dev/null 2>&1; then | ||
| echo "log scan: panic/fatal/race detected" | ||
| rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 50 || true | ||
| exit 1 | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log scanning logic here is a bit complex and can be simplified. By capturing the output of rg and checking if it's non-empty, you can avoid running rg twice. This also makes the code more readable.
| if command -v rg >/dev/null 2>&1; then | |
| if rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 20 | rg -n . >/dev/null 2>&1; then | |
| echo "log scan: panic/fatal/race detected" | |
| rg -n -i "panic|fatal|data race" "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log "$WORK_DIR"/cdc_*_consumer_stdout*.log 2>/dev/null | head -n 50 || true | |
| exit 1 | |
| fi | |
| fi | |
| if command -v rg >/dev/null 2>&1; then | |
| log_files_to_scan=( | |
| "$WORK_DIR"/runner.log "$WORK_DIR"/ddl_trace.log "$WORK_DIR"/stdout*.log | |
| "$WORK_DIR"/cdc*.log "$WORK_DIR"/cdc_*_consumer*.log | |
| "$WORK_DIR"/cdc_*_consumer_stdout*.log | |
| ) | |
| if matches=$(rg -n -i "panic|fatal|data race" "${log_files_to_scan[@]}" 2>/dev/null); then | |
| echo "log scan: panic/fatal/race detected" | |
| echo "$matches" | head -n 50 | |
| exit 1 | |
| fi | |
| fi |
|
|
||
| run_sql "SET GLOBAL tidb_enable_external_ts_read = off;" ${DOWN_TIDB_HOST} ${DOWN_TIDB_PORT} | ||
|
|
||
| echo "[$(date)] <<<<<< run test case $TEST_NAME success! >>>>>>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script is missing the final log scan for panics, which is present in other run.sh scripts in this PR. This is inconsistent. While the Go runner has its own log scanning, the shell script scan covers logs that the Go runner might not, and it runs at a different point in the test execution. It's good practice to have it for consistency and robustness. Consider adding it before the final echo.
| nextDML := activeDML | ||
| nextDDL := activeDDL | ||
|
|
||
| if sinceAdvance >= soft || successRate < 0.10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return autoTuneResult{nextDML: nextDML, nextDDL: nextDDL} | ||
| } | ||
| if nextDML > 1 { | ||
| nextDML -= 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func mathMaxInt32() int64 { | ||
| return int64(^uint32(0) >> 1) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| carry := make([]byte, 0, maxPatternLen) | ||
| scratch := make([]byte, 0, 256*1024) | ||
| tmp := make([]byte, 0, 256*1024) | ||
| lineMatched := false | ||
|
|
||
| for { | ||
| part, isPrefix, err := reader.ReadLine() | ||
| if err != nil { | ||
| if err == io.EOF { | ||
| break | ||
| } | ||
| _ = f.Close() | ||
| return err | ||
| } | ||
|
|
||
| if !lineMatched { | ||
| tmp = append(tmp[:0], part...) | ||
| for i := range tmp { | ||
| c := tmp[i] | ||
| if c >= 'A' && c <= 'Z' { | ||
| tmp[i] = c + ('a' - 'A') | ||
| } | ||
| } | ||
|
|
||
| scratch = append(scratch[:0], carry...) | ||
| scratch = append(scratch, tmp...) | ||
|
|
||
| for i, p := range patternBytes { | ||
| if bytes.Contains(scratch, p) { | ||
| hits = append(hits, hit{file: filepath.Base(path), line: lineNo + 1, pat: lowerPatterns[i]}) | ||
| lineMatched = true | ||
| break | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if !isPrefix { | ||
| lineNo++ | ||
| carry = carry[:0] | ||
| lineMatched = false | ||
| continue | ||
| } | ||
| // Keep a small suffix from the previous fragment to detect patterns spanning boundaries. | ||
| if len(scratch) == 0 { | ||
| carry = carry[:0] | ||
| continue | ||
| } | ||
| keep := maxPatternLen - 1 | ||
| if keep <= 0 { | ||
| carry = carry[:0] | ||
| continue | ||
| } | ||
| if keep > len(scratch) { | ||
| keep = len(scratch) | ||
| } | ||
| carry = append(carry[:0], scratch[len(scratch)-keep:]...) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation for scanning logs is quite complex. It manually handles line splitting with ReadLine and lowercasing. This can be simplified significantly by using bufio.Scanner to handle lines correctly (even long ones) and bytes.ToLower for case-insensitive matching. This would make the code more readable and maintainable.
Here's an example of a simpler approach:
scanner := bufio.NewScanner(f)
lineNo := 0
for scanner.Scan() {
lineNo++
lowerLine := bytes.ToLower(scanner.Bytes())
for i, p := range patternBytes {
if bytes.Contains(lowerLine, p) {
hits = append(hits, hit{file: filepath.Base(path), line: lineNo, pat: lowerPatterns[i]})
break // Found a match on this line
}
}
}
if err := scanner.Err(); err != nil {
_ = f.Close()
return err
}|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
|
@hongyunyan: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #xxx
What is changed and how it works?
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note