Fix restore order: wait for replicated DB sync before restoring tables by otselnik · Pull Request #302 · yandex/ch-backup

otselnik · 2026-03-19T16:09:21Z

Try to fix readonly replicated tables in-place via ZooKeeper metadata cleanup and SYSTEM RESTORE REPLICA before falling back to table recreation

Summary by Sourcery

Improve handling of replicated databases and readonly replicated tables during restore by syncing via ZooKeeper log pointers and attempting in-place fixes before table recreation.

Bug Fixes:

Attempt to fix readonly replicated tables in-place by cleaning ZooKeeper metadata and running SYSTEM RESTORE REPLICA before falling back to dropping and recreating tables.
Ensure replicated database sync happens before and after table restore, reducing chances of stuck DDL workers and inconsistent restore order.

Enhancements:

Replace SYSTEM SYNC DATABASE REPLICA usage with a ZooKeeper-based sync mechanism that tracks log_ptr and max_log_ptr per replicated database and inspects distributed DDL queue errors.
Refactor table drop logic into a dedicated helper, preserving existing behavior for dictionaries and long table names while simplifying restore preprocessing.

Try to fix readonly replicated tables in-place via ZooKeeper metadata cleanup and SYSTEM RESTORE REPLICA before falling back to table recreation

sourcery-ai · 2026-03-19T16:09:27Z

Reviewer's Guide

Refactors the restore workflow to first synchronize replicated databases via ZooKeeper log_ptr tracking, attempts in‑place repair of readonly replicated tables using ZooKeeper metadata cleanup plus SYSTEM RESTORE REPLICA before falling back to table recreation, and centralizes table-drop logic while skipping table operations in replicated databases unless explicitly requested.

Sequence diagram for updated restore workflow with replicated DB sync

sequenceDiagram
    actor Operator
    participant CLI as ch_backup_restore
    participant Restore as RestoreWorkflow
    participant DBSync as wait_sync_replicated_databases
    participant SyncOne as _sync_database
    participant ZKData as DatabaseZookeeperData
    participant ZK as ZookeeperCTL
    participant CH as ClickhouseCTL
    participant TblMgr as TableBackupManager

    Operator->>CLI: invoke restore
    CLI->>Restore: _restore(context, sources,...)

    Restore->>DBSync: wait_sync_replicated_databases(context, restored_databases, keep_going=True)
    loop for each replicated Database
        DBSync->>SyncOne: _sync_database(context, db, deadline)
        SyncOne->>ZKData: DatabaseZookeeperData.resolve(context, db)
        ZKData->>CH: get_macros()
        CH-->>ZKData: macros
        ZKData->>ZK: zk_client.get(zk_root_path + zk_path + /max_log_ptr)
        ZK-->>ZKData: max_log_ptr
        ZKData->>ZK: zk_client.get(zk_root_path + zk_path + /replicas/shard|replica/log_ptr)
        ZK-->>ZKData: log_ptr
        alt log_ptr >= max_log_ptr
            SyncOne-->>DBSync: sync complete for db
        else log_ptr < max_log_ptr
            loop until deadline
                SyncOne->>ZKData: get_log_ptr()
                ZKData->>ZK: zk_client.get(...log_ptr)
                ZK-->>ZKData: log_ptr
                alt log_ptr >= max_log_ptr
                    SyncOne-->>DBSync: sync complete
                    SyncOne-->>SyncOne: break out of loop
                else log_ptr stalled
                    SyncOne->>CH: get_ddl_queue_unfinished_status(db.name)
                    CH-->>SyncOne: pending DDL entries
                    alt entries have errors
                        SyncOne-->>DBSync: raise RuntimeError
                    else no errors
                        SyncOne->>SyncOne: sleep(poll_interval)
                    end
                end
            end
        end
    end
    DBSync-->>Restore: sync_results per db (DONE/FAILED)

    Restore->>TblMgr: restore(..., restore_tables_in_replicated_database)
    TblMgr-->>Restore: tables restored

    Restore->>DBSync: wait_sync_replicated_databases(context, failed_databases, keep_going)
    DBSync-->>Restore: updated sync_results

    Restore-->>CLI: restore finished / errors
    CLI-->>Operator: output status

Sequence diagram for readonly replicated table in-place repair during restore

sequenceDiagram
    participant TblMgr as TableBackupManager
    participant Prep as _preprocess_tables_to_restore
    participant FixRO as _get_remaining_readonly_tables_after_fix_attempt
    participant DropTbl as _drop_existing_table
    participant MC as MetadataCleaner
    participant CH as ClickhouseCTL

    TblMgr->>Prep: _preprocess_tables_to_restore(context, tables, metadata_cleaner,...)

    Prep->>FixRO: _get_remaining_readonly_tables_after_fix_attempt(context, tables, metadata_cleaner)
    FixRO->>CH: get_replicas(readonly=True)
    CH-->>FixRO: readonly replicas list
    alt metadata_cleaner is None or no readonly tables
        FixRO-->>Prep: (existing_readonly_tables, empty_set)
    else some replicated readonly tables
        FixRO->>MC: clean_tables_metadata(replicated_readonly_tables)
        loop for each replicated_readonly_table
            FixRO->>CH: restore_replica(table)
            alt restore_replica fails
                CH-->>FixRO: error
                FixRO->>FixRO: log warning, continue
            else restore_replica succeeds
                CH-->>FixRO: ok
            end
        end
        FixRO->>CH: get_replicas(readonly=True)
        CH-->>FixRO: still_readonly set
        FixRO->>FixRO: compute already_cleaned
        FixRO-->>Prep: (remaining_readonly_tables, already_cleaned)
    end

    loop for each table to restore
        Prep->>Prep: find existing_table
        alt db is replicated and restore_tables_in_replicated_database is False
            Prep->>Prep: log skip restore/drop
        else existing_table does not exist
            Prep->>Prep: append table to result
        else existing_table exists
            Prep->>DropTbl: _drop_existing_table(context, table, existing_table)
            alt drop succeeds
                DropTbl-->>Prep: None
                Prep->>Prep: drop_performed=True, append table
            else drop raises
                DropTbl--xPrep: Exception
                alt keep_going is False
                    Prep-->>TblMgr: raise
                else keep_going is True
                    Prep->>Prep: log exception, continue
                end
            end
        end

        alt table is replicated and metadata_cleaner and not drop_performed and (db, name) not in readonly_cleaned_metadata
            Prep->>Prep: tables_to_clean_metadata.append(table)
        end
    end

    alt metadata_cleaner and tables_to_clean_metadata not empty
        Prep->>MC: clean_tables_metadata(tables_to_clean_metadata)
    end

    Prep-->>TblMgr: list of tables to actually restore

Class diagram for new replicated database sync and readonly repair helpers

classDiagram
    class BackupContext {
        +config: Dict
        +config_root: Dict
        +ch_ctl_conf: Dict
        +ch_ctl: ClickhouseCTL
        +zk_ctl: ZookeeperCTL
    }

    class ClickhouseCTL {
        +get_macros() Dict
        +get_replicas(readonly: bool) List~Dict~
        +restore_replica(table: Table) None
        +get_ddl_queue_unfinished_status(db_name: str) List~Dict~
        +drop_dictionary_if_exists(table: Table) None
        +drop_table_if_exists(table: Table) None
        +rename_table(src: Table, new_name: str) None
    }

    class ZookeeperCTL {
        +zk_client
        +zk_root_path: str
    }

    class Database {
        +name: str
        +engine_full: str
        +is_replicated_db_engine() bool
    }

    class Table {
        +database: str
        +name: str
        +uuid: str
        +create_statement: str
        +is_dictionary() bool
        +is_replicated() bool
        +make_dummy(database: str, name: str) Table
    }

    class MetadataCleaner {
        +clean_tables_metadata(tables: List~Table~) None
    }

    class DatabaseZookeeperData {
        +zk_path: str
        +shard: str
        +replica: str
        +max_log_ptr: int
        +get_log_ptr() int
        +get_max_log_ptr() int
        +parse_replicated_db_zk_info(db: Database, macros: Dict~str,str~) tuple
        +resolve(context: BackupContext, db: Database) DatabaseZookeeperData
        -_read_int_node(path: str) int
        -_full_path(suffix: str) str
    }
    DatabaseZookeeperData --> ZookeeperCTL : uses

    class SyncStatus {
        <<enumeration>>
        DONE
        FAILED
    }

    class DatabaseSyncModule {
        +wait_sync_replicated_databases(context: BackupContext, databases: Iterable~Database~, keep_going: bool) Dict~str,SyncStatus~
        -_sync_database(context: BackupContext, db: Database, deadline: float) None
        -_check_ddl_queue_for_errors(context: BackupContext, db_name: str) bool
    }
    DatabaseSyncModule --> BackupContext
    DatabaseSyncModule --> DatabaseZookeeperData
    DatabaseSyncModule --> ClickhouseCTL

    class TableBackupManager {
        +restore(context: BackupContext, ...) None
        -_preprocess_tables_to_restore(context: BackupContext, tables: List~Table~, metadata_cleaner: MetadataCleaner, restore_tables_in_replicated_database: bool, keep_going: bool) List~Table~
        -_get_remaining_readonly_tables_after_fix_attempt(context: BackupContext, tables: List~Table~, metadata_cleaner: MetadataCleaner) Tuple~Set~Tuple~str,str~~,Set~Tuple~str,str~~~
        -_drop_existing_table(context: BackupContext, table: Table, existing_table: Table) None
    }
    TableBackupManager --> BackupContext
    TableBackupManager --> ClickhouseCTL
    TableBackupManager --> MetadataCleaner
    TableBackupManager --> Table

    BackupContext --> ClickhouseCTL
    BackupContext --> ZookeeperCTL
    BackupContext --> Database

File-Level Changes

Change	Details	Files
Preprocess tables to restore by first attempting to fix readonly replicated replicated tables in place and by skipping table restore/drop in replicated databases unless explicitly requested.	Replace direct readonly replica lookup with a helper that attempts ZooKeeper metadata cleanup and SYSTEM RESTORE REPLICA before classifying tables as readonly Track which replicated tables had their metadata cleaned during the fix attempt to avoid redundant cleanup later Adjust restore decision flow to skip tables in replicated databases when the dedicated flag is not set, and only add tables to restore list when appropriate Perform ZooKeeper metadata cleanup for remaining replicated tables only when they were not already cleaned and when no drop was performed	`ch_backup/logic/table.py`
Extract and reuse table-dropping logic, including handling of dictionaries and long table names, into a dedicated helper.	Move dictionary-specific drop behavior into a shared _drop_existing_table helper Implement long-table-name handling by renaming to a random short name prior to drop when exceeding configured max length Invoke the new helper from table restore preprocessing and record whether a drop actually occurred to drive later logic	`ch_backup/logic/table.py`
Replace SYSTEM SYNC DATABASE REPLICA based database sync with a ZooKeeper log_ptr/max_log_ptr polling mechanism integrated earlier into the restore flow, including retry of failed databases after table restore.	Remove DatabaseBackup.wait_sync_replicated_databases and the ClickHouse control wrapper around SYSTEM SYNC DATABASE REPLICA Introduce database_sync module that resolves replicated database ZooKeeper paths, shards, and replicas from engine_full and polls log_ptr until it reaches max_log_ptr or times out, with DDL queue error detection via system.distributed_ddl_queue Expose get_ddl_queue_unfinished_status on ClickHouse control to query the DDL queue for unfinished entries and errors Invoke wait_sync_replicated_databases before table restore with keep_going=True to prime replicated databases, then re-run it only for databases that initially failed after tables are restored, and adjust config to use a poll interval instead of retry/backoff parameters	`ch_backup/logic/database.py` `ch_backup/logic/database_sync.py` `ch_backup/clickhouse/control.py` `ch_backup/ch_backup.py` `ch_backup/config.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Several new log statements use {}-style placeholders with logging.* (e.g. in _try_fix_readonly_tables), but the standard logging API expects %-style formatting or f-strings, so these messages will not be formatted as intended; consider switching to f-strings or %s placeholders.
The _try_fix_readonly_tables helper assumes it is given replicated tables (per the docstring) but does not enforce or validate this; consider either asserting table.is_replicated() inside or relaxing the wording in the docstring to match its actual behavior.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Several new log statements use `{}`-style placeholders with `logging.*` (e.g. in `_try_fix_readonly_tables`), but the standard logging API expects `%`-style formatting or f-strings, so these messages will not be formatted as intended; consider switching to f-strings or `%s` placeholders.
- The `_try_fix_readonly_tables` helper assumes it is given replicated tables (per the docstring) but does not enforce or validate this; consider either asserting `table.is_replicated()` inside or relaxing the wording in the docstring to match its actual behavior.

## Individual Comments

### Comment 1
<location path="ch_backup/logic/table.py" line_range="691-700" />
<code_context>
+                not restore_tables_in_replicated_database
+                and is_in_replicated_db
+            ):
+                logging.info(
+                    f"Skipping table {table_name_for_logs} because it is in replicated database and --restore-tables-in-replicated-database flag is not set",
+                )
</code_context>
<issue_to_address>
**issue:** Use standard logging formatting instead of `{}` placeholders to ensure arguments are interpolated.

The stdlib logger won’t interpolate `{}` placeholders; they’ll be logged literally and the extra args will be ignored. Please switch these calls to either f-strings or `%`-style placeholders, for example:

```python
logging.info(
    "Attempting to fix %d readonly table(s) via metadata cleanup + SYSTEM RESTORE REPLICA",
    len(readonly_tables),
)
```

Apply the same change to the other `logging.debug` / `logging.warning` / `logging.info` calls using `{}` placeholders.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

aalexfvk · 2026-03-19T19:37:54Z

+                )
+                continue

+            if need_drop and existing_table:


need_drop looks superfluous

aalexfvk · 2026-03-19T19:51:41Z

            for replica in context.ch_ctl.get_replicas(readonly=True)
        }

+        if metadata_cleaner and existing_readonly_tables:
+            replicated_readonly_tables = []
+
+            for table in tables:
+                table_key = (table.database, table.name)
+                if table.is_replicated() and table_key in existing_readonly_tables:
+                    replicated_readonly_tables.append(table)
+
+            if replicated_readonly_tables:
+                existing_readonly_tables -= self._try_fix_readonly_tables(
+                    context,
+                    replicated_readonly_tables,
+                    metadata_cleaner,
+                )


Let's extract as separate function also

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The new logging calls in _get_remaining_readonly_tables_after_fix_attempt use {}-style placeholders (logging.info(..., len(...)) etc.), but the standard Python logger expects %s-style placeholders, so these messages will log the braces literally rather than interpolated values; consider switching to %s placeholders or f-strings for those new messages.
Readonly replicated tables that are successfully repaired are passed to metadata_cleaner.clean_tables_metadata both inside _get_remaining_readonly_tables_after_fix_attempt and again later via tables_to_clean_metadata; if this double-cleaning is not required for correctness, you might want to avoid re-adding already-cleaned tables to reduce redundant work.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new logging calls in `_get_remaining_readonly_tables_after_fix_attempt` use `{}`-style placeholders (`logging.info(..., len(...))` etc.), but the standard Python logger expects `%s`-style placeholders, so these messages will log the braces literally rather than interpolated values; consider switching to `%s` placeholders or f-strings for those new messages.
- Readonly replicated tables that are successfully repaired are passed to `metadata_cleaner.clean_tables_metadata` both inside `_get_remaining_readonly_tables_after_fix_attempt` and again later via `tables_to_clean_metadata`; if this double-cleaning is not required for correctness, you might want to avoid re-adding already-cleaned tables to reduce redundant work.

## Individual Comments

### Comment 1
<location path="ch_backup/logic/table.py" line_range="669-678" />
<code_context>

+            # For tables in replicated databases, skip unless explicitly requested.
+            if not restore_tables_in_replicated_database and is_in_replicated_db:
+                logging.info(
+                    f"Skipping table {table_name_for_logs} because it is in replicated database and --restore-tables-in-replicated-database flag is not set",
+                )
</code_context>
<issue_to_address>
**issue (bug_risk):** Logging messages use `{}` placeholders but `logging` default formatting is `%`-style, so these arguments likely won’t be interpolated.

In this helper you’re passing extra positional arguments to `logging.info`/`logging.warning` while using `{}` in the message, so those arguments won’t be interpolated and the braces will appear literally. Please either use f-strings and drop the extra args, or switch the placeholders to `%s` (e.g. `"Attempting to fix %s readonly table(s)...", len(replicated_readonly_tables)`). Apply the same correction to all new logging calls in this block that use `{}` with positional args.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

…cated DB Track already-cleaned tables to avoid redundant re-cleaning; defer drop for replicated DB tables only when flag is not set instead of skipping restore entirely

otselnik · 2026-03-20T12:34:04Z

Reproducing the test case for deleting a table in read-only mode is difficult, because the script’s query that fetches read-only tables has to hit exactly the moment when a table has already been added by the database, but has not yet been activated up by the restart thread.
I’ve tried reproducing it a dozen times, but no luck so far.

…estore Track log_ptr via ZooKeeper to detect stalled DDL worker; retry sync after table restore.

Remove system sync branch, stall threshold and intermediate sync statuses

otselnik · 2026-03-24T08:23:30Z

@sourcery-ai review

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In ch_backup.py the logging.info call for retrying sync uses {} placeholders ("Retrying sync for {} database(s) that failed: {}") instead of %s, so the arguments will not be interpolated correctly; switch to %s or str.format to log the actual counts and names.
In GET_DDL_QUEUE_STATUS_SQL the filter uses WHERE cluster = '{db_name}', but the argument passed is a database name; if the intention is to query DDL entries for that database, this should likely filter by the database column (or by the actual cluster name derived elsewhere) rather than cluster.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `ch_backup.py` the `logging.info` call for retrying sync uses `{}` placeholders (`"Retrying sync for {} database(s) that failed: {}"`) instead of `%s`, so the arguments will not be interpolated correctly; switch to `%s` or `str.format` to log the actual counts and names.
- In `GET_DDL_QUEUE_STATUS_SQL` the filter uses `WHERE cluster = '{db_name}'`, but the argument passed is a database name; if the intention is to query DDL entries for that database, this should likely filter by the `database` column (or by the actual cluster name derived elsewhere) rather than `cluster`.

## Individual Comments

### Comment 1
<location path="ch_backup/ch_backup.py" line_range="631-632" />
<code_context>
+                if sync_results.get(db.name) == SyncStatus.FAILED
+            ]
+            if failed_databases:
+                logging.info(
+                    "Retrying sync for {} database(s) that failed: {}",
+                    len(failed_databases),
+                    ", ".join(db.name for db in failed_databases),
</code_context>
<issue_to_address>
**issue (bug_risk):** The logging call uses '{}' formatting but passes arguments separately, so placeholders won't be interpolated.

Here `logging.info` is given a `{}`-style format string plus separate arguments, but the stdlib logger only applies `%`-style interpolation. This will log the literal `{}` instead of the values. Please either format the string yourself (e.g. with an f-string) or switch to `%s` placeholders and let the logger format the message.
</issue_to_address>

### Comment 2
<location path="ch_backup/clickhouse/control.py" line_range="279" />
<code_context>
+        exception_code,
+        exception_text
+    FROM system.distributed_ddl_queue
+    WHERE cluster = '{db_name}'
+      AND status != 'Finished'
+    ORDER BY entry
</code_context>
<issue_to_address>
**issue (bug_risk):** Filtering distributed_ddl_queue by `cluster = '{db_name}'` looks conceptually wrong and may never match entries.

In `GET_DDL_QUEUE_STATUS_SQL`, `cluster = '{db_name}'` is compared against `db.name`, but `cluster` in `system.distributed_ddl_queue` normally stores the *cluster name*, not the database name. This will likely return no rows and hide real DDL errors. Revisit the predicate: either filter by the correct column for the replicated database, or drop the `cluster` filter and apply any DB/schema filtering in Python instead.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

aalexfvk · 2026-03-25T21:02:24Z

Reproducing the test case for deleting a table in read-only mode is difficult, because the script’s query that fetches read-only tables has to hit exactly the moment when a table has already been added by the database, but has not yet been activated up by the restart thread. I’ve tried reproducing it a dozen times, but no luck so far.

Can we reproduce a test case by deleting table metadata in zookeeper after first restore and make second restore ?

otselnik · 2026-03-26T11:50:57Z

Reproducing the test case for deleting a table in read-only mode is difficult, because the script’s query that fetches read-only tables has to hit exactly the moment when a table has already been added by the database, but has not yet been activated up by the restart thread. I’ve tried reproducing it a dozen times, but no luck so far.

Can we reproduce a test case by deleting table metadata in zookeeper after first restore and make second restore ?

Added test. It fails on current master.

aalexfvk · 2026-03-27T09:53:43Z

+    """
+    3
+    """
+    When we execute query on clickhouse02


Let's check that table is not readonly anymore

…tore

Fix restore order: wait for replicated DB sync before restoring tables

3140b72

Try to fix readonly replicated tables in-place via ZooKeeper metadata cleanup and SYSTEM RESTORE REPLICA before falling back to table recreation

sourcery-ai bot reviewed Mar 19, 2026

View reviewed changes

Comment thread ch_backup/logic/table.py

aalexfvk reviewed Mar 19, 2026

View reviewed changes

otselnik added 3 commits March 20, 2026 10:47

Refactor readonly tables fix logic into a single method

a71a8e8

Remove superfluous need_drop variable

0f8a844

Fix code formatting in table.py

9b1dcb6

sourcery-ai bot reviewed Mar 20, 2026

View reviewed changes

Comment thread ch_backup/logic/table.py

otselnik added 3 commits March 20, 2026 13:27

Fix ZooKeeper metadata double-cleaning for replicated tables in repli…

e7749f0

…cated DB Track already-cleaned tables to avoid redundant re-cleaning; defer drop for replicated DB tables only when flag is not set instead of skipping restore entirely

Fix table drop result tracking and minor code improvements

cdc596a

fix recreation

6db79e7

otselnik added 5 commits March 20, 2026 15:38

little refactoring

0840b58

Fix skipping restore of non-existing tables in replicated database

9cec424

Fix replicated database sync stuck on REPLICA_ALREADY_EXISTS during r…

67efd37

…estore Track log_ptr via ZooKeeper to detect stalled DDL worker; retry sync after table restore.

Simplify database_sync module logic

e30c289

Remove system sync branch, stall threshold and intermediate sync statuses

forgoten changes sync

d5d6555

sourcery-ai bot reviewed Mar 24, 2026

View reviewed changes

Comment thread ch_backup/ch_backup.py Outdated

Comment thread ch_backup/clickhouse/control.py

otselnik added 6 commits March 24, 2026 11:33

f string

1acdb72

Increase timeout

def3135

try fix zookeeper ssl

6a6501e

Fix replicated database restore logic

1ddacf7

Add stall detection for database replica sync

25f0b14

Merge branch 'yandex:main' into fix/replicated_db_restore

352c607

otselnik added 2 commits March 26, 2026 14:13

Add test

1773408

remove comments

b8594ab

aalexfvk reviewed Mar 27, 2026

View reviewed changes

otselnik added 2 commits March 27, 2026 13:08

Check that table is not readonly anymore

41af148

Merge remote-tracking branch 'origin/main' into fix/replicated_db_res…

621b651

…tore

aalexfvk approved these changes Mar 31, 2026

View reviewed changes

aalexfvk merged commit 45ee5ab into yandex:main Mar 31, 2026
22 of 23 checks passed

Conversation

otselnik commented Mar 19, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for updated restore workflow with replicated DB sync

Sequence diagram for readonly replicated table in-place repair during restore

Class diagram for new replicated database sync and readonly repair helpers

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aalexfvk Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

aalexfvk Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

otselnik commented Mar 20, 2026

Uh oh!

otselnik commented Mar 24, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aalexfvk commented Mar 25, 2026

Uh oh!

otselnik commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aalexfvk Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

otselnik commented Mar 19, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Mar 19, 2026 •

edited

Loading

otselnik commented Mar 26, 2026 •

edited

Loading