Skip to content

Propose Solution for Efficient Sync with Many Delete Entries #2066

@VJag

Description

@VJag

Is your feature request related to a problem? Please describe.

The current synchronization process for atServer is facing inefficiencies, particularly when handling large numbers of deletions. This impacts the performance and scalability of the system. This ticket's objective is to propose a solution to improve synchronization efficiency in scenarios involving numerous delete entries, ensuring the solution is scalable and independent of Hive.

Current Design Overview:

CRUD Operations:

Data Storage: atServer stores data as key-value pairs.
Key Management: Keys can be created, deleted, or automatically expired using the ttl (time to live) parameter.
Expired Key Cleanup: A cron job deletes expired keys.
Key Storage: All keys are stored in a Hive box named KeyStore.

Commit Log:

Operation Logging: Key creation or updates are logged in a Hive box called CommitLog with an auto-generated sequence number.
Recording Changes: Each operation is recorded with a new sequence number.
Single Entry per Key: The CommitLog maintains one entry per unique key.

In-Memory Compact CommitLog:

In-Memory Representation: atServer keeps an in-memory map of the CommitLog to optimize synchronization.
Sync Efficiency: This map supports efficient synchronization operations.

Sync Process:

Client Connections: Multiple clients can be connected to an atServer.
Data Synchronization: Clients sync data with the atServer, which assigns a commit ID. Clients record this ID locally.
Sync Status: A data item with a server commit ID indicates it is synced.
Managing Sync Differences: Clients must update their local commit ID before pushing new data if their ID is lower than the server's latest ID.

Current Design Issues:

Inefficient Sync with Many Deletions:

  • Excessive Syncing of Deleted Keys: New clients must sync all keys, including numerous deletions, leading to significant time and space inefficiencies.

  • Impact on Sync Performance: Syncing a large number of deletions consumes bandwidth and processing resources, reducing overall efficiency.

  • Inefficiencies in Key Expiry: Clients that created expired keys also sync deletions, even though they could manage these locally.

Describe the solution you'd like

Propose a Solution for Efficient Sync with Many Delete Entries: Scalable and Hive-Agnostic

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions