Basic sync support by grigoryk · Pull Request #563 · mozilla/mentat

grigoryk · 2018-02-16T16:38:18Z

Fast-forward syncing with support for "merging" the bootstrap transaction.

Issues more or less resolved with this: #513, #514, #511

rnewman

Keep going!

rnewman · 2018-02-20T16:13:21Z

src/conn.rs

+        let mut last_tx_uuid = None;
+        // Giant hack: we're expecting 'parts' to not actually contain metadata
+        // about any transactions - just their datoms! we generate metadata ourselves
+        // while transacting, thus loosing when transactions were made originally /o\


Yep, for that you'll need to go a little lower, and also renumber (#494).

rnewman · 2018-02-20T16:14:10Z

src/conn.rs

+        match tx_list {
+            Some(list) => {
+                for tx in list {
+                    let in_progress = self.begin_transaction()?;


I think you should lift this out of the for loop — you can transact multiple times against an in_progress, and they'll all share a single SQLite transaction.

rnewman · 2018-02-20T16:15:12Z

src/conn.rs

+        let mut tx_list = None;
+        {
+            let mut db_tx = self.sqlite.transaction()?;
+            if let Some(list) = Syncer::flow(&mut db_tx, server_uri, &uuid)? {


If you implement flow on InProgress, you get the transaction for free… the only tricky bit is that you'd ideally like to not do any work (including establishing an exclusive transaction!) if there's no data to write.

rnewman · 2018-02-20T16:23:22Z

src/conn.rs

+            db_tx.commit()?;
+        }
+        let mut last_tx_entid = None;
+        let mut last_tx_uuid = None;


Use a pair to indicate that these two things must be set together.

let mut last_tx = None; … last_tx = Some((uuid, entid));

rnewman · 2018-02-20T16:24:22Z

tolstoy/src/syncer.rs

+#[derive(Debug)]
+pub struct Tx {
+    pub tx: Uuid,
+    pub parts: Vec<TxPart>


Nit: trailing comma, please.

rnewman · 2018-03-08T17:11:14Z

src/conn.rs

        })
    }
+
+    pub fn fast_forward_user_partition(&mut self, new_head: Entid) -> Result<()> {


What I expect to see here is essentially a compare-and-set: if my local partition map is this, then expand it to this one.

Doing that work requires writing to the DB, so it shouldn't be on Store, it should be on InProgress; if the DB write fails the metadata will be discarded.

And you want to advance all parts — typically user and tx — at the same time, atomically, right? Remember that the data itself will refer to tx IDs — it's not enough to only work with the user partition!

rnewman · 2018-03-08T17:12:11Z

src/conn.rs

+    pub fn fast_forward_user_partition(&mut self, new_head: Entid) -> Result<()> {
+        let mut metadata = self.conn.metadata.lock().unwrap();
+        metadata.partition_map.expand_up_to(":db.part/user", new_head);
+        db::update_partition_map(&mut self.sqlite, &metadata.partition_map).map_err(|e| e.into())


I would rather see the metadata change as a side-effect of updating the DB. The db:: code is where we'll be able to do compare-and-set, and we already have code to update the metadata after a transact.

Relatedly: it might be worth thinking about what you're doing here as a special case of renumbering.

Renumbering is the process of taking a space of new remote identifiers, moving any conflicting local identifiers out of the way, and making space locally for the remote identifiers. If there are no conflicting local identifiers, then the second step involves no work.

This is a nice unification because it reduces the number of different code paths…

rnewman · 2018-03-08T17:14:47Z

src/lib.rs

+#[cfg(feature = "syncable")]
+pub mod sync;
+
+pub fn get_name() -> String {


I'm pretty sure I deleted this… it was just there for testing, long ago.

rnewman · 2018-03-08T17:14:53Z

src/sync.rs

@@ -0,0 +1,133 @@
+// Copyright 2016 Mozilla


rnewman · 2018-03-08T17:16:48Z

src/sync.rs

+        // partition's next allocation will not overlap with just-inserted datoms.
+        // To allow for "holes" in the user partition (due to data excision),
+        // we track the highest incoming entid we saw, and expand our
+        // local partition to match.


Again, I think this process would benefit from splitting out renumbering.

Download the new remote parts table.

Renumber locally, which also makes space locally for remotely allocated identifiers. (In the fast-forward case no work is done here.)

Now we get into the state machine. For fast-forwarding:

Apply remote datoms (which now don't exceed the parts table!).

Make sense?

ncalexan

We'll talk more tomorrow, @grigoryk. Thanks for pushing the spear on this!

ncalexan · 2018-06-29T00:33:28Z

src/errors.rs

 use std::collections::BTreeSet;

 use rusqlite;
+use uuid;


Lift this commit out and land it, since it's mechanical.

ncalexan · 2018-06-29T00:34:17Z

tolstoy/src/metadata.rs

        Ok(())
    }
+
+    pub fn get_partitions(tx: &rusqlite::Transaction) -> Result<PartitionMap> {


We have

mentat/db/src/db.rs

Line 443 in 5fe4f12

fn read_partition_map(conn: &rusqlite::Connection) -> Result<PartitionMap> {

just waiting to be generalized; let's use it.

ncalexan · 2018-06-29T00:36:54Z

tolstoy/src/schema.rs

+    // For now, only information about the user partition is tracked.
+    // We rely on the transactor to advance the tx partition.
+    // Support for migrating the db partition forward (that is, bootstrap schema) is TBD.
+    tx.execute("INSERT OR IGNORE INTO tolstoy_parts VALUES (?, ?, ?)", &[&PARTITION_USER, &USER0, &USER0])?;


We also have

mentat/db/src/bootstrap.rs

Line 88 in 5fe4f12

static ref V1_PARTS: [(symbols::Keyword, i64, i64); 3] = {

.

ncalexan · 2018-06-29T00:37:19Z

tolstoy/src/schema.rs

            }
        }

+        let new_idx = USER0 + 1;


There's a function for writing the partition map in general, too.

ncalexan · 2018-06-29T00:38:00Z

tolstoy/src/schema.rs

            (_, _) => { panic!("Wrong number of results."); },
        }
+
+        let partitions = SyncMetadataClient::get_partitions(&tx).unwrap();


In general, we can't be :db.part/user-centric, 'cuz :db.part/tx advances too. I think this will get simpler as you generalize, not more complicated.

ncalexan · 2018-06-29T00:43:56Z

tolstoy/src/syncer.rs

+        }
+
+        // Annotate first datom in the series with the user partition information.
+        // TODO this is obviously wrong - we want to read partition info without


It feels like this is at the wrong place in the stack. You want to change the flow from Vec<Tx> to (PartitionMap, Vec<Tx>), but what you've done is change it to Vec<(Option<PartitionMap>, Tx)> and an assumption about option.

It feels to me like we need some notion of a chunk that's more general than what we have now; some chunks are transaction data, yes; some are transactions themselves; but some are also partition maps.

I think you're probably aware of this, and tomorrow we can talk through the difficulty of changing the underlying model more broadly.

I'm viewing this particular change as purely temporary - I really don't like having partition information one step removed from the transaction - and following the path of least resistance for now. The goal is to move this out of chunks entirely, and lift it up into transactions. They should be annotated with partitionmaps, and the only reason that's not the way it is now is lack of server support.

Currently, I don't think there's any other need to generalize a chunk, and doing so seems like a distraction.

My higher level view of this - let's settle on the overall flow first - how data generally moves around, what do we need to move around, etc - then re-shape the server and clients to match. In the meantime, iterate quickly and maintain some kind of momentum.

"Nothing is as permanent as the temporary", but I hope that won't apply 👯‍♂️

ncalexan · 2018-06-29T00:47:07Z

db/src/renumber.rs


        // N.b., after renumbering, the on-disk data is not reflected in the `TestConn`!
-        let mut db = renumber(&local.sqlite,
+        let mut db;


I think something has gotten confused. Everything here is going to take place inside an encapsulating SQL transaction; we shouldn't be managing transactions in this helper function at all. It's up to consumers to manage isolation, etc; and sync (like transact) will do so.

Reason to yourself: suppose renumbering succeeds but something later fails -- how will I roll back renumbering? If you commit, you can't. (Without using nested transactions or checkpoints or other complicated things, which I really don't think we need.)

grigoryk · 2018-08-31T01:53:07Z

Wow, this has been open for 7 months. This is a good example of how "not to PR". Anyway...

The current state is as follows.

Part 1 is very non-controversial. It clears up the code base, introduces bunch of internal types and solidifies the general flow of things. As for mechanics of a sync beyond fast-forward scenarios, only a "no-op rebase of local on top of remote" is supported. That is, if a merge generates any assertions/retractions, we bail out.

A merge is essentially: "rewind local transactions, transact the remote transactions, then transact local. succeed if local transactions had no effect, bail out otherwise".

Part 2 takes things further, and generally allows anything one can throw at a transactor (sans implemention bugs and some TODOs). Except for schema alterations. It's fine to add new vocabulary (including at the same time, on both "remote" and "local", as long as it's defined the same way), but we bail out if we see changes to existing vocabulary. This is done largely so that we can address syncing of schema alterations by themselves.

A merge is essentially: "rewind local transactions, transact the remote transactions, then ensure local may be transacted, perform some modifications, transact them, and request a follow-up sync if that operation produced any new datoms."

Certain choices were made around handling of entids and retractions, which are non-optimal in some cases but produce understandable results - and are natural given the current approach.

General approach is to let the transactor handle merges, letting the schema dictate how that happens. A concrete manifestation of this:

:db/unique attributes will upsert. Asserting entities against non-unique attributes will naturally produce duplicates.

During a rebase, local entids are replaced with tempids. Retractions pose a challenge: a retraction won't allocate, so if a tempid mentioned in the retraction doesn't resolve, transaction will fail. If a retraction datom is by itself, its tempids will never resolve. Current approach is to replace tempids with lookup-refs whenever we see retractions that use tempids that we recognize won't resolve. Additionally, a lookup-ref mentioned in a retraction must resolve, and so we drop on the floor any retractions that fail to meet that criteria.

While all of this has certain limitations, it has very nice general property: an intuition about transactor's behaviour applies directly to sync's behaviour. Additionally, expanding what transactor accepts and supports will expand sync's capabilities with little additional effort. For example, if we ever add compound uniqueness, above sync mechanics should "just work" as you'd expect them.

grigoryk · 2018-08-31T21:58:40Z

Currently tests are failing due to a bug in the transactor: #818

Since timeline move operations use a transactor, they generate a "phantom" 'tx' and a 'txInstant' assertion. It is "phantom" in a sense that it was never present in the 'transactions' table, and is entirely synthetic as far as our database is concerned. It's an implementational artifact, and we were not cleaning it up. It becomes a problem when we start inserting transactions after a move. Once the transactor clashes with the phantom 'tx', it will retract the phantom 'txInstant' value, leaving the transactions log in an incorrect state. This patch adds a test for this scenario and elects the easy way out: simply remove the offending 'txInstant' datom.

A "side-effect" is defined here as a mutation of a remote state as part of the sync. If, during a sync we determine that a remote state needs to be changed, bail out. This generally supports different variations of "baton-passing" syncing, where clients will succeed syncing if each change is non-conflicting.

This patch introduces a concept of a follow-up sync. If a sync generated a "merge transaction" (a regular transaction that contains assertions necessary for local and remote transaction logs to converge), then this transaction needs to be uploaded in a follow-up sync. Generated SyncReport indicates if a follow-up sync is required. Follow-up sync itself is just a regular sync. If remote state did not change, it will result in a simple RemoteFastForward. Otherwise, we'll continue merging and requesting a follow-up. Schema alterations are explicitly not supported. As local transactions are rebased on top of remote, following changes happen: - entids are changed into tempids, letting transactor upsert :db/unique values - entids for retractions are changed into lookup-refs if we're confident they'll succeed -- otherwise, retractions are dropped on the floor

ncalexan · 2018-09-07T22:29:08Z

tolstoy/README.md

+
+## Overview
+### Very briefly
+Tolstoy will synchronize a local Mentat database against a remote server, modifying local state if necessary, and uploading changes to the server if necessary. Schema additions are allowed (adding vocabulary). Schema mutations are currently not implemented (changing vocabulary). Mentat's core schema must be the same.


nit: "[must be the same] on all participating clients (i.e., core schema alterations are unsupported)."

ncalexan · 2018-09-07T22:29:48Z

tolstoy/README.md

+### In more detail...
+Syncing is defined in terms of coming to an agreement between local and remote states. A local state is what's currently present on the current instance. A remote state is what's currently present on a server.
+
+We're synchronizing transaction logs are synchronized, and so we may think about the primitive operations in context of smushing together two logs - local and remote.


nit: this sentence got caught in an edit.

Oh man, it did indeed.

ncalexan · 2018-09-07T22:30:17Z

tolstoy/README.md

+Internally, Tolstoy tracks the "locally known remote HEAD" and the "last-synced local transaction", which gives us three basic primitives:
+- a shared root, a state which is at the root of both local and remote
+- incoming changes - what remote changed on top of the shared root
+- local changes on top of shared root.


nit: "[of] the [shared root]".

ncalexan · 2018-09-07T22:30:35Z

tolstoy/README.md

+- a remote fast-forward - there are local changes, but no remote changes
+- a merge - there are both local and remote changes.
+
+First three cases are "trivial" - we either do nothing, or we download and transact remote transactions, or we upload local transactions and advance remote HEAD.


nit: "The [first three] ...".

ncalexan

Looking good. Some small changes, then bombs away!

ncalexan · 2018-09-07T22:32:15Z

tolstoy/README.md

+2. remote transactions are transacted on top of the shared root
+3. local transactions are transacted
+
+Generally, intuition about transactor's behaviour applies to reasoning about Tolstoy's sync as well. If a transaction "makes sense", it will be applied.


nit: "[about] the [transactor's]"

ncalexan · 2018-09-07T22:32:47Z

tolstoy/README.md

+
+Remote transactions are applied "as-is", with an exception of the `txInstance` - it must be preserved, and so the datom describing it is re-written prior to application to use the `(transaction-tx)` transaction function.
+
+Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped:


s/additions/assertions/

ncalexan · 2018-09-07T22:33:02Z

tolstoy/README.md

+Remote transactions are applied "as-is", with an exception of the `txInstance` - it must be preserved, and so the datom describing it is re-written prior to application to use the `(transaction-tx)` transaction function.
+
+Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped:
+- we're retracting an entitiy which isn't `db/unique`


:db/unique (include colon)

ncalexan · 2018-09-07T22:33:13Z

tolstoy/README.md

+
+Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped:
+- we're retracting an entitiy which isn't `db/unique`
+- we're retracting an entitiy which was already retracted by `remote`.


by the remote client

ncalexan · 2018-09-07T22:33:51Z

tolstoy/README.md

+## Explicitly not supported - will abort with a NotYetImplemented
+This alpha implementation doesn't support some cases, but it recognizes them and gracefully aborts (leaving local and remote states untouched):
+- Syncing against a Mentat instance which uses a different core schema version.
+- Syncing with schema mutations. Schema additions are fine, but transactions which change a set of attributes that define a user-defined `db/ident` will cause sync to abort.


nit: :db/ident (throughout -- keep the colons).

ncalexan · 2018-09-07T22:34:09Z

tolstoy/README.md

+
+## Misc operational properties
+- All sync operations happen in a context of an `InProgress` - an internal Mentat transaction representation. If sync succeeds, all necessary operations are comitted to the underlying database in a single SQLite transaction. Similarly, an aborting sync will simply drop an uncomitted transaction.
+- "Follow-up" syncing is currently supported in a most basic way: if there are local changes arising from a merge operation, they are comitted to the local store, and a full sync is requested which is expected to fast-forward remote state in an optimal case, and if we lost the race to the server - to merge the local "merged state" with further remote changes.


"in a basic manner"

ncalexan · 2018-09-07T22:34:33Z

tolstoy/README.md

+- "Follow-up" syncing is currently supported in a most basic way: if there are local changes arising from a merge operation, they are comitted to the local store, and a full sync is requested which is expected to fast-forward remote state in an optimal case, and if we lost the race to the server - to merge the local "merged state" with further remote changes.
+
+## Server
+Tolstoy operates against an instance of [Mentat Sync Prototype Server](https://github.com/rfk/mentat-sync-prototype). That repository defines a transaction-oriented API, which is all that Tolstoy expects of the server.


Can you include a hash, so that we know where this left off?

rnewman reviewed Feb 20, 2018

View reviewed changes

grigoryk force-pushed the grisha/sync-downloader branch 5 times, most recently from 05db24b to c42d100 Compare February 28, 2018 00:34

grigoryk mentioned this pull request Feb 28, 2018

Don't walk the whole tx table to figure out current local HEAD #572

Open

grigoryk force-pushed the grisha/sync-downloader branch from c42d100 to f6ee832 Compare March 1, 2018 00:19

grigoryk force-pushed the grisha/sync-downloader branch 2 times, most recently from c751faf to d9d2b3a Compare March 8, 2018 07:21

grigoryk changed the title ~~WIP: Rough cut of a sync downloader and an incoming tx transactor~~ Fast-forward syncing Mar 8, 2018

rnewman reviewed Mar 8, 2018

View reviewed changes

rnewman force-pushed the master branch from 457e645 to f71b2b2 Compare March 19, 2018 21:35

grigoryk self-assigned this May 30, 2018

ncalexan mentioned this pull request Jun 6, 2018

Fast-forward syncing, rebased. #736

Closed

grigoryk force-pushed the grisha/sync-downloader branch 2 times, most recently from 4e6bd1a to 4fc5e47 Compare June 8, 2018 01:27

ncalexan force-pushed the grisha/sync-downloader branch from adfebc5 to 92d1550 Compare June 8, 2018 21:07

grigoryk force-pushed the grisha/sync-downloader branch 4 times, most recently from 1f85186 to 9d1f0fa Compare June 26, 2018 00:51

grigoryk requested a review from ncalexan June 26, 2018 01:04

grigoryk force-pushed the grisha/sync-downloader branch 2 times, most recently from eac23fa to d35fcbc Compare June 27, 2018 02:31

ncalexan mentioned this pull request Jun 27, 2018

[meta] Reduce dependencies and on-disk footprint #772

Open

2 tasks

grigoryk force-pushed the grisha/sync-downloader branch 2 times, most recently from b90ccf8 to 70f5120 Compare June 28, 2018 23:24

ncalexan reviewed Jun 29, 2018

View reviewed changes

grigoryk mentioned this pull request Jun 29, 2018

Concrete TolstoyError error type for tolstoy Result; wrap dependency errors #776

Closed

grigoryk force-pushed the grisha/sync-downloader branch 10 times, most recently from e7f56e6 to c7bab02 Compare August 30, 2018 23:44

grigoryk force-pushed the grisha/sync-downloader branch from e411bd8 to 498f2da Compare September 6, 2018 22:53

Grisha Kruglov added 4 commits September 7, 2018 13:46

Pre: remove remnants of 'open_empty'

f69fb58

grigoryk force-pushed the grisha/sync-downloader branch 2 times, most recently from 8b1cfdc to 0c0e794 Compare September 7, 2018 22:30

ncalexan approved these changes Sep 7, 2018

View reviewed changes

Grisha Kruglov added 2 commits September 7, 2018 17:42

Tolstoy README

b3796d4

Post: use a macro for more readable tests

20e4d87

grigoryk force-pushed the grisha/sync-downloader branch from 0c0e794 to 20e4d87 Compare September 8, 2018 00:42

gburd mentioned this pull request Aug 6, 2020

Don't walk the whole tx table to figure out current local HEAD qpdb/mentat#191

Open


		Remote transactions are applied "as-is", with an exception of the `txInstance` - it must be preserved, and so the datom describing it is re-written prior to application to use the `(transaction-tx)` transaction function.

		Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped:

Conversation

grigoryk commented Feb 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncalexan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grigoryk commented Aug 31, 2018

Uh oh!

grigoryk commented Aug 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncalexan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

grigoryk commented Feb 16, 2018 •

edited

Loading