Conversation
src/conn.rs
Outdated
| let mut last_tx_uuid = None; | ||
| // Giant hack: we're expecting 'parts' to not actually contain metadata | ||
| // about any transactions - just their datoms! we generate metadata ourselves | ||
| // while transacting, thus loosing when transactions were made originally /o\ |
There was a problem hiding this comment.
Yep, for that you'll need to go a little lower, and also renumber (#494).
src/conn.rs
Outdated
| match tx_list { | ||
| Some(list) => { | ||
| for tx in list { | ||
| let in_progress = self.begin_transaction()?; |
There was a problem hiding this comment.
I think you should lift this out of the for loop — you can transact multiple times against an in_progress, and they'll all share a single SQLite transaction.
src/conn.rs
Outdated
| let mut tx_list = None; | ||
| { | ||
| let mut db_tx = self.sqlite.transaction()?; | ||
| if let Some(list) = Syncer::flow(&mut db_tx, server_uri, &uuid)? { |
There was a problem hiding this comment.
If you implement flow on InProgress, you get the transaction for free… the only tricky bit is that you'd ideally like to not do any work (including establishing an exclusive transaction!) if there's no data to write.
src/conn.rs
Outdated
| db_tx.commit()?; | ||
| } | ||
| let mut last_tx_entid = None; | ||
| let mut last_tx_uuid = None; |
There was a problem hiding this comment.
Use a pair to indicate that these two things must be set together.
let mut last_tx = None;
…
last_tx = Some((uuid, entid));
tolstoy/src/syncer.rs
Outdated
| #[derive(Debug)] | ||
| pub struct Tx { | ||
| pub tx: Uuid, | ||
| pub parts: Vec<TxPart> |
There was a problem hiding this comment.
Nit: trailing comma, please.
05db24b to
c42d100
Compare
c42d100 to
f6ee832
Compare
c751faf to
d9d2b3a
Compare
src/conn.rs
Outdated
| }) | ||
| } | ||
|
|
||
| pub fn fast_forward_user_partition(&mut self, new_head: Entid) -> Result<()> { |
There was a problem hiding this comment.
What I expect to see here is essentially a compare-and-set: if my local partition map is this, then expand it to this one.
Doing that work requires writing to the DB, so it shouldn't be on Store, it should be on InProgress; if the DB write fails the metadata will be discarded.
And you want to advance all parts — typically user and tx — at the same time, atomically, right? Remember that the data itself will refer to tx IDs — it's not enough to only work with the user partition!
src/conn.rs
Outdated
| pub fn fast_forward_user_partition(&mut self, new_head: Entid) -> Result<()> { | ||
| let mut metadata = self.conn.metadata.lock().unwrap(); | ||
| metadata.partition_map.expand_up_to(":db.part/user", new_head); | ||
| db::update_partition_map(&mut self.sqlite, &metadata.partition_map).map_err(|e| e.into()) |
There was a problem hiding this comment.
I would rather see the metadata change as a side-effect of updating the DB. The db:: code is where we'll be able to do compare-and-set, and we already have code to update the metadata after a transact.
There was a problem hiding this comment.
Relatedly: it might be worth thinking about what you're doing here as a special case of renumbering.
Renumbering is the process of taking a space of new remote identifiers, moving any conflicting local identifiers out of the way, and making space locally for the remote identifiers. If there are no conflicting local identifiers, then the second step involves no work.
This is a nice unification because it reduces the number of different code paths…
src/lib.rs
Outdated
| #[cfg(feature = "syncable")] | ||
| pub mod sync; | ||
|
|
||
| pub fn get_name() -> String { |
There was a problem hiding this comment.
I'm pretty sure I deleted this… it was just there for testing, long ago.
src/sync.rs
Outdated
| @@ -0,0 +1,133 @@ | |||
| // Copyright 2016 Mozilla | |||
src/sync.rs
Outdated
| // partition's next allocation will not overlap with just-inserted datoms. | ||
| // To allow for "holes" in the user partition (due to data excision), | ||
| // we track the highest incoming entid we saw, and expand our | ||
| // local partition to match. |
There was a problem hiding this comment.
Again, I think this process would benefit from splitting out renumbering.
- Download the new remote parts table.
- Renumber locally, which also makes space locally for remotely allocated identifiers. (In the fast-forward case no work is done here.)
Now we get into the state machine. For fast-forwarding:
- Apply remote datoms (which now don't exceed the parts table!).
Make sense?
4e6bd1a to
4fc5e47
Compare
adfebc5 to
92d1550
Compare
1f85186 to
9d1f0fa
Compare
eac23fa to
d35fcbc
Compare
b90ccf8 to
70f5120
Compare
| use std::collections::BTreeSet; | ||
|
|
||
| use rusqlite; | ||
| use uuid; |
There was a problem hiding this comment.
Lift this commit out and land it, since it's mechanical.
tolstoy/src/metadata.rs
Outdated
| Ok(()) | ||
| } | ||
|
|
||
| pub fn get_partitions(tx: &rusqlite::Transaction) -> Result<PartitionMap> { |
There was a problem hiding this comment.
We have
Line 443 in 5fe4f12
tolstoy/src/schema.rs
Outdated
| // For now, only information about the user partition is tracked. | ||
| // We rely on the transactor to advance the tx partition. | ||
| // Support for migrating the db partition forward (that is, bootstrap schema) is TBD. | ||
| tx.execute("INSERT OR IGNORE INTO tolstoy_parts VALUES (?, ?, ?)", &[&PARTITION_USER, &USER0, &USER0])?; |
| } | ||
| } | ||
|
|
||
| let new_idx = USER0 + 1; |
There was a problem hiding this comment.
There's a function for writing the partition map in general, too.
tolstoy/src/schema.rs
Outdated
| (_, _) => { panic!("Wrong number of results."); }, | ||
| } | ||
|
|
||
| let partitions = SyncMetadataClient::get_partitions(&tx).unwrap(); |
There was a problem hiding this comment.
In general, we can't be :db.part/user-centric, 'cuz :db.part/tx advances too. I think this will get simpler as you generalize, not more complicated.
tolstoy/src/syncer.rs
Outdated
| } | ||
|
|
||
| // Annotate first datom in the series with the user partition information. | ||
| // TODO this is obviously wrong - we want to read partition info without |
There was a problem hiding this comment.
It feels like this is at the wrong place in the stack. You want to change the flow from Vec<Tx> to (PartitionMap, Vec<Tx>), but what you've done is change it to Vec<(Option<PartitionMap>, Tx)> and an assumption about option.
It feels to me like we need some notion of a chunk that's more general than what we have now; some chunks are transaction data, yes; some are transactions themselves; but some are also partition maps.
I think you're probably aware of this, and tomorrow we can talk through the difficulty of changing the underlying model more broadly.
There was a problem hiding this comment.
I'm viewing this particular change as purely temporary - I really don't like having partition information one step removed from the transaction - and following the path of least resistance for now. The goal is to move this out of chunks entirely, and lift it up into transactions. They should be annotated with partitionmaps, and the only reason that's not the way it is now is lack of server support.
Currently, I don't think there's any other need to generalize a chunk, and doing so seems like a distraction.
My higher level view of this - let's settle on the overall flow first - how data generally moves around, what do we need to move around, etc - then re-shape the server and clients to match. In the meantime, iterate quickly and maintain some kind of momentum.
"Nothing is as permanent as the temporary", but I hope that won't apply 👯♂️
db/src/renumber.rs
Outdated
|
|
||
| // N.b., after renumbering, the on-disk data is not reflected in the `TestConn`! | ||
| let mut db = renumber(&local.sqlite, | ||
| let mut db; |
There was a problem hiding this comment.
I think something has gotten confused. Everything here is going to take place inside an encapsulating SQL transaction; we shouldn't be managing transactions in this helper function at all. It's up to consumers to manage isolation, etc; and sync (like transact) will do so.
Reason to yourself: suppose renumbering succeeds but something later fails -- how will I roll back renumbering? If you commit, you can't. (Without using nested transactions or checkpoints or other complicated things, which I really don't think we need.)
e7f56e6 to
c7bab02
Compare
|
Wow, this has been open for 7 months. This is a good example of how "not to PR". Anyway... The current state is as follows. Part 1 is very non-controversial. It clears up the code base, introduces bunch of internal types and solidifies the general flow of things. As for mechanics of a sync beyond fast-forward scenarios, only a "no-op rebase of local on top of remote" is supported. That is, if a merge generates any assertions/retractions, we bail out. A merge is essentially: "rewind local transactions, transact the remote transactions, then transact local. succeed if local transactions had no effect, bail out otherwise". Part 2 takes things further, and generally allows anything one can throw at a transactor (sans implemention bugs and some TODOs). Except for schema alterations. It's fine to add new vocabulary (including at the same time, on both "remote" and "local", as long as it's defined the same way), but we bail out if we see changes to existing vocabulary. This is done largely so that we can address syncing of schema alterations by themselves. A merge is essentially: "rewind local transactions, transact the remote transactions, then ensure local may be transacted, perform some modifications, transact them, and request a follow-up sync if that operation produced any new datoms." Certain choices were made around handling of entids and retractions, which are non-optimal in some cases but produce understandable results - and are natural given the current approach. General approach is to let the transactor handle merges, letting the schema dictate how that happens. A concrete manifestation of this:
During a rebase, local entids are replaced with tempids. Retractions pose a challenge: a retraction won't allocate, so if a tempid mentioned in the retraction doesn't resolve, transaction will fail. If a retraction datom is by itself, its tempids will never resolve. Current approach is to replace tempids with lookup-refs whenever we see retractions that use tempids that we recognize won't resolve. Additionally, a lookup-ref mentioned in a retraction must resolve, and so we drop on the floor any retractions that fail to meet that criteria. While all of this has certain limitations, it has very nice general property: an intuition about transactor's behaviour applies directly to sync's behaviour. Additionally, expanding what transactor accepts and supports will expand sync's capabilities with little additional effort. For example, if we ever add compound uniqueness, above sync mechanics should "just work" as you'd expect them. |
|
Currently tests are failing due to a bug in the transactor: #818 |
e411bd8 to
498f2da
Compare
Since timeline move operations use a transactor, they generate a "phantom" 'tx' and a 'txInstant' assertion. It is "phantom" in a sense that it was never present in the 'transactions' table, and is entirely synthetic as far as our database is concerned. It's an implementational artifact, and we were not cleaning it up. It becomes a problem when we start inserting transactions after a move. Once the transactor clashes with the phantom 'tx', it will retract the phantom 'txInstant' value, leaving the transactions log in an incorrect state. This patch adds a test for this scenario and elects the easy way out: simply remove the offending 'txInstant' datom.
A "side-effect" is defined here as a mutation of a remote state as part of the sync. If, during a sync we determine that a remote state needs to be changed, bail out. This generally supports different variations of "baton-passing" syncing, where clients will succeed syncing if each change is non-conflicting.
This patch introduces a concept of a follow-up sync. If a sync generated a "merge transaction" (a regular transaction that contains assertions necessary for local and remote transaction logs to converge), then this transaction needs to be uploaded in a follow-up sync. Generated SyncReport indicates if a follow-up sync is required. Follow-up sync itself is just a regular sync. If remote state did not change, it will result in a simple RemoteFastForward. Otherwise, we'll continue merging and requesting a follow-up. Schema alterations are explicitly not supported. As local transactions are rebased on top of remote, following changes happen: - entids are changed into tempids, letting transactor upsert :db/unique values - entids for retractions are changed into lookup-refs if we're confident they'll succeed -- otherwise, retractions are dropped on the floor
8b1cfdc to
0c0e794
Compare
tolstoy/README.md
Outdated
|
|
||
| ## Overview | ||
| ### Very briefly | ||
| Tolstoy will synchronize a local Mentat database against a remote server, modifying local state if necessary, and uploading changes to the server if necessary. Schema additions are allowed (adding vocabulary). Schema mutations are currently not implemented (changing vocabulary). Mentat's core schema must be the same. |
There was a problem hiding this comment.
nit: "[must be the same] on all participating clients (i.e., core schema alterations are unsupported)."
tolstoy/README.md
Outdated
| ### In more detail... | ||
| Syncing is defined in terms of coming to an agreement between local and remote states. A local state is what's currently present on the current instance. A remote state is what's currently present on a server. | ||
|
|
||
| We're synchronizing transaction logs are synchronized, and so we may think about the primitive operations in context of smushing together two logs - local and remote. |
There was a problem hiding this comment.
nit: this sentence got caught in an edit.
There was a problem hiding this comment.
Oh man, it did indeed.
tolstoy/README.md
Outdated
| Internally, Tolstoy tracks the "locally known remote HEAD" and the "last-synced local transaction", which gives us three basic primitives: | ||
| - a shared root, a state which is at the root of both local and remote | ||
| - incoming changes - what remote changed on top of the shared root | ||
| - local changes on top of shared root. |
tolstoy/README.md
Outdated
| - a remote fast-forward - there are local changes, but no remote changes | ||
| - a merge - there are both local and remote changes. | ||
|
|
||
| First three cases are "trivial" - we either do nothing, or we download and transact remote transactions, or we upload local transactions and advance remote HEAD. |
ncalexan
left a comment
There was a problem hiding this comment.
Looking good. Some small changes, then bombs away!
tolstoy/README.md
Outdated
| 2. remote transactions are transacted on top of the shared root | ||
| 3. local transactions are transacted | ||
|
|
||
| Generally, intuition about transactor's behaviour applies to reasoning about Tolstoy's sync as well. If a transaction "makes sense", it will be applied. |
There was a problem hiding this comment.
nit: "[about] the [transactor's]"
tolstoy/README.md
Outdated
|
|
||
| Remote transactions are applied "as-is", with an exception of the `txInstance` - it must be preserved, and so the datom describing it is re-written prior to application to use the `(transaction-tx)` transaction function. | ||
|
|
||
| Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped: |
tolstoy/README.md
Outdated
| Remote transactions are applied "as-is", with an exception of the `txInstance` - it must be preserved, and so the datom describing it is re-written prior to application to use the `(transaction-tx)` transaction function. | ||
|
|
||
| Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped: | ||
| - we're retracting an entitiy which isn't `db/unique` |
tolstoy/README.md
Outdated
|
|
||
| Local transactions are rewritten to use tempids instead of their entids if they are additions, and `(lookup-ref a v)` form in cases of retractions - but only if `lookup-ref` is guaranteed to succeed, otherwise retractions are dropped on the floor. Cases where local retractions are dropped: | ||
| - we're retracting an entitiy which isn't `db/unique` | ||
| - we're retracting an entitiy which was already retracted by `remote`. |
tolstoy/README.md
Outdated
| ## Explicitly not supported - will abort with a NotYetImplemented | ||
| This alpha implementation doesn't support some cases, but it recognizes them and gracefully aborts (leaving local and remote states untouched): | ||
| - Syncing against a Mentat instance which uses a different core schema version. | ||
| - Syncing with schema mutations. Schema additions are fine, but transactions which change a set of attributes that define a user-defined `db/ident` will cause sync to abort. |
There was a problem hiding this comment.
nit: :db/ident (throughout -- keep the colons).
tolstoy/README.md
Outdated
|
|
||
| ## Misc operational properties | ||
| - All sync operations happen in a context of an `InProgress` - an internal Mentat transaction representation. If sync succeeds, all necessary operations are comitted to the underlying database in a single SQLite transaction. Similarly, an aborting sync will simply drop an uncomitted transaction. | ||
| - "Follow-up" syncing is currently supported in a most basic way: if there are local changes arising from a merge operation, they are comitted to the local store, and a full sync is requested which is expected to fast-forward remote state in an optimal case, and if we lost the race to the server - to merge the local "merged state" with further remote changes. |
tolstoy/README.md
Outdated
| - "Follow-up" syncing is currently supported in a most basic way: if there are local changes arising from a merge operation, they are comitted to the local store, and a full sync is requested which is expected to fast-forward remote state in an optimal case, and if we lost the race to the server - to merge the local "merged state" with further remote changes. | ||
|
|
||
| ## Server | ||
| Tolstoy operates against an instance of [Mentat Sync Prototype Server](https://github.com/rfk/mentat-sync-prototype). That repository defines a transaction-oriented API, which is all that Tolstoy expects of the server. |
There was a problem hiding this comment.
Can you include a hash, so that we know where this left off?
0c0e794 to
20e4d87
Compare
Fast-forward syncing with support for "merging" the bootstrap transaction.
Issues more or less resolved with this: #513, #514, #511