Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 75 additions & 11 deletions docs/changeset-format.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,38 @@

# Changeset Format

The format for changesets is borrowed from SQLite3 session extension's internal format
and it is currently 100% compatible with it. Below are details of the format, extracted
from SQLite3 source code.
The format for changesets is based on the SQLite3 session extension's internal
format. Below are details of the format:

## Summary

A changeset is a collection of DELETE, UPDATE and INSERT operations on
one or more tables. Operations on a single table are grouped together,
but may occur in any order (i.e. deletes, updates and inserts are all
mixed together).
A changeset is a linear list of operations of various types, identified by a
one-byte tag:

Each group of changes begins with a table header:
- Table record (`'T'`)
- Data entry (`18`, `23`, `9`)
- Create table entry (`'a'`)
- Drop table entry (`'A'`)
- Add column entry (`'c'`)
- Drop column entry (`'C'`)

Data operations on a single table are grouped together, preceded by a single
table record. The operations are processed as if they were executed
sequentially.

## Table record

The table record identifies the table and its columns:

- 1 byte: Constant 0x54 (capital 'T')
- Varint: Number of columns in the table.
- nCol bytes: 0x01 for PK columns, 0x00 otherwise.
- N bytes: Unqualified table name (encoded using UTF-8). Nul-terminated.
- N bytes: Unqualified table name (encoded using UTF-8). Null-terminated.

Followed by one or more changes to the table.
## Data entry

A data entry is a DELETE, UPDATE or INSERT operation on one table (identified
by last table record):

- 1 byte: Either SQLITE_INSERT (0x12), UPDATE (0x17) or DELETE (0x09).
- 1 byte: The "indirect-change" flag.
Expand Down Expand Up @@ -48,6 +61,44 @@ with table columns modified by the UPDATE change contain the new
values. Fields associated with table columns that are not modified
are set to "undefined".

## Create table entry

This entry creates a new empty table:

- 1 byte: Constant 0x61 (lowercase 'a')
- Null-terminated string: Table name
- Varint: Number of columns in the table.
- nCol entries: Table column info.

## Drop table entry

This entry deletes an existing table by name. The table must be empty. Column
information is kept for the purpose of rebasing and inverting the changeset.

- 1 byte: Constant 0x41 (uppercase 'A')
- Null-terminated string: Table name
- Varint: Number of columns in the table.
- nCol entries: Table column info.

## Add column entry

This entry adds a new column to an existing table. All existing rows will have
`NULL` filled in.

- 1 byte: Constant 0x63 (lowercase 'c')
- Null-terminated string: Table name
- Table column info.

## Drop column entry

This entry deletes an existing column from a table. All existing rows must have
`NULL` values in this column. Column information is kept for the purpose of
rebasing and inverting the changeset.

- 1 byte: Constant 0x43 (uppercase 'C')
- Null-terminated string: Table name
- Table column info.

# Record Format

Unlike the SQLite database record format, each field is self-contained -
Expand All @@ -69,7 +120,7 @@ is followed by:
- Text values:
A varint containing the number of bytes in the value (encoded using
UTF-8). Followed by a buffer containing the UTF-8 representation
of the text value. There is no nul terminator.
of the text value. There is no null terminator.

- Blob values:
A varint containing the number of bytes in the value, followed by
Expand All @@ -82,6 +133,19 @@ is followed by:
An 8-byte big-endian IEEE 754-2008 real value.


# Table column info

- Null-terminated string: column name
- 1 byte: Column type (same as record)
- 1 byte: Flags packed as bits. From LSb:
- is primary key
- is autoincrement
- is geometry column
- geometry has Z coordinate
- geometry has M coordinate
- Null-terminated string: geometry type (`POINT`, `LINE`, ...)
- Varint: SRS ID for geometry

# Varint Format

Varint values are encoded in the same way as varints in the SQLite
Expand Down
26 changes: 26 additions & 0 deletions docs/schema-changes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Schema changes

Geodiff supports diffing databases with different schemata. It identifies table
and column additions/deletions.

Tables and columns are always created empty and any data present in the
database is recreated manually via `INSERT`/`UPDATE` entries, written after the
schema change entry. Likewise, deletion entries expect the table/column to be
empty, so `DELETE`/`UPDATE` entries clearing the data are written beforehand.
This simplifies inverting and rebasing, since the schema change entries work
separately from e.g. the ID renaming machinery.

## Limitations and pitfalls

Since we only look at the final state of the database, default values in
columns are not supported. Any default specified during creation of the column
will be simulated by an `UPDATE` for each row. This means that only the rows
present in the modified database will get the "default" value, and the default
won't be propagated when the diff is applied onto base.

Renaming columns is supported only as a deletion & addition. This has similar
pitfalls to the default values - on rebase, values in the second database won't
be moved. Same with renaming tables.

The intermediate states created by applying the resulting diff (e.g. "nulling
out" column before dropping it) may conflict with database constraints.
2 changes: 2 additions & 0 deletions geodiff/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ SET(geodiff_src
src/driver.h
src/tableschema.cpp
src/tableschema.h
src/tableschemadiff.cpp
src/tableschemadiff.hpp

src/drivers/sqlitedriver.cpp
src/drivers/sqlitedriver.h
Expand Down
67 changes: 61 additions & 6 deletions geodiff/src/changeset.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@
#include <assert.h>
#include <memory>
#include <string>
#include <variant>
#include <vector>

#include "tableschema.h"


/**
* Representation of a single value stored in a column.
Expand Down Expand Up @@ -200,9 +203,23 @@ struct ChangesetTable
size_t columnCount() const { return primaryKeys.size(); }
};

/**
* Types of supported changeset records.
*/
enum class ChangesetEntryType
{
OpTableRecord = 'T', //!< corresponds to ChangesetTable
OpInsert = 18, //!< corresponds to ChangesetDataEntry
OpUpdate = 23, //!< corresponds to ChangesetDataEntry
OpDelete = 9, //!< corresponds to ChangesetDataEntry
OpCreateTable = 'a', //!< corresponds to ChangesetTable
OpDropTable = 'A',
OpAddColumn = 'c',
OpDropColumn = 'C',
};

/**
* Details of a single change within a changeset
* Details of a single data change within a changeset
*
* Contents of old/new values array based on operation type:
* - INSERT - new values contain data of the row to be inserted, old values array is invalid
Expand All @@ -212,7 +229,7 @@ struct ChangesetTable
* columns of old value are always present (but new value of pkey columns is undefined
* if the primary key is not being changed).
*/
struct ChangesetEntry
struct ChangesetDataEntry
{
enum OperationType
{
Expand All @@ -231,17 +248,17 @@ struct ChangesetEntry
* Optional pointer to the source table information as stored in changeset.
*
* When the changeset entry has been read by ChangesetReader, the table always will be set to a valid
* instance. Do not delete the instance - it is owned by ChangesetReader.
* instance.
*
* When the changeset entry is being passed to ChangesetWriter, the table pointer is ignored
* and it does not need to be set (writer has an explicit beginTable() call to set table).
*/
ChangesetTable *table = nullptr;
std::shared_ptr<ChangesetTable> table;

//! a quick way for tests to create a changeset entry
static ChangesetEntry make( ChangesetTable *t, OperationType o, const std::vector<Value> &oldV, const std::vector<Value> &newV )
static ChangesetDataEntry make( std::shared_ptr<ChangesetTable> t, OperationType o, const std::vector<Value> &oldV, const std::vector<Value> &newV )
{
ChangesetEntry e;
ChangesetDataEntry e;
e.op = o;
e.oldValues = oldV;
e.newValues = newV;
Expand All @@ -250,4 +267,42 @@ struct ChangesetEntry
}
};

//! Entry for CREATE TABLE command
struct ChangesetCreateTableEntry
{
std::string tableName;
std::vector<TableColumnInfo> columns;
};

//! Entry for DROP TABLE command
struct ChangesetDropTableEntry
{
std::string tableName;
std::vector<TableColumnInfo> columns;
};

//! Entry for ALTER TABLE ... ADD COLUMN command
struct ChangesetAddColumnEntry
{
std::string tableName;
TableColumnInfo column;
};

//! Entry for ALTER TABLE ... DROP COLUMN command
struct ChangesetDropColumnEntry
{
std::string tableName;
TableColumnInfo column;
};

struct ChangesetEntry : public std::variant <
ChangesetDataEntry,
ChangesetCreateTableEntry,
ChangesetDropTableEntry,
ChangesetAddColumnEntry,
ChangesetDropColumnEntry >
{
using variant::variant; // Use std::variant's constructor
};

#endif // CHANGESET_H
Loading
Loading