MS-SQL identity insert error (8106): SET IDENTITY_INSERT fails because remove_mssql_identity strips IDENTITY from DDL

## Bug

When running `datafaker create-data` against an MS-SQL destination, the following error is raised:

```
('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]
Table 'mimic100_synthetic.person' does not have the identity property.
Cannot perform SET operation. (8106) (SQLExecDirectW)")
[SQL: SET IDENTITY_INSERT mimic100_synthetic.person ON]
```

Datafaker halts before inserting any rows.

## Root cause

There are two systems in conflict:

1. **`remove_mssql_identity` DDL hook** (`create.py:46–57`) strips `IDENTITY` from every `CREATE TABLE` statement on MS-SQL, so the physical column is a plain `INTEGER` with no identity property.

2. **SQLAlchemy's insert path** sees `Column(Integer, primary_key=True)` with `autoincrement='auto'` (the SQLAlchemy default). When an explicit PK value is supplied, SQLAlchemy wraps the INSERT with `SET IDENTITY_INSERT <table> ON`. SQL Server rejects this with error 8106 because the column has no identity property — it was stripped in step 1.

The result is that the table and the metadata disagree about whether the column is an identity column, so either the table creation or the insert path will always be wrong.

## Options considered

### Option A — Let the database generate PKs ✅ **(chosen)**
- Remove `remove_mssql_identity` so tables are created with `IDENTITY(1,1)`.
- Stop supplying explicit PK values in generated `df.py` files. The DB generates them, and SQLAlchemy's `return_defaults()` captures the generated value for story generators.

**Pros:**
- PKs are guaranteed unique even across concurrent datafaker runs.
- No `SELECT MAX()` query on first insert.
- Simpler generated `df.py` — no `increment()` boilerplate per identity column.
- Works correctly when datafaker is run multiple times against a non-empty destination.

**Cons:**
- Generated PK values are not deterministic — cannot reproduce the exact same synthetic dataset.
- Existing hand-edited `df.py` files must be updated to remove `increment()` calls.

---

### Option B — Suppress autoincrement in MetaData, keep `increment()`
- Add `autoincrement=False` to `Column(...)` in `dict_to_column` (`serialize_metadata.py`). SQLAlchemy stops emitting `SET IDENTITY_INSERT`; `increment()` in `df.py` supplies explicit values as before.

**Pros:**
- Smallest possible code change (one keyword argument).
- Deterministic/reproducible PK values.
- No need to regenerate or edit `df.py` files.

**Cons:**
- Does not give automatic generation — every `df.py` must still wire up `increment()` per identity PK.
- In-memory accumulator is not concurrency-safe: two parallel datafaker processes both query `SELECT MAX()` and start from the same value → PK collision.

---

### Option C — Keep IDENTITY on tables, keep `increment()` (minimal surgical)
- Remove `remove_mssql_identity` so tables have `IDENTITY`. Keep `increment()` in `df.py` to supply explicit values. `SET IDENTITY_INSERT ON/OFF` now works because the column actually has IDENTITY.

**Pros:**
- Smallest change to `create.py` (delete ~10 lines). No changes to `df.py` or `serialize_metadata.py`.

**Cons:**
- Two systems manage the same counter. The DB's IDENTITY sequence falls behind explicit inserts; if anything ever inserts without `IDENTITY_INSERT ON`, the DB-generated value collides with a previously-inserted one.
- `SET IDENTITY_INSERT` is one-table-at-a-time per session; story generators touching multiple identity-PK tables in one pass hit this limit.

## Fix (Option A)

The following files are changed in the associated PR:

- **`datafaker/create.py`**: Remove `remove_mssql_identity` hook — MS-SQL tables now get `IDENTITY(1,1)`.
- **`datafaker/make.py`**: `_get_default_generator` returns `None` for single-column integer PKs with no FK; `_get_generator_for_table` excludes them from both `nonnull_columns` and `row_gens`. The existing comment ("we presume that primary keys are populated automatically") already documented this intent.
- **`examples/omop-mssql/df.py`**: Remove manual `increment()` call for `person_id`.
- **`tests/test_create_mssql.py`**: Invert `TestMSSQLRemoveIdentity` → `TestMSSQLIdentityPresent`; add insert-without-PK test.
- **`tests/test_make.py`**: Add `TestGetDefaultGenerator` unit tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MS-SQL identity insert error (8106): SET IDENTITY_INSERT fails because remove_mssql_identity strips IDENTITY from DDL #104

Bug

Root cause

Options considered

Option A — Let the database generate PKs ✅ (chosen)

Option B — Suppress autoincrement in MetaData, keep `increment()`

Option C — Keep IDENTITY on tables, keep `increment()` (minimal surgical)

Fix (Option A)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

MS-SQL identity insert error (8106): SET IDENTITY_INSERT fails because remove_mssql_identity strips IDENTITY from DDL #104

Description

Bug

Root cause

Options considered

Option A — Let the database generate PKs ✅ (chosen)

Option B — Suppress autoincrement in MetaData, keep increment()

Option C — Keep IDENTITY on tables, keep increment() (minimal surgical)

Fix (Option A)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Option B — Suppress autoincrement in MetaData, keep `increment()`

Option C — Keep IDENTITY on tables, keep `increment()` (minimal surgical)