Problem
When datafaker reads a table schema from an MS-SQL source, string columns that
carry a collation (e.g. VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS)
cause a parsy.ParseError when the orm.yaml is later loaded:
Failed to parse VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS
parsy.ParseError: ...
Root cause
The string_type parser in datafaker/serialize_metadata.py
only handled quoted collation names — the PostgreSQL dialect style:
MS-SQL renders collation names without quotes:
COLLATE SQL_Latin1_General_CP1_CI_AS
so the parser always failed on any MS-SQL string column with a collation.
Fix
Extended the collation clause in string_type to accept both forms using
parsy.alt:
collation: str | None = yield parsy.alt(
# PostgreSQL: COLLATE "name" (quoted)
parsy.string(' COLLATE "') >> parsy.regex(r'[^"]*') << parsy.string('"'),
# MS-SQL: COLLATE name (unquoted identifier)
parsy.string(" COLLATE ") >> parsy.regex(r'\S+'),
).optional()
The quoted path is tried first, so PostgreSQL behaviour is unchanged.
Tests added
Four new tests in tests/test_serialize_metadata_mssql.py:
test_varchar_with_mssql_collation — VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS
test_nvarchar_with_mssql_collation — NVARCHAR(100) COLLATE Latin1_General_CI_AS
test_char_with_mssql_collation — CHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS
test_varchar_with_quoted_collation_still_works — regression test confirming the PostgreSQL quoted form still works
Problem
When datafaker reads a table schema from an MS-SQL source, string columns that
carry a collation (e.g.
VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS)cause a
parsy.ParseErrorwhen the orm.yaml is later loaded:Root cause
The
string_typeparser indatafaker/serialize_metadata.pyonly handled quoted collation names — the PostgreSQL dialect style:
MS-SQL renders collation names without quotes:
so the parser always failed on any MS-SQL string column with a collation.
Fix
Extended the collation clause in
string_typeto accept both forms usingparsy.alt:The quoted path is tried first, so PostgreSQL behaviour is unchanged.
Tests added
Four new tests in
tests/test_serialize_metadata_mssql.py:test_varchar_with_mssql_collation—VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AStest_nvarchar_with_mssql_collation—NVARCHAR(100) COLLATE Latin1_General_CI_AStest_char_with_mssql_collation—CHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AStest_varchar_with_quoted_collation_still_works— regression test confirming the PostgreSQL quoted form still works