Skip to content

PostgreSQL type conversion should default to NATIONAL CHARACTER #574

@daniel-skovenborg

Description

@daniel-skovenborg

PostgreSQL type conversion in PostgreSQLJDBCDatatypeImporter imports varchar as CHARACTER VARYING and text as CHARACTER LARGE OBJECT. However, because SQL:1999 distincts between CHARACTER and NATIONAL CHARACTER, and PostgreSQL does not, the conversion should default to NATIONAL CHARACTER.
I haven't tried, but I believe this could break migration of SIARD archives from PostgreSQL databases to databases that distinct between VARCHAR and NVARCHAR if cells contains non-ASCII characters..

Of course, NATIONAL CHARACTER is not always what you'll want, e.g. if the database encoding is not a Unicode type or the column is just an enum.
I suggest that the type conversion methods should take the schema, table, and column as arguments, and that the PostgreSQL importer should have an option to make the following query to determine if a text column holds national characters:

select exists(select from SCHEMA_NAME.TABLE_NAME where COLUMN_NAME::text ~ '[^\x01-\x7F]');

This will of course very much slow down the import and should probably be an opt in.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions