Bug: column_value provider uses random() which is not recognised by MS-SQL

## Problem

`ColumnValueProvider.column_value()` in [`datafaker/providers.py:27–36`](../../datafaker/providers.py#L27-L36) selects a random row using:

```python
query = select(orm_class).order_by(functions.random()).limit(1)
```

SQLAlchemy's `functions.random()` compiles to `random()` for all dialects. MS-SQL does not have a `random()` function and raises an error at runtime:

```
sqlalchemy.exc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 18
for SQL Server][SQL Server]'random' is not a recognized built-in function name.")
```

The correct MS-SQL equivalents are `RAND()` (returns a scalar float, usable in `ORDER BY`) or `NEWID()` (generates a random GUID, commonly used for random row ordering).

## Root cause

`functions.random()` is not dialect-aware — SQLAlchemy emits `random()` unconditionally regardless of the connected engine. PostgreSQL and DuckDB both expose a `random()` function, so this works on those engines, but fails on MS-SQL.

## Impact

Any `df.py` that calls `column_value` against an MS-SQL **source** database will fail. This includes OMOP-style schemas where many FK columns are populated via `column_value` lookups against vocabulary tables (e.g. `concept`).

## Proposed fix

Detect the dialect at runtime inside `column_value` and substitute the appropriate random expression:

```python
@staticmethod
def column_value(
    db_connection: Connection, orm_class: Any, column_name: str
) -> Any:
    """Return a random value from the column specified."""
    dialect = db_connection.dialect.name
    if dialect == "mssql":
        random_fn = func.newid()
    else:
        random_fn = functions.random()
    query = select(orm_class).order_by(random_fn).limit(1)
    random_row = db_connection.execute(query).first()

    if random_row:
        return getattr(random_row, column_name)
    return None
```

`NEWID()` is preferred over `RAND()` for MS-SQL row ordering because `RAND()` returns the same value for every row in a single query execution, giving a non-random sort. `NEWID()` produces a unique value per row and is the standard MS-SQL idiom for `ORDER BY random()`.

## Reference

The existing dialect-specific hooks in [`datafaker/create.py`](../../datafaker/create.py) (e.g. `remove_mssql_identity`, `remove_mssql_on_delete_cascade`) show the established pattern for handling dialect differences in this codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: column_value provider uses random() which is not recognised by MS-SQL #102

Problem

Root cause

Impact

Proposed fix

Reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: column_value provider uses random() which is not recognised by MS-SQL #102

Description

Problem

Root cause

Impact

Proposed fix

Reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions