Skip to content

[BUG] Error when using crosstab() when variable names contain spaces #305

@PetrosKots-UoS

Description

@PetrosKots-UoS

I am running, using JupyterLab, the command acro.crosstab() with supression=True and margins=True. ACRO returns an error when passing columns that include a space character e.g. 'Heart Rate'. ACRO also does not return the output table as it should do. The error does not occur if the columns are one word e.g. "Sex" or if one of suppress and margins is set to False.

The output I get is :

INFO:acro:get_summary(): fail; threshold: 7 cells suppressed; 
INFO:acro:outcome_df:
------------------------------------------------------------------------------|
Chest |pain |type |1            |2            |3            |4            |All|
Sex   |     |     |             |             |             |             |   |
------------------------------------------------------------------------------|
0     |     |     | threshold;  | threshold;  | threshold;  | threshold;  | ok|
1     |     |     | threshold;  | threshold;  |          ok |          ok | ok|
All   |     |     | threshold;  |          ok |          ok |          ok | ok|
------------------------------------------------------------------------------|

Traceback (most recent call last):

  File ~/miniconda/lib/python3.13/site-packages/IPython/core/interactiveshell.py:3699 in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[20], line 1
    acro.crosstab(data['Sex'],data['Chest pain type'],margins=True)

  File ~/miniconda/lib/python3.13/site-packages/acro/acro_tables.py:201 in crosstab
    table = crosstab_with_totals(

  File ~/miniconda/lib/python3.13/site-packages/acro/acro_tables.py:1495 in crosstab_with_totals
    data = data.query(f"not ({query})")

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/frame.py:4834 in query
    res = self.eval(expr, **kwargs)

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/frame.py:4960 in eval
    return _eval(expr, inplace=inplace, **kwargs)

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/eval.py:339 in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:809 in __init__
    self.terms = self.parse()

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:828 in parse
    return self._visitor.visit(self.expr)

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:409 in visit
    raise e

  File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:405 in visit
    node = ast.fix_missing_locations(ast.parse(clean))

  File ~/miniconda/lib/python3.13/ast.py:50 in parse
    return compile(source, filename, mode, flags,

  File <unknown>:1
    not ((Sex ==1 )and (Chest pain type ==1 ))
                        ^
SyntaxError: Python keyword not valid identifier in numexpr query

To Reproduce
My code is :

import pandas as pd
from acro import ACRO, add_constant, utils
from sksurv.datasets import load_veterans_lung_cancer
import matplotlib.pyplot as plt
import numpy as np
import time

acro = ACRO(suppress=True,config='custom_config')

data=pd.read_csv('./Heart_Disease_Prediction.csv')

acro.crosstab(data['Sex'],data['Chest pain type'],margins=True)

Expected behavior
This following example of expected output is produced when running

acro.crosstab(data['Sex'],data['Thallium'],margins=True)
INFO:acro:get_summary(): fail; threshold: 4 cells suppressed; 
INFO:acro:outcome_df:
----------------------------------------------|
Thallium |3   |6            |7            |All|
Sex      |    |             |             |   |
----------------------------------------------|
0        | ok | threshold;  | threshold;  | ok|
1        | ok | threshold;  |          ok | ok|
All      | ok | threshold;  |          ok | ok|
----------------------------------------------|

INFO:acro:records:add(): output_0

Thallium | 3 | 7 | All
-- | -- | -- | --
74 | NaN | 74
78 | 91.0 | 169
152 | 91.0 | 243

Desktop:

  • OS: Ubuntu 22.04
  • App: JupyterLab
  • Python version: 3.13.9
  • ACRO version: 0.4.11

Additional context
After a bit o research, it seems that the error occurs because numexpr (used by pandas query() or eval() ) does not allow spaces in variable names, and it treats spaces as invalid identifiers.

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions