I am running, using JupyterLab, the command acro.crosstab() with supression=True and margins=True. ACRO returns an error when passing columns that include a space character e.g. 'Heart Rate'. ACRO also does not return the output table as it should do. The error does not occur if the columns are one word e.g. "Sex" or if one of suppress and margins is set to False.
The output I get is :
INFO:acro:get_summary(): fail; threshold: 7 cells suppressed;
INFO:acro:outcome_df:
------------------------------------------------------------------------------|
Chest |pain |type |1 |2 |3 |4 |All|
Sex | | | | | | | |
------------------------------------------------------------------------------|
0 | | | threshold; | threshold; | threshold; | threshold; | ok|
1 | | | threshold; | threshold; | ok | ok | ok|
All | | | threshold; | ok | ok | ok | ok|
------------------------------------------------------------------------------|
Traceback (most recent call last):
File ~/miniconda/lib/python3.13/site-packages/IPython/core/interactiveshell.py:3699 in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
Cell In[20], line 1
acro.crosstab(data['Sex'],data['Chest pain type'],margins=True)
File ~/miniconda/lib/python3.13/site-packages/acro/acro_tables.py:201 in crosstab
table = crosstab_with_totals(
File ~/miniconda/lib/python3.13/site-packages/acro/acro_tables.py:1495 in crosstab_with_totals
data = data.query(f"not ({query})")
File ~/miniconda/lib/python3.13/site-packages/pandas/core/frame.py:4834 in query
res = self.eval(expr, **kwargs)
File ~/miniconda/lib/python3.13/site-packages/pandas/core/frame.py:4960 in eval
return _eval(expr, inplace=inplace, **kwargs)
File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/eval.py:339 in eval
parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:809 in __init__
self.terms = self.parse()
File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:828 in parse
return self._visitor.visit(self.expr)
File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:409 in visit
raise e
File ~/miniconda/lib/python3.13/site-packages/pandas/core/computation/expr.py:405 in visit
node = ast.fix_missing_locations(ast.parse(clean))
File ~/miniconda/lib/python3.13/ast.py:50 in parse
return compile(source, filename, mode, flags,
File <unknown>:1
not ((Sex ==1 )and (Chest pain type ==1 ))
^
SyntaxError: Python keyword not valid identifier in numexpr query
To Reproduce
My code is :
import pandas as pd
from acro import ACRO, add_constant, utils
from sksurv.datasets import load_veterans_lung_cancer
import matplotlib.pyplot as plt
import numpy as np
import time
acro = ACRO(suppress=True,config='custom_config')
data=pd.read_csv('./Heart_Disease_Prediction.csv')
acro.crosstab(data['Sex'],data['Chest pain type'],margins=True)
Expected behavior
This following example of expected output is produced when running
acro.crosstab(data['Sex'],data['Thallium'],margins=True)
INFO:acro:get_summary(): fail; threshold: 4 cells suppressed;
INFO:acro:outcome_df:
----------------------------------------------|
Thallium |3 |6 |7 |All|
Sex | | | | |
----------------------------------------------|
0 | ok | threshold; | threshold; | ok|
1 | ok | threshold; | ok | ok|
All | ok | threshold; | ok | ok|
----------------------------------------------|
INFO:acro:records:add(): output_0
Thallium | 3 | 7 | All
-- | -- | -- | --
74 | NaN | 74
78 | 91.0 | 169
152 | 91.0 | 243
Desktop:
- OS: Ubuntu 22.04
- App: JupyterLab
- Python version: 3.13.9
- ACRO version: 0.4.11
Additional context
After a bit o research, it seems that the error occurs because numexpr (used by pandas query() or eval() ) does not allow spaces in variable names, and it treats spaces as invalid identifiers.
I am running, using JupyterLab, the command acro.crosstab() with
supression=Trueandmargins=True. ACRO returns an error when passing columns that include a space character e.g. 'Heart Rate'. ACRO also does not return the output table as it should do. The error does not occur if the columns are one word e.g. "Sex" or if one ofsuppressandmarginsis set toFalse.The output I get is :
To Reproduce
My code is :
Expected behavior
This following example of expected output is produced when running
Desktop:
Additional context
After a bit o research, it seems that the error occurs because numexpr (used by pandas query() or eval() ) does not allow spaces in variable names, and it treats spaces as invalid identifiers.