fix(oocana): use duck-typing for DataFrame detection instead of class name#458
fix(oocana): use duck-typing for DataFrame detection instead of class name#458leavesster wants to merge 1 commit intomainfrom
Conversation
Replaced hardcoded 'DataFrame' string check with __is_dataframe_like() method that uses duck-typing. Now correctly supports DataFrame subclasses like GeoDataFrame.
Walkthrough向 Context 类添加了私有辅助方法 Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
oocana/oocana/context.py (1)
387-399:⚠️ Potential issue | 🟠 Major防止后台线程中
to_pickle参数不兼容导致无声失败。鸭子类型检查
__is_dataframe_like会匹配任何拥有to_pickle和copy方法的对象,但不同实现的to_pickle方法签名可能不兼容。当调用copy_value.to_pickle(serialize_path, compression=compression)时,如果该对象的to_pickle不接受compression参数,将抛出TypeError。关键问题是:
- 此调用发生在后台线程中(第 398 行)
- 当前的 try-except 仅捕获
IOError,不捕获TypeError(第 401 行)- 线程中的异常不会传播到主线程,会无声失败
- 但
serialize_path已在线程启动前被设置为非 None 值(第 392 行),导致调用者获得一个指向不存在文件的路径建议添加降级处理:
修复建议
def write_pickle(): - copy_value.to_pickle(serialize_path, compression=compression) + try: + copy_value.to_pickle(serialize_path, compression=compression) + except TypeError: + # 若 to_pickle 不支持 compression 参数,则不使用压缩 + copy_value.to_pickle(serialize_path)
There was a problem hiding this comment.
Pull request overview
Updates DataFrame detection in Context.__wrap_output_value() from a hardcoded class-name check to a duck-typed helper so that DataFrame subclasses can be serialized for cache.
Changes:
- Added
Context.__is_dataframe_like()helper to detect DataFrame-like objects via presence ofto_pickle()andcopy(). - Replaced
value.__class__.__name__ == "DataFrame"conditional with the new helper when deciding whether to serialize outputs for cache.
Comments suppressed due to low confidence (1)
oocana/oocana/context.py:392
- The condition allows
len(self.__block_info.stacks) < 2, but the body unconditionally indexesself.__block_info.stacks[-1]. Sincenode_idexplicitly handlesstacksbeing empty, this path can raiseIndexErrorwhenstacksis empty and caching is enabled. Update the condition to require at least one stack entry (e.g.,0 < len(...) < 2), or computeflow_nodewithout indexing when stacks is empty.
if len(self.__block_info.stacks) < 2 and output_def.need_serialize_var_for_cache() and self.__is_dataframe_like(value):
from .serialization import compression_suffix, compression_options
suffix = compression_suffix(context=self)
compression = compression_options(context=self)
flow_node = self.__block_info.stacks[-1].get("flow", "unknown") + "-" + self.node_id
serialize_path = f"{self.pkg_data_dir}/.cache/{string_hash(flow_node)}/{handle}{suffix}"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __is_dataframe_like(self, value: Any) -> bool: | ||
| """Check if value is DataFrame-like by duck-typing. | ||
| Supports pandas DataFrame and subclasses (GeoDataFrame, etc.) | ||
| """ | ||
| return ( | ||
| hasattr(value, 'to_pickle') and callable(getattr(value, 'to_pickle', None)) and | ||
| hasattr(value, 'copy') and callable(getattr(value, 'copy', None)) | ||
| ) |
There was a problem hiding this comment.
__is_dataframe_like() returns True based only on presence/callability of to_pickle/copy, but the serialization path later calls to_pickle(..., compression=compression). Objects with a to_pickle() method that doesn’t accept the compression kwarg (or whose copy() returns an object without to_pickle) will pass this check and then fail at runtime. Consider either tightening the duck-type check to the actual required API (including kwarg support), or making the serialization call resilient (e.g., fallback call without compression / catch TypeError and skip caching).
Summary
'DataFrame'string check with duck-typing method__is_dataframe_like()Problem
Original code used hardcoded class name check:
This fails for:
Solution
Use duck-typing to check for required methods:
Test Plan
to_pickle()andcopy()methods will be handled correctly