Extend Support for RaggedTensorToTensor#613
Extend Support for RaggedTensorToTensor#613apaniukov merged 4 commits intoopenvinotoolkit:masterfrom
Conversation
Adjust shape infer, add new mask logic to avoid segfaults
There was a problem hiding this comment.
Pull request overview
Extends the TensorFlow frontend’s RaggedTensorToTensor translation to support an additional ragged representation (two ragged dimensions) and updates internal ragged ops to handle non-1D data tensors during densification.
Changes:
- Added support for
row_partition_types == {"ROW_SPLITS", "VALUE_ROWIDS"}intranslate_ragged_tensor_to_tensor(3D densification via twoRaggedToDensepasses). - Updated
RaggedToDenseto acceptdatatensors with rank ≥ 1 and to propagate trailing dense dimensions into output shape + runtime evaluation. - Adjusted
RaggedToRaggedtype/shape inference and added a new ragged-input validator helper.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils.hpp |
Declares new helper for validating ragged inputs with any-rank data. |
src/utils.cpp |
Implements the new ragged validation helper (rank ≥ 1 for data). |
src/tensorflow_translators.cpp |
Extends RaggedTensorToTensor translation to handle {ROW_SPLITS, VALUE_ROWIDS} and reshapes pad value to scalar. |
src/ragged_to_ragged.hpp |
Adds evaluate_lower/upper overrides returning false. |
src/ragged_to_ragged.cpp |
Changes output shape inference logic based on first_dim_size. |
src/ragged_to_dense.cpp |
Broadens data rank support and updates runtime evaluation to handle trailing dense dimensions. |
Comments suppressed due to low confidence (1)
src/ragged_to_dense.cpp:66
pad_rightinput validation allows any integral element type (is_integral()), butevaluate()reads it asbool(inputs[5].data<bool>()). If a caller passes an integral (e.g., i32) scalar, this will interpret the buffer as bool and can produce incorrect results/UB. Tighten the check to require boolean element type (or read as the validated type and convert to bool).
if (get_input_size() == 3 + 1 + 1 + 1) {
OPENVINO_ASSERT(get_input_partial_shape(5).is_dynamic() || get_input_partial_shape(5).is_static() && get_input_partial_shape(5).rank().get_length() == 0,
"RaggedToDense: pad_right should be a boolean scalar.");
OPENVINO_ASSERT(get_input_element_type(5).is_integral(),
"RaggedToDense: pad_right should be a boolean value.");
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Check whether input 1 is a Constant node, otherwise fall back to the lower-bound tensor | ||
| PartialShape out_shape({ Dimension::dynamic() }); | ||
| auto infer_from_int32_tensor = [&](const ov::Tensor& t) { | ||
| if (t && t.get_element_type() == element::i32 && t.get_size() >= 1) { | ||
| out_shape = PartialShape({ static_cast<Dimension::value_type>(t.data<const int32_t>()[0]) }); | ||
| } | ||
| }; | ||
| if (const auto* first_dim_const = dynamic_cast<const Constant*>(get_input_node_ptr(1))) { | ||
| auto vals = first_dim_const->cast_vector<int32_t>(); | ||
| if (!vals.empty()) { | ||
| out_shape = PartialShape({ static_cast<Dimension::value_type>(vals[0]) }); | ||
| } | ||
| } else { | ||
| infer_from_int32_tensor(get_input_tensor(1).get_lower_value()); |
There was a problem hiding this comment.
In validate_and_infer_types(), inferring a static output length from get_input_tensor(1).get_lower_value() can over-constrain the output shape (e.g., lower bound may be 0/unknown while runtime first_dim_size is larger). This can lead to incorrect compile-time shapes and downstream validation failures. Safer approach: only set a concrete out_shape when input 1 is a Constant; otherwise keep it dynamic (or only specialize when lower==upper and is known).
| // Check whether input 1 is a Constant node, otherwise fall back to the lower-bound tensor | |
| PartialShape out_shape({ Dimension::dynamic() }); | |
| auto infer_from_int32_tensor = [&](const ov::Tensor& t) { | |
| if (t && t.get_element_type() == element::i32 && t.get_size() >= 1) { | |
| out_shape = PartialShape({ static_cast<Dimension::value_type>(t.data<const int32_t>()[0]) }); | |
| } | |
| }; | |
| if (const auto* first_dim_const = dynamic_cast<const Constant*>(get_input_node_ptr(1))) { | |
| auto vals = first_dim_const->cast_vector<int32_t>(); | |
| if (!vals.empty()) { | |
| out_shape = PartialShape({ static_cast<Dimension::value_type>(vals[0]) }); | |
| } | |
| } else { | |
| infer_from_int32_tensor(get_input_tensor(1).get_lower_value()); | |
| // Infer output shape: only specialize when input 1 is a Constant; otherwise keep it dynamic. | |
| PartialShape out_shape({ Dimension::dynamic() }); | |
| if (const auto* first_dim_const = dynamic_cast<const Constant*>(get_input_node_ptr(1))) { | |
| auto vals = first_dim_const->cast_vector<int32_t>(); | |
| if (!vals.empty()) { | |
| out_shape = PartialShape({ static_cast<Dimension::value_type>(vals[0]) }); | |
| } |
|
|
||
| // Suppose validate was called and set correct output shape | ||
| // Take a target shape value for ragged dimension | ||
| size_t target_dim = inputs[3].data<const int32_t>()[0]; | ||
| const size_t target_dim = static_cast<size_t>(inputs[3].data<const int32_t>()[0]); | ||
|
|
There was a problem hiding this comment.
padding_size/target_dim is validated as any integral type (get_input_element_type(3).is_integral_number()), but evaluate() unconditionally reads it as int32_t. This is inconsistent and will misread i64/u64 inputs. Either restrict input 3 to i32 in validate_and_infer_types() or handle other integral element types in evaluate() (e.g., via a type switch and conversion to size_t).
| for (size_t i = 0; i < nelems; ++i) { | ||
| const size_t data_len = static_cast<size_t>(ends[i] - begins[i]); | ||
| size_t target_len = (std::min(data_len, target_dim) * (1 - m_pad_max_length) + | ||
| target_dim * m_pad_max_length); | ||
|
|
||
| const auto begin = elems + elem_size * inner_elems * static_cast<size_t>(begins[i]); | ||
| const auto end = begin + elem_size * inner_elems * target_len; | ||
| out_elems = std::copy(begin, end, out_elems); |
There was a problem hiding this comment.
When m_pad_max_length is true, target_len becomes target_dim regardless of data_len. If data_len < target_dim, the subsequent std::copy(begin, end, ...) will read past the end of inputs[2] (OOB). This is especially likely when padding to a fixed max length (see pad_max_length usage). Consider always using copy_len = min(data_len, target_dim) for the copy, and use default-value padding to reach target_dim when data_len < target_dim.
| for (size_t i = 0; i < nelems; ++i) { | ||
| const size_t data_len = static_cast<size_t>(ends[i] - begins[i]); | ||
| size_t target_len = (std::min(data_len, target_dim) * (1 - m_pad_max_length) + | ||
| target_dim * m_pad_max_length); | ||
| const size_t pad_len = target_dim - target_len; | ||
|
|
||
| for (size_t j = 0; j < pad_len; ++j) { | ||
| out_elems = std::copy(default_value, default_value + elem_size, out_elems); | ||
| fill_default_block(out_elems); | ||
| } | ||
| // fill actual values | ||
| auto begin = elems + elem_size * begins[i]; | ||
| auto end = begin + elem_size * target_len; | ||
|
|
||
| const auto begin = elems + elem_size * inner_elems * static_cast<size_t>(begins[i]); | ||
| const auto end = begin + elem_size * inner_elems * target_len; | ||
| out_elems = std::copy(begin, end, out_elems); | ||
|
|
||
| // construct padding mask | ||
| out_mask = std::fill_n(out_mask, pad_len, char(0)); | ||
| out_mask = std::fill_n(out_mask, target_dim - pad_len, char(1)); | ||
| mask_fill(out_mask, pad_len * inner_elems, char(0)); | ||
| mask_fill(out_mask, (target_dim - pad_len) * inner_elems, char(1)); | ||
| } |
There was a problem hiding this comment.
Same m_pad_max_length/target_len issue exists in the left-padding branch: if m_pad_max_length is true and data_len < target_dim, target_len becomes target_dim and the copy uses target_len, which can read past the input buffer. Align the copy length with available data_len and pad with default_value to reach target_dim.
| // Number of dense elements per one ragged element (trailing dense dimensions). | ||
| // For 1D data, this equals 1. | ||
| const auto& data_shape = inputs[2].get_shape(); | ||
| size_t inner_elems = 1; | ||
| for (size_t idx = 1; idx < data_shape.size(); ++idx) { | ||
| inner_elems *= data_shape[idx]; | ||
| } |
There was a problem hiding this comment.
This change adds support for data tensors with rank > 1 (via inner_elems and appending trailing data_shape dims). There are existing tests for RaggedToDense, but they appear to only cover 1D data. Add test coverage for multi-rank data (e.g., values shaped [N, D] and [N, D1, D2]) and for both pad_right settings to ensure padding/mask behavior is correct across trailing dims.
CVS-181575