Extend Support for RaggedTensorToTensor by apaniukov · Pull Request #613 · openvinotoolkit/openvino_tokenizers

apaniukov · 2026-02-20T11:57:09Z

Adjust shape infer, add new mask logic to avoid segfaults

Copilot

Pull request overview

Extends the TensorFlow frontend’s RaggedTensorToTensor translation to support an additional ragged representation (two ragged dimensions) and updates internal ragged ops to handle non-1D data tensors during densification.

Changes:

Added support for row_partition_types == {"ROW_SPLITS", "VALUE_ROWIDS"} in translate_ragged_tensor_to_tensor (3D densification via two RaggedToDense passes).
Updated RaggedToDense to accept data tensors with rank ≥ 1 and to propagate trailing dense dimensions into output shape + runtime evaluation.
Adjusted RaggedToRagged type/shape inference and added a new ragged-input validator helper.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/utils.hpp`	Declares new helper for validating ragged inputs with any-rank data.
`src/utils.cpp`	Implements the new ragged validation helper (rank ≥ 1 for data).
`src/tensorflow_translators.cpp`	Extends `RaggedTensorToTensor` translation to handle `{ROW_SPLITS, VALUE_ROWIDS}` and reshapes pad value to scalar.
`src/ragged_to_ragged.hpp`	Adds `evaluate_lower/upper` overrides returning `false`.
`src/ragged_to_ragged.cpp`	Changes output shape inference logic based on `first_dim_size`.
`src/ragged_to_dense.cpp`	Broadens `data` rank support and updates runtime evaluation to handle trailing dense dimensions.

Comments suppressed due to low confidence (1)

src/ragged_to_dense.cpp:66

pad_right input validation allows any integral element type (is_integral()), but evaluate() reads it as bool (inputs[5].data<bool>()). If a caller passes an integral (e.g., i32) scalar, this will interpret the buffer as bool and can produce incorrect results/UB. Tighten the check to require boolean element type (or read as the validated type and convert to bool).

    if (get_input_size() == 3 + 1 + 1 + 1) {
        OPENVINO_ASSERT(get_input_partial_shape(5).is_dynamic() || get_input_partial_shape(5).is_static() && get_input_partial_shape(5).rank().get_length() == 0,
                        "RaggedToDense: pad_right should be a boolean scalar.");

        OPENVINO_ASSERT(get_input_element_type(5).is_integral(),
                      "RaggedToDense: pad_right should be a boolean value.");
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T19:15:14Z

+    // Check whether input 1 is a Constant node, otherwise fall back to the lower-bound tensor
+    PartialShape out_shape({ Dimension::dynamic() });
+    auto infer_from_int32_tensor = [&](const ov::Tensor& t) {
+        if (t && t.get_element_type() == element::i32 && t.get_size() >= 1) {
+            out_shape = PartialShape({ static_cast<Dimension::value_type>(t.data<const int32_t>()[0]) });
+        }
+    };
+    if (const auto* first_dim_const = dynamic_cast<const Constant*>(get_input_node_ptr(1))) {
+        auto vals = first_dim_const->cast_vector<int32_t>();
+        if (!vals.empty()) {
+            out_shape = PartialShape({ static_cast<Dimension::value_type>(vals[0]) });
+        }
+    } else {
+        infer_from_int32_tensor(get_input_tensor(1).get_lower_value());


In validate_and_infer_types(), inferring a static output length from get_input_tensor(1).get_lower_value() can over-constrain the output shape (e.g., lower bound may be 0/unknown while runtime first_dim_size is larger). This can lead to incorrect compile-time shapes and downstream validation failures. Safer approach: only set a concrete out_shape when input 1 is a Constant; otherwise keep it dynamic (or only specialize when lower==upper and is known).

Suggested change

// Check whether input 1 is a Constant node, otherwise fall back to the lower-bound tensor

PartialShape out_shape({ Dimension::dynamic() });

auto infer_from_int32_tensor = [&](const ov::Tensor& t) {

if (t && t.get_element_type() == element::i32 && t.get_size() >= 1) {

out_shape = PartialShape({ static_cast<Dimension::value_type>(t.data<const int32_t>()[0]) });

}

};

if (const auto* first_dim_const = dynamic_cast<const Constant*>(get_input_node_ptr(1))) {

auto vals = first_dim_const->cast_vector<int32_t>();

if (!vals.empty()) {

out_shape = PartialShape({ static_cast<Dimension::value_type>(vals[0]) });

}

} else {

infer_from_int32_tensor(get_input_tensor(1).get_lower_value());

// Infer output shape: only specialize when input 1 is a Constant; otherwise keep it dynamic.

PartialShape out_shape({ Dimension::dynamic() });

if (const auto* first_dim_const = dynamic_cast<const Constant*>(get_input_node_ptr(1))) {

auto vals = first_dim_const->cast_vector<int32_t>();

if (!vals.empty()) {

out_shape = PartialShape({ static_cast<Dimension::value_type>(vals[0]) });

}

Copilot · 2026-03-23T19:15:14Z


-    // Suppose validate was called and set correct output shape
    // Take a target shape value for ragged dimension
-    size_t target_dim = inputs[3].data<const int32_t>()[0];
+    const size_t target_dim = static_cast<size_t>(inputs[3].data<const int32_t>()[0]);
+


padding_size/target_dim is validated as any integral type (get_input_element_type(3).is_integral_number()), but evaluate() unconditionally reads it as int32_t. This is inconsistent and will misread i64/u64 inputs. Either restrict input 3 to i32 in validate_and_infer_types() or handle other integral element types in evaluate() (e.g., via a type switch and conversion to size_t).

Copilot · 2026-03-23T19:15:14Z

+        for (size_t i = 0; i < nelems; ++i) {
+            const size_t data_len = static_cast<size_t>(ends[i] - begins[i]);
+            size_t target_len = (std::min(data_len, target_dim) * (1 - m_pad_max_length) +
+                                 target_dim * m_pad_max_length);
+
+            const auto begin = elems + elem_size * inner_elems * static_cast<size_t>(begins[i]);
+            const auto end = begin + elem_size * inner_elems * target_len;
            out_elems = std::copy(begin, end, out_elems);


When m_pad_max_length is true, target_len becomes target_dim regardless of data_len. If data_len < target_dim, the subsequent std::copy(begin, end, ...) will read past the end of inputs[2] (OOB). This is especially likely when padding to a fixed max length (see pad_max_length usage). Consider always using copy_len = min(data_len, target_dim) for the copy, and use default-value padding to reach target_dim when data_len < target_dim.

Copilot · 2026-03-23T19:15:15Z

+        for (size_t i = 0; i < nelems; ++i) {
+            const size_t data_len = static_cast<size_t>(ends[i] - begins[i]);
+            size_t target_len = (std::min(data_len, target_dim) * (1 - m_pad_max_length) +
+                                 target_dim * m_pad_max_length);
+            const size_t pad_len = target_dim - target_len;
+
            for (size_t j = 0; j < pad_len; ++j) {
-                out_elems = std::copy(default_value, default_value + elem_size, out_elems);
+                fill_default_block(out_elems);
            }
-            // fill actual values
-            auto begin = elems + elem_size * begins[i];
-            auto end = begin + elem_size * target_len;
+
+            const auto begin = elems + elem_size * inner_elems * static_cast<size_t>(begins[i]);
+            const auto end = begin + elem_size * inner_elems * target_len;
            out_elems = std::copy(begin, end, out_elems);

-            // construct padding mask
-            out_mask = std::fill_n(out_mask, pad_len, char(0));
-            out_mask = std::fill_n(out_mask, target_dim - pad_len, char(1));
+            mask_fill(out_mask, pad_len * inner_elems, char(0));
+            mask_fill(out_mask, (target_dim - pad_len) * inner_elems, char(1));
        }


Same m_pad_max_length/target_len issue exists in the left-padding branch: if m_pad_max_length is true and data_len < target_dim, target_len becomes target_dim and the copy uses target_len, which can read past the input buffer. Align the copy length with available data_len and pad with default_value to reach target_dim.

Copilot · 2026-03-23T19:15:15Z

+    // Number of dense elements per one ragged element (trailing dense dimensions).
+    // For 1D data, this equals 1.
+    const auto& data_shape = inputs[2].get_shape();
+    size_t inner_elems = 1;
+    for (size_t idx = 1; idx < data_shape.size(); ++idx) {
+        inner_elems *= data_shape[idx];
+    }


This change adds support for data tensors with rank > 1 (via inner_elems and appending trailing data_shape dims). There are existing tests for RaggedToDense, but they appear to only cover 1D data. Add test coverage for multi-rank data (e.g., values shaped [N, D] and [N, D1, D2]) and for both pad_right settings to ensure padding/mask behavior is correct across trailing dims.

github-actions Bot added the tokenizers_cpp label Feb 20, 2026

apaniukov and others added 4 commits February 20, 2026 14:08

WiP

9eff335

Fix shape infer

735b792

Merge pull request #2 from p-wysocki/ragged_tsr

a0b223f

Adjust shape infer, add new mask logic to avoid segfaults

Merge branch 'master' into extend-ragged-to-tensor

c0b4992

apaniukov changed the title ~~[WiP] Extend Support for RaggedTensorToTensor~~ Extend Support for RaggedTensorToTensor Mar 23, 2026

apaniukov requested a review from Copilot March 23, 2026 19:07

Copilot started reviewing on behalf of apaniukov March 23, 2026 19:07 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

apaniukov merged commit 6a170d2 into openvinotoolkit:master Mar 24, 2026
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Support for RaggedTensorToTensor#613

Extend Support for RaggedTensorToTensor#613
apaniukov merged 4 commits intoopenvinotoolkit:masterfrom
apaniukov:extend-ragged-to-tensor

apaniukov commented Feb 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

apaniukov commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

apaniukov commented Feb 20, 2026 •

edited

Loading