Skip to content

Commit 5e2de0a

Browse files
authored
Documenting string and bool tensor types + validate bool blob (#842)
Add documentation for BOOL and STRING tensor + add overflow verification for bool tensor blob (+ test it)
1 parent d2565ac commit 5e2de0a

File tree

4 files changed

+119
-15
lines changed

4 files changed

+119
-15
lines changed

docs/commands.md

Lines changed: 80 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,38 @@ OK
5050
```
5151

5252
!!! note "Uninitialized Tensor Values"
53-
As both `BLOB` and `VALUES` are optional arguments, it is possible to use the `AI.TENSORSET` to create an uninitialized tensor.
53+
As both `BLOB` and `VALUES` are optional arguments, it is possible to use the `AI.TENSORSET` to create an uninitialized tensor (it will contain zeros at all entries).
5454

5555
!!! important "Using `BLOB` is preferable to `VALUES`"
5656
While it is possible to set the tensor using binary data or numerical values, it is recommended that you use the `BLOB` option. It requires fewer resources and performs better compared to specifying the values discretely.
5757

58+
###Boolean Tensors
59+
The possible values for a tensor of type `BOOL` are `0` and `1`. The size of every bool element in a blob should be 1 byte.
60+
61+
**Examples**
62+
63+
Here are two ways of creating the following boolean tensor: $\begin{equation*} A = \begin{bmatrix} 0 & 1 \\ 0 & 1 \\ \end{bmatrix} \end{equation*}$
64+
```
65+
redis> AI.TENSORSET my_bool_tensor BOOL 2 2 VALUES 0 1 0 1
66+
OK
67+
redis> AI.TENSORSET my_bool_tensor BOOL 2 2 BLOB "\x00\x01\x00\x01"
68+
OK
69+
```
70+
71+
###String Tensors
72+
String tensors are tensors in which every element is a single utf-8 string (may or may not be null-terminated). A string element can be at any length, and it cannot contain another null character except for the last one if it is a null-terminated string.
73+
A string tensor blob contains the encoded string elements concatenated, where the null character serves as a delimiter. Note that the size of string tensor blob equals to the total size of its elements, and it is not determined given the tensor's shapes (unlike in the rest of tensor types)
74+
75+
**Examples**
76+
77+
Here are two ways of creating the same 2X2 string tensor:
78+
```
79+
redis> AI.TENSORSET my_str_tensor STRING 2 2 VALUES first second third fourth
80+
OK
81+
redis> AI.TENSORSET my_bool_tensor STRING 2 2 BLOB "first\x00second\x00third\x00fourth\x00"
82+
OK
83+
```
84+
5885
## AI.TENSORGET
5986
The **`AI.TENSORGET`** command returns a tensor stored as key's value.
6087

@@ -81,51 +108,69 @@ Depending on the specified reply format:
81108
1. The tensor's shape as an Array consisting of an item per dimension
82109
* **BLOB**: the tensor's binary data as a String. If used together with the **META** option, the binary data string will put after the metadata in the array reply.
83110
* **VALUES**: Array containing the numerical representation of the tensor's data. If used together with the **META** option, the binary data string will put after the metadata in the array reply.
84-
* Default: **META** and **BLOB** are returned by default, in case that non of the arguments above is specified.
111+
* Default: **META** and **BLOB** are returned by default, in case that none of the arguments above is specified.
85112

86113

87114
**Examples**
88115

89-
Given a tensor value stored at the 'mytensor' key:
116+
Given tensor values stored at 'my_tensor' and _my_str_tensor keys:
90117

91118
```
92-
redis> AI.TENSORSET mytensor FLOAT 2 2 VALUES 1 2 3 4
119+
redis> AI.TENSORSET my_tensor FLOAT 2 2 VALUES 1 2 3 4
120+
OK
121+
redis> AI.TENSORSET my_str_tensor STRING 2 2 VALUES first second third fourth
93122
OK
94123
```
95124

96-
The following shows how to retrieve the tensor's metadata:
125+
The following shows how to retrieve the tensors' metadata:
97126

98127
```
99-
redis> AI.TENSORGET mytensor META
128+
redis> AI.TENSORGET my_tensor META
100129
1) "dtype"
101130
2) "FLOAT"
102131
3) "shape"
132+
4) 1) (integer) 2
133+
2) (integer) 2
134+
135+
redis> AI.TENSORGET my_str_tensor META
136+
1) "dtype"
137+
2) "STRING"
138+
3) "shape"
103139
4) 1) (integer) 2
104140
2) (integer) 2
105141
```
106142

107-
The following shows how to retrieve the tensor's values as an Array:
143+
The following shows how to retrieve the tensors' values as an Array:
108144

109145
```
110-
redis> AI.TENSORGET mytensor VALUES
146+
redis> AI.TENSORGET my_tensor VALUES
111147
1) "1"
112148
2) "2"
113149
3) "3"
114150
4) "4"
151+
152+
redis> AI.TENSORGET my_str_tensor VALUES
153+
1) "first"
154+
2) "second"
155+
3) "third"
156+
4) "fourth"
115157
```
116158

117-
The following shows how to retrieve the tensor's binary data as a String:
159+
The following shows how to retrieve the tensors' binary data as a String:
118160

119161
```
120-
redis> AI.TENSORGET mytensor BLOB
162+
redis> AI.TENSORGET my_tensor BLOB
121163
"\x00\x00\x80?\x00\x00\x00@\x00\x00@@\x00\x00\x80@"
164+
165+
redis> AI.TENSORGET my_str_tensor BLOB
166+
"first\x00second\x00third\x00fourth\x00"
122167
```
123168

124169

125-
The following shows how the combine the retrieval of the tensor's metadata, and the tensor's values as an Array:
170+
The following shows how the combine the retrieval of the tensors' metadata, and the tensors' values as an Array:
126171

127172
```
128-
redis> AI.TENSORGET mytensor META VALUES
173+
redis> AI.TENSORGET my_tensor META VALUES
129174
1) "dtype"
130175
2) "FLOAT"
131176
3) "shape"
@@ -136,19 +181,40 @@ redis> AI.TENSORGET mytensor META VALUES
136181
2) "2"
137182
3) "3"
138183
4) "4"
184+
185+
redis> AI.TENSORGET my_str_tensor META VALUES
186+
1) "dtype"
187+
2) "STRING"
188+
3) "shape"
189+
4) 1) (integer) 2
190+
2) (integer) 2
191+
5) "values"
192+
6) 1) "first"
193+
2) "second"
194+
3) "third"
195+
4) "fourth"
139196
```
140197

141-
The following shows how the combine the retrieval of the tensor's metadata, and binary data as a String:
198+
The following shows how the combine the retrieval of the tensors' metadata, and binary data as a String:
142199

143200
```
144-
redis> AI.TENSORGET mytensor META BLOB
201+
redis> AI.TENSORGET my_tensor META BLOB
145202
1) "dtype"
146203
2) "FLOAT"
147204
3) "shape"
148205
4) 1) (integer) 2
149206
2) (integer) 2
150207
5) "blob"
151208
6) "\x00\x00\x80?\x00\x00\x00@\x00\x00@@\x00\x00\x80@"
209+
210+
redis> AI.TENSORGET my_str_tensor META BLOB
211+
1) "dtype"
212+
2) "STRING"
213+
3) "shape"
214+
4) 1) (integer) 2
215+
2) (integer) 2
216+
5) "blob"
217+
6) "first\x00second\x00third\x00fourth\x00"
152218
```
153219

154220
!!! important "Using `BLOB` is preferable to `VALUES`"

src/redis_ai_objects/tensor.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,21 @@ static int _RAI_TensorFillWithValues(int argc, RedisModuleString **argv, RAI_Ten
122122
return REDISMODULE_OK;
123123
}
124124

125+
static int _RAI_TensorParseBooleansBlob(const char *tensor_blob, size_t blob_len, size_t tensor_len,
126+
RAI_Error *err) {
127+
128+
// if we encounter a non-boolean value - return an error
129+
for (size_t i = 0; i < blob_len - 1; i++) {
130+
if (tensor_blob[i] != 0 && tensor_blob[i] != 1) {
131+
if (err) {
132+
RAI_SetError(err, RAI_ETENSORSET, "ERR BOOL tensor elements must be 0 or 1");
133+
}
134+
return REDISMODULE_ERR;
135+
}
136+
}
137+
return REDISMODULE_OK;
138+
}
139+
125140
// This will populate the offsets array with the start position of every string element in the blob
126141
static int _RAI_TensorParseStringsBlob(const char *tensor_blob, size_t blob_len, size_t tensor_len,
127142
uint64_t *offsets, RAI_Error *err) {
@@ -300,6 +315,13 @@ RAI_Tensor *RAI_TensorCreateFromBlob(DLDataType data_type, const size_t *dims, i
300315
"ERR data length does not match tensor shape and type");
301316
return NULL;
302317
}
318+
if (data_type.code == kDLBool) {
319+
if (_RAI_TensorParseBooleansBlob(tensor_blob, blob_len, tensor_len, err) !=
320+
REDISMODULE_OK) {
321+
RAI_TensorFree(new_tensor);
322+
return NULL;
323+
}
324+
}
303325
}
304326

305327
// Copy the blob. We must copy instead of increasing the ref count since we don't have
@@ -656,6 +678,10 @@ int RAI_TensorSetData(RAI_Tensor *t, const char *data, size_t len) {
656678
}
657679
RedisModule_Free(RAI_TensorData(t));
658680
t->tensor.dl_tensor.data = RedisModule_Alloc(len);
681+
} else if (data_type.code == kDLBool) {
682+
if (_RAI_TensorParseBooleansBlob(data, len, RAI_TensorLength(t), NULL) != REDISMODULE_OK) {
683+
return 0;
684+
}
659685
}
660686
memcpy(RAI_TensorData(t), data, len);
661687
t->blobSize = len;

tests/flow/tests_common.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,10 @@ def test_common_tensorset_error_replies(env):
119119
check_error_message(env, con, "Number of string elements in data blob does not match tensor length",
120120
'AI.TENSORSET', 'z{0}', 'STRING', 2, 'BLOB', 'C-string\0followed by a non C-string')
121121

122+
# ERR in bool tensor blob - element is not 0/1
123+
check_error_message(env, con, "BOOL tensor elements must be 0 or 1",
124+
'AI.TENSORSET', 'z{0}', 'BOOL', 2, 'BLOB', "\x02\x01")
125+
122126

123127
def test_common_tensorget(env):
124128
con = get_connection(env, '{0}')

tests/module/LLAPI.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -313,13 +313,21 @@ int RAI_llapi_CreateTensor(RedisModuleCtx *ctx, RedisModuleString **argv, int ar
313313

314314
// create an empty tensor and validate that in contains zeros
315315
t = RedisAI_TensorCreate("INT8", dims, n_dims);
316-
int8_t expected_blob[8] = {0};
316+
int8_t expected_blob[4] = {0};
317317
if (t == NULL || RedisAI_TensorLength(t) != dims[0] * dims[1] ||
318318
memcmp(RedisAI_TensorData(t), expected_blob, 4) != 0) {
319319
return RedisModule_ReplyWithSimpleString(ctx, "empty tensor create test failed");
320320
}
321321
RedisAI_TensorFree(t);
322322

323+
// create an invalid bool tensor
324+
t = RedisAI_TensorCreate("BOOL", dims, n_dims);
325+
uint8_t data_blob[4] = {2, 0, 0, 0}; // This value is invalid for bool type
326+
if (RedisAI_TensorSetData(t, (const char *)data_blob, 4) != 0) {
327+
return RedisModule_ReplyWithSimpleString(ctx, "invalid bool tensor data set test failed");
328+
}
329+
RedisAI_TensorFree(t);
330+
323331
// This should fail since the blob contains only one null-terminated string, while the tensor's
324332
// len should be 4.
325333
RAI_Tensor *t1 = RedisAI_TensorCreate("STRING", dims, n_dims);

0 commit comments

Comments
 (0)