Skip to content

Commit 1daeff5

Browse files
Document parse_array primitive and CSV reduce workflow
- add parse_array to primitives reference and examples\n- document parse_array->reduce chain for serialized arrays\n- include delimiter examples for pipe and newline
1 parent cfdff1b commit 1daeff5

1 file changed

Lines changed: 30 additions & 0 deletions

File tree

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,7 @@ All settings are provided in the operation dict for rule serialization.
135135
| `normalize_boolean` | Normalize truthy/falsy values to booleans. | `truthy` (list, optional; defaults below)<br>`falsy` (list, optional; defaults below)<br>`strict` (bool, default `true`)<br>`default` (optional; used when `strict=false`) |
136136
| `normalize_text` | Apply a single text normalization. | `normalization` (`strip`, `lower`, `upper`, `remove_accents`, `remove_punctuation`, `remove_special_characters`) |
137137
| `offset` | Add an offset to numeric values. | `offset` (number) |
138+
| `parse_array` | Parse array-like values into a list for downstream operations. | `format` (`json` default, `delimiter`)<br>`delimiter` (string; used for `delimiter` format, default `|`, supports `\\n` for newline)<br>`item_type` (`auto`, `string`, `integer`, `float`, `boolean`)<br>`strict` (bool, default `true`)<br>`default` (optional; used when `strict=false`)<br>`allow_singleton` (bool, default `false`) |
138139
| `reduce` | Reduce a list of values to one value. | `reduction` (`any`, `none`, `all`, `one-hot`, `sum`); expects a list/tuple input; one-hot returns index or None |
139140
| `round` | Round numeric values to a given precision. | `precision` (int, >=0); uses Python `round` semantics |
140141
| `scale` | Multiply numeric values by a factor. | `scaling_factor` (number) |
@@ -162,6 +163,7 @@ Each operation is represented by a JSON-friendly dict. Examples:
162163
| `normalize_boolean` | `{"operation":"normalize_boolean","truthy":["yes","y","1"],"falsy":["no","n","0"],"strict":true}` |
163164
| `normalize_text` | `{"operation":"normalize_text","normalization":"lower"}` |
164165
| `offset` | `{"operation":"offset","offset":2.5}` |
166+
| `parse_array` | `{"operation":"parse_array","format":"json","item_type":"integer","strict":true}` |
165167
| `reduce` | `{"operation":"reduce","reduction":"one-hot"}` |
166168
| `round` | `{"operation":"round","precision":2}` |
167169
| `scale` | `{"operation":"scale","scaling_factor":0.453592}` |
@@ -176,3 +178,31 @@ If you use the `normalize_boolean` primitive without specifying `truthy` or
176178

177179
- truthy: `["true", "t", "yes", "y", "1", 1, true, "on"]`
178180
- falsy: `["false", "f", "no", "n", "0", 0, false, "off", ""]`
181+
182+
### ParseArray + Reduce for CSV data
183+
184+
When arrays are serialized as text in CSV (for example `"[8,8,8,8,6]"` or
185+
`"8|8|8|8|6"`), chain `parse_array` before `reduce`:
186+
187+
```json
188+
{
189+
"source": "week_hours",
190+
"target": "total_hours",
191+
"operations": [
192+
{"operation": "parse_array", "format": "json", "item_type": "integer", "strict": true},
193+
{"operation": "reduce", "reduction": "sum"}
194+
]
195+
}
196+
```
197+
198+
For delimiter input, use:
199+
200+
```json
201+
{"operation": "parse_array", "format": "delimiter", "delimiter": "|", "item_type": "integer"}
202+
```
203+
204+
For newline-separated input, use:
205+
206+
```json
207+
{"operation": "parse_array", "format": "delimiter", "delimiter": "\\n", "item_type": "integer"}
208+
```

0 commit comments

Comments
 (0)