Skip to content
203 changes: 203 additions & 0 deletions doc/specs/2026-05-19-insert-operation-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# Insert Operation for Block Sequences

**Date:** 2026-05-19
**Issue:** #24

## Problem

There is no way to insert an item at a specific position in a YAML sequence while preserving comments on existing items. The only options are `append` (end only) or `replace` on the entire list (loses internal comments).

Use case: in usethis, pre-commit hooks have a defined ordering. Adding a new hook repo at a specific position requires inserting between existing repos.

## Approach

Direct string surgery in yamltrip's Rust layer (`document.rs`), following the same pattern as `apply_complex_replace`. The operation is intercepted in `apply_patches_impl` before reaching yamlpatch, since `yamlpatch::Op` has no `InsertAt` variant and we cannot upstream to yamlpatch.

## Python API

### Document (immutable)

```python
doc = doc.insert('repos', index=1, value={'repo': 'https://...', 'hooks': [...]})
doc = doc.insert('repos', index=-1, value='before_last') # negative index
doc = doc.insert('repos', index=0, value='prepend') # prepend
```

**Signature:** `insert(*keys: KeyPart, index: int, value: Any) -> Document`

### Editor (mutable)

```python
with Editor('file.yaml') as ed:
ed.insert('repos', index=1, value=new_repo)
```

**Signature:** `insert(*keys: KeyPart, index: int, value: Any) -> None`

### Index Semantics

Follows Python `list.insert()`:
- Non-negative indices insert before the item at that position.
- Negative indices count from the end: `-1` means before the last item, `-2` before the second-to-last, etc.
- Out-of-range indices clamp: `index >= len` acts like append, `index <= -len` acts like prepend.

## Rust Layer

### The Problem with `yamlpatch::Op`

`yamlpatch::Op` is an external enum we cannot extend. We need a way to represent `InsertAt` in the patch pipeline.

### Solution: Extend `PyOp` with a local variant

Introduce an internal enum in `ops.rs` that wraps both yamlpatch ops and local-only ops:

```rust
/// Operations that yamltrip handles locally (not in yamlpatch).
#[derive(Clone, Debug)]
pub enum LocalOp {
InsertAt { index: i64, value: serde_yaml::Value },
}

/// The inner representation of a PyOp.
#[derive(Clone, Debug)]
pub enum OpInner {
Yamlpatch(yamlpatch::Op<'static>),
Local(LocalOp),
}
```

`PyOp` changes from holding `yamlpatch::Op<'static>` directly to holding `OpInner`. The existing `#[staticmethod]` constructors (`replace`, `add`, `remove`, `append`, etc.) wrap in `OpInner::Yamlpatch(...)`. A new `insert_at` constructor wraps in `OpInner::Local(LocalOp::InsertAt { ... })`.

### Patch Routing in `apply_patches_impl`

The existing logic batches yamlpatch ops and flushes them together. `InsertAt` (and any future local ops) flush the batch first, then execute via dedicated string surgery — identical to the complex-replace pattern:

```
for each patch:
if patch is Local(InsertAt { index, value }):
flush yamlpatch batch
apply_insert_at(&current_doc, &route, index, &value)
else:
add to yamlpatch batch
flush remaining batch
```

### `Op.insert_at` Python-facing constructor

```python
op = Op.insert_at(index=1, value={'repo': '...'})
```

### `kind` getter

Returns `"insert_at"` for the new variant.

## Algorithm: `apply_insert_at`

Input: `yamlpath::Document`, `yamlpath::Route` (pointing to the sequence), `index: i64`, `serde_yaml::Value`.

1. **Query the sequence feature** — `document.query_exact(&route)` → verify `FeatureKind::BlockSequence`. Error if not found, or if it's a flow sequence or non-sequence.

2. **Count items** — Iterate `route + [0]`, `route + [1]`, ... until `query_exists` returns false. Let `len` = item count.

3. **Resolve index (Python semantics):**
```
if index < 0:
resolved = max(0, len + index)
else:
resolved = min(index, len)
```

4. **Delegate to append if `resolved == len`:**
Build a `yamlpatch::Op::Append { value }` patch and apply via yamlpatch. This reuses existing append logic including its indentation handling.

5. **Locate insertion byte position:**
- Query `route + [resolved]` with `query_exact` to get the item feature.
- The insertion point is the byte offset of the item's `- ` dash (accounting for leading whitespace on its line).
- Walk backward from `feature.location.byte_span.0` to the start of the line to include the leading whitespace/indentation.

6. **Determine indentation:**
- Extract the indentation string from the target item's line (characters from line start to `- `).
- This becomes the indentation for the new item.

7. **Serialize new value as a sequence item:**
- Scalar: `- value\n`
- Mapping: `- key1: val1\n key2: val2\n...` (indented relative to the dash)
- Sequence: `- - item1\n - item2\n...` (nested sequence)
- Use `serde_yaml::to_string(&value)` then reformat as a sequence item with correct indentation.

8. **String splice** — Insert the formatted item text at the computed byte position (before the target item's line start).

9. **Re-parse** — `yamlpath::Document::new(patched_source)` to validate.

### Serialization Detail

The new item text must be a complete block sequence entry. Given base indent `I` (spaces before `-`):

- For a scalar value `v`: `"{I}- {v}\n"`
- For a mapping `{k1: v1, k2: v2}`:
```
{I}- k1: v1
{I} k2: v2
```
- For a sequence `[a, b]`:
```
{I}- - a
{I} - b
```

Use `serde_yaml::to_string` for the value portion, then prefix the first line with `{I}- ` and subsequent lines with `{I} ` (indent = base + 2).

## Error Cases

| Condition | Error |
|-----------|-------|
| Path doesn't exist | `PatchError("Path not found: ...")` |
| Target is not a sequence | `PatchError("Value at ... is not a sequence")` |
| Target is a flow sequence | `PatchError("insert requires a block sequence; flow sequences are not supported")` |

No `IndexError` is raised — indices always clamp (matching `list.insert()` behavior).

## Design Decisions

**Leading comments on sequence items are not tracked by insertion.**

When a sequence item has comment lines immediately above it (e.g. `# description` before `- value`), `query_exact` for that item returns only the value's span — the comment is not part of the feature location. The insertion point is computed as the start of the item's `- ` line, so a new item inserted before index N will appear *between* any leading comments and the item at index N. The comments visually become associated with the newly inserted item.

This is by design because:
- A comment above a block sequence item may describe the entire block (the sequence as a whole, or a group of items), not just the item on the next line. There is no way to determine intent from syntax alone.
- yamlpath provides no mechanism to determine which comments "belong to" a given sequence item (comments are free-floating in YAML)
- The existing behavior is predictable: insertion always occurs at the `- ` line boundary

Users who need to preserve comment–item associations should structure their insertions to avoid splitting comments from their items (e.g. insert *after* the commented item rather than before it).

## Files Changed

| File | Change |
|------|--------|
| `src/ops.rs` | Add `LocalOp` enum, `OpInner` wrapper, `insert_at` constructor, update `kind`/`__repr__` |
| `src/document.rs` | Add `apply_insert_at` function, update `apply_patches_impl` routing |
| `src/yamltrip/document.py` | Add `Document.insert()` method |
| `src/yamltrip/editor.py` | Add `Editor.insert()` method |
| `src/yamltrip/_core.pyi` | Add type stub for `Op.insert_at` |
| `tests/test_core_ops.py` | Tests for `Op.insert_at` construction |
| `tests/test_document.py` | Tests for `Document.insert()` |
| `tests/test_editor.py` | Tests for `Editor.insert()` |

## Test Cases

1. Insert at beginning (`index=0`) of multi-item list
2. Insert in middle (`index=1` of 3-item list)
3. Insert at end (`index=len`) — should produce same result as append
4. Negative index (`index=-1` inserts before last)
5. Negative out-of-range (`index=-100` prepends)
6. Positive out-of-range (`index=100` appends)
7. Complex value (dict with nested structure)
8. Comment preservation — comments remain in the document (note: leading comments above an item may visually shift to the inserted item; see Known Limitations)
9. Empty sequence (`index=0`)
10. Single-item sequence (insert before and after)
11. Value with special YAML characters (strings needing quoting)
12. Nested sequence insertion (list within list)
13. Target is not a sequence → error
14. Target is a flow sequence → error
15. Path doesn't exist → error
153 changes: 131 additions & 22 deletions src/document.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use pyo3::prelude::*;

use crate::ops::PyPatch;
use crate::ops::{LocalOp, OpInner, PyPatch};
use crate::types::{PyFeature, PyFeatureKind, PyLocation, PyRoute};

/// A parsed YAML document.
Expand Down Expand Up @@ -161,38 +161,47 @@ pub(crate) fn apply_patches_impl(
let mut batch: Vec<usize> = Vec::new();

for (idx, patch) in patches.iter().enumerate() {
let is_complex_replace = matches!(
&patch.operation.inner,
yamlpatch::Op::Replace(v) if matches!(v, serde_yaml::Value::Mapping(_) | serde_yaml::Value::Sequence(_))
);
let needs_direct_handling = match &patch.operation.inner {
OpInner::Yamlpatch(yamlpatch::Op::Replace(v)) => {
matches!(
v,
serde_yaml::Value::Mapping(_) | serde_yaml::Value::Sequence(_)
)
}
OpInner::Local(_) => true,
_ => false,
};

if is_complex_replace {
if needs_direct_handling {
// Flush any pending yamlpatch batch first
if !batch.is_empty() {
let yaml_patches: Vec<yamlpatch::Patch<'_>> = batch
.iter()
.map(|&i| yamlpatch::Patch {
route: patches[i].route.to_yamlpath_route(),
operation: patches[i].operation.inner.clone(),
.filter_map(|&i| {
patches[i]
.operation
.as_yamlpatch_op()
.map(|op| yamlpatch::Patch {
route: patches[i].route.to_yamlpath_route(),
operation: op.clone(),
})
})
.collect();
current_doc = yamlpatch::apply_yaml_patches(&current_doc, &yaml_patches)
.map_err(|e| e.to_string())?;
batch.clear();
}

// Apply the complex replace directly.
// NOTE: This re-parses the document for each complex replace (O(N) parses).
// This is consistent with yamlpatch::apply_yaml_patches, which also
// re-parses per patch via apply_single_patch. A bottom-up single-pass
// approach could avoid this, but isn't warranted until profiling shows
// it matters.
let route = patch.route.to_yamlpath_route();
let value = match &patch.operation.inner {
yamlpatch::Op::Replace(v) => v,
match &patch.operation.inner {
OpInner::Yamlpatch(yamlpatch::Op::Replace(value)) => {
current_doc = apply_complex_replace(&current_doc, &route, value)?;
}
OpInner::Local(LocalOp::InsertAt { index, value }) => {
current_doc = apply_insert_at(&current_doc, &route, *index, value)?;
}
_ => unreachable!(),
};
current_doc = apply_complex_replace(&current_doc, &route, value)?;
}
} else {
batch.push(idx);
}
Expand All @@ -202,9 +211,14 @@ pub(crate) fn apply_patches_impl(
if !batch.is_empty() {
let yaml_patches: Vec<yamlpatch::Patch<'_>> = batch
.iter()
.map(|&i| yamlpatch::Patch {
route: patches[i].route.to_yamlpath_route(),
operation: patches[i].operation.inner.clone(),
.filter_map(|&i| {
patches[i]
.operation
.as_yamlpatch_op()
.map(|op| yamlpatch::Patch {
route: patches[i].route.to_yamlpath_route(),
operation: op.clone(),
})
})
.collect();
current_doc = yamlpatch::apply_yaml_patches(&current_doc, &yaml_patches)
Expand All @@ -214,6 +228,101 @@ pub(crate) fn apply_patches_impl(
Ok(current_doc)
}

fn apply_insert_at(
doc: &yamlpath::Document,
route: &yamlpath::Route<'_>,
index: i64,
value: &serde_yaml::Value,
) -> Result<yamlpath::Document, String> {
let source = doc.source();

// 1. Query the sequence feature and verify it's a block sequence
let feature = doc
.query_exact(route)
.map_err(|e| format!("Query failed: {e}"))?
.ok_or_else(|| "insert_at: sequence not found at route".to_string())?;

if feature.kind() != yamlpath::FeatureKind::BlockSequence {
return Err(format!(
"insert_at: expected BlockSequence, got {:?}",
feature.kind()
));
}

// 2. Count items in the sequence
let mut len: i64 = 0;
loop {
let item_route = route.with_key(yamlpath::Component::Index(len as usize));
if !doc.query_exists(&item_route) {
break;
}
len += 1;
}

// 3. Resolve index with Python list.insert semantics
let resolved = if index < 0 {
0i64.max(len + index)
} else {
index.min(len)
} as usize;

// 4. Delegate to append if inserting at the end
if resolved as i64 == len {
let patch = yamlpatch::Patch {
route: route.clone(),
operation: yamlpatch::Op::Append {
value: value.clone(),
},
};
return yamlpatch::apply_yaml_patches(doc, &[patch]).map_err(|e| e.to_string());
}

// 5. Locate insertion byte position
let item_route = route.with_key(yamlpath::Component::Index(resolved));
let item_feature = doc
.query_exact(&item_route)
.map_err(|e| format!("Query failed: {e}"))?
.ok_or_else(|| format!("insert_at: item at index {resolved} not found"))?;

let item_start = item_feature.location.byte_span.0;
let line_start = source[..item_start]
.rfind('\n')
.map(|nl| nl + 1)
.unwrap_or(0);
Comment thread
nathanjmcdougall marked this conversation as resolved.

// 6. Determine indentation from the existing item's line prefix
let prefix = &source[line_start..item_start];
let dash_pos = prefix.find('-').unwrap_or(prefix.len());
let base_indent = &prefix[..dash_pos];

// 7. Serialize new value as a sequence item
let serialized =
serde_yaml::to_string(value).map_err(|e| format!("Failed to serialize YAML: {e}"))?;
let trimmed = serialized.trim_end_matches('\n');

let mut item_text = String::new();
for (i, line) in trimmed.lines().enumerate() {
if i == 0 {
item_text.push_str(base_indent);
item_text.push_str("- ");
item_text.push_str(line);
} else {
item_text.push('\n');
item_text.push_str(base_indent);
item_text.push_str(" ");
item_text.push_str(line);
}
}
item_text.push('\n');

// 8. String splice — insert before the target item's line
let mut result = source.to_string();
result.insert_str(line_start, &item_text);

// 9. Re-parse to validate
yamlpath::Document::new(result).map_err(|e| format!("Failed to re-parse YAML: {e}"))
}

fn apply_complex_replace(
doc: &yamlpath::Document,
route: &yamlpath::Route<'_>,
Expand Down
Loading
Loading