Skip to content

Commit 87d3628

Browse files
committed
Add RFC-0201 implementation plan and two mission files
- docs/plans/: RFC-0201 Phase 2a-2e (BYTEA core) and Phase 2f (DFP/BigInt) design - missions/open/rfc-0201-phase-2a-2b-2c-2e-bytea-core.md: BYTEA core blob type mission - missions/open/rfc-0201-phase-2f-dfp-bigint.md: DFP/BigInt dispatcher integration mission Key design decisions: - Value::Blob as first-class variant (not Extension), wire tag 12 - compare_blob with BlobOrdering (bytes-first, length tiebreaker) - SchemaColumn.blob_length for BYTEA(N) constraint - DFP wire tag 13 (RFC-0104), BigInt wire tag 14 (RFC-0110) - NUMERIC_SPEC_VERSION bump to 2 after BigInt
1 parent 50029b8 commit 87d3628

3 files changed

Lines changed: 629 additions & 0 deletions

File tree

Lines changed: 341 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
# Plan: RFC-0201 Blob Implementation Missions
2+
3+
## Context
4+
5+
RFC-0201 (Binary BLOB Type for Deterministic Hash Storage) has been moved to Accepted status in the CipherOcto repository. The spec defines native BYTEA/BLOB support for cryptographic hash storage (SHA256, HMAC-SHA256). Implementation must happen in the **stoolap** codebase (external dependency at `github.com:CipherOcto/stoolap`, branch `feat/blockchain-sql`).
6+
7+
Two separate missions are needed:
8+
- **Mission A**: Phase 2a/2b/2c/2e — Core Blob (parser, DataType, Value, serialization, comparison, projection)
9+
- **Mission B**: Phase 2f — DFP/BigInt wire format integration
10+
11+
---
12+
13+
## Mission A: RFC-0201 Phase 2a/2b/2c/2e — BYTEA Core Blob Type
14+
15+
### 1. DataType Enum (`src/core/types.rs`)
16+
17+
Add `Blob = 10` as the next free variant:
18+
19+
```rust
20+
/// Binary large object for cryptographic hashes and binary data
21+
Blob = 10,
22+
```
23+
24+
Update `FromStr` to parse BYTEA/BINARY/VARBINARY:
25+
26+
```rust
27+
"BYTEA" | "BLOB" | "BINARY" | "VARBINARY" => Ok(DataType::Blob),
28+
```
29+
30+
Update `is_numeric` → no change (Blob is not numeric). Update `is_orderable``!matches!(..., DataType::Blob | DataType::Json | DataType::Vector)` — Blob IS orderable via byte comparison.
31+
32+
**Note**: `DataType::as_u8()` and `from_u8()` auto-handle new variants via `#[repr(u8)]`.
33+
34+
### 2. SchemaColumn Extension (`src/core/schema.rs`)
35+
36+
Add `blob_length: Option<u32>` to `SchemaColumn`:
37+
38+
```rust
39+
/// Fixed length for BLOB columns (None = variable length)
40+
pub blob_length: Option<u32>,
41+
```
42+
43+
Initialize to `None` in all constructors. Add builder method:
44+
45+
```rust
46+
pub fn with_blob_length(mut self, len: u32) -> Self {
47+
self.blob_length = Some(len);
48+
self
49+
}
50+
```
51+
52+
### 3. Value::Blob Variant (`src/core/value.rs`)
53+
54+
Add first-class Blob variant (NOT Extension):
55+
56+
```rust
57+
/// Binary large object — stored as CompactArc<[u8]> for zero-copy sharing.
58+
/// INVARIANT: The Arc is always heap-allocated; there is no inline/blob case.
59+
Blob(CompactArc<[u8]>),
60+
```
61+
62+
**Remove** the comment at line 68 mentioning "Blob" as a future Extension type.
63+
64+
### 4. Blob Constructors in Value
65+
66+
```rust
67+
impl Value {
68+
/// Create a Blob from a byte slice (copies into CompactArc)
69+
pub fn blob(data: &[u8]) -> Self {
70+
Value::Blob(CompactArc::from(data))
71+
}
72+
73+
/// Create a Blob from an owned Vec (no copy — takes ownership of Arc)
74+
pub fn blob_from_vec(data: Vec<u8>) -> Self {
75+
Value::Blob(CompactArc::from(data))
76+
}
77+
78+
/// Create a Blob from a CompactArc (zero-copy)
79+
pub fn blob_from_arc(data: CompactArc<[u8]>) -> Self {
80+
Value::Blob(data)
81+
}
82+
83+
/// Extract blob data as byte slice
84+
pub fn as_blob(&self) -> Option<&[u8]> {
85+
match self {
86+
Value::Blob(data) => Some(data),
87+
_ => None,
88+
}
89+
}
90+
}
91+
```
92+
93+
### 5. compare_blob and BlobOrdering (`src/core/value.rs`)
94+
95+
Per RFC-0201 Section on Comparison Semantics:
96+
97+
```rust
98+
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
99+
pub enum BlobOrdering {
100+
Less,
101+
Equal,
102+
Greater,
103+
}
104+
105+
/// Compare two blobs byte-by-byte in deterministic order
106+
///
107+
/// Algorithm:
108+
/// 1. Compare bytes in ascending index order until difference found
109+
/// 2. If all compared bytes are equal, compare lengths (shorter = less)
110+
///
111+
/// Determinism: This ordering is canonical and reproducible.
112+
fn compare_blob(a: &[u8], b: &[u8]) -> BlobOrdering {
113+
let min_len = a.len().min(b.len());
114+
for i in 0..min_len {
115+
match a[i].cmp(&b[i]) {
116+
Ordering::Less => return BlobOrdering::Less,
117+
Ordering::Greater => return BlobOrdering::Greater,
118+
Ordering::Equal => continue,
119+
}
120+
}
121+
match a.len().cmp(&b.len()) {
122+
Ordering::Less => BlobOrdering::Less,
123+
Ordering::Greater => BlobOrdering::Greater,
124+
Ordering::Equal => BlobOrdering::Equal,
125+
}
126+
}
127+
```
128+
129+
**Important**: `BlobOrdering` is NOT `Ordering` — the RFC intentionally uses a separate type. The `Ord` impl on `BlobOrdering` is for use in BTree contexts, but `compare_blob` returns `BlobOrdering`.
130+
131+
### 6. Value::compare Integration (`src/core/value.rs`)
132+
133+
In `compare_same_type`, add:
134+
135+
```rust
136+
(Value::Blob(a), Value::Blob(b)) => {
137+
Ok(match compare_blob(a, b) {
138+
BlobOrdering::Less => Ordering::Less,
139+
BlobOrdering::Equal => Ordering::Equal,
140+
BlobOrdering::Greater => Ordering::Greater,
141+
})
142+
}
143+
```
144+
145+
In `PartialEq` for Value:
146+
147+
```rust
148+
(Value::Blob(a), Value::Blob(b)) => a == b,
149+
```
150+
151+
In `Ord` for Value:
152+
153+
```rust
154+
(Value::Blob(a), Value::Blob(b)) => a.cmp(b),
155+
```
156+
157+
In `Hash` for Value:
158+
159+
```rust
160+
Value::Blob(data) => {
161+
// Include discriminant (10) and blob data in hash
162+
let mut hasher = state;
163+
hasher.write_u8(10);
164+
hasher.write(data);
165+
}
166+
```
167+
168+
### 7. Display and as_string for Blob
169+
170+
In `fmt::Display`:
171+
172+
```rust
173+
Value::Blob(data) => {
174+
// Display as hex string (first 8 bytes + "..." if long)
175+
if data.len() <= 16 {
176+
write!(f, "{}", hex::encode(data))
177+
} else {
178+
write!(f, "{}...", hex::encode(&data[..16]))
179+
}
180+
}
181+
```
182+
183+
In `as_string`:
184+
185+
```rust
186+
Value::Blob(data) => Some(hex::encode(data)),
187+
```
188+
189+
In `as_str` → Blob does NOT implement `as_str` (binary data, not UTF-8).
190+
191+
### 8. Type Coercion for Blob
192+
193+
In `cast_to_type``DataType::Blob`: pass through if already Blob, error otherwise.
194+
195+
In `cast_to_type` FROM Blob → Text: hex encoding.
196+
197+
### 9. Serialization (`src/storage/mvcc/persistence.rs`)
198+
199+
**Tag 12** is the next free tag for Blob:
200+
201+
```rust
202+
Value::Blob(data) => {
203+
buf.push(12);
204+
buf.extend_from_slice(&(data.len() as u32).to_le_bytes());
205+
buf.extend_from_slice(data);
206+
}
207+
```
208+
209+
**Deserialization** for tag 12:
210+
211+
```rust
212+
12 => {
213+
// Blob
214+
if rest.len() < 4 {
215+
return Err(Error::internal("missing blob length"));
216+
}
217+
let len = u32::from_le_bytes(rest[..4].try_into().unwrap()) as usize;
218+
if rest.len() < 4 + len {
219+
return Err(Error::internal("missing blob data"));
220+
}
221+
let blob_data = CompactArc::from(&rest[4..4 + len]);
222+
Ok(Value::Blob(blob_data))
223+
}
224+
```
225+
226+
### 10. DDL Parser (`src/executor/ddl.rs`)
227+
228+
Currently at line ~1131: `BLOB | "BINARY" | "VARBINARY" => Ok(DataType::Text)`. Change to:
229+
230+
```rust
231+
"BYTEA" | "BLOB" | "BINARY" | "VARBINARY" => Ok(DataType::Blob),
232+
```
233+
234+
Handle `BYTEA(N)` length constraint via regex in the DDL column parsing path, storing in `SchemaColumn.blob_length`.
235+
236+
### 11. Projection/Selection (Phase 2c)
237+
238+
`Value::Blob` must serialize correctly in result set encoding. The existing `Display` impl for `Value` handles this — Blob displays as hex.
239+
240+
### 12. Equality in Expression Evaluation (Phase 2b)
241+
242+
The `Value::compare` method already handles Blob via the new arm in `compare_same_type`. The expression VM calls `col_val.compare(val)` — no changes needed to the VM, only to Value's comparison logic.
243+
244+
### 13. Phase 2a: Hash Index for Blob Columns
245+
246+
The existing `HashIndex` uses ahash (not SipHash). Per RFC-0201:
247+
248+
- **Acceptable for Phase 2a**: ahash is fine for non-consensus use. SipHash with persistent key is the production requirement for the hash index, but ahash is acceptable for correctness verification first.
249+
- **Implementation**: `HashIndex` already handles arbitrary `Value` types via `Value::hash`. The key insight is that `HashIndex` stores `Value::Blob` as a key — no structural changes needed. Only the hasher would differ (SipHash vs ahash), which is a Phase 2a follow-up.
250+
251+
Acceptance for Phase 2a: `CREATE INDEX ... USING HASH ON blob_column` creates a functional hash index that correctly resolves `WHERE blob_column = $1` lookups.
252+
253+
### 14. Null Handling
254+
255+
Per RFC-0201: `ALTER TABLE ADD COLUMN BYTEA ... NOT NULL` and `ALTER TABLE ADD COLUMN BYTEA ... NULL` are both **rejected** until null bitmap integration is complete. The schema validation layer must reject any `CREATE TABLE` or `ALTER TABLE` that introduces a BYTEA column with a clear error: "BYTEA columns not supported: null bitmap integration required".
256+
257+
### 15. Tests
258+
259+
Per RFC-0201 test vectors, implement:
260+
- Blob round-trip: `Value::Blob(bytes)` → serialize → deserialize → `Value::Blob(same_bytes)`
261+
- `compare_blob` deterministic ordering (bytes-first, length as tiebreaker)
262+
- `BYTEST` in SQL parser
263+
- `CREATE TABLE t(key_hash BYTEA(32) NOT NULL)` rejected
264+
- Hash index creation and lookup for BYTEA column
265+
266+
---
267+
268+
## Mission B: RFC-0201 Phase 2f — DFP and BigInt Dispatcher Integration
269+
270+
Phase 2f implements `serialize_dfp`/`deserialize_dfp` and `serialize_bigint`/`deserialize_bigint` in the RFC-0201 dispatcher, replacing the `Err(DCS_INVALID_STRUCT)` stubs. Both RFC-0104 (DFP, 24-byte canonical format) and RFC-0110 (BigInt, little-endian limb array) are Accepted.
271+
272+
### Prerequisites
273+
274+
- `octo-determin` crate (already a dependency in stoolap — used for `Dfp`, `Dqa`)
275+
- RFC-0104 and RFC-0110 wire format specs must be available
276+
277+
### DFP (RFC-0104)
278+
279+
The `octo-determin::Dfp` type already exists in stoolap (used via `Value::dfp()` etc.). The missing piece is the **dispatcher integration**:
280+
281+
In the RFC-0201 dispatcher pseudocode (implemented in stoolap's query/serialization layer):
282+
283+
```rust
284+
(Value::Dfp(dfp_val), ColumnType::DeterministicFloat) => {
285+
let encoding = DfpEncoding::from_dfp(dfp_val).to_bytes();
286+
Ok(serialize_dfp(&encoding))
287+
}
288+
```
289+
290+
The wire format per RFC-0104 is **24 bytes**: sign(1) + exponent(2) + mantissa(21). `octo_determin::DfpEncoding` handles the conversion.
291+
292+
### BigInt (RFC-0110)
293+
294+
The `octo-determin::BigInt` type may not exist yet in stoolap's scope. Per RFC-0110, the wire format is:
295+
- 4-byte little-endian limb count N
296+
- N × 8-byte little-endian limbs, least-significant first
297+
298+
```rust
299+
(Value::BigInt(bigint_val), ColumnType::BigInt) => {
300+
Ok(serialize_bigint(bigint_val))
301+
}
302+
```
303+
304+
### Dispatcher Integration Points
305+
306+
The "dispatcher" in RFC-0201 terminology maps to stoolap's query/serialization layer. Specifically:
307+
308+
1. **`serialize_value`** (in `src/storage/mvcc/persistence.rs`) — currently has no DFP or BigInt arm. Add:
309+
```rust
310+
Value::Dfp(dfp) => { buf.push(13); buf.extend_from_slice(&DfpEncoding::from_dfp(dfp).to_bytes()); }
311+
Value::BigInt(bigint) => { /* limb serialization */ }
312+
```
313+
314+
2. **`deserialize_value`** — currently returns `Err` for unknown tags. Add deserialization arms for tags 13 (DFP) and 14 (BigInt).
315+
316+
3. **`Value::from_typed`** and **`cast_to_type`** — add DFP and BigInt coercion paths.
317+
318+
### NUMERIC_SPEC_VERSION
319+
320+
Per RFC-0201 Phase 1 item and RFC-0110 governance, after implementing BigInt: bump `NUMERIC_SPEC_VERSION` to 2. This is a configuration constant in the serialization layer.
321+
322+
---
323+
324+
## Dependencies
325+
326+
- **Mission A**: No external RFC dependencies. RFC-0127 (DCS Blob Amendment) is already Accepted and provides the wire format foundation.
327+
- **Mission B**: RFC-0104 (DFP wire format) and RFC-0110 (BigInt wire format) are both Accepted.
328+
329+
---
330+
331+
## Verification
332+
333+
After Mission A:
334+
- `cargo test` passes with new Blob tests
335+
- `cargo clippy --all-targets --all-features -- -D warnings` passes
336+
- `CREATE TABLE t(key_hash BYTEA(32))` parses without error
337+
- `SELECT * FROM t WHERE key_hash = $1` uses hash index
338+
339+
After Mission B:
340+
- DFP and BigInt round-trip through serialize/deserialize
341+
- `NUMERIC_SPEC_VERSION = 2` after BigInt implementation

0 commit comments

Comments
 (0)