|
| 1 | +# Storage Comparison for Short-Duration VM Instances |
| 2 | + |
| 3 | +## Context |
| 4 | +For short-duration VM instances (e.g., GitHub Actions, Railway deploys, ephemeral containers), choosing the right storage strategy is critical for StackMemory. |
| 5 | + |
| 6 | +## Storage Options Comparison |
| 7 | + |
| 8 | +### 1. SQLite (Current Implementation) |
| 9 | +**Pros:** |
| 10 | +- Zero network latency (local file) |
| 11 | +- No external dependencies |
| 12 | +- ACID compliant transactions |
| 13 | +- Good for reads (very fast) |
| 14 | +- Single file portability |
| 15 | + |
| 16 | +**Cons:** |
| 17 | +- Lost when VM terminates |
| 18 | +- No built-in replication |
| 19 | +- File size grows over time (~5-50MB typical) |
| 20 | +- Requires disk I/O |
| 21 | +- No concurrent write scaling |
| 22 | + |
| 23 | +**Best for:** Development, testing, single-instance apps with <1000 frames |
| 24 | + |
| 25 | +### 2. Git Storage (Files + Commits) |
| 26 | +**Pros:** |
| 27 | +- Automatic versioning |
| 28 | +- Survives VM restarts (pushed to repo) |
| 29 | +- Zero infrastructure cost |
| 30 | +- Works with any Git provider |
| 31 | +- Natural branching/merging |
| 32 | + |
| 33 | +**Cons:** |
| 34 | +- Slow for queries (file scanning) |
| 35 | +- Not suitable for frequent writes |
| 36 | +- Git history bloat |
| 37 | +- No indexing/relationships |
| 38 | +- Poor for structured queries |
| 39 | + |
| 40 | +**Best for:** Configuration, skills, small datasets (<100 items) |
| 41 | + |
| 42 | +### 3. Skills.md / JSON Files |
| 43 | +**Pros:** |
| 44 | +- Human readable |
| 45 | +- Easy to edit manually |
| 46 | +- Version control friendly |
| 47 | +- No dependencies |
| 48 | +- Fast to implement |
| 49 | + |
| 50 | +**Cons:** |
| 51 | +- No querying capability |
| 52 | +- Full file must be parsed |
| 53 | +- No concurrent access |
| 54 | +- Limited to small datasets |
| 55 | +- No relationships |
| 56 | + |
| 57 | +**Best for:** Static configuration, learned patterns, skills definitions |
| 58 | + |
| 59 | +### 4. Hosted Database (PostgreSQL/MySQL) |
| 60 | +**Pros:** |
| 61 | +- Data persists across deployments |
| 62 | +- Professional grade performance |
| 63 | +- Concurrent access |
| 64 | +- Advanced queries |
| 65 | +- Proper indexing |
| 66 | + |
| 67 | +**Cons:** |
| 68 | +- Network latency (5-50ms per query) |
| 69 | +- Requires connection management |
| 70 | +- External dependency |
| 71 | +- Cost ($5-50/month) |
| 72 | +- Connection limits on free tiers |
| 73 | + |
| 74 | +**Best for:** Production apps, multi-instance, >1000 frames |
| 75 | + |
| 76 | +## Recommendations by Use Case |
| 77 | + |
| 78 | +### GitHub Actions / CI/CD (Duration: 5-60 minutes) |
| 79 | +**Recommended: Git Storage + JSON Files** |
| 80 | +```yaml |
| 81 | +Strategy: |
| 82 | + - Store skills in skills.json |
| 83 | + - Save frames as JSON in .stackmemory/frames/ |
| 84 | + - Commit and push important state |
| 85 | + - Use git as persistence layer |
| 86 | +``` |
| 87 | +
|
| 88 | +### Railway / Vercel (Duration: Hours to Days) |
| 89 | +**Recommended: Hosted PostgreSQL** |
| 90 | +```yaml |
| 91 | +Strategy: |
| 92 | + - Use Railway's PostgreSQL addon |
| 93 | + - Connection pooling with pg-pool |
| 94 | + - Implement caching layer (Redis) |
| 95 | + - Fallback to SQLite for local dev |
| 96 | +``` |
| 97 | +
|
| 98 | +### Docker Containers (Duration: Variable) |
| 99 | +**Recommended: Hybrid Approach** |
| 100 | +```yaml |
| 101 | +Strategy: |
| 102 | + - SQLite for temporary data |
| 103 | + - Volume mount for persistence |
| 104 | + - Periodic export to JSON |
| 105 | + - S3/GCS for long-term storage |
| 106 | +``` |
| 107 | +
|
| 108 | +### Development Environment |
| 109 | +**Recommended: SQLite** |
| 110 | +```yaml |
| 111 | +Strategy: |
| 112 | + - Simple setup |
| 113 | + - No external dependencies |
| 114 | + - Easy to reset/clear |
| 115 | + - Good enough performance |
| 116 | +``` |
| 117 | +
|
| 118 | +## Implementation Strategy for Short-Duration VMs |
| 119 | +
|
| 120 | +### Optimal Hybrid Architecture |
| 121 | +```typescript |
| 122 | +class HybridStorage { |
| 123 | + constructor() { |
| 124 | + // Priority order |
| 125 | + this.storage = [ |
| 126 | + new MemoryCache(), // L1: In-memory (fastest) |
| 127 | + new SQLiteStorage(), // L2: Local disk |
| 128 | + new GitStorage(), // L3: Persistent |
| 129 | + new HostedDB() // L4: Fallback |
| 130 | + ]; |
| 131 | + } |
| 132 | + |
| 133 | + async save(data: Frame) { |
| 134 | + // Write to memory and SQLite immediately |
| 135 | + await Promise.all([ |
| 136 | + this.storage[0].save(data), |
| 137 | + this.storage[1].save(data) |
| 138 | + ]); |
| 139 | + |
| 140 | + // Async write to persistent storage |
| 141 | + setImmediate(() => { |
| 142 | + this.storage[2].save(data).catch(console.error); |
| 143 | + }); |
| 144 | + } |
| 145 | + |
| 146 | + async load(id: string) { |
| 147 | + // Try each tier in order |
| 148 | + for (const store of this.storage) { |
| 149 | + try { |
| 150 | + const data = await store.load(id); |
| 151 | + if (data) return data; |
| 152 | + } catch (e) { |
| 153 | + continue; |
| 154 | + } |
| 155 | + } |
| 156 | + return null; |
| 157 | + } |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +### Size Considerations |
| 162 | + |
| 163 | +| Storage Type | Typical Size | 1000 Frames | 10000 Frames | |
| 164 | +|-------------|--------------|-------------|--------------| |
| 165 | +| SQLite | 5-50MB | ~5MB | ~50MB | |
| 166 | +| JSON Files | 2-20MB | ~2MB | ~20MB | |
| 167 | +| Git Repo | 10-100MB | ~10MB | ~100MB | |
| 168 | +| PostgreSQL | N/A (remote) | ~3MB | ~30MB | |
| 169 | + |
| 170 | +### Performance Metrics |
| 171 | + |
| 172 | +| Operation | SQLite | JSON | Git | PostgreSQL | |
| 173 | +|-----------|--------|------|-----|------------| |
| 174 | +| Write | 5ms | 10ms | 100ms | 20ms | |
| 175 | +| Read | 1ms | 5ms | 50ms | 10ms | |
| 176 | +| Query | 2ms | N/A | N/A | 5ms | |
| 177 | +| Startup | 10ms | 1ms | 100ms | 500ms | |
| 178 | + |
| 179 | +## Final Recommendation |
| 180 | + |
| 181 | +For **short-duration VM instances**, use a **three-tier strategy**: |
| 182 | + |
| 183 | +1. **Hot Data**: In-memory cache (last 100 frames) |
| 184 | +2. **Warm Data**: JSON files in `.stackmemory/` directory |
| 185 | +3. **Cold Data**: Git commits or external API |
| 186 | + |
| 187 | +```typescript |
| 188 | +// Optimized for short-duration VMs |
| 189 | +const storage = process.env.VM_DURATION === 'short' |
| 190 | + ? new GitBackedJSONStorage() // Best for CI/CD |
| 191 | + : new SQLiteStorage(); // Best for local dev |
| 192 | + |
| 193 | +// With automatic persistence |
| 194 | +if (process.env.PERSIST_ON_EXIT) { |
| 195 | + process.on('SIGTERM', async () => { |
| 196 | + await storage.exportToGit(); |
| 197 | + process.exit(0); |
| 198 | + }); |
| 199 | +} |
| 200 | +``` |
| 201 | + |
| 202 | +This approach: |
| 203 | +- Minimizes dependencies |
| 204 | +- Works offline |
| 205 | +- Survives VM termination |
| 206 | +- Costs nothing |
| 207 | +- Scales to ~10,000 frames |
| 208 | +- Maintains query capability |
0 commit comments