From 03de742eb2f0365b288035ce76f6e7ac8e2cbb3b Mon Sep 17 00:00:00 2001
From: Manuel Saelices <msaelices@gmail.com>
Date: Sun, 22 Mar 2026 20:51:15 +0100
Subject: [PATCH 1/4] [Skills] Add mojo-stdlib-contributing skill

Add a skill capturing patterns and pitfalls for contributing to the Mojo
standard library, distilled from 30+ reviewed PRs. Covers process,
assertion semantics, optimization verification with compile_info,
benchmarks, testing, SIMD safety, and Writable trait patterns.

Signed-off-by: Manuel Saelices <msaelices@gmail.com>
---
 README.md                         |   9 ++
 mojo-stdlib-contributing/SKILL.md | 169 ++++++++++++++++++++++++++++++
 2 files changed, 178 insertions(+)
 create mode 100644 mojo-stdlib-contributing/SKILL.md

diff --git a/README.md b/README.md
index 6775602..100ce0e 100644
--- a/README.md
+++ b/README.md
@@ -82,6 +82,15 @@ triggered when Python types are used Mojo or a Python module needs to interact
 with Mojo code. Many capabilities of Mojo - Python interoperability are fairly
 new, and existing coding agents don't handle them correctly without guidance.
 
+### `mojo-stdlib-contributing`
+
+[This skill](mojo-stdlib-contributing/SKILL.md) captures patterns and pitfalls
+for contributing to the Mojo standard library, distilled from 30+ reviewed PRs.
+It covers process (GitHub issues before new APIs, draft PRs), assertion
+semantics (`assert_mode="safe"` is intentional), benchmark patterns, testing
+requirements, SIMD/memory safety, and common reviewer feedback. Use when
+modifying code under `mojo/stdlib/`.
+
 ## Examples
 
 Once these skills are installed, you can use them for many common tasks.
diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md
new file mode 100644
index 0000000..8b1ce67
--- /dev/null
+++ b/mojo-stdlib-contributing/SKILL.md
@@ -0,0 +1,169 @@
+---
+name: mojo-stdlib-contributing
+description: Patterns and pitfalls for contributing to the Mojo standard library. Use when modifying code under mojo/stdlib/, writing tests or benchmarks for stdlib, or preparing PRs to the modular/modular repository. Distilled from 30+ reviewed PRs.
+---
+
+<!-- EDITORIAL GUIDELINES FOR THIS SKILL FILE
+Distilled from real reviewer feedback on 30+ PRs to the Mojo stdlib.
+Every entry reflects an actual rejection or correction. Only include
+patterns that are non-obvious or that a model would get wrong. -->
+
+## Process — before writing code
+
+- **New APIs require a GitHub issue first.** Do not add methods to existing types (String, List, Deque, etc.) without prior consensus. Python parity alone is not justification. Open the issue, then create a draft PR linking it.
+- **Keep PRs minimal and focused.** One logical change per PR. Never mix unrelated changes (e.g. don't add `@always_inline` to Set/Dict in a Deque fix PR). Benchmark utilities and their usage belong in separate PRs.
+- **Always create draft PRs.** Never mark a PR as ready for review yourself.
+- **Branch from `upstream/main`.** Never work on `main` directly. Each PR branch must contain only commits for that specific change.
+
+## Assertion semantics — critical
+
+The `assert` statement desugars to `debug_assert()` with `assert_mode="none"`. The default `ASSERT_MODE` is `"safe"` (from `get_defined_string["ASSERT", "safe"]()`). This means:
+
+| Form | Runs when `ASSERT_MODE="safe"` (default) | Runs when `-D ASSERT=all` |
+|---|---|---|
+| `debug_assert[assert_mode="safe"](...)` | Yes | Yes |
+| `debug_assert(...)` (default `assert_mode="none"`) | No | Yes |
+| `assert cond, msg` (desugars to above) | No | Yes |
+
+**Do NOT downgrade `debug_assert[assert_mode="safe"]` to `debug_assert` or `assert`.** These are intentional safety invariants. The maintainers want aggressive checking on operations like `byte=` indexing. Users who need to bypass safety should use the existing unsafe escape hatches (e.g. `.as_bytes().unsafe_get(idx)` for byte access).
+
+## Optimizations — verify codegen and benchmark before submitting
+
+**Every optimization PR must include both IR evidence and benchmark results.**
+
+### Verify codegen with `compile_info`
+
+Use `std.compile.compile_info` to inspect the generated IR *before* and *after* your change. If the IR is identical, the optimization is a no-op regardless of what the source looks like.
+
+```mojo
+from std.compile import compile_info
+
+# Define the function to inspect
+def my_function(x: SIMD[DType.float32, 4]) -> SIMD[DType.float32, 4]:
+    return x + x
+
+# Inspect optimized LLVM IR
+comptime info = compile_info[my_function, emission_kind="llvm-opt"]()
+print(info)  # prints optimized IR
+
+# Check for specific instructions
+assert "fadd" in compile_info[my_function, emission_kind="llvm-opt"]()
+
+# Write IR to file for detailed comparison
+compile_info[my_function, emission_kind="llvm-opt"]().write_text("after.ll")
+```
+
+Supported `emission_kind` values:
+
+| Kind | Output |
+|---|---|
+| `"asm"` | Assembly (default) |
+| `"llvm"` | Unoptimized LLVM IR |
+| `"llvm-opt"` | Optimized LLVM IR (use this to compare) |
+| `"llvm-bitcode"` | LLVM bitcode |
+
+**Workflow for optimization PRs:**
+1. Write a small test that calls `compile_info` on the function before your change. Save the IR.
+2. Apply your change.
+3. Compare IR. If identical, the optimization does nothing. Do not submit.
+4. If IR differs, run benchmarks to confirm the improvement is measurable.
+
+### Common pitfalls
+
+- **Do not add manual fast paths that LLVM already optimizes.** For example, `1 if b < 128 else _utf8_first_byte_sequence_length(b)` produces identical IR to just calling `_utf8_first_byte_sequence_length(b)` because LLVM sees through the branch.
+- **Measure before optimizing compile-time-evaluated paths.** Format strings may be evaluated at compile time, so runtime SIMD scanning provides no benefit.
+- **Verify alignment claims with data.** Cache-line alignment (e.g. 64-byte `@align`) requires benchmark evidence. Do not apply speculatively.
+- **Include before/after comparisons in the PR description.** Show the benchmark numbers and, if relevant, the IR diff.
+
+## Code design
+
+- **Reuse existing stdlib primitives.** For SIMD byte search, use `_memchr`/`_memchr_impl` rather than writing a new loop. If a new algorithm is better, update the existing primitive so all callers benefit. Use `clamp`, `Int64.__xor__`, etc. instead of reimplementing.
+- **Implement fast paths on the lowest-level type** (`Span`), not higher-level types (`List`). Then delegate: `List.__contains__` calls `Span.__contains__`.
+- **Generalize type constraints.** Don't hard-code a single type (e.g. `Byte`). Use trait conformance like `TrivialRegisterPassable` to cover all applicable types.
+- **Use move semantics (`^`)** when transferring ownership, not `.copy()`.
+- **Use `uninit_copy_n`, `uninit_move_n`, `destroy_n`** for bulk operations on trivial types instead of manual loops.
+- **Prefer lazy evaluation (iterators) over eager allocation** for collection operations like slicing. E.g., `deque[0:1:2]` should return an iterator, not allocate a new `Deque`.
+- **Check if traits synthesize default methods** before writing explicit implementations. E.g., `Equatable` may already synthesize `__ne__` from `__eq__`.
+- **Lift constants into traits** when they vary across implementations (e.g., `MAX_NAME_SIZE` into a `_DirentLike` trait).
+- **Keep t-string expressions simple.** Assign complex sub-expressions to local variables before interpolating.
+- **Do not add redundant logic** just for Python API parity if Mojo already has a better primitive.
+- **Question two-pass patterns.** A "collect then delete" approach with an extra allocation may be slower than a single-pass approach. Always benchmark to confirm.
+- **Ensure PR description matches reality.** If `discard()` internally uses try/except, don't claim "avoids exception overhead".
+
+## Benchmarks
+
+```mojo
+# Correct benchmark pattern:
+@parameter
+def bench_something(mut b: Bencher) raises:
+    var data = setup_data()
+    @always_inline
+    def call_fn() unified {read}:
+        var result = black_box(data).some_operation(black_box(arg))
+        keep(result)
+    b.iter(call_fn)
+```
+
+- **Always wrap inputs with `black_box()` and results with `keep()`** to prevent the compiler from optimizing away the benchmark.
+- **Do not include construction inside the hot loop.** Setup goes before `call_fn`. Use `iter_with_setup` for destructive benchmarks.
+- **Data must match the described scenario.** If the docstring says "element in the middle", verify it is actually there.
+- **Use `comptime` instead of `@parameter`** for compile-time tuples/loops in benchmark `main()`. (`@parameter` is still used as a function decorator on benchmark functions themselves.)
+- **Setup functions passed to `iter_with_setup` must not raise.**
+- **Provide benchmarks for performance PRs**, covering both small and large inputs.
+
+## Testing
+
+- **Add unicode test cases** for any string/byte operation.
+- **Cover both hit and miss paths** in search/contains tests.
+- **Add edge cases:** `start > end`, out-of-bounds indices, empty input, exact-width (no padding).
+- **Use `debug_assert`** for internal preconditions on values that must be non-negative by invariant.
+- **Tests for `Writable` must use `check_write_to`**, not `String(x)` or `repr(x)`.
+
+## SIMD / memory safety
+
+- **Validate pointer arithmetic** before low-level memory access. Clamp/normalize `[start, end]` into `[0, len]` before `memcmp`.
+- **Check for negative `pos_in_block`** when computing leading/trailing zeros in reverse SIMD scans.
+- **Verify `vectorized_end`** accounts for full SIMD block width to prevent OOB reads.
+- **SIMD test comments must be platform-agnostic.** Write "exercises both SIMD path and scalar tail" instead of "one full 64-byte SIMD block + scalar tail".
+- **Consider endianness** when working with bit-level SIMD operations (e.g. `count_trailing_zeros` on packed bitmasks). Implementations may need a branch for big-endian targets.
+
+## Writable trait patterns
+
+```mojo
+# Correct signature (not generic [W: Writer]):
+def write_to(self, mut writer: Some[Writer]):
+    writer.write_string("literal")  # not writer.write("literal")
+    writer.write(self.field)        # non-literals use .write()
+
+def write_repr_to(self, mut writer: Some[Writer]):
+    writer.write("TypeName.", self)  # include type name for enums
+```
+
+- **`write_to` and `write_repr_to` must not allocate.** No `repr()`, `String(x)`, or heap-allocating calls.
+- **Use `FormatStruct`** from `format._utils` for consistent repr formatting of structured types.
+- **Add an explicit `else` fallback** in enum-style `write_to`.
+- **`Stringable` and `Representable` are deprecated.** Only implement `Writable`. Do not add `__str__` or `__repr__`.
+
+## Changelog
+
+- **Do NOT add changelog entries for NFC/implementation-detail changes.** Only user-facing behavior changes belong in the changelog. Internal optimization, refactoring, or performance improvements that don't change the public API are not changelisted.
+
+## Build and lint
+
+```bash
+# Build stdlib
+./bazelw build //mojo/stdlib/std
+
+# Run specific tests
+./bazelw test //mojo/stdlib/test/collections/string:test_string_slice.mojo.test
+
+# Run all stdlib tests
+./bazelw test mojo/stdlib/test/...
+
+# Format check (run before pushing)
+./bazelw run format
+```
+
+- **Always run `./bazelw run format` before pushing.** The CI lint check will reject unformatted code.
+- **Sign commits** with `git commit -s`.
+- **Include `Assisted-by: AI`** as a trailer in every PR description per `AI_TOOL_POLICY.md`.

From cd08f91a63371a0b590c027c0815e63e874f97df Mon Sep 17 00:00:00 2001
From: Manuel Saelices <msaelices@gmail.com>
Date: Sun, 22 Mar 2026 20:55:59 +0100
Subject: [PATCH 2/4] [Skills] Ask human to review before marking PR as ready

Maintainer review cycles are expensive. The skill should instruct
the agent to ask the human to review the diff, PR description, and
benchmarks before converting from draft to ready for review.

Signed-off-by: Manuel Saelices <msaelices@gmail.com>
---
 mojo-stdlib-contributing/SKILL.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md
index 8b1ce67..7dc33c0 100644
--- a/mojo-stdlib-contributing/SKILL.md
+++ b/mojo-stdlib-contributing/SKILL.md
@@ -12,7 +12,7 @@ patterns that are non-obvious or that a model would get wrong. -->
 
 - **New APIs require a GitHub issue first.** Do not add methods to existing types (String, List, Deque, etc.) without prior consensus. Python parity alone is not justification. Open the issue, then create a draft PR linking it.
 - **Keep PRs minimal and focused.** One logical change per PR. Never mix unrelated changes (e.g. don't add `@always_inline` to Set/Dict in a Deque fix PR). Benchmark utilities and their usage belong in separate PRs.
-- **Always create draft PRs.** Never mark a PR as ready for review yourself.
+- **Always create draft PRs.** Never mark a PR as ready for review yourself. Before the human marks it as ready, ask them to review the diff, PR description, and benchmark results carefully. Maintainer review cycles are expensive -- catching issues before requesting review avoids wasted rounds.
 - **Branch from `upstream/main`.** Never work on `main` directly. Each PR branch must contain only commits for that specific change.
 
 ## Assertion semantics — critical

From 6469dd93b1f19c99f3e080a0618cf38188a9dc96 Mon Sep 17 00:00:00 2001
From: Manuel Saelices <msaelices@gmail.com>
Date: Sun, 22 Mar 2026 21:00:23 +0100
Subject: [PATCH 3/4] [Skills] Simplify and align with existing skill
 conventions

- Use standard editorial comment template from other skills
- Add opening paragraph consistent with mojo-syntax/gpu-fundamentals
- Remove content duplicated from mojo-syntax (Writable basics,
  deprecated APIs, comptime vs @parameter, move semantics)
- Remove general git/build commands already in CLAUDE.md
- Consolidate compile_info section, cut niche bullets
- Merge changelog into process section
- Reorder: process, assertions, optimizations, design, benchmarks,
  testing, SIMD, writable
- Reduce from 169 to 111 lines (34% smaller)

Signed-off-by: Manuel Saelices <msaelices@gmail.com>
---
 mojo-stdlib-contributing/SKILL.md | 154 ++++++++++--------------------
 1 file changed, 48 insertions(+), 106 deletions(-)

diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md
index 7dc33c0..8a6ec3c 100644
--- a/mojo-stdlib-contributing/SKILL.md
+++ b/mojo-stdlib-contributing/SKILL.md
@@ -4,99 +4,78 @@ description: Patterns and pitfalls for contributing to the Mojo standard library
 ---
 
 <!-- EDITORIAL GUIDELINES FOR THIS SKILL FILE
-Distilled from real reviewer feedback on 30+ PRs to the Mojo stdlib.
-Every entry reflects an actual rejection or correction. Only include
-patterns that are non-obvious or that a model would get wrong. -->
+This file is loaded into an agent's context window as a correction layer for
+pretrained contribution knowledge. Every line costs context. When editing:
+- Be terse. Use tables and inline code over prose where possible.
+- Never duplicate information from the mojo-syntax skill.
+- Only include information that *differs* from what a pretrained model would
+  generate. Don't document things models already get right.
+- Prefer one consolidated code block over multiple small ones.
+- If adding a new section, ask: "Would a model get this wrong?" If not, skip it.
+These same principles apply to any files this skill references.
+-->
 
-## Process — before writing code
+Contributing to the Mojo stdlib has non-obvious patterns that differ from typical open-source projects. **Always follow this skill to avoid common rejection reasons.**
 
-- **New APIs require a GitHub issue first.** Do not add methods to existing types (String, List, Deque, etc.) without prior consensus. Python parity alone is not justification. Open the issue, then create a draft PR linking it.
-- **Keep PRs minimal and focused.** One logical change per PR. Never mix unrelated changes (e.g. don't add `@always_inline` to Set/Dict in a Deque fix PR). Benchmark utilities and their usage belong in separate PRs.
-- **Always create draft PRs.** Never mark a PR as ready for review yourself. Before the human marks it as ready, ask them to review the diff, PR description, and benchmark results carefully. Maintainer review cycles are expensive -- catching issues before requesting review avoids wasted rounds.
-- **Branch from `upstream/main`.** Never work on `main` directly. Each PR branch must contain only commits for that specific change.
+## Process
 
-## Assertion semantics — critical
+- **New APIs require a GitHub issue first.** Do not implement new methods on existing types without prior consensus. Open the issue, then create a draft PR linking it.
+- **Keep PRs focused.** One logical change per PR. Benchmark utilities and their usage belong in separate PRs.
+- **Always create draft PRs.** Never mark as ready yourself. Ask the human to review the diff, description, and benchmarks before requesting maintainer review.
+- **No changelog entries for internal changes.** Only user-facing behavior changes belong in the changelog.
 
-The `assert` statement desugars to `debug_assert()` with `assert_mode="none"`. The default `ASSERT_MODE` is `"safe"` (from `get_defined_string["ASSERT", "safe"]()`). This means:
+## Assertion semantics
 
-| Form | Runs when `ASSERT_MODE="safe"` (default) | Runs when `-D ASSERT=all` |
+The default `ASSERT_MODE` is `"safe"` (from `get_defined_string["ASSERT", "safe"]()`).
+
+| Form | Runs by default (`"safe"`) | Runs with `-D ASSERT=all` |
 |---|---|---|
 | `debug_assert[assert_mode="safe"](...)` | Yes | Yes |
-| `debug_assert(...)` (default `assert_mode="none"`) | No | Yes |
-| `assert cond, msg` (desugars to above) | No | Yes |
+| `debug_assert(...)` / `assert cond, msg` | No | Yes |
 
-**Do NOT downgrade `debug_assert[assert_mode="safe"]` to `debug_assert` or `assert`.** These are intentional safety invariants. The maintainers want aggressive checking on operations like `byte=` indexing. Users who need to bypass safety should use the existing unsafe escape hatches (e.g. `.as_bytes().unsafe_get(idx)` for byte access).
+**Do NOT downgrade `debug_assert[assert_mode="safe"]`.** These are intentional safety invariants, not performance bugs.
 
-## Optimizations — verify codegen and benchmark before submitting
+## Optimizations
 
 **Every optimization PR must include both IR evidence and benchmark results.**
 
-### Verify codegen with `compile_info`
-
-Use `std.compile.compile_info` to inspect the generated IR *before* and *after* your change. If the IR is identical, the optimization is a no-op regardless of what the source looks like.
+Use `std.compile.compile_info` to compare generated IR before and after your change:
 
 ```mojo
 from std.compile import compile_info
 
-# Define the function to inspect
 def my_function(x: SIMD[DType.float32, 4]) -> SIMD[DType.float32, 4]:
     return x + x
 
-# Inspect optimized LLVM IR
+# Inspect optimized LLVM IR ("llvm" for unoptimized, "asm" for assembly)
 comptime info = compile_info[my_function, emission_kind="llvm-opt"]()
-print(info)  # prints optimized IR
+print(info)
 
-# Check for specific instructions
+# Pattern-match on IR content
 assert "fadd" in compile_info[my_function, emission_kind="llvm-opt"]()
-
-# Write IR to file for detailed comparison
-compile_info[my_function, emission_kind="llvm-opt"]().write_text("after.ll")
 ```
 
-Supported `emission_kind` values:
-
-| Kind | Output |
-|---|---|
-| `"asm"` | Assembly (default) |
-| `"llvm"` | Unoptimized LLVM IR |
-| `"llvm-opt"` | Optimized LLVM IR (use this to compare) |
-| `"llvm-bitcode"` | LLVM bitcode |
-
-**Workflow for optimization PRs:**
-1. Write a small test that calls `compile_info` on the function before your change. Save the IR.
-2. Apply your change.
-3. Compare IR. If identical, the optimization does nothing. Do not submit.
-4. If IR differs, run benchmarks to confirm the improvement is measurable.
+**Workflow:** save IR before your change, apply change, compare. If IR is identical, the optimization is a no-op. Do not submit. If IR differs, run benchmarks to confirm measurable improvement.
 
-### Common pitfalls
-
-- **Do not add manual fast paths that LLVM already optimizes.** For example, `1 if b < 128 else _utf8_first_byte_sequence_length(b)` produces identical IR to just calling `_utf8_first_byte_sequence_length(b)` because LLVM sees through the branch.
-- **Measure before optimizing compile-time-evaluated paths.** Format strings may be evaluated at compile time, so runtime SIMD scanning provides no benefit.
-- **Verify alignment claims with data.** Cache-line alignment (e.g. 64-byte `@align`) requires benchmark evidence. Do not apply speculatively.
-- **Include before/after comparisons in the PR description.** Show the benchmark numbers and, if relevant, the IR diff.
+- **Do not add manual fast paths that LLVM already optimizes.** E.g., `1 if b < 128 else _utf8_first_byte_sequence_length(b)` produces identical IR to just calling the function.
+- **Verify alignment claims with benchmarks.** Cache-line alignment requires evidence.
+- **Include before/after benchmark numbers in the PR description.**
 
 ## Code design
 
-- **Reuse existing stdlib primitives.** For SIMD byte search, use `_memchr`/`_memchr_impl` rather than writing a new loop. If a new algorithm is better, update the existing primitive so all callers benefit. Use `clamp`, `Int64.__xor__`, etc. instead of reimplementing.
-- **Implement fast paths on the lowest-level type** (`Span`), not higher-level types (`List`). Then delegate: `List.__contains__` calls `Span.__contains__`.
-- **Generalize type constraints.** Don't hard-code a single type (e.g. `Byte`). Use trait conformance like `TrivialRegisterPassable` to cover all applicable types.
-- **Use move semantics (`^`)** when transferring ownership, not `.copy()`.
-- **Use `uninit_copy_n`, `uninit_move_n`, `destroy_n`** for bulk operations on trivial types instead of manual loops.
-- **Prefer lazy evaluation (iterators) over eager allocation** for collection operations like slicing. E.g., `deque[0:1:2]` should return an iterator, not allocate a new `Deque`.
-- **Check if traits synthesize default methods** before writing explicit implementations. E.g., `Equatable` may already synthesize `__ne__` from `__eq__`.
-- **Lift constants into traits** when they vary across implementations (e.g., `MAX_NAME_SIZE` into a `_DirentLike` trait).
-- **Keep t-string expressions simple.** Assign complex sub-expressions to local variables before interpolating.
-- **Do not add redundant logic** just for Python API parity if Mojo already has a better primitive.
-- **Question two-pass patterns.** A "collect then delete" approach with an extra allocation may be slower than a single-pass approach. Always benchmark to confirm.
-- **Ensure PR description matches reality.** If `discard()` internally uses try/except, don't claim "avoids exception overhead".
+- **Reuse existing stdlib primitives.** Use `_memchr`/`_memchr_impl`, `clamp`, etc. rather than reimplementing. If a new algorithm is better, update the existing primitive.
+- **Implement fast paths on `Span`**, not `List`. Then delegate upward.
+- **Generalize type constraints.** Use trait conformance (e.g. `TrivialRegisterPassable`) rather than hard-coding a single type.
+- **Use move semantics (`^`)**, not `.copy()`. Use `uninit_copy_n`, `uninit_move_n`, `destroy_n` for bulk operations.
+- **Question two-pass patterns.** A "collect then delete" approach may be slower than single-pass. Benchmark to confirm.
 
 ## Benchmarks
 
 ```mojo
-# Correct benchmark pattern:
+# CORRECT
 @parameter
 def bench_something(mut b: Bencher) raises:
-    var data = setup_data()
+    var data = setup_data()  # setup outside hot loop
     @always_inline
     def call_fn() unified {read}:
         var result = black_box(data).some_operation(black_box(arg))
@@ -104,66 +83,29 @@ def bench_something(mut b: Bencher) raises:
     b.iter(call_fn)
 ```
 
-- **Always wrap inputs with `black_box()` and results with `keep()`** to prevent the compiler from optimizing away the benchmark.
-- **Do not include construction inside the hot loop.** Setup goes before `call_fn`. Use `iter_with_setup` for destructive benchmarks.
-- **Data must match the described scenario.** If the docstring says "element in the middle", verify it is actually there.
-- **Use `comptime` instead of `@parameter`** for compile-time tuples/loops in benchmark `main()`. (`@parameter` is still used as a function decorator on benchmark functions themselves.)
-- **Setup functions passed to `iter_with_setup` must not raise.**
+- **Always `black_box()` inputs and `keep()` results.** Otherwise the compiler may optimize away the benchmark.
+- **Setup goes outside `call_fn`.** Use `iter_with_setup` for destructive benchmarks. Setup must not raise.
+- **Data must match the described scenario.** If the docstring says "element in the middle", verify it.
 - **Provide benchmarks for performance PRs**, covering both small and large inputs.
 
 ## Testing
 
 - **Add unicode test cases** for any string/byte operation.
 - **Cover both hit and miss paths** in search/contains tests.
-- **Add edge cases:** `start > end`, out-of-bounds indices, empty input, exact-width (no padding).
-- **Use `debug_assert`** for internal preconditions on values that must be non-negative by invariant.
+- **Add edge cases:** `start > end`, out-of-bounds indices, empty input, exact-width.
 - **Tests for `Writable` must use `check_write_to`**, not `String(x)` or `repr(x)`.
 
 ## SIMD / memory safety
 
-- **Validate pointer arithmetic** before low-level memory access. Clamp/normalize `[start, end]` into `[0, len]` before `memcmp`.
-- **Check for negative `pos_in_block`** when computing leading/trailing zeros in reverse SIMD scans.
+- **Validate pointer arithmetic** before low-level memory access. Clamp `[start, end]` into `[0, len]` before `memcmp`.
+- **Check for negative `pos_in_block`** in reverse SIMD scans with `count_leading_zeros`.
 - **Verify `vectorized_end`** accounts for full SIMD block width to prevent OOB reads.
-- **SIMD test comments must be platform-agnostic.** Write "exercises both SIMD path and scalar tail" instead of "one full 64-byte SIMD block + scalar tail".
-- **Consider endianness** when working with bit-level SIMD operations (e.g. `count_trailing_zeros` on packed bitmasks). Implementations may need a branch for big-endian targets.
-
-## Writable trait patterns
+- **SIMD test comments must be platform-agnostic.** Write "exercises both SIMD path and scalar tail" instead of assuming 64-byte width.
 
-```mojo
-# Correct signature (not generic [W: Writer]):
-def write_to(self, mut writer: Some[Writer]):
-    writer.write_string("literal")  # not writer.write("literal")
-    writer.write(self.field)        # non-literals use .write()
+## Writable trait (stdlib-specific additions to mojo-syntax)
 
-def write_repr_to(self, mut writer: Some[Writer]):
-    writer.write("TypeName.", self)  # include type name for enums
-```
+The `mojo-syntax` skill covers basic `Writable` patterns. Additional stdlib-contributing rules:
 
-- **`write_to` and `write_repr_to` must not allocate.** No `repr()`, `String(x)`, or heap-allocating calls.
+- **`write_to` and `write_repr_to` must not allocate.** No `repr()`, `String(x)`, or heap-allocating calls inside these methods.
 - **Use `FormatStruct`** from `format._utils` for consistent repr formatting of structured types.
 - **Add an explicit `else` fallback** in enum-style `write_to`.
-- **`Stringable` and `Representable` are deprecated.** Only implement `Writable`. Do not add `__str__` or `__repr__`.
-
-## Changelog
-
-- **Do NOT add changelog entries for NFC/implementation-detail changes.** Only user-facing behavior changes belong in the changelog. Internal optimization, refactoring, or performance improvements that don't change the public API are not changelisted.
-
-## Build and lint
-
-```bash
-# Build stdlib
-./bazelw build //mojo/stdlib/std
-
-# Run specific tests
-./bazelw test //mojo/stdlib/test/collections/string:test_string_slice.mojo.test
-
-# Run all stdlib tests
-./bazelw test mojo/stdlib/test/...
-
-# Format check (run before pushing)
-./bazelw run format
-```
-
-- **Always run `./bazelw run format` before pushing.** The CI lint check will reject unformatted code.
-- **Sign commits** with `git commit -s`.
-- **Include `Assisted-by: AI`** as a trailer in every PR description per `AI_TOOL_POLICY.md`.

From f65062182783a4fdf17783719d37a913c02ff63c Mon Sep 17 00:00:00 2001
From: Manuel Saelices <msaelices@gmail.com>
Date: Sun, 22 Mar 2026 21:29:19 +0100
Subject: [PATCH 4/4] [Skills] Add references to contributing docs in modular
 repo

Signed-off-by: Manuel Saelices <msaelices@gmail.com>
---
 mojo-stdlib-contributing/SKILL.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mojo-stdlib-contributing/SKILL.md b/mojo-stdlib-contributing/SKILL.md
index 8a6ec3c..52ddc4f 100644
--- a/mojo-stdlib-contributing/SKILL.md
+++ b/mojo-stdlib-contributing/SKILL.md
@@ -17,6 +17,8 @@ These same principles apply to any files this skill references.
 
 Contributing to the Mojo stdlib has non-obvious patterns that differ from typical open-source projects. **Always follow this skill to avoid common rejection reasons.**
 
+**Read before contributing:** `mojo/CONTRIBUTING.md` (accepted/avoided changes, proposal process), `mojo/stdlib/docs/development.md` (building, testing), `mojo/stdlib/docs/style-guide.md` (coding standards), and `AI_TOOL_POLICY.md` (labeling, human-in-the-loop requirements).
+
 ## Process
 
 - **New APIs require a GitHub issue first.** Do not implement new methods on existing types without prior consensus. Open the issue, then create a draft PR linking it.