This document tracks current limitations and known issues in SlopCodeBench. We're actively working on improvements and welcome feedback.
Note: This is an initial release. Some rough edges are expected as we iterate toward production-quality software.
- Severity: Medium
- Issue: Dashboard functionality is undocumented
- Impact: Users cannot use advanced visualization features without reverse-engineering
- Status: Partially documented (see
vizcommand) - Workaround: Use
slop-code viz difffor diff visualization. See viz command documentation for details. - Tracking: [Issue #TBD]
- Note: The
viz diffcommand provides an interactive diff viewer. Full dashboard documentation is still needed.
- Severity: Medium
- Issue: Models, providers, and agents configuration is complex and not intuitive
- Impact: Steep learning curve for new users
- Status: Open to redesign suggestions
- Feedback welcome: We acknowledge this is hard to use and are open to changes
Current workarounds:
- See Agent Guide for detailed setup
- Use provided config files in
configs/as templates - Check FAQ for common configurations
Due to specification evolution across checkpoints, some reference solutions may not pass all tests. However, the test cases themselves are verified to be correct. We prioritized test case accuracy over fixing all reference solutions.
- Severity: Medium
- Checkpoints affected: 2, 3
- Issue: Reference solution doesn't solve all cases
- Impact: Tests are correct; solution needs updating
- Workaround: Use tests as ground truth
Details:
- Checkpoint 2: Current solution incomplete but tests verified
- Checkpoint 3: Same as checkpoint 2
- Severity: Low
- Checkpoints affected: 1, 2, 3
- Issue: Reference solution fails some cases
- Impact: Expected answers verified against actual market data
- Workaround: Tests are authoritative
Details:
- Checkpoints 1/2: Solution wrong for a few cases, but expected outputs verified with market data
- Checkpoint 3: Exact yields off by 1-2 units in solution, but verifier correctly calculates reprocessing yields
- Severity: Low
- Checkpoints affected: 5
- Issue: Incorrect rounding for recursive jobs in reference solution
- Impact: Tests are correct
- Status: Will be fixed in future release
- Workaround: None needed; tests are accurate
- Severity: Medium
- Checkpoints affected: 4
- Issue: Reference solution has incorrect code generation for C++/Rust
- Impact: Tests are identical across all languages and verified
- Status: Will be fixed
- Workaround: Use test cases as ground truth
- Severity: Low
- Checkpoints affected: 6
- Issue: Solution fails some cases
- Impact: Cases verified to match specification exactly
- Workaround: Tests represent correct behavior per spec
The following are planned for future releases:
- Fix all reference solutions to pass tests
- Document dashboard functionality
- Simplify agent/model/provider configuration
- Add parallel execution support
- Improve error messages and debugging
Found a bug or limitation not listed here?
- Check GitHub Issues to see if it's already reported
- If not, open a new issue with:
- Description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Your environment (OS, Python version, Docker version)
- Relevant logs or error messages
We appreciate your patience and feedback as we improve SlopCodeBench!