Project Motivation

I think that in order to truly understand an idea or a practice you must actually try it and live with it for a while.

A while back I realized that I was watching a lot of conference talks and reading a lot of blog posts about various software development practices. This led to me developing some opinions that did not hold up well when theory crashed into reality.

This project is a sandbox where I can try out various ideas and practices to figure out what works and doesn't.

Mission Statement

To learn and explore how best to develop software.

To me, best currently means creating software that is correct (has no bugs), I am confident is correct (I can confidently say it has no bugs), and is easy to change.

Project Goals

Learn how to efficiently refactor code
Lean how to efficiently test my code, and test my tests
Live with the code

Some ideas seem really convenient at the start but become a tangled mess later on. Others seem overly verbose initially but pay off in the long run. One of the reasons I selected RSM for this porting project is because I knew it would be a large project, and I would have to live with any decisions I made.

What is this Project?

This project is a port of the Reference Standard M implementation maintained by David Wicksell from C to Rust.

Why RSM?

I have always found language design interesting. When I was learning M workings at Epic, I thought it seemed like a "simple" language and wondered if I could write an interpreter for it. I quickly realized that M was not as simple as I assumed, especially when I tried to add indirection and goto support to my half-baked interpreter. Eventually I ended up poking around online looking at how other interpreters worked and found RSM. On a whim, I experimented with converting some of the code to Rust, and after a while the project became the main codebase I would use to try stuff out.

Project Structure

ffi

The purpose of this crate is to store/manage the ordinal C code from RSM and the Foreign Function Interface (i.e., making C and Rust play nicely together in the same binary).

This crate is responsible for:

Building the C code
Generating the ffi struct and function definitions
Exposing an Unsafe API to the C code
Exposing a Safe API to the C code

Safe API Considerations

Most of the other crates in this project currently use the Unsafe API, but I would like to move towards only exposing a Safe API.

Concurrency Considerations

The original RSM code was a single threaded/multi process application. Because of this, there are many references to global variables without any sort of synchronization control. Rust by default runs all of its unit tests in parallel (multi-threaded), so care must be taken whenever the Rust unit tests call out to C in order to avoid race conditions.

Additionally, C uses a shared memory segment to manage cross process communication.

Type Considerations

There are of course all the normal C vs Rust type considerations. However, in addition to those, the original C code makes heavy use of un-sized types and quasi self referential types.

Dynamically Sized Types

One fairly common dynamically sized type used in the ordinal C code is CSTRING. The struct definition claims to hold a [u_char;65535]; however, in practice that is only the max size for this type. If the string is smaller than 65535 bytes (most of the time), then the C code only allocates enough space to hold the string.

Rust does have a way of creating dynamically sized types, but I have not explored that yet.

Quasi Self Referential Types.

Self references in Rust are challenging since, unless you use Pin, Rust assumes that everything can be moved. Fortunately, the C types are only kind of self referential. They frequently assume that type A will be immediately followed by type B, so we end up with the logical composite type AB. This is mostly just a logical construction, and it is fairly easy to spot and handle once you know what to look for.

tree-sitter-m

For this project, I have chosen to use a tree-sitter parser. This crate is responsible for:

Using JavaScript to specify the grammar
Using an external scanner to deal with indentation
Running the tree-sitter-cli as part of the build script
Generating
- A C library that contains the parser
- A Rust crate that wraps that C library
- A node-types.json file that describes the grammar's structure

lang-model

This holds the Rust type wrappers for each of the nodes in the M grammar. The models.rs file is generated from the node-types.json using a separate personal project.

ir

IR stands for intermediate representation; this crate holds the abstract syntax tree definition that is output by the frontend and consumed by the backend.

frontend

The Frontend is responsible for taking in text input, invoking the tree-sitter-parser, and converting the result into IR.

backend

The Backend is responsible for taking the IR and converting it into bytecode.

value

M has only one primitive value type. This crate is responsible for defining that type and all the associated primitive operations, such as addition, division, and concatenation.

symbol_table

This crate is responsible for managing:

What variables are in scope
Variable keys and sub-keys
Shadowing/restoring variables

interpreter

Currently, the crate is responsible for:

Creating a database file
Setting up the shared memory segment

lang-server

This is a language server for M and was a spur-of-the-moment weekend project that only provides some basic syntax highlighting/syntax error detection. There are many useful features I would like to see from an M language server; however, the rest of the project will have to mature before I can start working on those features.

Future feature ideas:

Find all assumed variables and indirection calls
Rename variables
Find all references
Lint for unused and assumed variables
Extract method
Introduce package scoping

One of the biggest roadblocks I see to refactoring in M is the dynamic scoping of variables.

Example of dynamic scoping:

A()
   s i=i+1 ; tag A references variable i without initializing it
   q
B() 
   s i=0
   d A()
   q ; i now has the value of 1

C() 
   s i=9
   d A()
   q ; i now has the value of 10

Dynamic scoping makes it difficult to locally reason about the code. This makes it rather challenging to create automatic refactoring tools even for relatively simple operations like "rename variable".

Running the Project

This project does not currently produce a working executable. If you need a working M interpreter, please see Reference-Standard-M. Any bugs that I find during the course of creating this clone will be reported back upstream to RSM.

Development Environment Setup

NOTE check the GitHub actions for the version of the CLI tools

cargo install tree-sitter-cli --version <version> --locked
cargo install cargo-mutants --version <version> --locked
You will need clang installed (requirement of bindgen); see bindgen documentation for more details.

Running Unit Tests

cargo test

Running Fuzz Tests

NOTE: currently fuzzing is only done in the symbol table crate

cargo fuzz list
cargo fuzz run <fuzzing target>

cargo fuzz book

Running Mutation Testing

NOTE: this can take a while

cd <crate name>
cargo mutants

Techniques/Concepts

Unit Testing

The more unit tests I write, the more useful I realize unit tests are, and the less they seem to be about double checking my work.

Concept Overview

Unit tests are code fragments that describe how a "unit" of code is invoked and what behavior is expected from that "unit".

I think unit test should be:

Descriptive: Well written unit tests should be able to serve as documentation.
Small: If you need more than 20 lines of code to write a unit test, you are probably violating the single responsibility heuristic.
Simple: It should take less than two minutes for someone to look at a unit test, understand what it is verifying, and why that behavior is correct.
Fast and deterministic: Unit tests should be run frequently. I normally run them every couple of minutes.

In college, unit testing was primarily presented as an afterthought, a way to verify your code was correct before turning in the assignment. However waiting to write/run unit tests until after the code is already in a finished state robs unit tests of most of their utility.

As I see it, there are two main benefits to writing unit tests before writing your code.

First, it allows you to imagine how your code will be called. If the unit tests are hard to write, then the application code is going to be hard to write/maintain.
Second, once code behavior has been pinned down with unit tests, you can fearlessly refactor without worrying about breaking changes. Frequency it is only after a first draft solution that I truly understand the problem I am trying to solve. Therefore I will nearly always want to refactor my code at some point in the future. With a robust set of unit tests this is a fairly painless and simple process. Without them I have to be hyper aware of every change I make, as any change could introduce a bug.

Mutation Testing

Concept Overview

The goal of Mutation Testing is to check how well a test suite defines the behavior of a codebase. This goal is accomplished by introducing mutations into the source code (e.g., Small changes like replacing addition with subtraction or less than with less than or equal to). If the mutated code can still pass the test suite, then the tests are not fully specifying the system's behavior. (It is possible for a mutation to not change the system's behavior, but in this project that should be fairly rare.) The main downside to mutation testing is that it takes time to run. For each mutation, we may have to run the entire test suite.

Use in Rust-RSM

I am currently using cargo-mutants to run mutation testing. Since mutation testing can take quite a while to run, the CICD pipeline is currently only introducing mutations into the edited hunks of code. The expectation is that all mutants must be killed or timeout before a branch can be merged into main.

General Thoughts on Mutation Testing

Mutation testing is a low effort technique that dramatically increases my confidence in my unit test suite. The first time I ran cargo mutants, I ended up finding a bug in the test code that would have been impossible to detect using traditional unit testing.

C Foreign Function Interface

Concept Overview

A foreign function is simply a function that was written in a different programming language. In this case, I am calling C code from Rust and vice versa. Calling code that was written in a different language requires some extra care:

Parameters must match the target language's memory layout
From the Rust compiler's viewpoint calling into C code is a black box that could do anything; therefore, every ffi call is inherently Unsafe since the Rust compiler cannot verify that Rust's safety invariants are upheld.

Use in Rust-RSM

In this project, Rust is responsible for matching the C ABI when cross-language calls occur. The bindgen and cbindgen tools do most of the heavy lifting by automating the generation of type and function definitions. However, there are a few project specific things that must be kept in mind:

Don't blindly trust the generated type definitions. The original C code uses dynamically sized types; however, the header files/generated Rust types assume these types occupy their max size.
Pay extra attention to pointers/pointer arithmetic. The original C code sometimes allocates memory for multiple structs of different types at once. This pattern is particularly prominent in the shared memory segment and is problematic when verbatim translated into Rust, since Rust assumes every struct can be moved. However, as long as you are aware of this issue, it is fairly easy to work around.
The C code assumes it is single-threaded. The C code uses a lot of global variables, and since it assumes it is single-threaded, there are no synchronization guards in place (Atomics, Mutexes, etc.). However, Rust unit tests are multi-threaded by default.

General Thoughts on FFI

Sometimes you just need functionality that was written in another programming language. There are a lot of invariants that need to be upheld, but it is manageable with the bindgen and cbindgen build tools. FFI is not something I would introduce into a project on a whim, but I would also not be afraid of adding it if I needed some specialized functionality.

Property Based Testing

Concept Overview

The idea behind Property Based Testing is that we want to verify some invariant is upheld for all inputs. So we plug in a bunch of random inputs and verify that the invariant holds true.

There are a couple of different testing paradigms that can be viewed as verifying an invariant.

Technique	Invariant
Regression testing	The new code behaves just like the old code did.
Fuzz testing	The code does not crash, and there are no memory access violations.

Use in Rust-RSM

When I started this project I was primarily combining regression testing with property basted testing. I would run both the original code and my port and compare their results. Regression testing in this manner is fairly situation specific and only really applies when porting legacy code.

I have been moving away from using that technique as a primary means of testing since I figure I will learn more by focusing on other forms of testing.

That being said, I think regression testing can be put to great use checking how well I converted/tested a module of code. If bugs are slipping past my unit tests and are only being caught once I add the regression tests, this is an indication that my unit test writing skills need additional work.

Name		Name	Last commit message	Last commit date
Latest commit History 584 Commits
.cargo		.cargo
.github/workflows		.github/workflows
LICENSES		LICENSES
backend		backend
ffi		ffi
frontend		frontend
interpreter		interpreter
ir		ir
lang-model		lang-model
lang-server		lang-server
symbol_table		symbol_table
tree-sitter-M		tree-sitter-M
value		value
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Folders and files

Latest commit

History

Repository files navigation

Project Motivation

Mission Statement

Project Goals

What is this Project?

Why RSM?

Project Structure

Safe API Considerations

Concurrency Considerations

Type Considerations

Dynamically Sized Types

Quasi Self Referential Types.

Running the Project

Development Environment Setup

Running Unit Tests

Running Fuzz Tests

Running Mutation Testing

Techniques/Concepts

Unit Testing

Concept Overview

Mutation Testing

Concept Overview

Use in Rust-RSM

General Thoughts on Mutation Testing

C Foreign Function Interface

Concept Overview

Use in Rust-RSM

General Thoughts on FFI

Property Based Testing

Concept Overview

Use in Rust-RSM

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Uh oh!

Languages