Home

Welcome to the farstorm wiki! While writing documentation is definitely on the todo list, I suppose something more helpful is to explain the philosophy behind farstorm, aka:

Why Were We Mad Enough To Implement an ORM From Scratch?

It all started with nullability. At my previous employer, I mostly wrote in Java using JPA/Hibernate as the ORM. This was fine. Then at Atrocit I started using TypeScript for our backend, and ofcourse, being used to Hibernate, I started looking for a similar ORM for NodeJS. I ended up with TypeORM, and this turned out to be the start of many frustrations. TypeORM is very similar to Hibernate in many respects, and this turns out not to work well with TypeScript.

Type systems tend to fulfill two functions: historically they would help the compiler figure out how to compile code, for example in the various flavors of C and Java. More recently however, type systems are much more of a soft proof mechanism for proving properties within your code, much more of a correctness measure than a benefit to the compiler. TypeScript is probably the purest example of that: all of the types are stripped away at runtime, so it provides no benefit in terms of compilation/interpretation or runtime speed.

In the compiler version, NULL values make a lot of sense: if you allocate some memory, and haven't initialized it, that's a null, and you're going to to have to deal with that somehow. In the correctness focused type systems, like TypeScript, generally you want strict null checks turned on all over the place, so you are always explicitly dealing with the possibility of nulls or undefined as part of the correctness checks.

Stop using annotated classes in TypeScript

In Java, a typical pattern is to define a class, and have the ORM fill it up with data. This works fine since you start off with all values being null, which is just the natural state of that programming environment. In TypeScript however, it does not make sense to preallocate some fixed data structure with a bunch of null values to later fill it up at all, as you can just expand which properties are on the type later and avoid the problem with nulls. TypeScript has a much better model of 'this property doesn't exist on this object right now', since it doesn't require you to specify classes up front, but can just infer the shape of each object and merge them seamlessly.

Most ORMs ignore this, and still require you to specify classes and decorate/annotate the properties within each class. There are a number of drawbacks with that approach:

You cannot null check with decorators properly, every property on the class has to allow nulls/undefined or you'd have to not use the strictPropertyInitialization feature in TypeScript, weakening the checks
Types in the type system and the types as specified in decorators can differ, and this cannot be detected at compile time
Decorators are still not ECMAScript approved, and are in stage 3 as of writing. Adoption has been incredibly slow.
Composition using classes is not that easy, whereas with plain JavaScript objects you can just use the spread operator to compose objects to your hearts desire

There are other solutions to this, like Prisma, which defines a DSL for specifying models and then doing a codegen step. I personally find it inelegant, since it would've been fairly simple to write plain TypeScript instead for specifying the models, and then doing inference using the types. That way you don't need the codegen step at all.

Transactions

There are a ton of ORMs that treat transactions as optional. We think that if you care about your code being correct at all, this is not a sensible stance. By forcing transactions, other design considerations became much simpler, such as:

First Access Retrieval

Relations in most ORMs can usually be specified to be fetched eagerly or lazily. With eager fetching, usually the SELECT query is expanded with a JOIN, and multiple entities are fetched in one go. When fetching lazily, the related entity is fetched whenever the code tries to access it.

The lazy problem: N+1 queries

If you fetch items lazily, you quickly end up running into the famous N+1 query problem, where if you have a list of n TodoItems and you try and fetch the assignee for each of them, now you are running 1 query to get your initial list of TodoItems and n queries against the User table to get the assignees for each TodoItem. This is problematic, since queries usually have a large network overhead. If the database is not running on the same machine, a good rule of thumb is that every query takes at least 1ms. This means that a page loading 1000 TodoItems will never be faster that 1001ms, no matter how fast the backend, efficient the machine, or how clever the programmer. This is clearly not good.

The eager problem: N * M

So instead we can eagerly fetch things. For instance a TodoItem may have a one-to-many relation with n Comments, and perhaps m AuditLogEvents. If we make both of these relations eager, we'll end up with a huge result, since for each TodoItem, SQL will return the carthesian product of the Comment and AuditLogEvent results for each, resulting in n * m rows per TodoItem. This can get to a lot of data really fast, and overload the connection between the database and the application with lots of redundant data.

First Access Retrieval

We noticed that lots of data access patterns look somewhat like this:

const todoItems = await fetchTodoItems();

for (const todoItem of todoItems) {
    console.log(await todoItem.assignee);
}

Meaning that if you access the assignee of one of the TodoItems, chances are you are about to access all of them. So the moment you access a relation of an entity, we look at all entities of that type currently loaded within the transaction, and fetch the relation for all of them. In the above example we'd internally do something like this:

// on accessing todoItem.assignee
const allRawTodoItemsLoadedInTransaction = /* ORM internals omitted */;

// Grab all user IDs
const assigneeIdsToLoad = allRawTodoItemsLoadedInTransaction.map(todoItem => todoItem['assignee_id']);

// Fetch all of those users and store them in memory
const usersToLoadIntoTransaction = await runQuery(sql`select * from "user" where id in (${assigneeIdsToLoad})`);

// Then resolve the user currently being accessed

Then the next time todoItem.assignee is accessed for a different TodoItem we can resolve it from the in-memory cache. The scope of this is always the current transaction, and any write to an entity involved in the relation will invalidate the local cache and force it to be refetched entirely.

The beauty of this pattern is that the number of queries does not scale with the number of entities fetched, but rather with the number of relations accessed. This is usually a fairly low number for each transaction, even for deeply nested data structures. The number of rows fetched never gets duplicated, since we never join on one-to-many relations. In fact, there is not a single join in the code base, since we always do plain SELECT statements by either the primary key or the foreign key for inverse relations.

The farstorm design

We started with a number of requirements:

Nullability in TypeScript should be identical to the nullability in the database itself. If a column is marked NOT NULL, then the type system should complain if you pass an object to persist that may have a null value in the corresponding property.
Nullability of relations should represent on which side the relation is defined: Promise<X> | null is different from Promise<X | null>. The first you use if you have a column with an ID that can be null, the latter you use if you have an inverse relationship which you need to fetch first before you know if there are any results.
Both lazy and eager fetching are dumb most of the time, we are going to use the First Access Retrieval strategy by default
Whenever you state you are fetching something by ID, if it doesn't existing in the database, don't return null. The default behavior for findOne(id) should be to crash if the item doesn't exist. There are rarely use cases for fetching by ID but then dealing with NULL in any other way than crashing/erroring.
Queries should be executed in batch as much as possible. When inserting or updating entities, prefer a single query updating all entities over a ton of individual update statements. The network is slow, the database is fast.

This culminated into farstorm as it stands today.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Why Were We Mad Enough To Implement an ORM From Scratch?

Stop using annotated classes in TypeScript

Transactions

First Access Retrieval

The lazy problem: N+1 queries

The eager problem: N * M

First Access Retrieval

The farstorm design

Clone this wiki locally