Skip to content
This repository was archived by the owner on Sep 2, 2022. It is now read-only.
This repository was archived by the owner on Sep 2, 2022. It is now read-only.

Prisma join performances #4744

@Hebilicious

Description

@Hebilicious

Is your feature request related to a problem? Please describe.
I'm working on a large-ish IOT project with a non trivial amount of data. The project uses Prisma as a DAL on AWS Fargate, with an Aurora (Postgres) database. I've noticed that for some of the data retrieval needs of the project, I had to fallback to raw SQL for performance reasons. Some of the queries that I try to do with the prisma client end up crashing the Prisma server without returning any data.

Describe the solution you'd like
I would like a way to retrieve data from different tables (call it relationships or join) using Prisma or Prisma2 (which I haven't tried for this project yet) in an efficient way, one that doesn't crash the server and doesn't take more than 30s to run.

Describe alternatives you've considered
Raw SQL/ Low level tools (knex,pg) which defeats the point a little.
In a graphQL server context, Overriding resolvers provided by nexus-prisma.

Additional context
I'll provide as much information as I'm allowed to.
This is a simplified version of the datastructure. The missing fields are mostly strings and irrelevant to the issue, and each table has createdAt and updatedAt fields defined in the datamodel.

type Device {
    id: ID! @id
    deviceUpdates: [DeviceUpdate!]!
}
type DeviceUpdate {
    id: ID! @id
    device: Device!
    sensorUpdates: [SensorUpdate!]! @relation(onDelete: CASCADE)
}

type SensorUpdate {
    id: ID! @id
    sensor: Sensor!
    deviceUpdate: DeviceUpdate!
}
type Sensor {
    id: ID! @id
    sensorUpdate: [SensorUpdate!]
}

Two of those tables are 'growing', deviceUpdate and sensorUpdate, they get a considerable amount of new entries regularly.
The device table is expected to have on average thousands of entries (will scale up to 50000 entries).
On average each device makes 10 updates a day, so thedeviceUpdate table roughly grows by the number of devices * 10 every day.
The sensorUpdate table is between 1 and 5 times the size of the DeviceUpdate.
The sensor table is roughly a hundred entries.

The type of queries that i’m trying to run looks like this :

query{
  devices(first: 10){
    id
    deviceUpdates(first: 100){
      id
      sensorUpdates{
        id
        sensor{
          id
        }
      }
    }
  }
}

with potentially more query parameters, such as filtering and ordering.
This type of queries takes ages to complete, and very often they end up crashing the prisma server in most cases.
The data retrieval can be expressed with the following SQL queries :

Very slow query (minutes):

SELECT *
FROM "Device" d
LEFT JOIN "DeviceUpdate" du ON d.id = du.device
LEFT JOIN "SensorUpdate" su ON du.id = 'su.deviceUpdate'
LEFT JOIN "Sensor" s ON su.id = 's.sensorUpdate'
WHERE d.id IN(...)
WHERE s.id IN(...); 

But the same result can be achieved in a much more performant way.
Fast query (seconds) :

SELECT*
FROM "Sensor" s
INNER JOIN "SensorUpdate" su ON su.sensor = s.id
INNER JOIN "DeviceUpdate" du ON du.id = su."deviceUpdate"
INNER JOIN "Device" d ON d.id = du.device AND d.id IN(...)
WHERE s.id IN(...);

I'm sure it's possible to write a more performant SQL query, or maybe to play around with indexes to achieve desired performances. But I don't see a way to do that with Prisma.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions