This unearths a problem, however, since some public API functions applied to untyped tables will return keys, values, and blobs that retain unused memory. Inspecting RawBytes, we see that it is a newtype wrapper around a primitive vector:
Ideally, public API functions should never return slices, but some do.
We wrote a reproducer program that inserts a bunch of keys/values/blobs into an untyped table, then performs lookups and range lookups (i.e., cursor reads) while retrieving blobs, and prints the results. The reproducer program and its output is included below:
Inserting keys, values, and blobs
Printing inserted keys, values, and blobs as (bytes, bytes, Maybe bytes)
([0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0],Just [0,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,1],[1,0,0,0,0,0,0,0],Just [1,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,2],[2,0,0,0,0,0,0,0],Just [2,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,3],[3,0,0,0,0,0,0,0],Just [3,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,4],[4,0,0,0,0,0,0,0],Just [4,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,5],[5,0,0,0,0,0,0,0],Just [5,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,6],[6,0,0,0,0,0,0,0],Just [6,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,7],[7,0,0,0,0,0,0,0],Just [7,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,8],[8,0,0,0,0,0,0,0],Just [8,0,0,0,0,0,0,0])
([0,0,0,0,0,0,0,9],[9,0,0,0,0,0,0,0],Just [9,0,0,0,0,0,0,0])
Performing lookups
Printing looked up values as (bytes, offset, size)
([0,0,0,0,0,0,0,0],0,8)
([1,0,0,0,0,0,0,0],0,8)
([2,0,0,0,0,0,0,0],0,8)
([3,0,0,0,0,0,0,0],0,8)
([4,0,0,0,0,0,0,0],0,8)
([5,0,0,0,0,0,0,0],0,8)
([6,0,0,0,0,0,0,0],0,8)
([7,0,0,0,0,0,0,0],0,8)
([8,0,0,0,0,0,0,0],0,8)
([9,0,0,0,0,0,0,0],0,8)
Printing looked up blobs as (bytes, offset, size)
([0,0,0,0,0,0,0,0],0,8)
([1,0,0,0,0,0,0,0],8,8)
([2,0,0,0,0,0,0,0],16,8)
([3,0,0,0,0,0,0,0],24,8)
([4,0,0,0,0,0,0,0],32,8)
([5,0,0,0,0,0,0,0],40,8)
([6,0,0,0,0,0,0,0],48,8)
([7,0,0,0,0,0,0,0],56,8)
([8,0,0,0,0,0,0,0],64,8)
([9,0,0,0,0,0,0,0],72,8)
Performing a range lookup
Printing looked up keys as (bytes, offset, size)
([0,0,0,0,0,0,0,0],106,8)
([0,0,0,0,0,0,0,1],114,8)
([0,0,0,0,0,0,0,2],122,8)
([0,0,0,0,0,0,0,3],130,8)
([0,0,0,0,0,0,0,4],138,8)
([0,0,0,0,0,0,0,5],106,8)
([0,0,0,0,0,0,0,6],114,8)
([0,0,0,0,0,0,0,7],122,8)
([0,0,0,0,0,0,0,8],130,8)
([0,0,0,0,0,0,0,9],138,8)
Printing looked up values as (bytes, offset, size)
([0,0,0,0,0,0,0,0],146,8)
([1,0,0,0,0,0,0,0],154,8)
([2,0,0,0,0,0,0,0],162,8)
([3,0,0,0,0,0,0,0],170,8)
([4,0,0,0,0,0,0,0],178,8)
([5,0,0,0,0,0,0,0],146,8)
([6,0,0,0,0,0,0,0],154,8)
([7,0,0,0,0,0,0,0],162,8)
([8,0,0,0,0,0,0,0],170,8)
([9,0,0,0,0,0,0,0],178,8)
Printing looked up blobs as (bytes, offset, size)
([0,0,0,0,0,0,0,0],0,8)
([1,0,0,0,0,0,0,0],8,8)
([2,0,0,0,0,0,0,0],16,8)
([3,0,0,0,0,0,0,0],24,8)
([4,0,0,0,0,0,0,0],32,8)
([5,0,0,0,0,0,0,0],40,8)
([6,0,0,0,0,0,0,0],48,8)
([7,0,0,0,0,0,0,0],56,8)
([8,0,0,0,0,0,0,0],64,8)
([9,0,0,0,0,0,0,0],72,8)
What we can see here is that lookups do not return slices, but blob retrieval and range lookups (i.e., cursor reads) do. We should fix this, presumably by forceing primitive vectors before returning them as raw bytes from public API functions. We should also write tests for this.
lsm-treecan be used in an "untyped" way by setting types of keys, values, and blobs as raw bytes. For example, a user could decide on either of the two following options:The latter option bypasses
lsm-tree's serialisation framework, which is perfectly valid to do, and it is useful in some cases wherelsm-tree's serialisation classes are too restrictive. The caveat is that the onus for (de-)serialisation now falls on the user oflsm-tree. For exampleouroboros-consensusdoes this because they want to (de-)serialise keys and values with additional context.This unearths a problem, however, since some public API functions applied to untyped tables will return keys, values, and blobs that retain unused memory. Inspecting
RawBytes, we see that it is a newtype wrapper around a primitive vector:A
RawBytesthat retains unused memory is slice of a primitive vector. For example:Ideally, public API functions should never return slices, but some do.
We wrote a reproducer program that inserts a bunch of keys/values/blobs into an untyped table, then performs lookups and range lookups (i.e., cursor reads) while retrieving blobs, and prints the results. The reproducer program and its output is included below:
What we can see here is that lookups do not return slices, but blob retrieval and range lookups (i.e., cursor reads) do. We should fix this, presumably by
forceing primitive vectors before returning them as raw bytes from public API functions. We should also write tests for this.Reproducer
{-# OPTIONS_GHC -Wno-orphans #-} module Main (main) where import Control.Tracer import Data.Maybe import qualified Data.Vector as V import qualified Data.Vector.Primitive as VP import Data.Word import Database.LSMTree main :: IO () main = do foo foo :: IO () foo = withOpenSessionIO nullTracer "_temp" $ \s -> withTableWith conf s $ \(t :: Table IO RawBytes RawBytes RawBytes) -> do do putStrLn "Inserting keys, values, and blobs" let entries = V.fromList [ ( serialiseKey k , serialiseValue v , fmap serialiseValue mb ) | (k, v, mb) <- contents ] inserts t entries putStrLn "Printing inserted keys, values, and blobs as (bytes, bytes, Maybe bytes)" mapM_ print entries do putStrLn "Performing lookups" res <- lookups t $ V.fromList [ serialiseKey k | (k, _, _) <- contents ] let values = V.map (fromJust . getValue) res let blobRefs = V.mapMaybe getBlob res blobs <- retrieveBlobs s blobRefs putStrLn "Printing looked up values as (bytes, offset, size)" mapM_ print $ fmap deconstruct values putStrLn "Printing looked up blobs as (bytes, offset, size)" mapM_ print $ fmap deconstruct blobs do putStrLn "Performing a range lookup" res <- rangeLookup t $ FromToIncluding (serialiseKey @Word64 minBound) (serialiseKey @Word64 maxBound) let keys = V.map getEntryKey res let values = V.map getEntryValue res let blobRefs = V.mapMaybe getEntryBlob res blobs <- retrieveBlobs s blobRefs putStrLn "Printing looked up keys as (bytes, offset, size)" mapM_ print $ fmap deconstruct keys putStrLn "Printing looked up values as (bytes, offset, size)" mapM_ print $ fmap deconstruct values putStrLn "Printing looked up blobs as (bytes, offset, size)" mapM_ print $ fmap deconstruct blobs where conf = defaultTableConfig { confWriteBufferAlloc = AllocNumEntries 5 } contents :: [(Word64, Word64, Maybe Word64)] contents = [(i, i, Just i) | i <- [0..9]] instance SerialiseKey RawBytes where serialiseKey = id deserialiseKey = id instance SerialiseValue RawBytes where serialiseValue = id deserialiseValue = id instance ResolveValue RawBytes where resolve = const deconstruct :: RawBytes -> (RawBytes, Int, Int) deconstruct rb@(RawBytes (VP.Vector off sz _)) = (rb, off, sz)