Introducing simple-store and simple-cell

2017-06-23
haskelldb

simple-store1 provides persistence, atomicity and consistency for a shared data type. simple-cell2 uses simple-store and directed-keys3 to create multiple atomic values with the unique keys. We will review a few concepts before discussing these packages further.

Review

Concurrency control

Concurrency control4 ensures that concurrent operations are safe and fast and solves the following problems:

  • lost update
  • dirty read
  • incorrect summary

We will only discuss one type here, optimistic concurrency control.

Optimistic concurrency control (OCC)

OCC5 assumes that multiple transactions can be made frequently without interfering with each other. OCC does not use locks6. Rather, before committing each transaction, it verifies that no other transaction has modified the data it has read. If the check reveals that modifications have occurred, the committing transaction rolls back and can be restarted. OCC is good for environments with low data contention.

OCC phases

  • begin: record a timestamp which marks the beginning of a transaction.
  • modify: read db values, tentatively write changes.
  • validate: check if other transactions have modified the data that the transaction has read/wrote.
  • commit/rollback: if there is no conflict, make all changes take effect, otherwise abort the changes.

Software transactional memory (STM)

STM7 is a type of OCC. A thread modifies shared memory without concern for what other threads may be doing to that memory. STM makes the reader responsible for making sure nothing is operating on the shared memory.

Concurrent Haskell and STM

MVar

MVar t8 is a mutable location that is empty or contains a value t.

  • putMVar sets the value in an MVar.
  • readMVar gets the value in an MVar and sets it to the value it just took.
  • takeMVar gets the value in an MVar and sets the value to empty.

TMVar

TMVar t9 is the STM version of MVar and is thread safe.

  • putTMVar sets the value in an TMVar.
  • readTMVar gets the value in an TMVar and sets it to the value it just took.
  • takeTMVar gets the value in an TMVar and sets the value to empty.

Dependencies

cereal

cereal10 is a package the performs binary serialization. By declaring an instance of the Serialize type class, we can perform serialization and deserialization on a type. It can also use GHC.Generics to automatically declare a Serialize instance.

directed-keys

directed-keys provides a data type and functions to serialize data to and from Base6411.

  • DirectedKeyRaw is data type that has a unique key, a source url/file path, a destination url/file path and a time. They all need an instance of Hashable and Serialize.
  • DirectedKey is a ByteString of the data in DirectedKeyRaw.
  • encodeKeyRaw is like DKeyRaw constructor, but it enforces the Serialize restriction on the keys.
  • encodeKey and decodeKey to an from a base64 ByteString.
  • parseFilename and decodeFilename escape and unescape the necesssary Unix characters in a file path.

simple-store

The main data type is SimpleStore. You do not need to manipulate SimpleStore records directly. simple-store provides a set of functions to save and retrieve data via the SimpleStore data type and filesystem.

The most important functions are:

  • makeSimpleStore save a serializable type to a file
  • openSimpleStore read a deserializable type from a file
  • getSimpleStore get the data type value from the SimpleStore.
  • modifySimpleStore
  • updateSimpleStore

simple-cell

A SimpleCell takes a function to retrieve keys and a function to make those keys into filenames. It maintains a key-value pair of filename to SimpleStore. By convention, simple-cell uses a type suffixed Store as a newtype wrapped entity of an entity with DB specific properties (like external keys).

The general work flow is:

  • Declare CellKey value for the data type we want to store.
  • Generate functions with Template Haskell.
  • Initialize a SimpleStore.
  • Pass the cell around manually or with ReaderT.
  • Insert data with insert<Type>SC.
  • Perform batch operations with foldlWithKey<Type>SC
  • Get stores out the cell with get<Type>SC.

We need to define three functions for CellKey for looking up, decoding and encoding a DirectedKeyRaw.

Then we use Template Haskell to produce a set of type specific functions.

makeStoreCell generates the following functions, but you have to provide the type signature for each of them to help the Template Haskell.

  • get<Type>SC
  • update<Type>SC
  • createCheckpointAndClose<Type>SC
  • traverseWithKey<Type>SC_
  • insert<Type>SC
  • delete<Type>SC
  • foldlWithKey<Type>SC
  • initialize<Type>SC

We generally do not need to manipulate the SimpleCell data type directly, but it is helpful to know what it contains.

  • cellCore is an in-memory representation of the map of keys and SimpleStores.
  • cellKey is the key provided by the user.
  • cellParentFP is the file path that the root of the project is in.
  • cellRootFP is the file path that the cell occupies.

Types to remember for simple-cell:

  • DirectedKeyRaw
  • CellKey
  • SimpleCell
  • SimpleStore

For a complete example, take a look at the simple-cell tests.

References


  1. Github :: simple-store

  2. Github :: simple-cell

  3. Github :: directed-keys

  4. Wikipedia :: Concurrency control

  5. Wikipedia :: Optimistic concurrency control

  6. Wikipedia :: Lock (computer science)

  7. Real World Haskell :: Software transactional memory

  8. Hackage :: stm :: MVar

  9. Hackage :: stm :: TMVar

  10. Hackage :: cereal

  11. Wikipedia :: Base64