Data Store Guidelines
Overview
Modularity and flexibility are two of the core principles of Splinter. To support these, it is almost always desirable to define data stores as Rust traits without assumptions about the implementation or how the data is actually stored. By using Rust traits, you can provide various implementations of the data store that are swappable depending on the target environment and requirements like performance, scalability, and safety guarantees.
Data Store Basics
A data store in Splinter has three key components:
- Data structure(s) that represent the information being stored
- Rust trait(s) that define a backend-agnostic interface for the store
- Implementation(s) of the Rust traits, where each implementation provides a different type of backend storage
The data structures are arbitrary; the information they contain depends entirely on what data the store is designed to hold. These data structures are used throughout the store: in the traits, the implementations, and by the consumers of the store.
The Rust traits provide the methods that the store’s consumers interact with. These traits define the various operations for adding, modifying, and reading data, as well as some basic requirements for the store like thread-safety.
The implementations of the Rust traits provide the concrete mechanisms for storing data. These implementations can use a variety of backends for storage, including system memory, the filesystem, a database, or even the internet.
Interface vs. Implementation
The public interface that is used by other components is made up exclusively of the data structures and traits. Together, these two components encapsulate the functionality of writing to and reading from the data store without exposing any details about the type of storage (implementation) used. It is very important that the data structures and traits do not include any arguments, fields, or details about a specific implementation; this would violate the encapsulation that data stores are intended to provide.
Because of the encapsulation provided by a data store’s interface, the consumer of the data store should not interact directly with the concrete implementations of the Rust traits. By using only the public interface and not specific implementations, the backing storage can easily be swapped out without any changes to the code that consumes it.
Concrete store implementations should only be used directly when instantiating the store, which is typically done on startup (like in the Splinter daemon). Once a store is created, it should be used generically via the Rust traits.
Module Structure
Splinter stores should be broken up into a few different modules. The top level
of the store’s module, the mod.rs file, should contain the public data
structures and traits that make up the store’s interface. In the example above,
the pub struct Record
defines the type of data being stored (data structure)
and the pub trait RecordStore
is the store’s trait that will be implemented
for different backends.
Any errors that will be used or exposed by the store’s public interface should
be defined in an error
module (an error.rs file). In the example above, the
pub enum RecordStoreError
will be returned by the methods of the RecordStore
,
so it’s in the error
module.
Each implementation of the store’s interface should be in its own sub-module of
the store. These sub-modules should be self-contained, meaning all data, types,
and implementations for that kind of storage should exist only in that sub-module.
In the example above, there are two implementations of the RecordStore
: a
memory
implementation, and a diesel
implementation.
Designing Data Stores
This section describes the general patterns that should be used when designing a Splinter data store.
Traits
Data stores defined as traits allow for various backends to be implemented for
the data store. The data store’s API is then consistent across various backends
using these traits. Data stores should have clear names that describe the types
of things that are stored, regardless of implementation details. If a data store
only stores one data structure, it should be named after that data structure;
for example, a store of Record
structs would be called RecordStore
. Here’s a
basic example of what a simple data store trait may look like for the
RecordStore
:
pub trait RecordStore {
fn add_record(&self, record: Record) -> Result<(), RecordStoreError>;
fn remove_record(&self, record_id: &str) -> Result<(), RecordStoreError>;
}
In some cases, stores will hold multiple different data structures; in these cases, the name should reflect the overall functionality of the store. Be careful not to make a store that does too much; sometimes a store should be broken down into separate stores.
Another important factor to consider when designing a store is if the store
should be defined by a single trait, or by distinct reader and writer traits.
If the store will be used strictly as a reader in some cases, or if the store is
shared across threads, then it will likely be useful to have separate traits.
If you are splitting a store called MyStore
into reader and writer traits,
the appropriate name for the traits would be MyStoreReader
and MyStoreWriter
.
Implementations
Data store implementations should have clear and concise names that tell you the
type of store it is, and what kind of backend storage is used. For example, an
implementation of the RecordStore
that uses system memory for storage should
be called something like MemoryRecordStore
.
Data Structures
The simplest data stores are used to store some data structure that is identified
by a unique ID. IDs generally help stores when implementations include operations
used to fetch data. For example, you may want to store a struct Record
that’s
defined as follows:
struct Record {
id: String,
metadata: HashMap<String, String>,
}
Sometimes it is desirable for the ID to be more complicated than a string; for
example, an ID that is composed of two parts with different meanings might be
represented by a struct. In this case, the ID will be the struct rather than a
string, and the store methods will take and return this struct (or a reference
to it) rather than a String
(or &str
) for the ID.
In other cases, it may not make sense for the data being stored to have an associated ID. This may have an impact on the kinds of operations that the store can support or how efficient the implementations can be, so keep this in mind when designing your data stores.
All data structures should have a designated builder struct; for example,
Record
would have a builder struct RecordBuilder
. In addition to providing a
convenient way to create the structs, the builder can assist the user by
indicating if the struct has any missing or invalid values that will need to be
fixed to be able to add the struct to a store.
Error Types
The structure and handling patterns of Splinter store errors should reflect the patterns used throughout Splinter.
Data Store Methods
Adding Items
Adding items to a data store will generally be accomplished with an add_X
method, where X
is the name of the item to add. For instance, a method for
adding a Record
to the store would be defined as follows:
/// Adds a `Record` to the underlying storage
///
/// # Arguments
///
/// * `record` - The record to be added
///
/// # Errors
///
/// * Returns an error if a record with the same unique ID already exists
fn add_record(&mut self, record: Record) -> Result<(), MyStoreError>
When the data structure being added must be unique, the add method should return an error when a duplicate entry already exists in the store. Other errors may also be returned in other cases depending on the requirements of the store and what is being stored. For instance, the add method may return an error when some additional uniqueness constraints are violated, or when the object to add is invalid or missing values. In the case of invalid or missing values, the struct’s builder should check these values when building the struct to reduce the likelihood of these errors occurring.
In addition to a method for adding a single item, it’s sometimes desirable to
add multiple items at a time. A method for adding multiple Record
items would
be defined as follows:
/// Adds multiple `Record`s to the underlying storage
///
/// # Arguments
///
/// * `records` - The new records to be added
///
/// # Errors
///
/// * Returns an error if the unique ID of any of the records already exists
fn add_records(&mut self, records: Vec<Record>) -> Result<(), MyStoreError>
The same error cases as when adding a single item will generally apply when adding multiple items.
Updating Items
In the simplest case, updating an item in a store will look similar to adding an
item. An existing item is often replaced with a new definition. For example, a
method for replacing a Record
with a new definition would be defined as
follows:
/// Updates an existing `Record` with a matching ID in the underlying
/// storage
///
/// # Arguments
///
/// * `record` - The record to be updated
///
/// # Errors
///
/// Returns an error if record with the same unique ID does not already exist
fn update_record(&mut self, record: Record) -> Result<(), MyStoreError>;
In this case, an error will be returned if a Record
with the same unique ID as
the one passed in does not already exist. Otherwise, the matching Record
in
the store will be replaced by the one that was passed in.
In general, update operations only make sense where there is some unique identifier for the items in the store.
It is important to note that the add and update methods are intentionally
distinct; this is in contrast to many of the data structures in the Rust
standard library, which provide an insert
method for both adding and updating
existing items. The reason for this is that some store implementations behave
differently for adds and updates, and they may not be able to infer which action
is required when a generic “insert” is used.
Removing Items
Items will generally be removed from a store using a remove_X
method that
takes the desired item’s unique ID as an argument. For example, a method for
removing a Record
from a data store would be defined as follows:
/// Removes a `Record` with the given ID from the underlying storage
///
/// # Arguments
///
/// * `id` - The ID of the record to remove
///
/// # Errors
///
/// Returns an error if record with the given ID does not exist in the store
fn remove_record(&mut self, id: &str) -> Result<(), MyStoreError>;
Getting Individual Items
Getting individual items from a data store is usually accomplished with a
get_X
method. For example, a method for getting a Record
from a data store
would be defined as follows:
/// Gets the `Record` with the given ID from the underlying storage
///
/// # Arguments
///
/// * `id` - The ID of the record to get
fn get_record(&self, id: &str) -> Result<Option<Record>, MyStoreError>;
If a Record
with the given ID does not exist in the store, the method should
return an Ok(None)
value.
Listing Items
Listing items in a data store is usually accomplished with a list_Xs
method.
For example, a method for list all Record
items in a data store would be
defined as follows:
/// Lists all `Record`s in the underlying storage
fn list_records(
&self,
) -> Result<Box<dyn ExactSizeIterator<Item = X>>, MyStoreError>;
It is almost always best to return an iterator from a list method; this is more
efficient than a Vec
for some implementations. When using databases, for
instance, a Vec
would require loading all items into memory to be returned.
When returning an iterator, the iterator’s implementation can load items into
memory as needed, which would be more efficient.
An ExactSizeIterator
is the same as a standard Rust Iterator
, except that it
has a known size and provides a len
method. This is often very useful and
should be provided whenever possible; however, a standard Iterator
may be used
if ExactSizeIterator
is not feasible.
When listing items in a store, it may be desirable to allow filtering which
items are returned. Typically, filtering should be supported by adding some
optional predicates to the list method. For example, a list method for Record
items that supports filtering by some predicates may be defined as follows:
/// Lists some or all `Record`s in the underlying storage
///
/// # Arguments
///
/// * `predicates` - A list of predicates to be applied to the resulting
/// list. These are applied as an AND, from a query perspective. If the list
/// is empty, it is the equivalent of no predicates (i.e. return all).
fn list_records(
&self,
predicates: &[RecordPredicate],
) -> Result<Box<dyn ExactSizeIterator<Item = X>>, MyStoreError>;
In this example, RecordPredicate
would be an enum whose variants would provide
various ways to filter a Record
based on the fields of the Record
items in
the store.
In some situations, it may be sufficient (and more convenient) to provide
separate list methods instead of using predicates; this is the case when there
are a few well-defined subsets of items within the store. For example, if the
Record
items in a store can be either “active” or “inactive”, you may provide
list_active_records
and list_inactive_records
methods in addition to the
list_records
method.
Existence Methods
It’s often useful to provide convenient methods for checking the existence of
some items in a store. This will generally be accomplished with a has_X
method. For example, a method for checking if a Record
with a specific ID
exists in the store may be defined as follows:
/// Checks if the `Record` with the given ID exists in the underlying storage
///
/// # Arguments
///
/// * `id` - The ID of the record to check for
fn has_record(&self, id: &str) -> Result<bool, MyStoreError>;
More Complex Store Patterns
Internal IDs
In many cases, the data structure that is being stored provides its own unique
identifier. However, this is not always the case; sometimes the store itself
needs to provide unique identifiers for the structures being stored. In these
situations, the store’s add_X
methods must be modified. Other operations aren’t
affected by this change, since these methods will take in the regular data
structs, which contain the ID generated by the store.
If the store is generating the object IDs, the object passed to the add_X
method doesn’t need to have an ID field, since an ID for the object does not
exist before it’s added to the store. This intermediate object can be
represented by a separate struct which is named for the object it represents,
prefixed by New
. For example, a Record
struct without an ID would be called
NewRecord
; these would be defined as follows:
pub struct Record {
pub id: String,
pub description: String,
}
pub struct NewRecord {
pub description: String,
}
The operation would take the fields from the NewRecord
to construct a Record
with a newly generated ID. The method for adding a Record
using a NewRecord
to the store would be defined as follows:
/// Adds a `Record` to the underlying storage
///
/// # Arguments
///
/// * `new_record` - The record to be added
///
/// # Errors
///
/// * Returns an error if a record with the same unique ID already exists
fn add_record(&mut self, new_record: NewRecord) -> Result<(), MyStoreError>
Atomic Operations
Some stores that contain multiple types of related data will need to support atomic updates. When atomic updates are needed, they should be combined into a single method instead of requiring the user to call multiple methods to get the desired result.
For example, if you have a store that contains both RecordA
and RecordB
items, the implementation of the store will likely save the two structs to
different locations. If you need to provide a way to atomically remove a
RecordA
while adding a RecordB
, you would define a method like this:
/// Removes a `RecordA` with the given ID and adds a `RecordB` to the
/// underlying storage
///
/// # Arguments
///
/// * `recorda_id` - The ID of the `RecordA` to remove
/// * `recordb` - The `RecordB` to be added
///
/// # Errors
///
/// Returns an error if `RecordA` with the given ID does not exist in the
/// store. Returns an error if `RecordB` with the same ID already exists in
/// the store.
fn remove_recorda_and_add_recordb(
&mut self,
recorda_id: &str,
recordb: RecordB,
) -> Result<(), MyStoreError>;
Database Implementations
Most Splinter stores should provide a database-backed implementation. Databases are widely supported and are the most likely storage type to be used in production environments. This section describes the design patterns for database-backed Splinter stores.
Diesel
Splinter stores should be implemented using the Diesel library. Diesel is an ORM and query builder that allows stores to interact with different types of databases in a generic way. With Diesel, a single store implementation can support multiple backend databases.
The name of a Splinter store implemented using Diesel should follow the standard
store naming convention. For example, an implementation of a RecordStore
that
uses Diesel would typically be called DieselRecordStore
.
Types of Databases
The Splinter library provides support for PostgreSQL and SQLite databases. In general, database-backed store implementations should support at least these two database types.
Module Structure
Because the database-backed implementations of Splinter stores are designed to work with multiple database types, the implementation should be as modular as possible; this means breaking the implementation up into different sub-modules for different concerns. The diagram above demonstrates how database stores should be organized. Each module is covered in the following sections.
Top Level (mod.rs)
Diesel store implementations should be in a diesel
sub-module of the store.
This module should be guarded by the diesel
Rust feature, which ensures that
it is only compiled when database support is required. For example, the diesel
module would be defined in the top-level of a store module like this:
#[cfg(feature = "diesel")]
pub mod diesel;
The top level of the diesel
module is defined by the diesel/mod.rs
file,
which should contain the Rust struct that implements the store, as well as the
implementations of the store traits on that struct. For example, the
diesel/mod.rs
file for a RecordStore
would look something like this:
use diesel::r2d2::{ConnectionManager, Pool};
use operations::add_record::RecordStoreAddRecordOperation as _;
use operations::RecordStoreOperations;
pub struct DieselRecordStore<C: diesel::Connection + 'static> {
connection_pool: Pool<ConnectionManager<C>>,
}
impl<C: diesel::Connection> DieselRecordStore<C> {
fn add_record(&self, record: Record) -> Result<(), RecordStoreError> {
RecordStoreOperations::new(&*self.connection_pool.get()?).add_record(record)
}
...
}
The RecordStoreOperations
and the operations
sub-module will be covered in
the next section, Operation Traits.
It is important to note that the store is defined for a generic
diesel::Connection
type. This allows the Diesel store to be used with different
database types; each database type will have a different connection type. Some
operations, however, may be defined for specific database connection types; in
these situations, the store trait will need to be defined separately for each
supported connection type. See the
Implementing Operations for Individual Database Types
section of this document for more details.
Operations
Each operation that is performed by the store is represented by a trait specific
to the operation. Defining operations with traits allows for the operations to
be implemented differently for different database types, while still being able
to use them interchangeably in the store implementation. Here is an example of
the trait for the “add record” operation of the DieselRecordStore
, which would
be defined in the diesel/operations/add_record.rs
file:
pub trait RecordStoreListRecordsOperation {
fn list_records(
&self,
) -> Result<Box<dyn ExactSizeIterator<Item = Record>>, RecordStoreError>;
}
Notice that the trait does not have any requirements for the type of database
that is used; that is entirely up to the implementation. All operations for a
given store should be implemented by a single struct in the top level of the
operations
module. Here is an example of the operations struct for the
RecordStore
, which would be defined in diesel/operations/mod.rs
:
pub struct RecordStoreOperations<'a, C> {
conn: &'a C,
}
impl<'a, C: diesel::Connection> RecordStoreOperations<'a, C> {
pub fn new(conn: &'a C) -> Self {
RecordStoreOperations { conn }
}
}
This struct will be able to perform the database queries using a generic
database Connection
, which may be for any of the database types that are
supported by Diesel.
The implementations of the individual operations should be done in the
appropriate operations’ module. For instance, the implementation of the
RecordStoreAddRecordOperation
trait shown above should be in the
diesel/operations/add_record.rs
file, right after the definition of the trait:
impl<'a, C> RecordStoreListRecordsOperation for RecordStoreOperations<'a, C>
where
C: diesel::Connection,
{
fn list_records(
&self,
) -> Result<Box<dyn ExactSizeIterator<Item = Record>>, RecordStoreError> {
…
}
}
By implementing the operation traits on a single operations struct that holds
the database connection, we can refer to the connection using self.connection
in the body of the method implementations.
Implementing Operations for Individual Database Types
A trait for each operation allows for the implementation details to be concealed
and offers flexibility in the database connection type being used. Ideally, each
operation trait uses a generic Connection
type, which allows the operation to
be implemented for all Diesel connection types. However, there are some cases
where this is not possible due to limitations of certain database types, or is
not desirable because of optimizations for different databases.
One situation where databases require separate implementations is when Diesel’s
insert
operation is used; this operation requires that the Connection
implements the SupportsDefaultKeyword
trait, which is not implemented for
SqliteConnection
. In this case, Diesel needs to know specifically which type
of connection is used, which requires separate implementations for each database
type.
When separate implementations are required, the operations should be implemented
for each of the connection types, where each connection type is guarded by the
corresponding Rust feature. For example, implementing the
RecordStoreListRecordsOperation
separately for SQLite and PostgreSQL would
look something like this:
#[cfg(feature = "postgres")]
impl<'a> RecordStoreListRecordsOperation
for RecordStoreOperations<'a, diesel::pg::PgConnection>
{
fn list_records(
&self,
) -> Result<Box<dyn ExactSizeIterator<Item = Record>>, RecordStoreError> {
…
}
}
#[cfg(feature = "sqlite")]
impl<'a> RecordStoreListRecordsOperation
for RecordStoreOperations<'a, diesel::sqlite::SqliteConnection>
{
fn list_records(
&self,
) -> Result<Box<dyn ExactSizeIterator<Item = Record>>, RecordStoreError> {
…
}
}
When separate implementations are used for different connection types, the Diesel
store must implement the store’s traits for each connection type as well. For
example, the DieselRecordStore
that uses the RecordStoreOperations
struct
defined above would look like this:
#[cfg(feature = "postgres")]
impl DieselRecordStore<diesel::pg::PgConnection> {
fn add_record(&self, record: Record) -> Result<(), RecordStoreError> {
RecordStoreOperations::new(&*self.connection_pool.get()?).add_record(record)
}
...
}
#[cfg(feature = "sqlite")]
impl DieselRecordStore<diesel::sqlite::SqliteConnection> {
fn add_record(&self, record: Record) -> Result<(), RecordStoreError> {
RecordStoreOperations::new(&*self.connection_pool.get()?).add_record(record)
}
...
}
Database Models and Schema
Models and schemas define the structure of the database implementation and offer
a native Rust representation of the data stored in the database. The models and
schemas also directly correspond to how the database migrations are defined.
This data must be accessible to the migrations and operations, regardless of the
backend, so they should be stored in the diesel/models.rs
and
diesel/schemas.rs
files, respectively.
Models and schemas must account for what is able to be represented by the backend, as each database uniquely represents data. For instance, lists must be represented in a way that all supported databases can store. SQLite does not support lists, so special consideration must be taken for data that contains lists. Lists should be represented using separate database tables and foreign keys.
For example, if a Record
contains a list of strings, the strings should be
stored as individual database entries, where each entry in the list is
associated with its Record
. The Rust struct may look like this:
pub struct Record {
pub id: String,
pub description: String,
pub data: Vec<String>,
}
The corresponding database entries would be representing using the following models:
pub struct RecordModel {
pub id: String,
pub description: String,
}
pub struct RecordDataModel {
pub id: String,
pub data: String,
}
The RecordModel
requires an id
and description
, since the data
being
held in the Rust representation cannot be stored as a list in SQLite databases.
The RecordDataModel
represents an entry in the data
field of the Rust struct.
The RecordDataModel
is associated with its RecordModel
via the id
attribute. When querying the database, the ID allows for the RecordDataModel
entries to be fetched into a list which can then be parsed and organized into
the original Record
representation.
Database Migrations
Database migrations are necessary to apply the structure defined by models and schemas to the database itself. Migrations are written directly in the database’s query language, so they are defined separately for each database type.
All migrations should be in a migration
module. The diesel/migrations/mod.rs
should contain anything that is applicable to all migrations, such as errors for
issues that arise while running migrations. A MigrationError
is typically
defined in this file, and is used for migrations of the various database
implementations.
Each implemented backend should have its own migrations directory. For example,
migrations for PostgreSQL databases would be in the diesel/migrations/postgres
directory. The module that this directory comprises (the migrations::postgres
module in this example) should be guarded with the appropriate feature for
that database type (the postgres
feature in this example).
The mod.rs file in each database type’s migrations folder should define a
function for running the migrations. This function requires a database connection
that is specific to the backend implementation used. For example, the
diesel/migrations/postgres/mod.rs
file would look something like this:
embed_migrations!("./src/admin/store/diesel/migrations/sqlite/migrations");
use diesel::sqlite::SqliteConnection;
use super::MigrationError;
pub fn run_migrations(conn: &PgConnection) -> Result<(), MigrationError> {
embedded_migrations::run(conn).map_err(|err| MigrationError {
context: "Failed to embed migrations".to_string(),
source: Box::new(err),
})?;
info!("Successfully applied PostgreSQL migrations");
Ok(())
}
Diesel requires migration data to be contained within a directory titled
migrations
, so the migrations for a PostgreSQL store, for example, would be in
the directory diesel/migrations/postgres/migrations
.
For more on Diesel migrations, see Diesel’s Getting Started Guide.