RefSet
Summary
The RefSet
provides a singular source of objects, similar to the Sawtooth
BlockManager
in its scope. This generalized RefSet
can provide a singular
reference to an object held in storage, using the ID as its reference.
Motivation
This RFC intends to simplify and generalize the BlockManager for use with other objects, such as batches. This will provide broader usage for library consumers.
Guide-level explanation
The RefSet
provides references to data objects. For example, it can be used
to store blocks or batches. The object manager’s role is to ensure that only a
single instance of an object exists in memory.
The managed objects need to provide some value as an identifier. This ID will be
used to reference the object by other components. A reference may be requested
from the RefSet
using a raw ID, but the manager will return a reference only
if it contains the referenced object.
The RefSet
will provide access to its managed objects via leases. These
leases allow the caller to reference the object, but not modify it.
Modifications may only be done by creating a new instance, with a new ID.
Unlike the current BlockManager
, the RefSet
will rely on the underlying
store to provide strong guarantees that an object with any references will not
be deleted. Initial store implementations will rely on RDBSs to provide those
guarantees, via foreign key constraints.
Moving the referential integrity to the underlying storage system does mean that a component holding onto an object reference is not necessarily guaranteed that the object exists. It does require that the component has performed a necessary upgrade of a raw ID before attempting to acquire a lease to the object.
If the component holding a reference requires referential integrity, it must
persist its own reference. For example, all chain heads of interest must be
persisted, to guarantee that blocks stored via a RefSet<Block>
are preserved.
In a SQL environment, this can be accomplished with two foreign key constraints.
The first constraint is between a block ID and a block’s previous_block_id
in
the primary block table. The second constraint is between a chain_head
record’s block ID and a block in the primary block table. With these
constraints, an individual block in the primary table cannot be deleted without
the referencing record being deleted.
References are held through the use of a RefToken
. These tokens convey
ownership of a reference to an item in the RefSet
. That ownership may be
transferred to another component. In order to do so, the transferring component
may provide a Command
to be executed when the transfer is complete. The
transferee may also provide a Command
on acquisition of the token. These two
commands will be executed within a transactional context, such that they are
executed atomically.
Reference-level explanation
Managed Objects
Each object, T
, with an identifier ID
, stored in a RefSet
has the
following requirements:
where
T: Identifiable<ID>,
ID: Send + Clone + Eq + Hash,
The trait Identifiable
is defined as follows:
pub trait Identifiable<ID> {
/// Return an identifier for the implementor.
fn id(&self) -> &ID;
}
RefSet
The RefSet
struct itself has the following API:
pub struct RefSet<T, ID> {
...
}
impl<T, ID> RefSet<T, ID>
where
T: Identifiable<ID>,
ID: Send + Clone + Eq + Hash,
{
/// Construct a new `RefSet`
///
/// Args
/// * set_store: The underlying persistences store for the objects in the
/// set
pub fn new(set_store: impl SetStore<Object=T, ID = ID>) -> Self {
...
}
/// Return a lease to an item in the set, if it exists.
///
/// ## Args
///
/// * obj_ref: an shared id reference with which to request the item
///
/// ## Returns
///
/// A lease to the item, if it exists in the set.
///
/// ## Errors
///
/// Returns an `InternalError` if there is an issue with the underlying
/// storage system.
pub fn get(
&self,
obj_ref: &RefToken<ID>
) -> Result<Option<Lease<'_, T>>, InternalError> {
...
}
/// Add an item to the set.
///
/// This method adds an item to the set, which results in the item being
/// persisted via this instance's `SetStore`.
///
/// ## Args
///
/// * obj: an shared id reference with which to request the item
///
/// ## Returns
///
/// A ShareableRef to the ID of the item, based on its definition of
/// `Identifiable`.
///
/// ## Errors
///
/// * Returns an `InvalidStateOrInternalError` if the item is a duplicate,
/// based on its identifier
/// * Returns an `InternalError` if there is an issue with the underlying
/// storage system
pub fn add(&self, obj: T)
-> Result<RefToken<ID>, InvalidStateOrInternalError>
{
...
}
/// Upgrade an ID to a RefToken, if it exists in the set.
///
/// ## Args
///
/// * id: the identifier to be upgraded
///
/// ## Returns
///
/// A ShareableRef to the ID, if it exists in the set.
///
/// ## Errors
///
/// Returns an `InternalError` if there is an issue with the underlying
/// storage system.
pub fn upgrade(&self, id: &ID)
-> Result<Option<RefToken<ID>>, InternalError>
{
...
}
}
Leases
An Lease
provides read access to an item in the RefSet
. It has the following
signature:
struct Lease<'a, T>
It implements Deref<Target = T>
, which allows the lease to be used as such:
// This set uses strings, where Identifiable has been implemented on a
// string returning itself, with a hypothetical store.
let set: RefSet<String, String> = RefSet::new(StringStore);
let my_ref = set.add(String::from("my batch"))?;
assert_eq!(Some(&String::from("my batch")), set.get(&my_ref)?.as_deref());
This lease is !Send
, explicitly so that it can’t be sent to other threads.
Only a RefToken
and the RefSet
may be sent to other threads.
Reference Tokens
References are held through the use of a RefToken
. These tokens convey
ownership of a reference to an item in the RefSet
. That ownership may be
transferred to another component. In order to do so, the transferring component
may provide a Command
to be executed when the transfer is complete. The
transferee may also provide a Command
on acquisition of the token. These two
commands will be executed within a transactional context, such that they are
executed atomically.
The details of the RefToken
and its transfer life-cycle are as follows:
pub struct RefToken<T>(T);
impl<T: Send> RefToken<T> {
pub fn value(&self) -> &T {
&self.0
}
pub fn release<C>(
self,
release_commands: Vec<Box<dyn Command<T, C> + Send>>,
) -> TransferToken<T, C, Released> {
TransferToken {
id: self.0,
commands: release_commands,
_state: std::marker::PhantomData,
}
}
}
pub trait TransferState {}
pub struct Released;
impl TransferState for Released {}
pub struct Acquired;
impl TransferState for Acquired {}
pub struct TransferToken<T: Send, C, S: TransferState> {
id: T,
commands: Vec<Box<dyn Command<T, C> + Send>>,
_state: std::marker::PhantomData<S>
}
impl<T, C, S> TransferToken<T, C, S> {
pub fn value(&self) -> &T {
&self.id
}
}
impl<T: Send, C> TransferToken<T, C, Released> {
pub fn acquire(
self,
mut acquire_commands: Vec<Box<dyn Command<T, C> + Send>>,
) -> TransferToken<T, C, Aqcuired> {
let mut commands = self.commands;
commands.append(&mut acquire_commands);
TransferToken {
id: self.id,
commands,
_state: std::marker::PhantomData,
}
}
}
impl<T: Send, C> TransferToken<T, C, Acquired> {
pub fn complete(self, context: &C) -> Result<RefToken<T>, InternalError> {
for command in self.commands {
command.execute(&self.id, context)?;
}
Ok(RefToken(self.id))
}
}
pub trait Command<T, C> {
fn execute(&self, t: &T, context: &C) -> Result<(), InternalError>;
}
Persistence
Persistence is provided to the RefSet
via a SetStore
implementation. This
trait provides the RefSet
with operations that include storing, loading, and
checking for the existence of objects in the set. It is defined as follows:
pub trait SetStore: Sync + Send {
type ID;
type Object: Identifiable<Self::ID>;
fn persist(&self, obj: Self::Object) -> Result<(), InternalError>;
fn get(&self, id: &Self::ID) -> Result<Option<Self::Object>, InternalError>;
fn has(&self, id: &Self::ID) -> Result<bool, InternalError>;
}
A Lease
has a lifetime, which limits it to local (i.e. single-threaded) use.
The lease provides Deref<Target = T>
for accessing the data.
A RefToken
provides the reference that should be used or held by the caller
to retrieve a full reference to the data.
Example: Replacing the BlockManager
The existing BlockManager
can be replaced by an instance of a RefSet
, backed
by an implementation of the SetStore
on Block
structures.
We can introduce several type definitions for the block management purposes:
type HeaderSignature = String;
type BlockRef = RefToken<HeaderSignature>;
type BlockManager = RefSet<Block, HeaderSignature>;
The SetStore
implementation would be backed by a table. The parts of the table
relevant to this discussion are described here:
CREATE TABLE blocks (
header_signature VARCHAR(128) NOT NULL PRIMARY KEY,
--- ... details omitted
previous_block_id VARCHAR(128),
previous_block_ref VARCHAR(128),
FOREIGN KEY(previous_block_ref) REFERENCES blocks(header_signature)
);
In sawtooth, there are two main components that hold on to block references: the Completer and the Chain Controller. The former is used to retrieve blocks the local node doesn’t have. The latter is used to track the various chain heads in question.
In order to preserve referential integrity of the blocks being completed, the Completer should persist a list of blocks pending completion:
CREATE TABLE pending_blocks (
block_id VARCHAR(128) NOT NULL PRIMARY KEY,
pending_previous_block_id: VARCHAR(128) NOT NULL,
FOREIGN KEY(block_id) REFERENCES blocks(header_signature)
);
When it inserts a block into the BlockManager
(again, just a RefSet
), the
underlying SetStore
implementation will insert the block without a
previous_block_ref
if the previous block isn’t yet stored. If this is
the case, the completer will insert an entry into the pending_blocks
table for
the block awaiting its predecessor.
Once the block has been completed, it will update the blocks
table to add the
previous_block_ref
to create the reference link between the blocks.
The RefToken
would be released, with a command to remove the reference from
the pending_blocks
table. The TransferToken
would be sent to the chain
controller, which would acquire the token with a command to commit the block to
its own table of references. When the transfer is completed, both changes are
committed atomically.
The chain controller would likewise commit a list of chain heads. This simple table would be as follows:
CREATE TABLE chain_heads (
chain_head VARCHAR(128) NOT NULL PRIMARY KEY,
FOREIGN KEY(chain_head) REFERENCES blocks(header_signature)
);
Any chains being considered would have a record in this table. When a chain is no longer of interest, it can be deleted.
Note, this simplifies the current system to these two components, but omits the reference needs of intermediate systems, such as block validation.
Finally, when there are no references to a block in the blocks
table,
including via previous_block_id
values, then the block may be deleted.
Drawbacks
A limiting factor of this implementation is that users of the RefSet
must
persist their references if they want to maintain referential integrity. It
strongly implies that any strictly in-memory operations must gracefully handle
missing items from the set.
However, the simplicity of this solution as well as relying on tried and true technologies outweigh the cost. This is especially true in the areas where referential integrity is required. The benefits of persisting this information greatly outweigh the loss of in-memory integrity.
Rationale and alternatives
One alternative is to maintain all the references in memory. This requires a complex set of explicit and implicit references, as well as additional extension points for implementing dependencies between items in the set - blocks and their predecessors, or blocks and their batches. These definitions depend on the implementation of a “block” which may differ depending on library consumer.
This alternative was considered, but it was determined that move towards storing all data in SQL provided the opportunity to make use of existing technologies to implement inter-component references in a simplified manner.
Prior art
The current BlockManager
solves the problem of reference integrity via
in-memory reference counting.
Unresolved questions
-
How is clean-up handled? There are several ideas here (triggers, background clean process) but currently a best solution has not presented itself.
-
How is atomicity of storage handled?
An example, a block is added to a
RefSet<Block>
and consequently written to its underlying SQL-backedSetStore
. After this operation, a set of transaction receipts are written to the database. These operations should be done within the context of a database transaction, but it is unclear how to add that with the storage abstraction one layer below the call site of adding the block.