Disclaimer: These are planning documents. The functionalities described here may be unimplemented, partially implemented, or implemented differently than the original design.

Scabbard Back Pressure

Summary

When scabbard is under heavy load the send queue can fill up. This resulted in the following error that halted all future batch processing:

[2021-04-13 13:06:28.911] T["Orchestrator Outgoing"] ERROR [splinter::orchestrator::runnable]
 erminating orchestrator  outgoing thread due to error: an orchestration error occurred:
connection 20ae95a4-b2f0-4a99-8174-29c123420c00 send queue is full

Instead of falling over, when the batch queue fills up scabbard should start rejecting new batches until it can handle the batches it has already accepted.

Guide-level explanation

A simple back pressure will be added to the scabbard batch submission by returning a 429 Too Many Requests if the service’s batch queue reaches 30 or more batches.

/scabbard/{circuit}/{service_id}/batches:
    post:
      summary: Submit a list of batches to the Scabbard service
      description: |
        This endpoint can be used to submit batches to a Scabbard service. The
        body of the request must be a list of valid Sabre batches. If the
        batches are submitted successfully, the response will contain a link for
        checking the status of the submitted batches.

        This endpoint requires the permission "scabbard.write".

      . . .  

    429:
      description: Too many requests have been made to process batches

Any APIs written against the scabbard REST API should handle the 429 response by waiting and resubmitting the batch at a future time. Scabbard will start accepting requests again once the queue reaches half its max size, 15 batches.

Reference-level explanation

If the pending batch queue gets bigger than 30 batches, any new batches that are submitted will be rejected with TooManyRequests until the queue size is reduced by half.

The services coordinator is currently the only service that keeps track of the pending batches, therefore the coordinator needs to tell the other members of the circuit that there are too many batches. The following two message types will be added

    message ScabbardMessage {
        enum Type {
            UNSET = 0;
            CONSENSUS_MESSAGE = 1;
            PROPOSED_BATCH = 2;
            NEW_BATCH = 3;

        +   TOO_MANY_REQUESTS = 10;
        +   ACCEPTING_REQUESTS = 11;
        }
    }

When the coordinator’s queue reaches 30 batches, it will notify its peer services by sending a ScabbardMessageType::TOO_MANY_REQUESTS. During this time the coordinator will stop accepting any batches from its REST API but it will still accept pending batches from the other services to account for any that may be accepted before the TOO_MANY_REQUEST message is handled.

Non-coordinators will stop accepting batches when they receive ScabbardMessageType::TOO_MANY_REQUESTS. The service will only start accepting batches again when it receives a ScabbardMessageType::ACCEPTING_REQUESTS from the coordinator.

Once back pressure is enabled, new batches will not be accepted until the queue gets to half the size of the limit. The coordinator will then send a ScabbardMessageType::ACCEPTING_REQUESTS messages to the non coordinator and all services will begin accepting batches again.

The batch queue limit does not change as this is intended to just be a simple implementation.