Quiver: Introduction

Motivation

In 2019, the Walmart Control Services¹ (WCS) team was tasked with creating a new system that would serve as Walmart’s homegrown Warehouse Control and Execution System (WCS + WES). Its vision was a big undertaking in a critical path of the supply chain flow that would impact customer satisfaction and associate productivity. The team had to build systems that would be leveraged in all the different types of fulfillment and distribution centers.

To achieve this, a functionally scalable architecture with a solid technical foundation that can provide the basis for building these systems was required. A foundation that fulfills all the technical challenges that come with being present in such a varied set of environments and interact with the varied set of automation systems at the warehouses as well as the different software systems in the Walmart ecosystem.

Introduction

For such a system to be architected which scales both functionally and technically, it was important to build a framework that would allow quick development of business functionality while leaving the core concerns to be solved at the low level. A technical foundation that would fit the needs of such a system would be:

  • Latency sensitive
  • Geared to run on commodity hardware
  • Designed to have low CPU and memory footprints
  • Equipped with common abstractions for interactions with different infrastructures
  • Rich in feature sets without being overly prescriptive on applications
  • Able to handle common concerns like configuration, authentication, logging, metrics, etc.
  • Resilient to failures, with data processing guarantees, recovery, etc.

Quiver, the homegrown microservices framework designed and developed by the WCS team, was built with these considerations in mind and much more.

Key Ideas

Quiver is packed with features a modern distributed system needs to build an efficient, scalable, and customizable application.

Abstraction forms the baseline of several core offerings found within the framework. Abstraction is how applications use available features, support infrastructure, and address common concerns.

Unified Access Pattern (UAP)

Often the application in a distributed environment needs to interact with multiple different types and variants of infrastructure components. For example, a typical application deployed in cloud might consume messages from a Kafka cluster or AMQ/IBM MQ server, or write messages to AMQ/IBM MQ, or query from MS SQL Server, or Maria DB, etc. And while the semantics of consuming from Kafka topics and AMQ queues are different from an application’s point of view, it is receiving a stream of messages from a system that the application needs to process. The same applies to other operations.

This idea is what translates to Unified Access Pattern (UAP) in Quiver, where the application can treat each supported infrastructure uniformly with respect to the operations it can perform.

Figure 1: Unified Access Pattern

Given this paradigm, applications can be developed by focusing on business needs rather than underlying communications mechanisms. It also makes possible the creation of a uniform way of writing such business logic that makes the codebase more maintainable.

Resource Definition

Each supported infrastructure is associated with a Uniform Resource Identifier (URI) and the application can use the URI’s scheme to define it. It also allows applications to use additional parameters for controlling behavior while interacting with a resource.

kafka://broker-list/test-topic?consumerGroup=test.quiver.consumer

ibmmq-ps://host:51670/TOPIC?queueManager=QM1&channel=CHANNEL&subscriberName=SUBSCRIBER

Then the resource URI is internally converted into a Resource Definition, and the supported infrastructure implementations use it to initialize and support operations. This abstraction allows applications to only deal with a single type regardless of the underlying infrastructure. This enables applications to use a single façade to parse and obtain the resource definition.

public static ResourceDefinition parse(final String uri);

Message Object

Since applications interact with the infrastructure through abstracted resource definitions, it’s important to have a common way to pass incoming and outgoing data to and from the application. The Message object is the domain object used for such interaction. The Message object allows applications and the Quiver framework to have a common understanding and data, metadata, trace information, and more. The components within the Message object that carry these elements have data types that they are able to cover all possibilities.

The definition type is below:

class Message {
Metadata metadata;
TraceInfo traceInfo;
OperationInfo operationInfo;
String id;
String name;
byte[] data;
ResourceProperties resourceProperties;
ZonedDateTime timestamp;
String version;
String partitionId;
}

Supported Operations

While there are many different types of infrastructure supported by Quiver, the operations it supports can be grouped into the following categories:

Consume

One of the most common operations performed by a microservice is getting messages from an unbounded stream and processing them. This operation is generically referred to as “consume” in Quiver which allows applications to provide information about where to get the messages from and the message processing pipeline.

<Decoded, Handled> ConsumeToken consume(ConsumeProperties properties,
ResourceDefinition resourceDefinition,
BiFunction<Message, OperationContext, Optional<Decoded>> decoder,
BiFutureFunction<Decoded, OperationContext, Optional<Handled>> handler,
BiFutureFunction<Handled, OperationContext, Void> interpreter);

ConsumeToken consume(ConsumeProperties properties,
ResourceDefinition resourceDefinition,
BiFutureFunction<Message, OperationContext, Void> singleStage);

Write

This operation represents an outgoing message posted onto a system. Typically, when a message is processed in the “consume” operation, side effects are produced and sent out for the next service to pick up through this operation. Quiver supports both single message “write” as well as their bulk counterparts for multiple messages.

CompletableFuture<WriteResult> write(OperationContext context,
WriteProperties properties,
ResourceDefinition definition,
Message message);

Read

The “read” operation represents fetching a specific message from a supporting infrastructure. Typically, this is used by an application when it’s not consuming all the messages but wants to make a point lookup to grab a specific message or data.

CompletableFuture<ReadResult> read(OperationContext context,
ReadProperties properties,
ResourceDefinition definition);

Execute

While interacting with persistence stores, applications often need to run queries to obtain, insert, or update data. All such operations can be performed using the “execute” operation. This operation, just like the others above, abstracts out the underlying persistence store against, which the query is going to be executed by providing the result set back in the common domain language as the other operations.

CompletableFuture<QueryResult> execute(OperationContext context,
QueryProperties properties,
ResourceDefinition rd,
String query,
Map<String, Object> queryParameters);

CompletableFuture<QueryResult> execute(OperationContext context,
QueryProperties properties,
ResourceDefinition rd,
String query,
Object queryParameters,
Map<String, Object> additionalQueryParameter);

Although these operations are generally supported by Quiver, some of the resources may not internally support it. The following table shows which operations are applicable to each infrastructure.

Figure 2: Supported Operations

Key Features

Along with the above mentioned core operations Quiver supports, there are a lot of features that it provides that come out of the box for the applications to use. All of them are configurable by the applications to tune it for their needs and as required.

Measurements

Every Quiver operation is measured. If an operation is performed through pipeline stages pipeline, each stage is measured as well. These measurements allow applications to determine processing time and optimize as needed. The measurements are done transparently for the applications but are available to be exposed as metrics.

Metrics

Metrics are collected through measurements and can be exposed by different means. Common metrics stores, like Prometheus, are supported out of the box along with the ability to log metrics, etc. System metrics are collected automatically and cover key framework components. The application can also configure any additional app-specific custom metrics.

Resilience

Quiver provides multiple ways for the application to deal with inevitable failures. Operations like read, write, and execute allow them to be executed under a retry logic that provides the ability to recover from transient failures and try the operation again until a configurable limit is reached. Quiver also provides mechanisms to deal with messages that are ingested but cannot be processed. Such messages are consumed under configurable failure behaviors that give the application the flexibility to choose the reaction.

Figure 3: Failure Behaviors

Message Completion and Guarantees

For the “consume” operation, Quiver manages the message lifecycle so the application does not have to worry about any message drops or data loss. Quiver offers tight guarantees for message delivery to the application and managing message completion and checkpoint. It also provides flexibility to manually execute these responsibilities to enable more complex processing if needed.

Objects Management

Quiver also comes with the ability to allow any high allocation objects to be reused to ease garbage collector pressure. Such objects are managed using pre-allocated and elastic object pools that provide ways to clean and recycle objects. The details of pool usage, new allocations, garbage collection, and more are captured as part of the system metrics mentioned above, giving the application a detailed view of how the objects, and in turn the heap, is managed.

Advanced Features

The robust Quiver framework offers all the mentioned features and more. Other powerful features include:

  • LMAX Disruptor² at system boundaries
  • Uniform Configuration Management
  • Uniform Authentication Modes
  • Uniform Filtering on Streams
  • Advanced Parallelism
  • Isolated Application Environment
  • And much more

References

1. Walmart Control Services (WCS) powers Next Generation Fulfillment Centers https://www.linkedin.com/pulse/walmart-control-services-wcs-powers-next-generation-

2. Thompson, M., Farley, D., Barker, M., Gee, P., & Stewart, A. (2011, May). LMAX Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads. https://lmax-exchange.github.io/disruptor/disruptor.html

Quiver: Introduction was originally published in Walmart Global Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: Quiver: Introduction. Motivation | by Rajeev Ravindranath | Walmart Global Tech Blog | Feb, 2023 | Medium