Pipes
Pipes are uni-directional conduits for reliably transmitting data of a particular type between the publishers and consumers attached at the ends of the pipe.
pipe WeatherForecast is {
options rate(1000), paritions(7),
transmits type Forecast
}
In the foregoing, a pipe named WeatherForecast
is defined to transmit the
data type named Forecast
and with two options:
- rate - an expected sustained rate of 1000 data points per second
- partitions - a minimum number of partitions on the data of 7
Pipes can transmit any data type that RIDDL can specify. There is only one data type that flows in a pipe. The transmission type is often used with an alternation of message types such as the commands and queries that an entity might receive.
Pipes may play a large role in the resiliency of a reactive system so we permit a variety of options to be specified no them. These options are intended only as advice to the translators converting the pipe into useful code. For example, a pipe may or may not need to be persistent. If a pipe has the burden of persistence removed, it is likely much more performant because the latency of storage is not involved.
The messages flowing through the pipe are persisted to stable, durable storage, so they cannot be lost even in the event of system failure or shutdown. This arranges for a kind of bulkhead in the system that retains published data despite failures on either end of the pipe.
With this option, pipes support the notion of being commitable. This means the consuming processors of a pipe’s data may commit to the pipe that they have completed their work on one or more data items. The pipe then guarantees that it will never transmit those data items to that processor again. This is helpful when the processor is starting up to know where it left off from its previous incarnation.
For scale purposes, a pipe must be able to partition the data by some data
value that is in each data item (a key) and assign the consumption of the
data to corresponding members of a consumer group. This permits multiple
instances of a consuming processor to handle the data in parallel. The n
value is the minimum recommended number of partitions which defaults to 5
if not specified
By default, pipes provide the guarantee that they will deliver each data item
at least once. The implementation must then arrange for data items to be
idempotent so that the effect of running the event two or more times is the
same as running it once. To counteract this assumption a pipe can be use the
lossy
option which reduces the guarantee to merely best reasonable effort,
which could mean loss of data. This may increase throughput and lower overhead
and is warranted in situations where data loss is not catastrophic to the
system. Some IoT systems can have this characteristic.
Attached to the ends of pipes are producers and consumers. These are processors of data and may originate, terminate or flow data through them, connecting two pipes together. Producers provide the data, consumers consume the data. Sometimes we call producers sources because they originate the data. Sometimes we call consumers sinks because they terminate the data.
graph LR; Producers --> P{{Pipe}} --> Consumers Source --> P1{{Pipe 1}} --> Flow --> P2{{Pipe 2}} --> Sink
Pipes may have multiple publishers (writers of data to the pipe) and multiple consumers (readers of data from the pipe). In fact, because of the partitioned consumption principle, there can be multiple groups of consumers, each group getting each data item from the pipe.
When a pipe has multiple consumers, they are organized into subscriptions. Each subscription gets every datum the pipe carries. Consumers attach to a subscription and there is generally one consumer per partition of the subscription. Sometimes subscriptions are known as consumer groups as is the case for Kafka.