¿What are Reactive Streams?
Reactive Streams is a standard for asynchronous data processing in a streaming fashion with non-blocking back pressure.
Then you have to understand what is "processing in a streaming fashion", "asynchronous", "back pressure", and "standard".
Stream Processing
The figure below compares traditional data processing vs. stream processing.
The traditional method on the left in each request/response saves the data that is queried in the database in the application memory. If the size of the requested data is larger than the size of available memory, an "out of memory" error will occur as a result. There is another scenario where the service or application receives many simultaneous requests and a large amount of GC (garbage collection) is triggered in a short period of time, causing the server to not respond normally.
On the other hand, there is the stream processing method, where large amounts of data can be processed with little system memory. With this type of processing, you can create a pipeline that subscribes to any incoming data, processes the data, and then publishes that data. Thanks to this, the server is capable of processing large amounts of data elastically.
Asynchronous Method
Let's compare the asynchronous method with the synchronous method. The following figure shows the process of both methods:
In the synchronous method, a request sent by the client is blocked until the server sends a response. Being "blocked" means that the current thread cannot execute another task and has to go into a waiting state. If two requests are sent to servers A and B, the request must receive a response from server A before moving to server B. However, with the asynchronous method, the current thread is not blocked and can execute other tasks without waiting for a response. The thread can be used for other tasks after sending a request to server A or sending a separate request to server B. The advantages of asynchronous methods compared to synchronous methods are:
- Speed - you can get quick responses by sending two simultaneous requests
- Less resource usage - more requests can be processed with fewer threads since threads can execute tasks in parallel without being blocked.
Back pressure
Before going into more details about "back pressure", let's look at the observer pattern, push method, and pull method made famous by RxJava.
Push Method
In the observer pattern, a publisher transfers data by pushing it to a subscriber. If the publisher sends 100 messages per second, and the subscriber can only process 10 messages in 1 second, what would happen? The subscriber will have to save the pending events in a queue.
The amount of server memory is limited. If 100 messages per second are sent to the server, the buffer will fill instantly.
If you have a static buffer, new messages will be rejected, they will have to be sent again and that will cause additional processing and network and CPU load.
If you have a variable length buffer, the server will at some point have an "out of memory" error due to the attempt to save the events.
How do we solve this problem? Can we have a publisher that only sends a number of messages that the subscriber can handle? This is what is called back pressure.
Pull method
With the pull method, a subscriber can process 10 operations at a time with only 10 requests to the publisher. The publisher can send the requested amount, and the subscriber is sure not to have any "out of memory" errors.
Additionally, if the same subscriber is currently processing 8 operations, it can request 2 more operations so that the number of messages does not exceed the limits of what it can process. With the pull method, subscribers are free to choose the size of the data they receive. The method that allows subscribers to dynamically pull data requests within their capacity is backpressure.
Standardization
Reactive Streams is a standardized API.
Development of Reactive Streams began in 2013 by engineers from Netflix, Pivotal, and Lightbend. Netflix is responsible for the development of RxJava, Pivotal for WebFlux, and Akka's Lightbend, an implementation of distributed processing actor models. What these companies had in common was that they all required streaming APIs. However, streams only make sense if they combine and flow organically. In order for data to flow uninterrupted, these different companies needed to use shared specifications, or in other words, they needed standardization.
In April 2015, Reactive Streams released their 1.0.0 specifications that can be used in the JVM. In September 2017, Java 9 added the Flow API, which includes the Reactive Streams API, specifications, and pull method, and packaged it in java.util.concurrent. Reactive Streams, which was a shared effort between community members and some companies, has been officially recognized and added as an official part of Java. Three months later, Reactive Streams released an adapter that supports Flow, allowing existing libraries to be used.
Reactivate Extensions
Reactive Extensions (Reactive X) is a family of cross-platform frameworks for handling synchronous or asynchronous event streams, originally created by Erik Meijer at Microsoft. The Reactive Extensions implementation for Java is the Netflix RxJava framework.
In simple terms, Reactive Extensions are a combination of the Observer pattern, the iterator pattern, and functional programming. Thanks to the Observer pattern, the ability for consumers to subscribe to producer events is taken. Thanks to the Iterator pattern, the ability to handle the three types of stream events (data, error, completion) is taken. Thanks to functional programming, the ability to handle stream events with chained methods (filter, transform, combine, etc.) is acquired.
Reactivate Streams
It is an additional development of Reactive Extensions, in which backpressure is used to achieve a balance of performance between the producer and consumer. In simple terms, it is a combination of Reactive Extensions and batching.
The main difference between them is who initiates the exchange. In Reactive Extensions, a producer sends events to a subscriber as soon as they are available and in any number. In Reactive Streams, a producer must send events to a subscriber-only after they have been requested and no more than the number requested.
Advantages:
- The consumer can initiate the exchange at any time.
- The consumer can stop the exchange at any time.
- The consumer can determine when the producer will finish generating events
- The latency is lower than synchronous pull communication because the product sends the events to the consumer as soon as they are available.
- The consumer can uniformly handle events from streams of three types (data, error, and completion).
- Handle stream events with chained methods that are simpler than implementing nested event handlers.
- Implementing concurrent producers and consumers is not a simple task.
Disadvantages:
- A slow consumer can be overloaded with events, due to a fast producer.
- Implementing concurrent producers and consumers is not a simple task.
Reactive Stream Specification
- Reactive streams can be unicast and multicast: a publisher can send events to one or more consumers.
- Reactive streams are potentially infinite: they can handle zero, one, many, or an infinite number of events.
- Reactive flows are sequential: a consumer processes events in the same order as a producer sends them.
- Reactive flows can be synchronous or asynchronous: they can use computing resources for parallel processing in separate stages.
- Reactive streams are non-blocking: they do not waste computing resources if the performance of a producer and a consumer are different.
- Reactive flows use mandatory backpressure: a consumer can request events from a producer according to its processing speed.
- Reactive flows use bounded buffers: they can be implemented without unbounded buffers, avoiding out-of-memory errors.
Reactive Streams API
- The Publisher<T> interface represents a producer of data and control events.
- The Subscriber<T> interface represents an event consumer.
- The Subscription interface represents a connection between a Publisher and a Subscriber.
- The Processor<T,R> interface represents an event processor that acts as a subscriber and publisher.
- Publishers only have a subscribe API that allows subscribers to subscribe.
- Subscribers have onNext to process received data, onError to process errors, onComplete to complete tasks, and onSubscribe API to subscribe with parameters.
- For subscription, there is a request API to request data and a cancel API to cancel subscriptions.
0 comentarios:
Publicar un comentario