Photo Credit: SerenaWong
Performing computations on demand while fetching and aggregating data from multiple other systems in sub-second time was something that we had always wanted to do. To solve this problem, we turned to Java.
In our quest to find the best way to perform many tasks simultaneously in Java, we evaluated and compared various options. We conducted our study by performing a proof of concept (POC), making around 600 HTTP calls using each of the types.
Before we dive into what we observed, let’s examine the offerings we explored.
Sequential Processing Java Parallel Streams
Sequential processing is the most straightforward way to perform a task, wherein one call is made after the other. This approach takes the most time.
Java Parallel Streams
Java Parallel StreamsIntroduced in Java 8, parallel streams process a task on a large dataset simultaneously. When parallel streams are used, Java runtime divides a single task into multiple subtask streams and executes on each of them in parallel. Parallel streams use ForkJoinPool, which incorporates threads from commonPool(). By default, the size of the thread pool is equal to the number of logical CPU cores minus one. However, it can also be custom set using specific system properties.
Parallel Streams Uses
Consider using parallel streams when:
- The task is more expensive than the overhead of data splitting and thread management
- There is enough data to work on
- The amount of computation required per data point is large enough (NQ model)
Parallel Streams Side Effects
Parallel streams don’t guarantee the order of processing, so they should only be used when processing doesn’t need to be performed in a certain order. When parallel streams are used to perform a blocking or long-running operation, all threads of the common ForkJoin Pool will be engaged, and this will affect other tasks that require commonPool.
CompletableFuture is an extension of Java Future that can be used to perform tasks asynchronously. Using CompletableFuture allows the user to execute a task on a separate thread, which will then notify the main thread about the change that occurred and the new status of the task. The idea is that CompletableFuture was designed to allow the user to execute a long-running task (T1) while other tasks execute in parallel on the main thread. Therefore, this system is beneficial because it eliminates the issue of the other simultaneously running tasks being blocked by the primary task, which would reduce overall execution time.
If you want to perform a task and don’t need to return anything, use the runAsync() API, which returns a CompletableFuture<void>. However, if you need to return something, use supplyAsync(), which takes Supplier<T> and returns T when a get() method is called on it.
By default, CompletableFuture uses the same ForkJoinPool and commonPool() as parallel streams , but a custom thread pool can be supplied to the runAsync()/supplyAsync() methods.
Completable Future also supports callbacks, which can be used to perform certain actions once the control returns to the main thread on future completion.
We created 600 HTTP GET requests using each of these techniques to compare the performance, overhead, and time to complete the task. Here are our findings.
Compared to sequential processing, we saw an 89% reduction in time using parallel streams with commonPool(). Furthermore, using a custom thread pool of 25 threads reduced the time to half of that required by customPool.
We saw a similar trend while using Completable Future for the same task, which resulted in a 96%-time reduction while using custom threadpool() and an 86% reduction while using commonPool().
Parallel Processing in Java was originally published in Walmart Global Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Article Link: Parallel Processing in Java. Performing computations on demand while… | by bhagyashree gadekar | Walmart Global Tech Blog | Sep, 2022 | Medium