Multithreading in Java vs Node.JS

Azat Satklyčov
Modern Mainframe
Published in
16 min readOct 15, 2020

Day to day developers deal with how to improve performance of slow operations. The challenge is, how to get the most performant result, the right concurrency model and to choose which programming language offers the best solution.

Introduction

Nowadays, threads (in computing) are used in many real-world applications, like when we are doing online banking transactions, buying airline tickets, ordering food, or many others. To do multithreading correctly requires deep knowledge of all components (e.g. Java, database transactions or Mainframe CICS, and business knowledge such as banking, etc.) involved. Understanding the concurrent programming principles, concurrency models and learning how multithreading works in depth will exhibit maturity and technical depth in a developer.

In this article you will learn about how to handle slow operations, multithreading and concurrency concepts like deadlocks, thread safety, thread pools, completable futures, non-blocking I/O, event-loop etc. It also discusses concurrency models that are supported by Java and Node.JS. Concluding with a comparison matrix showing best|average|worst elapsed-time results for Java vs Node.js.

This article is derived from the output of the technical-webinar which is publicly available source code here.

Handling Slow Operations

Suppose, a logistic company generates a quotation-report which is a slow task consisting of three respective subtasks T1 (download rates), T2 (process them based on complex algorithms), and T3 (generate the report).

Slow operations performance can be improved by using one of below mechanisms depending on the use-case.

Then which mechanism should I use to improve the performance of my slow operations?

Before answering this question, let’s understand the concepts of multitasking, context-switching, multithreading, process, thread, concurrency, and parallelism.

In earlier times, computers were not very powerful and having only a single CPU and therefore, it was only possible to run a single task at a time (e.g. work on excel to draw charts, then close it and open a word file to write some article).

Single CPU can run only single program at a time

Later on the above limitation was resolved by inventing a context-switching mechanism where CPU time is shared across all running processes, and it is possible to execute multiple programs or tasks at the same time (e.g. working on excel and word at the same time). The operating system switches between the programs running, executing each of them for a little while before switching. This switching happens so fast that for us it looks like they run at same time.

Single CPU can run multiple programs by switching between executions.

Multitasking is the concurrent execution of multiple tasks over a certain period of time. Multitasking is of two types: Process based and thread based. A Process is an instance of an application or a program, which owns memory and has at least one thread.

Multithreading is a type of execution model where multiple threads (light-weight processes) can run independently in a process. A Thread of execution (thread) is a small set of programmed instructions designed to be scheduled and executed by a CPU using process resources, including memory and open files.

Multithreading follows the same principle as in multitasking, but it happens within an application (program, process). A multi-threaded program contains two or more parts of a program that can run concurrently to achieve maximum utilization of available resources, for more responsiveness and for fair division of CPUs between different tasks.

Multiple threads executed on a single CPU. Each application (process) has two threads here.

When modern computers come with multiple CPUs or single CPU with multiple COREs, the practice is the same. Each core is intended to run a separate task making it possible to run multiple tasks (threads) in parallel.

Multiple threads executed on multiple CPU, or CPU with multiple COREs

Concurrency is when two or more tasks can be processed at the same time (together, concurrently) in overlapping time periods using context switching. On the other hand, parallelism (application divides its tasks into subtasks which can be processed in parallel) happens when tasks literally run at the same time, (e.g. on a multicore processor).

So concurrency deals with how an application handles multiple tasks. It works via performing one task at a time or performing multiple tasks concurrently. Parallelism deals with how an application handles each individual task. Applications may simply process the tasks serially, or in parallel on multiple CPUs.

Now, back to the answer to the above question which consists of three subtasks.

1-way: (easiest): Execute each subtask synchronously, simply run one subtask after another (blocking operation). This will give the slowest performance alternative especially for heavy tasks.

2-way: Multithreading execution (deadlocks, race conditions can happen)

2a) Thread per subtask approach (parallel workers) running at least on three cores CPU. If threads share state then it may cause slow performance, otherwise running in parallel which gives the fastest performance.

2b) Multithreading execution on a single core CPU where preemptive switching happens by the OS and tasks run concurrently. Keep in mind that context-switching is an expensive operation.

3-way: Asynchronous (non-blocking by nature, single threaded, no synchronization issues happen like deadlocks, race conditions), from the surface looks like solution 2b but inside it works differently where cooperative scheduling occurs. This produces good performance but is dependent on the async-engine which handles this job.

Java Core Thread API

Java provides build-in support for multithreaded programming since Java 1.0, (Jan. 1996) via Object and Thread classes and Runnable interface. To implement a multi-threaded program using this Java Core Thread API requires a large amount of understanding of the basic concepts (thread execution, thread states, thread scheduler, thread priority, interruptions, monitor objects, thread inter communications etc.) as well as clear understanding of the application performance goals. Also be aware of how the JVM and CPU can reorder the instructions in the code to achieve better performance via a set of rules called ‘happens-before-guarantee’ related to ‘volatile’ variables and variables accessed from within synchronized blocks. Otherwise, thread synchronization issues like deadlocks, race conditions, starvation, and visibility issues are inevitable.

Let’s look at the below example on how to handle deadlock situations. Deadlock is a situation where a Thread-A holds a key needed by a Thread-B, and Thread-B holds a key needed by Thread-A. Here when we are debugging Java code, the JVM can help us to detect deadlocks. Otherwise the only alternative to resolve this issue would be to reboot the JVM.

There are three techniques to prevent deadlocks: 1) Lock Ordering, 2) Lock Timeout, and 3) Deadlock Detection. The above deadlock issue can be fixed using a lock ordering technique by replacing the lock1 and lock2 objects.

To deep-dive more about core Java multithreading and thread synchronization examples like: creating threads, race-conditions, thread join, and inter thread communications via wait-and-notify mechanism, play with all main methods under the ‘concurrency.java.core.api’ package here

Java Synchronization limitations

Java Synchronization limitation is derived mainly from its complex design. For example, the Core Java Thread API intrinsic locks cause a lot of performance overhead. You need to also handle race conditions, invisible writes, deadlocks, crashes, starvation and nested-monitor lockouts. Moreover, keep in mind the overhead of context-switching and resource consumption which are other important considerations in multithreading. Additionally , having in depth knowledge or awareness of the Java memory model is important as it is mapped to hardware memory. E.g. the Java Heap and stack memory concept is mapped to the same main-memory (RAM) in hardware and a volatile keyword directly interacts with main-memory.

Thread Safety Techniques and Java Concurrency API

There are some thread safety techniques that help to prevent or limit the different forms of thread synchronization issues. Moreover Java Concurrency API (java.util.concurrent.utilities) contains high-level concurrency features introduced in Java 5.0 and enhanced in version 7.0, 8.0, and 9.0 to help with it. These utility classes (thread safe and offer high performance) are designed to be used as building blocks for creating concurrent classes or applications.

  • Use Stateless and Immutable (immutable class, or clojure library) implementations
  • Prefer Concurrent Collections (ConcurrentHashMap, etc.) over Synchronized Collections (Collections.synchronizedMap(), etc.)
  • Avoid using Strings or reusable objects for locking purposes (reason is pool/cached)
  • Volatile Fields — ensures value is read from main memory, prevents visibility issues.
  • Use Java 5 Locks which is build upon optimistic lock concept to prevent starvation
  • Atomic variables — minimize synchronization and help avoid memory consistency errors.
  • Executors — define a high-level API for launching and managing threads.
  • Synchronizers — Semaphore, CountDownLatch, CyclicBarrier, ReentrantLock

Java Executor Service, Thread Pools

In Core Thread API — a thread which is created on demand dies when the task is completed therefore it is non reusable. In the Java Concurrency API thread creation and many others are abstracted and executed with the Executor Service. It can tune threads’ count, lifecycle, schedule task-execution, and keep incoming tasks (Runnable, Callable) in a queue.

  • java.util.concurrent.ExecutorService — is an async exec mechanism. Gives a task, then gets a Future
  • Future — represents result of an asynchronous computation when asynchronous task is created
  • Cached Thread Pool — keeps creating threads starting with 0 and max to 2³¹-1. Idea behind is, the task should not wait (SynchronousQueue) for execution. Only limitation is system resources may not be available. Removes idle threads after 1-min. Good for short-lived tasks.
  • Fixed Thread Pool — keeps adding more tasks to the queue (LinkedBlockingQueue) in case all threads (fixed number) are busy. Good to control resource consumption, stack size, and tasks with unpredictable execution times.
  • Scheduled Thread Pool — schedule commands to run after a given delay, or to execute periodically.
  • Single Thread Executor — used for the purpose of executing tasks sequentially.
  • ForkJoinPool — ForkJoin provides parallel mechanism for CPU intensive tasks, uses work stealing alg.
  • Customizable Thread Pools — to control over resource consumption use ThreadPoolExecutor or ScheduledThreadPoolExecutor which are extensible thread pool implementations with lots of parameters (see below snapshot) and hooks for fine-tuning

To deep-dive more about Java Concurrency API examples, run all main methods under the ‘concurrency.java.concurrent.api’ package here

Enhanced Java Concurrency (new era)

When you need to wait for another process to complete without blocking, it might be useful to go asynchronous. This approach helps to improve the usability and performance of applications. Since Java 5, the Future interface represents the result of an asynchronous computation but it lacks methods to combine the computation steps or handle possible errors. Moreover, its get() method is blocked.

Java keeps enriching its concurrency features with Parallel Streams (Java 8), and by providing support for composing Futures with help of CompletableFuture. Java 9 brings support for reactive programming or distributed asynchronous programming via the publish/subscribe protocol that forms the basis of Flow API — SubmissionPublisher, Flow [Publisher, Subscriber, Subscription, and Processor]. Also Java 11 http2-client embraces the concurrency and reactive programming ideas.

New CompletableFuture implements both Future & CompletionStage interfaces and promotes non-blocking asynchronous programming models, where CompletionStage lets you attach callbacks that will be executed on completion. CompletableFuture pipelines multiple async operations and merges them into a single async computation with help of more than 60 different methods for composing, combining, and executing asynchronous computation steps and handling errors.

The advent of multicore processors has helped to speed up application performance. Concepts behind the CompletableFuture and reactive programming are reaching parallelism using non-blocking operations, and ForkJoinPool (in fork/join framework) is designed for this purpose. It understands tasks depend on other tasks, so avoids blocking threads, and changing (context switch expensive) threads. It is an ideal candidate to write asynchronous systems.

To deep-dive more about Enhanced Java Concurrency — CompletableFuture with Java 11 http2 client examples, run all main methods under the ‘concurrency.completablefuture’ package here

Reactive, Event driven systems

Reactive Programming is creating systems that are responsive to events in other words it is programming with asynchronous data streams. A reactive application is based on asynchronous processing of one or more flows of events conveyed by reactive streams (responsive, resilient, elastic, and message-driven).

Monolithic applications with big data have slow response times, and offline maintenance support is a real issue. Reactive programming addresses these issues by allowing you to process and combine streams of data items coming from different systems and sources in an asynchronous way via micro-services.

Reactive libraries/frameworks: Akka (Lightbend), RxJava (Netflix), Vert.x (Red Hat), Reactor (Pivotal), Spring WebFlux Reactive REST API, Spring Cloud, Kafka, RabbitMQ

  1. To deep-dive more about Enhanced Java Concurrency — Reactive streams with Flow API examples, run all main methods under the ‘concurrency.reactive’ package here

2. To deep-dive more about SpringBoot — @Async with CompletableFuture examples, run all main methods under the ‘concurrency.completablefuture.springboot.asyncmethod’ package here

Node.js Concurrency — Event Loop

Node.js is a runtime environment (VM), based on Chrome’s V8 JavaScript engine. The biggest advantage of Node.js is its non-blocking nature. In a Node.js application, concurrency is handled by an Event Dispatcher called Event Loop (which is single threaded, and it has no race conditions or deadlock issues). Once the task is written via one of these asynchronous ways (Callbacks, Promises, or Async/await), the task will run asynchronously via Event Loop. EventLoop adds its own queues to be processed by the libuv threadpool, and then EventLoop picks events from the event queue and pushes their callbacks to the call stack [see below picture]. libuv (asynchronous library) maintains a pool of worker threads on which I/O tasks received from the V8.

The default size of the thread pool is 4, the maximum size is 128, and it can be changed at startup time of Node.js by setting the UV_THREADPOOL_SIZE environment variable. Node.js is good for IO intensive tasks but not for CPU intensive tasks. EventLoop is single threaded and it will be blocked by CPU intensive operations and it makes all other tasks have to wait. Read the ‘Concurrency Measures — Comparison1’ section for more detailed comparison on this concern.

To run the below example-code, first, install all the libraries in our nodejs-application via > npm ci

To deep-dive more about Node.js — Sync and Async examples, run all scripts under the ‘multithreading-java-vs-nodejs\nodejs-app\src’ here

> Sync: multithreading-java-vs-nodejs\nodejs-app\src>node validate-domain-async3.js

> Async: multithreading-java-vs-nodejs\nodejs-app\src>node validate-domain-sync.js

> Async Axios: multithreading-java-vs-nodejs\nodejs-app\src>node validate-domain-axios.js

> Async Got: multithreading-java-vs-nodejs\nodejs-app\src>node validate-domain-got.js

Multithreading in Node.js — Worker Threads

All time consuming tasks (excluding I/O) are considered a CPU intensive operation, and it blocks the main thread in Node.js. But, by passing heavy tasks to other threads, we can significantly increase the server’s throughput. Until Worker threads were introduced in 2018, Node.js was using (a workaround solution) either via ‘child_process’ module to spawn processes or via cluster or via third-party libraries like Napa.js etc.

New Worker(..) represents an independent JS execution thread. Module ‘worker_threads’ enables the use of threads that execute JS in parallel. Each worker thread owns an instance of V8 and EventLoop by V8 isolate. A V8 isolate is an independent instance of chrome V8 runtime which has its own JS heap and a microtask queue. It allows each Node.js worker to run its JS code completely isolated from other workers. The drawback of this approach is that the workers cannot directly access each other’s heaps. Unlike a child process or cluster, worker threads share memory. Worker threads can communicate via MessageChannel.

Message channel between the parent and the child workers

To deep-dive more Node.js — Worker threads examples, run related scripts under the ‘multithreading-java-vs-nodejs\nodejs-app\src’ here

> multithreading-java-vs-nodejs\nodejs-app\src> node worker-parent.js

Using worker threads in thread pools to compute CPU intensive operation above:

> multithreading-java-vs-nodejs\nodejs-app\src> node worker-parent.js

> multithreading-java-vs-nodejs\nodejs-app\src> node worker-parent-pool.js

Concurrency Measures — Comparison1

To compare the performance measures, I solved a simple task by Java and Node.js using synchronous and asynchronous methods via applying different solutions (java-solutions, and nodejs-solutions). Task definition is quite straightforward: Reading a given file which consists of 10 URLs (for lightweight tasks), 100 URLs (not heavy) and 500 URLs (heavyweight) and parsing it for validation.

The table below consists of elapsed-time results based on best, average and worst cases as a result of running the same solution multiple times (considering dependency on network, CPU, driver speed). E.g. Once the task is executed multiple times using Java async (java 11 httpclient async) way for 100 lines, results are: best: 7985 ms, average: 9111 ms, worst: 11413 ms

In the above results, it is quite clear that for small tasks Node.js has the best performance for both synchronous and asynchronous methods, it took: 250 ms and 819 ms respectively. For parsing and validating 100 URLs Java CompletableFuture (uses ForkAndJoinPool internally) has better performance than the Node.js asynchronous solution, 7985 ms < 10557 ms. When solving the task (heavy)for 500 URLs, Java CompletableFuture using different Executors running-in-parallel achieves the best performance: 12146 ms, whereas Nodejs took 11104ms just for processing 449 lines (51 lines less). When the number of URLs getting increased Java performance is getting increased too.

The last column is intended to show how the solutions will behave when a CPU intensive operation is added, and shows the result of computation on 100 URLs with adding heavySum operation during parsing of each URL.

With CPU intensive operations, Java provides the best performance, 79055ms, with CompletableFuture using different Executors running-in-parallel. As described above ‘Node.js Concurrency — Event Loop’ section, and as in the computation result, it confirms that Node.js has poor performance when computing CPU intensive tasks. For this kind of operation, special worker threads should be used by a well-designing solution.

Concurrency Measures — Comparison2

The table below mentions concurrency related factors for Java and Node.js. But, there can be also other performance factors, besides the concurrency, which can impact the performance as well including: Memory Management, Application Design, Data Structures, Algorithms, Concurrency, Network Communication, and Scalability, Hardware capability (CPU, RAM, and disk drive)

Concurrency Models

Now, let’s see which concurrency model (how threads in the system collaborate with each other) is used by Java or Node.js. There are different concurrency models mentioned below and they are quite similar to different architectures used in distributed systems. In a concurrent system threads communicate with each other, whereas in distributed systems different processes communicate with each other (on same or different servers).

  • Shared State vs. Separate State — In a Shared model — threads share some state (data) between them so race conditions or deadlock may happen. In Separate models — threads do not share state and communicate via exchanging immutable objects or sending copies to avoid some concurrency issues. E.g. Java Core Thread API can be used to build either of these models.
  • Parallel Workers — Incoming tasks (jobs) are assigned to workers (threads). The State is shared between threads, which may cause slow performance. Java package java.util.concurrent is designed on this model e.g. ExecutorService. In Node.js it is handled by worker-threads.
  • Shared Nothing (Asynchronous, Reactive or Event Driven) — Each worker has its own duty like in a factory. Workers are stateful and job ordering is possible with this model. Threads do not share state between them and are designed to use non-blocking IO. Java implements this model via Java 9 Flow API to support reactive programming. Asynchronous way of programming is supported by Java 8 CompletableFuture, and in Node.js EventLoop is responsible to handle that activity.
  • Functional Parallelism — Uses function calls as agents which are independently executed on the CPU. Java implements it using ForkAndJoin and Parallel Streams e.g. akka, rxjava libraries. Disadvantage is the learning curve and the need for more care once splitting tasks for parallelism.

Summary

The article tackled the topic of ‘multithreading in Java vs Node.js’ beginning with formulating a slow task. Then in following sections concurrency features and capabilities of both Java and Node.js are described. Some practical scenarios and examples are examined with employing threads to improve the performance. This helped us to compare and describe concurrency related factors as well as concurrency models more concisely. Finally, comparison matrix (generated after executing same task with different solutions) showing best|average|worst elapsed-time results gave us better understanding how to achieve better performance when solving slow operations, and learn which concurrency factors should be taken into consideration.

References

[1] https://docs.oracle.com/javase/tutorial/essential/concurrency/

[2] https://en.wikipedia.org/wiki/Computer_multitasking

[3] https://howtodoinjava.com/java/multi-threading/

[4] http://tutorials.jenkov.com/java-concurrency/concurrency-models.html

[5] https://www.baeldung.com/java-thread-safety

[6] https://jscomplete.com/learn/node-beyond-basics

[7] https://nodejs.org/fa/docs/guides/event-loop-timers-and-nexttick/

[8] https://medium.com/@manikmudholkar831995/worker-threads-multitasking-in-nodejs-6028cdf35e9d

Source-code repository: https://github.com/asatklichov/multithreading-java-vs-nodejs

--

--