Java Concurrency in Practice: Reading Notes [1]
Before we talk about good practice, we should ask ourselves: why we need java concurrency? The problem I come across really needs to be solved by concurrency?
As a developer, I did some coding using concurrency and my lesson is that before put your head down the concurrency details, ask yourself the strengths and drawbacks of using threads first.
What thread concurrency brings us?
I would say, all engineering level coding can be solved serially but it’s just too slow in some cases. For example, if you’re implementing an Android app, you don’t want those compute-intensive methods to be called in the UI thread because that would affect the UI responsiveness. Similarly, if 100 millions files are required to process and those files are independent, an obvious solution is to process files in parallel. The above performance gain with thread concurrency is from the power of CPU multi-core processors and also the benefit of the shard data space among threads.
What are drawbacks of threads?
Just as the light has shadow, the strength of threads also has side effects. first, threads takes advantage of multiple cpus, but context switching across threads can be huge. I once did a project where for an independent task like this:
public void independentTask(Object obj1, Object obj2) {
boolean satisfyCondition = precondition(obj1, obj2); if (satisfyCondition){ return;
} else { longCompute(obj1, obj2); }}
The above task will be run one million times. My initial implementation is to have a thread pool and then the task will be submitted to the pool. However, the performance is even worse than serial execution where you put the task in a loop. I did some research and found out most imputs(obj1, obj2) can’t satisfy the precondition and the task would return early. This indicates that the threads in the pool need to do frequent context switch and the time to submit new task and switching exceeds the task running time and thus the performance is worse.
A takeaway from the above project is that not all independent tasks need to be run concurrently. Quick return tasks could be very fast in a for loop without multiple threads at all.
How to determine this task needs concurrency? Measure and priori knowledge. you can do simple instrumenting in the code such as probing active threads and tasks in the pool frequently to measure the rough performance, or you can use some open source /commercial tool to monitor the CPU/memory usage, etc. measure is the source of condense and the best way to benchmark the performance. The other one is priori knowledge. In the project mentioned above, if know most tasks couldn’t satisfy precondition and return early in advance, I wouldn’t bother optimizing it.
Secondly, the shared data space increases the overhead in development and QA. A classical example is as below:
class A {int val = 0;public void incrementVal() {val++;}}
If multiple threads share an instance of class A, then there’re high risks that the variable val would be mutated at the same time and this could causes weird bugs. Unit tests and manual QA are hard to find this kind of bugs because it’s happening randomly with accidental timing.
There’re a lot of ways in Java to manage the shared data access in concurrent scenario and I will share it in later blogs.