Making sense of Multitouch with RxJava

Mahendra Chouhan
MindOrks
Published in
8 min readAug 16, 2018

Most software today deals with data that’s available only over time: websites load remote resources and respond to complex user interactions, servers are distributed across multiple physical locations, and people have mobile devices that they expect to work at all times, whether on high-speed Wi-Fi or spotty cellular networks. Any serious application involves many moving asynchronous parts that need to be efficiently coordinated, and it is very hard to combine these moving pieces together using multithreading/callbacks.

Reactive programming offers alternate paradigm to handle these asynchronous pieces,

From Wikipedia:

“Reactive programming is a programming paradigm oriented around data flows and the propagation of change. This means that it should be possible to express static or dynamic data flows with ease in the programming languages used, and that the underlying execution model will automatically propagate changes through the data flow.”

RxJava is a Java implementation of ReactiveX (Reactive Extensions): a library for composing asynchronous and event-based programs by using observable sequences.

This post assumes you already know about Android and Rx and are looking for some not so obvious scenarios other than making async backend request to the server.

The Problem:

If you have made any game for mobile, then you already know how important it is to handle touch input properly. To give you an example, Let’s look at the UI of one of androids most popular game, PUBG

Notice how many things are happening asynchronously : you have dozens of players moving in the arena, who are shooting other player, jumping/ crouching to doge other players bullets, each player can see the status of its team members, he can reload/switch his weapon and throw grenade at other players any time.
And this is just scratching surface, there will be many more actions happening inside system which we are not aware of to allow millions of players to play the game in real time.

For simplicity, we will focus only on touch input for now. To be more specific, I want to allow my player to perform only these actions in the game

Jump on Swipe Up
Shoot on Double Tap
Move on Finger Movement
Crouch on Swipe Down

What is common in all these actions ?

They all require user input and are asynchronous, i.e they are not dependent on each other and more than one action can be executed at any point in time.

For example : I can jump while moving and keep shooting in the air.

As you can see its a complex problem to solve, especially because we have single screen/input source which is generating all these actions. That means we need to map a single input source to different user actions and since each action can be in multiple state, we need to keep track of the state of all these actions as well.

We need a way to segregate data from touch input to detect these actions in real time and concurrent manner.

Start Simple !

Let’s say you have one Rectangle on screen, Rect A, and you want to display it on users touch location.

A basic implementation will look like:

Rect A = new Rect();
public void onTouch(MotionEvent ev) {
A.offsetTo(ev.getX(),
ev.getY());
}

Now, Let’s say you have two Rectangles, A and B and you only want to move the object which is currently in contact with your touch. A simple approach to do that will require us to store a state outside our function to keep track of currently selected item. Something like

Rect A,B;
Rect currentSelection;

public void onTouch(MotionEvent ev) {

int x = (int) ev.getX();
int y = (int) ev.getY();
switch (ev.getAction()) {
case ACTION_DOWN:
if(A.contains(x,y)) currentSelection = A;
else if(B.contains(x,y)) currentSelection = B;
break;
case ACTION_MOVE:
if(currentSelection != null)
currentSelection.offsetTo(x,y);
break;
case ACTION_UP:
currentSelection = null;
break;
}
}

What if we want to move multiple objects independently ?
Extending the previous method, We will end up with

ArrayList<Rect> rectArrayList;
HashMap<Integer, Rect> selections;
//unfortunately this code will not work as expected
//as ACTION_MOVE event do not contains information
//about which specific pointer was moved, we need to
//store the last touched location of each pointer as well
//to figure out which pointer was moved
public void onTouch(MotionEvent ev) {

int x = (int) ev.getX();
int y = (int) ev.getY();
int index = ev.getActionIndex();
int id = ev.getPointerId(index);
switch (ev.getAction()) {
case ACTION_DOWN:
for (Rect rectangle : rectArrayList)
if (rectangle.contains(x, y))
selections.put(id, rectangle);
break;
case ACTION_MOVE:
selections.forEach((key, rect) -> {
if (index == key)
rect.offsetTo(x, y);
});
break;
case ACTION_UP:
if (selections.get(id) != null)
selections.remove(id);
break;
}
}

see this if you want to know about multi touch handling in android .

Notice that now we are storing a list of states, where each item is maintaining status of individual finger and by storing these states outside the function, we have introduced a side effect, something that we should try to minimize.
Let’s say I want to reset the object’s position to its initial state when the gesture ends, then again I will be storing one more additional state outside to maintain each pointers initial touch location.
You can see how quickly our class gets clouded with variables which should have scope limited to particular functions, but are getting exposed to entities which don’t have any relevance to that state. In other words, we should minimize global states in order to keep our concerns separated.

You might think what is the problem in that, since I am not using the state in my scope so I don’t need to worry about it, right?
No, Even if you don’t use it, it is very likely that some function that you are using depends on that state. As soon as your program gets bigger and more complex and when multiple people starts working on it, it becomes harder to keep track of your objects state and any new state that you introduce in your code puts an additional strain on developer since now he needs to keep track of ‘one more thing’ while coding which might not be related to the current use-case that he is trying to solve.

Coming back to previous example, here is another way to do it :

//checkout the git repo to see how this stream is created from touch //input
pointers$ = touch$.groupBy(event -> event.pointer);

touchEvent$ = pointers$
.flatMap(pointer$ -> {
down$ = pointer$
.filter(event -> event.type == DOWN);
up$ = pointer$
.filter(event -> event.type == UP);
return pointer$
.distinctUntilChanged()
.window(down$, (item) -> up$);
});
touchEvent$.subscribe(this::processGestures);private void processGestures(Observable<TouchInput> touchInput$) {

touchInput$
.subscribe(new Observer<TouchInput>() {
Point initial = new Point();
Rect rectangle;

@Override
public void onSubscribe(Disposable d) {
Log.i(TAG, "onStart[Move]");
}

@Override
public void onNext(TouchInput event) {

int x = (int) event.x;
int y = (int) event.y;

switch (event.type) {
case DOWN:
for (Rect rect : objList) {
if (rect.contains(x, y)) {
rectangle = rect;
initial.set(
rectangle.left,
rectangle.top);
}
}
break;
case MOVE:
if (rectangle != null)
rectangle.offsetTo(x, y);
break;
}
}

@Override
public void onComplete() {
if (rectangle != null)
rectangle.offsetTo(initial.x, initial.y);
}
});
}

Code might look daunting to someone not used to Rx, so we will go step by step :

pointers$ = touch$.groupBy(event -> event.pointer)
I am applying groupBy operator to break the stream into multiple sub-streams based on the pointer id, which means each sub-stream will emit events belonging to one finger only.

.flatMap(pointer$ -> {             
down$ = pointer$
.filter(event -> event.type == DOWN);
up$ = pointer$
.filter(event -> event.type == UP);
return pointer$
.distinctUntilChanged()
.window(down$, (item) -> up$);
})

What we want is to deal with each user action like click, double tap, swipe separately. Notice that each user action starts with a DOWN event and ends at UP event, thats why I am splitting each sub-stream further into multiple sequential streams which starts with DOWN event and ends at UP event using window operator. distinctUntilChanged() will emit items only if the location of pointer changes, this is to avoid any repetitive events from firing.

Then finally we subscribe to the stream and process each action individually.

Few key points to note :

  • Each new user action will produce a separate stream/observable.
  • There is no global state for keeping track of selections, each state is encapsulated and exposed to relevant functions only.
  • Order of events is also guaranteed, we will always receive events in SUBSCRIBE → DOWN → MOVE → UP → COMPLETE order.

Notice that we ended up writing lot more code code to achieve similar results.Wasn’t the whole idea was to reduce complexity ?
Let’s increase the complexity even more, and see what happens. Remember our problem statement ?

Jump on Swipe Up
Shoot on Double Tap
Move on Finger Movement
Crouch on Swipe Down

Let’s look at double tap logic:

pointers$
.flatMap(pointer$ ->
pointer$.filter(event -> event.type == DOWN)
.buffer(800, TimeUnit.MILLISECONDS)
.filter(items -> items.size() == 2))
.subscribe(list -> {
TouchInput touchInput = list.get(1);
Log.i(TAG,"Double tap at" + touchInput.toString());
});

For detecting double tap, we need to check if two consecutive DOWN events occurred in a short span of time. Filter serves the purpose of identifying DOWN events and buffer bundles events that occurred within a defined time window into single list and emits that list after window expires. filter(items -> items.size() == 2) explicitly looks for double taps and rejects any other event.

Swipe detection :

touchEvent$
.flatMapMaybe(touchInput$ ->
Maybe.zip(
touchInput$.lastElement(),
touchInput$.firstElement(),
(last, first) -> {
float ydiff = last.y - first.y;
float delta = last.time - first.time;
float velocity = ydiff / delta;
//reject if gesture took more than 1
//second to finish
if (delta > 1000) return 0f;
return velocity;
})
)
.subscribe(velocity -> {
if (velocity > 1.5f)
Log.i(TAG,"SWIPE_DOWN");
else if (velocity < -1.5f)
Log.i(TAG,"SWIPE_UP");
});

For simplicity, We define a gesture as UP/DOWN based on following criteria

  • Gesture should finish within 1 second
  • Pointer velocity should be more than specified threshold

firstElement, lastElement operators emits first and last items from the stream respectively. In our case, that means DOWN and UP events. Velocity is calculated as distance/time and action is fired if velocity exceeds cutoff values.

That’s it!
By treating our input events as a stream, we had to write more code initially to setup the pipeline, but once we were able to segregate our input into separate sub-streams, implementing any new functionality becomes easy to reason about and we ended up minimizing our global states as well!

In conclusion,

If you think about your input as a data source pushing values into your system over time, as explained by Erik Meijer in one of his papers. Then Rx becomes a provider of tools to perform query on this data source.
In other words, the operators we used in above example to transform our stream can be thought of as SQL commands we use to build a query. And subscribe can be thought of as a way of executing the query with only difference being that the results of the query will arrive over time.

you can checkout the code here.

--

--

Mahendra Chouhan
MindOrks

A Software Design Enthusiast working as Senior Software Engineer @Atlassian. Past : Google, Moonfrog Labs, Ola Cabs, Appdynamics