# Zooming at the Mouse Coordinates with Affine Transformations

Mar 21 · 6 min read

Providing zoom functionality in a graphical application is as simple as applying a scale matrix. If, however, the scale operation should occur at a specific point — about the user’s mouse location or about the center of the scene, for example — then some linear algebra and a handful of affine transformations can get the job done. In this article I’ll go over an algorithm for zooming at a point with some graphics for visual aid. There’s also some some example JavaScript code that shows how to zoom using a 2D scene rendered in a canvas. The same algorithm works in 3D, but I’ve used a 2D scene in the example code for brevity and simplicity.

You can see an animated GIF of the code in action at the top of this article. To follow along, you should have the following in your tool belt.

1. A basic understanding of linear algebra, including vectors, matrix multiplication, and translation and scale matrices.
2. The ability to read JavaScript code. The zoom algorithm is generic — it will work with OpenGL, WebGL, DirectX, canvas, SVG, or whatever you use to render your scenes — but the demo code is written in JavaScript.

## The Algorithm

Given a point `P` (for example, the coordinates of the mouse), zooming about that point using affine transformations is a four-step process.

1. Apply any existing world-/scene-wide transformation(s). An existing world transformation might be a previous scale or pan operation, a skew, a flip, or any transformation operation that applies to every object in the world. Initially, the world transformation is generally the identity matrix.
2. Translate the world such that `P` is at the origin.
3. Scale the world.
4. Translate the world back such that `P` is at its initial location.

Below are some images that graphically show the process. Let’s say the world is 100 square units (pixels/fathoms/parsecs/whatever) and the user wants to zoom in at point `P={40,40}` by a factor of 1.25, thus making the world appear 25% larger. The images below use the browser convention where the origin `(0,0)` is at the top-left corner, but it doesn’t make any difference mathematically. The 100x100 square represents the world, and the circle at `(40,40)` represents point `P`.

Here’s the initial scene with no transformation. (More specifically, the world with an initial identity matrix as its transformation.)

The world is translated by `{-40,-40}` so that `P` is at the origin.

A scale is applied such that the world is 125x125 in screen coordinates. Since a scale operation is about the origin, the point `P` remains at the origin.

Lastly the world is translated back by `{40,40}`, thereby restoring point `P` to its origin location.

The world transformation matrix `T` is now the following product:

`T = translate(40, 40) * scale(1.25, 1.25) * translate(-40, -40)`

Keep in mind that matrix multiplication is not commutative and it applies from right to left. That is, the last matrix (`translate(-40, -40)` ) is applied first. If the user were to zoom in a second time at the same point, then the existing world transformation matrix `T` would need to be preserved.

`T = translate(40, 40) * scale(1.25, 1.25) * translate(-40, -40) * T`

## Example Code

Take a look at the following plunk which shows the algorithm in action.

The code is also available on GitHub in case you want to run it on your own machine. There’s one NPM dependency, glMatrix, which is used for vector and matrix operations. I’ll go over the code briefly, but let me know in the comments section if more detail or clarity is needed.

First, the world is made up of four rectangles. The `Rectangle` class simply encapsulates a color and a transformation matrix. There are two methods to size and position the rectangle, `setSize` and `setLocation`, respectively. Again, matrix operations are not commutative, so these operations have to be applied in order: scale then translate. The code responsible for drawing the rectangles is `RectangleRenderer`. The `render` method applies the rectangle’s transformation matrix, then renders it as a 1x1 square centered at the origin. (Rendering at the origin is common practice, especially in more complex scenes where models are made using a tool like Blender or Maya. Models are created in their own local, “model” coordinates, and then transformed to “world” coordinates.)

Rectangles are added to the `World`, which is just a collection of renderable objects (a collection of rectangles in this case). The world has its own transformation matrix, with a `zoom` method that implements the algorithm described above. The world is rendered using the `WorldRenderer` class. It applies to the rendering context the world’s transformation matrix — in this case the result of any zoom operations — then renders each object in the world.

Note that the `CanvasRenderingContext2D#transform` method multiplies the current transformation matrix with the matrix supplied as an argument. So, the world transformation is applied when the world is rendered, then each rectangle’s transformation matrix is post-multiplied by the world transformation (that is, each rectangle’s transformation is applied first). Depending on the rendering context, that multiplication operation may need to be applied manually.

The main entrypoint for the code is `main.js`. It instantiates the four rectangles, and colors, sizes, and positions each one. It also wires up a listener for the `wheel` event. When the user rolls the mouse wheel, the code calls the `World#zoom` method, supplying the mouse coordinates as a 2D vector.

## Benefits of using Affine Transformations for Zooming

The same zoom operation can of course be accomplished in other ways. In the simple scene presented here, basic algebra would be sufficient. But the algebra becomes complicated when additional transformative operations are applied to the scene. For example, if the user can zoom, then they may feasibly be able to pan as well, and that would add algebraic complexity as pan and zoom operations stack up.

Further, using algebra to apply a zoom may be inefficient. When a scene is made up of multiple objects, like in a video game or drawing application, then the equivalent algebra would need to be applied to each object in the world. That’s computationally intensive and slow. Using affine transformations simplifies that process because a parent transformation matrix — the world transformation in the example above — can easily be applied to each object in the world, just as it’s applied to each rectangle in the example. Keep in mind that the zoom involves four matrix multiplications, but the product only has to be computed once per zoom. In a more complex scenario like a first-person shooter, a player may be able to zoom about the crosshairs of their sniper rifle. For efficiency, that zoom operation should be done once, and then applied to each object in the world as a matrix operation directly on the GPU.

Another benefit is that using affine transformations abstracts well between 2D, 3D, and beyond. The example renders the world in a canvas, but the exact same algorithm could be applied in CSS, Three.js, Unity, or anything that works with transformation matrices.

This algorithm works when other transformation matrices are applied to the world due to the cumulative nature of matrix multiplication. In a more complex scene there are usually many transformation matrices involved. Typically there is at least a model, view, and projection matrix. Plus the scene may be rendered from multiple vantage points, e.g. from two cameras in a split-screen game, or from each light’s perspective to generate shadows.

And matrices are potentially faster because they can be used directly on hardware. GPUs are highly optimized for matrix multiplication.

## Need a Developer?

Get in touch! We’re a small software company in Folsom, California, and we would love to help you out.

Written by

## More From Medium

### Walrus Operators in Python

Mar 6 · 5 min read