ViewFinder: Framing Views In Augmented Reality

Kohzy
12 min readJan 10, 2017

--

This was originally published on my thesis blog: Small Big Cities.

I’m halfway through my second year at SVA Interaction Design, and that also means I’m halfway through my grad thesis process. I’ve been thinking a bunch about augmented reality (AR) in public space. Interaction with AR elements leads to confusing user behavior for an observing bystander; this confusion is amplified in public space where numerous bystanders from different backgrounds meet. Can we create public AR that minimizes such confusion? I believe so, and I spent a few weeks at the end of last semester creating Viewfinder (described later in this piece) as one attempt to do so.

The promise of augmented reality

It is exciting to think about what AR can do to draw attention to details in our surroundings. We’re already interacting with and learning about locations in interesting ways through technology: a whole body of map-based data visualizations exist that marry location with other data to draw insights about that dataset’s relationship with place. AR promises software the ability to insert itself into a higher fidelity world, to call attention to more granular details of our surroundings. We get to go beyond working with locations on a 2D map, and actually interact with points within 3D space. Instead of just sensing that the noise level at a street corner is high, our applications can now pinpoint which particular source is contributing to the noise. Instead of detecting that there is heavy foot traffic at a park, our apps can drill down to a specific feature in that park that people visit.

Unfortunately, existing applications that annotate points in public space haven’t been compelling enough to see broad adoption. An example is the use of AR in local discovery, as seen in the Monocle feature in Yelp (left). An AR browser lets you scan your surroundings to view the details and ratings of businesses around you. While initially exciting to fiddle with, it hasn’t become an essential step in discovering local businesses. Users at a particular location can quickly spot surrounding businesses through their signage, without the help of AR, and refer to the details separately on the Yelp app.

Are there more compelling uses of AR annotations in public space?

Framing views in space

What if we pulled back from interacting with specific ‘points’ in space, and instead thought about ‘views’? Views are a widely-used concept. Tourists and photographers seek out good views. Birdwatchers have their favorite spots to catch wildlife in action, and military personnel are trained to pick locations that let them best surveil an area. A particular location has high land value because of the views it provides, and locals enact rules to keep those views unblocked. Unless you can permanently build a window so that others can enjoy it, a view is a fleeting thing that appears only to those who know or have someone to point it out. Beyond physical windows and lookout points with signage, there are no existing ways to permanently frame and label a view in public space. Can we build that permanent window with AR?

How would we represent a view in AR? In 3D modeling and game engines such as Unity, the view is a key component that gets set up through ‘cameras’ that are oriented in digital space. The field of vision is represented by a structure that looks like a sideways pyramid (the technical term is viewing frustum) projecting out from the camera location.

In the physical world, however, our devices are the actual cameras, with all the field of vision information already built in. What we need to track a view in space is an AR frame that we can easily align with our viewfinders. Here’s my hypothesis: by saving AR “frames” in space, we make it possible for users to discover and share good views.

Frames in AR space

Viewfinder: Helping photographers and location scouts save good shots

Photographers and locations scouts are always on the lookout for a good shot, so why not build something for them to save good views? There are two approaches to scouting for views today: onsite and remote. The onsite approach involves wandering around with your eyes wide open and a camera around your neck, hoping to find something shot-worthy. Or you could scout remotely by searching on sites such as Flickr and Instagram. These photos are geotagged, so you could save down a map of points where good photos have been taken. You then go on-site when you are ready to shoot, and consult your saved photos to find the views you are looking for.

Flickr’s Photo Map feature

What if we could make both processes even easier by saving the frames of exactly where the photos were taken in space? Applying the ‘frame’ approach above, I designed a Viewfinder, a conceptual app that lets serious photographers and location scouts save, track and discover good views. Let’s walk through two ways that Viewfinder could be used.

The first approach (UX flow below) caters to the onsite shot discovery behavior. Let’s say you’re a photography hobbyist, and you’re wandering around looking for a good shot. You pop open the Nearby tab of Viewfinder, where you see nearby photos that were taken. A blue ring around your location indicates the radius within which photos can be activated as augmented reality frames within your viewfinder. Tapping the shaded region, or the frame button at the bottom right, activates the viewfinder where you can see a group of popular views around you. The blue frames are views that can be captured from where you are standing; the lighter frames are further away but can be approached.

Tapping on each frame brings up more details on the views that have been captured through that frame. Having found a view you like, you orient your camera to the frame, and hit the red button to take the picture.

The second approach caters to the remote scouting behavior. Here, you are a location scout that doing some initial scouting remotely. You’ve heard that a particular location has good shots; you pull open Google Street View at that location to confirm. Spotting some interesting views, you click the view-saving plugin button to the right, drag a rectangle to frame the shot, and save it to a collection on Viewfinder.

After some more remote scouting, you’ve gathered a collection of views within Manhattan that you’ve called “Art Deco”. You now decide to visit the locations to confirm their suitability for your project. On arrival at one of your destinations, you open the Art Deco collection within Viewfinder. Blue circles indicate that two views that you had previously saved are in the vicinity. You tap on one of the blue circles, which enlarges into a button that lets you switch on the AR viewfinder for that particular view.

Switching on the AR viewfinder, you see only the frame for that particular view, along with your notes on that frame. You align your camera with the frame to inspect the view: it is exactly what you wanted. You’re happy with the Art Deco collection you’ve compiled, and share it with your colleagues.

That last action, sharing, highlights one area where these AR frames are very useful. Saving these frames empowers collaborators to asynchronously share good views. The location scout doesn’t have to be onsite to point things out to the director. A local doesn’t have to be there to call attention to where exactly people should be looking. Just save a collection of these views, share it, and the recipient can walk through these views in their own time.

The wonderful properties of the frame

I want to dwell a little on the location-based frame. While straightforward in form, the frame is anything but simple. Here are six properties of a frame:

  • Geolocation Long-lat location. This is needed along with…
  • Altitude — … the altitude, to position the frame accurately in 3D space. Should the frame with its longitude, latitude, and altitude coordinates not fall within a sphere of a certain radius around the user, it should not be accessible by the user to avoid AR clutter.
  • Orientation/“pose” — In addition to the geolocation and the altitude, the frame requires information about its orientation to capture its true position. In Google Tango, they call this the pose. This is important since how you angle the device is going to result in a completely different view.
  • Size of frame — We move onto the characteristics of the view itself. The size is one such characteristic: is this a close up or a wide shot? The shape of the frame is also important, since you could take square photos, landscape photos, or even photos of any odd shape. For Viewfinder, I restrict it to rectangles to match how photos are traditionally framed.
  • Depth of camera (kept constant) — Since this is a mobile app, I assume that users are using cameras on their devices, which offer a constant depth of vision through a fixed aperture. The simple rectangular form of the frame works here: users merely have to align their viewfinders with the frame to ensure that they capture the view the frame is, uh, framing.
  • Resulting pictures — The frame is also associated with the pictures taken through it. This provides a reference for users to evaluate whether a frame is interesting without having to actually look through it, especially if the frame is some distance away. These pictures also capture the passage of time: different moments offer different views through the frame (e.g. nighttime versus winter).

These are properties of each frame. But what happens when you encounter many frames at once? Without adjustments, this would surely be an overwhelming experience. UX patterns are needed to reduce visual clutter, create hierarchy, and minimize confusion.

  1. Only showing the frames that matter
    First, Viewfinder only surfaces frames within a certain radius from the user. The frames beyond that radius are of no importance: the user would not be able to capture those photos from such distance. Second, even within the vicinity, only popular frames are revealed. The user is shown the number of photos taken from the particular frame, and can tap into it to view the photos.
  2. Vary opacity with distance
    For the frames in the vicinity, only the closest frames are the clearest, with more distant frames visible but faded. This indicates depth, and creates information hierarchy where the nearest frames are the most noticeable.
  3. Indicate the right side of the frame
    A drawback of the planar rectangle form of the frame is that it is not easy to tell which side the user should be on. To make that obvious, the user encounters an opaque frame if they are on the wrong side. The user can also be shown an alert banner.

Combined, the above UX patterns alleviate the problem of visual clutter when there are multiple frames. Users now see at most 3 popular frames nearby within their viewfinder, with a few that can be discovered in the distance. They know when they are on the right side of the frames.

Beyond photography and location scouting

A common piece of feedback I’ve gotten about Viewfinder is that there may be more compelling use cases for this concept of frame-saving. Do photo enthusiasts and location scouts really need to overlay AR frames on their surroundings to find good views? Photographers pride themselves on seeking out views and moments when they are on location: they train their eyes to do so. They get improvisational. They seek out moments, which are hard to track with location-based frames. I am not a photography enthusiast myself, and I still have to test this idea out on location scouts, and so I do acknowledge this feedback.

One application I could see frames being more useful in is in travel. Unlike photographers and location scouts, travelers aren’t trained to look around and spot key views. Instead, they rely on guidebooks and people to point things out on location. There is some friction to using travel guides: you get to a particular location, and have to look around to find “that colonial building” or “the tiny statue” described in the guide. What if you could instantly scan your surroundings for points of interest? AR frames could be useful for that. A more particular form of travel that is relevant is nature travel. Saving views as AR frames could be useful for indicating good spots for birdwatching, or good views to catch certain natural phenomenons.

Photocred: 6sqft

AR in public space

As mentioned at the beginning, Viewfinder is an exploration I’m doing as part of my thesis research on the discontents and opportunities of placing AR in public space. How does Viewfinder relate to this research?

When I started working on Viewfinder, I’d identified several imperfections of placing AR elements in public space. Many of these problems were made obvious by the widespread gameplay of Pokémon Go last summer. I’d found that interaction with public AR could lead to misinterpretation and misunderstanding, especially if user actions are similar to actions that mean something else, as shown by people attacking Pokémon Go players thinking the players were taking photos of them. AR elements could be placed in public locations that are undesirable either for the user or for surrounding people: threatening users’ personal safety, or encouraging trespassing, or even encouraging behavior deemed inappropriate at the particular location (e.g. gameplay in a place of worship.) AR experiences can also be a very individual, engrossing experience, not conducive to social interaction with others, nor to engagement with the surroundings.

Keiichi Matsuda’s HYPER REALITY presents a world where AR elements overwhelm our field of vision.

With the above in mind, I set out to design Viewfinder in a way that would also mitigate these imperfections. Specifically, I wanted to take the following principles into consideration:

  1. Encourages user interaction with surrounding environment

Viewfinder draws the attention of users to interesting views around them, so that even after they put down their device, they can still appreciate the view in front of them. In fact, it could be argued that it increases appreciation of the surrounding environment.

2. Doesn’t clash w/ existing lexicon of gestures

Users of Viewfinder are searching for views to capture through their devices, which is in line with normal photography behavior.

3. Situated in locations that do not endanger users’ personal safety or encourage trespassing into private property

Since the AR frames in Viewfinder are generated from previously taken photos, these frames would be situated in locations that are most likely acceptable for users to linger in. These frames would also be less likely to put users in harm’s way e.g. be situated in the middle of a highway.

4. Situated in locations where user behavior isn’t deemed inappropriate

For similar reasons to the above, it is less likely that photo-taking will be problematic behavior where the frames are located.

5. Encourages social behavior where multiple users can interact with same element

With multiple users able to approach the same AR frames, and contribute to the database of frames by taking photos, Viewfinder encourages interaction between initial photographer and follower, or between multiple photographers on site interacting with the same frames.

Future applications of frame-saving

What I find most fascinating about saving frames in AR space is what can be built on this concept. When we begin tracking where every camera is pointed, where every view is captured, we start being able to measure what views people like to capture, and from where. By saving the views that people capture at scale, we could eye track the world. Knowing what people like to see in their environments could be interesting to the people who construct our environments — architects, urban designers, landscape designers. It would allow public advertising to track eyeballs and measure the ad performance in the same way digital advertising currently does.

And what if human users aren’t actually the target audience of these AR frames, but machines? Another area where views will be useful would be steering autonomous cameras rigged on drones or robots. Imagine a director of photography who composes a bunch of views for a particular scene, strings them together, and uploads them to a camera drone. The camera drone can then chart an autonomous path through these views, consistently capturing the same shot over and over again. Autonomous camera drones such as the Lily and the Hexo+ are already becoming popular: these drones are designed to track a single target subject right now (See 3DR’s description of how its SHIFT computer vision helps drones auto-frame shots), but could quickly be applied to filming other views if these frames can be saved.

This was originally published on my thesis blog: Small Big Cities.

--

--

Kohzy

Cities present and future, AR, interaction design, the oxford comma, and puns. Currently Product @intersection_co