Making Sense of Skyboxes in VR Design
How the physics of our real world translate to VR design practices.
Imagine standing on a mountain, looking up and around with a clear view of the sky and clouds surrounding you. This is our real-world skybox. It’s what we perceive as far as the eye can see. It’s all around us, above and to the horizon in every direction. It’s our backdrop.
As a design element, the skybox is really just a sphere of imagery that can be made from a photo, texture, or rendered artwork. When it’s placed in a scene, it extends out to infinity without a perceivable edge, giving an illusion of depth and reality. The skybox is an essential component in virtual reality design — it’s like a panoramic wrapper that projects an entire background scene onto your interface.
As I started designing my first VR experience, I wasn’t planning on creating a traditional game with a character solving puzzles or running through a forest shooting zombies. Instead, I wanted to let users explore the interiors of cars using a non-room scale system. I could put someone in the center of a scene, wrap them up in a cozy skybox blanket, project the car image around them, and let them look around. I quickly realized that skyboxes could be used for more than just blue skies and fluffy clouds.
As a result, I’ve been thinking more about skyboxes — what they are, the types of materials and methods used to make them, and how they can be used in VR as well as less typical applications — along with how to create depth and realism in a 360˙ scene as a whole. So I thought I’d share some of my thoughts, research, and resources with you to see what you think.
So, where else have we seen skyboxes?
You’ve definitely seen the skyboxes in other places — like at the movies. Imagine a story where the characters are wrapped in a giant bubble separating them from the real world. They don’t know anything beyond it. We know their surroundings are fake, but those poor fools don’t have a clue. Hmm.. where have we seen this before?
I really hope you didn’t guess BioDome.
It’s Hunger Games, of course. (I’ll give you points for guessing The Truman Show, too. But they took most of the movie to figure out they were stuck in a bubble.)
Anything the characters see above them and at a distance is actually their skybox. It’s feels like their reality. Anything they walk on is their terrain or plane. And any of the items that they encounter are like the equivalent of 3D assets.
In the end, they manage to work their way to the edge to burst the bubble and climb out, which totally breaks the idea of the skybox, but let’s not get into that small Hollywood technicality…
Let’s put aside fiction and get technical for a moment. While writing this article, I spent a lot of time thinking about what skyboxes “really are.” As I dug around looking for info, I realized that skyboxes aren’t unique to game design environments. In astronomy, there is the celestial sphere — an imaginary sphere of infinite radius, centered at the observer.
This looks really similar to a skybox, doesn’t it? The celestial sphere actually correlates pretty nicely to designing in VR if you can imagine placing a camera at the center, then the infinite sphere surrounding it would be your game skybox. You’d project an image onto the inside of the sphere to create the skybox scenery.
Within this scope, distances and line of sight also become important considerations. How far can the player in your VR world see? What should they see in the distance where the land appears to meet the sky? How can the player’s position affect their range of view? This all has to do with the horizon.
Calculating the apparent horizon
In the diagram below, imagine a sphere S, which could be the surface of Earth. The observer is positioned at O with their eye height, h. The Earth’s radius is R and center C.
Let’s say the observer is standing on a beach, looking out at a sunset. At a certain point, G, the sky appears to meet land. Calculating the length of line OG gives us the apparent horizon distance. It’s a rough idea of how far someone can see and it’s related to their position above the Earth.
In the diagram above, notice the relationship between the observer O, the horizon point G, and the sphere center C — together they form a right triangle at point G. Also, the line OG is tangent to the surface of the Earth. So, we can calculate the length of the apparent horizon OG by simply revisiting our old friend from high school, the Pythagorean Theorem:
if: c^2 = a^2 + b^2
then: (R + h)^2 = R^2 + OG^2
expand: R^2 + 2Rh + h^2 = R^2 + OG^2
solve for OG: OG = sqrt ( 2 Rh + h^2 )
Now let’s plug in some values. If the radius of the Earth R is 6378 km and if the average observer’s eye height at sea level h is 1.7 meters, then the apparent horizon OG works out to be about 4.6km (2.9mi).
That’s a pretty good estimate assuming atmospheric homogeneity. In the real world, air tends to be denser at the surface causing a refraction effect, or ray bending of the line OG, that lets you see further than normal. Extreme refraction can cause looming and sinking, which makes objects appear higher or lower than normal. There are modified calculations you can use to find these values, but let’s not get too bogged down here — check it out if you’re feeling technically inclined or want to dust off your scientific calculator.
Some general ideas to keep in mind
Obstructions like trees, buildings, mountains, etc. can interfere with the sight-line view of the apparent horizon. Also, the angle of lighting can cause a silhouette effect of objects on the horizon, letting you see objects you normally wouldn’t. An island that’s unseen at mid day could appear visible when backlit at sunset on the horizon under certain atmospheric conditions. You see this in coastal Southern California when Catalina Island seems to pop up under the right conditions, but during the day it’s not visible.
Properly estimating the height and position of a player is critical when designing an experience, especially in a system like Oculus Rift or PlayStation VR, where the sensor is not located on the HMD itself. The folks over at Owlchemy Labs, creators of Job Simulator, have a neat way of doing this by recording a user’s wing span and translating that to user’s positional height. Smart right? So then the camera will always be placed at the proper height and the player’s view will feel natural.
Also, a player’s position above sea level affects their perceived distance, roughly by their height squared, h². It makes sense now why someone perched on a mountain top can see further than at sea level. You could calculate this distance and use it to position assets in your game scene. At a high elevation, total height would be large (total h = observer’s height + the mountain height) and will affect OG so use the formula:
OG = sqrt ( 2 Rh + h^2 )
Since height is very small at sea level, then h² from the formula above becomes negligible and can be simplified to this:
OG ≈ sqrt ( 2 R h )
So how do we apply this to VR design and skyboxes? Piles of mathematics are probably overkill, but using the player’s positional height to estimate horizon distance could be useful for designing a realistic interface. In general, just keeping some of these principles in mind with a proper frame of reference might be enough when designing a skybox and placing 3D objects in your scene.
Scene elements and depth of layers
So now that we have a good idea about skyboxes and the relationship between player and horizon, let’s think about creating depth and priority in a scene. Imagine that your player’s 360˙ world is divided up into a series of layers at varying distances that contain objects or imagery. You could think of these as distances that correlate to visual 3D comfort zones; foreground (1–10 m), midground (10–20 m), and background/skybox (20 m+). [3, 4] Objects are perceived as having strong sense of 3D in the foreground, moderate 3D in the midground, and then become flat as the distance increases to infinity. 
In the foreground you might have interactable 3D objects like buttons the player can touch to deploy a menu, or logs of wood they can pick up, or they run into wild wolves that try to attack them. In the midground, you might have distant messages or scenery that confer scale and depth. And lastly, the skybox resides in the background, expanding out to infinity and wrapping the scene. At SDC 2014, Alex Chu talked about this as primacy of content, where the interactable items that should garner the most attention are prioritized in the scene at a comfortable distance from the user’s eye.  We can see that these layers have a purpose in creating a sense of reality and depth.
Perspective and mobility
In a room scale experience (HTC Vive), the player is highly mobile. In a non-room scale VR experience (Oculus Rift pre-Touch controller era, Google Cardboard, Samsung Gear VR, etc) you, as the player, remain stationary — seated in a chair or standing still. But the character you’re controlling with a joystick could be either mobile or stationary. Both cases serve specific purposes that are just as valuable. Let’s look at some examples.
Mobile character experiences
In a VR universe with a mobile character, they are able to traverse laterally along a terrain or fly in an x/y-motion and jump or scale objects in a z-motion. Depending on the design of the experience, the player could reach the finite edge of a terrain (like a floating island), but would never encounter the edge of the skybox. Since the skybox is implemented as an environment lighting effect, rather than an actual asset like the other parts of your scene, its placement feels infinite.
In “Land’s End”, you are guided through a world of puzzles as a 1st person player. The camera is mounted on the main character, so you experience their perspective. You appear to navigate as if you are really there so it can feel very immersive.
Other experiences, like “Adventure Time”, are designed from a 3rd person viewpoint. The camera is fixed slightly away from the action while closely following the player, providing side-down perspective. The player controls the characters in the scene, rather than himself.
Stationary character experiences
In contrast, some experiences only allow the user to pivot in 360˙ and their character never actually moves through the world in an x, y, z motion. Let’s think of these as stationary experiences.
You’ll often see something like this in a home menu where the user simply turns their head to view and select items in the menu. A similar fixed interface can let the user watch a 360˙ video or look at a 360˙ photo. In both cases, the periphery is really just a skybox that wraps around the scene. There’s no actual player mobility. They’re just looking around, sometimes with interactive elements to click.
In some cases, there are no interactive elements. For instance, the animated short “Invasion!” is purely a cinematic storytelling experience. There is no true physical interaction on the part of the user, yet they are actively watching the story unfold. This sub-genre of VR presents its own challenges related to movie making in 360˙ (sound, direction, empathy, cues, cuts, etc.) which could be an entire series of articles in itself, not to mention the the spirited disputes as to whether it’s technically VR, AR, or AV.
In all of these examples skyboxes are a critical foundation. Whether they’re made from 360˙photos or rendered artwork as starting materials, they share some common methods of implementation. First off though, we need to capture the images, and the best methods for doing this can vary depending on the medium. Let’s take a look at some of these cases.
Some of the most common sources for a skybox images are a cylindrical panorama, spherical panorama, or a 6-sided cube. Let’s review what they are and how to capture them.
Capturing a 360˙ cylindrical panorama is simple to do using a mobile app like Google Cardboard Camera or the Photo Sphere app for Android. It’s similar to taking a normal panorama photo that we’ve all done before, except that the image encompasses a complete rotation. You hold the camera in front of you, slowly pivot around in a complete 360˙turn, and the camera dynamically stitches the frames together to output a single continuous cylindrical panorama image along a horizontal axis.
It’s a pretty easy method, but one problem here is continuity. As you can imagine, objects and people might move in the scene during the time it takes you to sweep around in a full circle, so you’ll have some unavoidable misalignments and anomalies like in the example above.
Also, the resulting image is not spherical in all directions — it’s a horizontal cylinder, so you’ll be missing content at the top and bottom (think of it like sitting inside a tube with open ends.) You can force the gaps to stitch together when making the skybox in Unity, but it will default to a distorted star-like shape and it won’t look perfect. You could also cover it up with an opaque circle which is functional, but not too elegant. Cylindrical panos are workable, but definitely aren’t ideal if you need to create a scene with detailed clarity in every direction.
In comparison, a spherical panorama captures a full 360 x 180 sphere equirectangular panorama (360˙ horizontally x 180˙ vertically). This is your best bet if your VR scene will depend on seeing exact details in every direction. However, you’ll need a specialized camera to do this.
It feels like there are suddenly tons of devices on the market using different types and numbers of lenses, rigged to capture 360˙ images. The higher end Jaunt, GoPro 360, and Surround 360 by Facebook, among others are intense contraptions for commercial grade work, with 20+ lenses that can capture still images and/or video. Multiple lenses can mean multiple output frames and a lot of work stitching them together in post-production. They can also be incredibly pricey.
There are also consumer-friendly options. The Samsung Gear 360 and Ricoh Theta both use two fisheye cameras mounted back-to-back that simultaneously capture a scene and output a single, full sphere pano image. These devices offer the smooth continuity of a simultaneous single capture and can record both video and still images. Pretty cool.
A spherical mosaic is another way to capture a full sphere pano. You can use a simple mobile phone feature like Surround Shot on the Samsung Galaxy or a similar feature for other Android phones. It’ll direct you to pivot around a single point, capturing a series of about 60 photos at various angles in the shape of a sphere, and at the end the app stitches it into an equirectangular pano. This is a really neat method — no special device needed other than your phone and your output is full 360 x 180 panorama.
However, the disadvantage of this, like the cylindrical pano that we talked about earlier, is the potential for artifacts that occur as objects move in the scene during the time it takes to take all these photos. It’s also challenging to pivot perfectly in six full circles without your feet drifting, so your center point could move and the images may not stitch exactly.
There are also ways to recreate a 360 x 180 panorama from a series of standard, flat DSLR photos that are strategically taken and stitched manually or with photo editing software in post-production. Learning to take and manipulate each of the photos by hand requires a steep learning curve. I’ll just leave that to the pros.
3D Cube Panoramas
With the 3D cube method, six cameras are placed in the center of a scene, each at a 90˙ rotation to one another, and photos are taken simultaneously from each camera so that all six sides of a 3D cube are captured.
You could do this out in the real world if you’re hiking in a forest and want to capture everything around you in 360. But it would require some special rigging and multiple cameras — it’s probably not the easiest method. Instead I’d go with one of the friendly handheld devices to capture a spherical pano that we discussed previously.
However, if you want to create a VR scene with rendered artwork, the 3D cube idea is your best option. You’ll mount six virtual cameras in the middle of your artwork to capture each face of a cube (shown above). Note that this doesn’t work well with just any 2D art or photograph — it’ll need to be 3D art made with a modeling software like Cinema 4D, Blender, etc. We’ll talk a little more about how to do this in my next article.
Whether you’re working with photos or 3D art, the cube face images can be imported into Unity, stitched into a sphere, and then projected onto a skybox. The quality of output is dependent upon the quality of your cube images — sometimes you’ll need to do some retouching in photoshop so the images will stitch smoothly.
Broadening our concept of skyboxes
So, a skybox can really be anything that creates the outermost limits of a scene that you are designing. It could be a solid color that sits behind the menu in a static scene. It could also be a real photograph or a 360˙ video that’s playing. It could be the outer limits of a rendered 3D universe — you could be swimming underwater, floating in outer space, or walking around an animated experience. There are a lot of possibilities here.
Now that we’ve talked about some basic skybox principles, let’s dive into the details about how to create them! In my next article, I’ve created a few tutorials, taking a look at different cases and methodologies to make creating skyboxes more cohesive.
If you want to try your hand at working with skyboxes in Unity, check out my next article, How to Design VR Skyboxes.
 The Celestial Sphere, Astronomy 201, Cornell University.
 Distance to the Horizon, Andrew T. Young San Diego State University, Astronomy Department.
 VR Interface Design Pre-Visualisation Methods, Mike Alger
 VR Design: Transitioning from a 2D to 3D Design Paradigm, Alex Chu, Samsung Developer Conference, 2014.
 Storytelling: From Rectangular Screen to VR, Eric Darnell, Samsung Developer Conference, 2016.
 Hands-on with Android 4.2’s Photo Sphere