Exploring YouTube Drawing Tutorials with ARCore
Lately, there’s been a lot of cool examples of developers using augmented reality to help people tap into their creative side. Whether that’s using AR to make art on a sketchpad, AR-enabled coloring books, or using AR to make art in the 3D space around you, something really delightful happens when you merge sketchpad and screen.
It got me thinking about how AR could help me with one of my new artistic hobbies: learning hand lettering. Hand lettering is a nice way to relax, but it’s also surprisingly tricky. Making lovely transitions between words and letters involves a lot of subtle tweaks: holding the pen correctly, applying pressure effectively, and making the correct arc shapes on the letterforms. Everything between your shoulder and the paper matters.
Without an IRL hand lettering teacher, my options were learning from worksheets or videos — but neither of them were exactly what I wanted. Printed worksheets help with the broad strokes (no pun intended) — but for me, their step-by-step nature, divorced from the artist’s hand, doesn’t provide all that extra, subtle information that I find so useful. (Further, not all of my favorite hand letterers even have the bandwidth to make worksheets.) And while video tutorials have all that juicy detail, they have their own problem: namely, that I have to keep looking back and forth between what I’m doing and what’s on the screen. As you might expect, that doesn’t usually end well.
After I lamented about several failed holiday cards at work, a few of us started talking about this back-and-forth glancing between the video and the notepad. We wondered: what if instead of having to choose between worksheets or videos, we could use AR and see those videos on the page in front of us? What would it be like to take some of these already-made YouTube tutorials around doodling, hand-lettering, and calligraphy, and draw along with them in AR? Could we crack open some cool learning experiences, without any extra effort on the video creator’s part? And what could we learn about AR video UX in the process? We decided to test it out and share the results.
To get insight from an expert, we invited the prolific doodler AmandaRachLee to come and help us find out. We were already fans of her, and we’re not alone: she has 1.3 million subscribers to her channel, which is all about breaking down the sometimes-tricky processes of calligraphy, drawing, and bullet-journaling into super-accessible pieces. Her friendly and helpful teaching style has cultivated a massive community of folks actively reproducing, practicing with, and remixing designs from her videos to make their own unique creations.
Read on to see how we made our prototype, what it felt like to use it, and some of the best practices and UI/UX affordances we found helpful along the way, which you might want to take for your own projects.
Making the prototype
There were a few technical considerations early on, before we even started thinking about UI/UX of YouTube in AR: placing the video on paper; retaining very high fidelity tracking; and getting YouTube videos in Unity.
Placing the video on paper
In order to set up our AR world, we set up a new Unity ARCore project. From there, we were able to jump off from the HelloAR example very easily. In the HelloAR example, there’s code to handle finding and visualizing planes, as well as handle raycasting to anchor virtual objects on those surfaces. All we had to do was replace the default prefab with our AR video “tracing paper” (we’ll get more in depth on that asset below).
However, while placing the video is easy, keeping it in place — especially for a use case as precise as drawing — brings in some complexity.
Retaining very high fidelity tracking
Drawing is a tricky use case for augmented reality, because typically, we want to draw on smooth, blank surfaces — but a smooth, blank surface is frequently kryptonite for AR.
An AR app typically gets its understanding of the world by detecting what are called feature points. A feature point is any visually distinct feature in the captured camera image that can help the AR app orient itself. Areas that have unique textures, contours, and/or color changes make good feature points. It’s a bit similar to trying to orient yourself in an unfamiliar place in pitch blackness versus on a sunny day: it’s much harder to keep track of where you are if you have no distinct points to orient yourself with.
The thing about a sketchpad is that it typically has no feature points by design: it’s meant to be blank, flat, and uniform. So what can the AR app grab on to in order to orient itself precisely? And heavy emphasis on “precisely,” because the level of AR accuracy that you need to trace something is super high. While you might not notice if an AR chair in your living room shifts an inch or two, that amount of movement while trying to trace a drawing is massive.
This problem of high-fidelity tracking is one that AR developers have tackled in many different ways. Here are just a few of the many techniques that can help compensate:
- Locking the image to a point on-screen. Some AR art developers have bypassed all tracking issues by simply locking whatever image you want to trace to a particular screen location. It’s almost like a camera lucida (which is what one popular AR drawing app is named after). This technique typically requires a perfectly still camera, though (e.g. a tripod), and we knew we wanted our UX to support folks holding their phone in their hands and rotating, zooming, and dramatically changing viewing angles on the fly.
- AR markers made before drawing. Another tactic that AR art developers have used is unique markers, which help lock virtual objects to physical space by associating them with markers. These markers can be printed images or hand-drawn. AR markers are typically very strong anchors, and you can use multiple markers to make the tracking super-robust: if one or two markers are not in view, those apps still track. However, we chose not to use markers because they typically require an extra step from the user to start, and they may also require that the drawing surface be otherwise blank or an exact shape/size.
- Progressive AR markers and other computer vision/machine learning techniques. There’s been some super-interesting work around creating computer vision and machine learning algorithms that use the user’s own drawing as a set of progressive markers, which can rectify the tutorial image against the drawn image and keep everything in place. The folks behind SketchAR have been diving deep into these algorithms and have some neat write-ups on their approaches here and here. I think that approach is super-cool — but because we were just making a fast experiment to spark discussion with video creators, and share some UI/UX thoughts around YouTube drawing tutorials in AR, we decided to leave the hardcore CV to the product folks, and make our experiment with already-open-source resources instead.
- Stabilizing the environment. The last approach is to simply make sure the drawing environment is well-suited for what the tech can do well. In this case, that means simply putting the sketchpad on a surface that has texture and color differences, and keep that surface at least a little in frame whenever possible.
At this point, we had to decide: was this project about solving the computer vision “white paper problem,” for a product-oriented approach? Or was it more of a hack to explore and spark discussion about the future of YouTube education in AR? We were most excited by the latter: getting a bunch of diverse folks in a room to poke at this prototype, dream up new use cases, test our various AR video UX affordances, and share the resulting insights with other developers. Since simply stabilizing the environment was the speediest tactic, and already worked fine for the videos of Amanda’s that we were using, we decided to leave the computer vision problems to the engineers and just go with option D.
Getting videos into AR
To speed up development time further, we grabbed a YouTube player component from the Unity Asset Store. (Depending on your use case, you could grab the same asset, write your own custom asset, use a different video service, or just load in your own static videos with the Unity VideoPlayer.) This asset let us search YouTube and stream videos in our app.
From that point, we could build right on top of the sample HelloAR project, replacing the default Android prefab with our custom video player. That player prefab could then be placed right on top of a horizontal or vertical surface, normals aligned and ready to trace.
We had two main areas of interest when dreaming up our prototype:
- What new, AR-first video experiences could video creators like Amanda dream up?
- What features might video creators need to turn their already-existing videos into AR-friendly experiences, without requiring any video editing?
The second challenge was especially interesting to us. Amanda alone has been making videos for 5 years — that’s a lot of uploads! How could we make sure the process of making her prior videos AR-accessible required little-to-no work on her part? We focused on 6 main areas to help make that pipeline as easy as possible for creators and learners alike, which you may find handy for your own AR video projects.
Adjusting video position
Position adjustment has a pretty obvious use case — you might want to change where you’re drawing without losing your place, especially if you have a tracking hiccup and need to realign the position of the video. But we also saw it as a way to play with AR drawing as embellishment.
We saw two primary ways to use AR tracing paper: as a way to make one standalone drawing, and as a way to embellish an already-existing piece of art. Consider a video that shows you how to draw many things — like Amanda’s 50 Bullet Journal Doodle Ideas. In the first case, you might want to trace a drawing of a bike on a blank surface to make a standalone piece of art. In the second case, you might have a piece of art, or something already decorated (e.g. a bullet journal spread) that you want to add embellishments to. There, the AR can act as a kind of scalable and movable stencil. You might place the AR video down and trace a succulent here, then pause and move or scale the video to trace a quote there. It allowed for a looseness we really liked.
Most of Amanda’s videos are shot from directly overhead, but for videos that weren’t, we also considered adding skew/distort/perspective functionality. That could help compensate for videos that might be shot at a slight angle (or hey, to open up new avenues for retro word art!).
Compensating for the video’s original speed
Different YouTube artists film in different ways: some stick to real-time tutorials, while others like to show a timelapse of their process. We knew we would want to give the user control over the video’s speed, so viewers could blow past things they already understood and slow down things they wanted to focus on.
We also considered adding hold-to-pause functionality: i.e., if the user held a finger down on any part of the screen, the video would pause until they released. In our case, the pause button at the bottom of the screen was easy enough to hit that we didn’t feel we needed it, but it would be a nice addition for folks who hold their phones differently and might not have the precision to hit that button at the exact moment they need to.
Avoiding “gorilla arm”
Whenever you work in VR, AR, or gestural interfaces, you have to think about the so-called Minority Report problem: if the user needs to hold their arms up for a long time, their arms will probably get tired! Given that most of the drawing videos we looked at were at least 5–10 minutes long, we knew we could not simply assume that all users would be comfortable holding their arms out that long. We had to think critically about where AR mode was truly useful; where it maybe wasn’t needed; and how to avoid any AR-induced aches and pains.
We decided to add a fullscreen button, which would allow you to switch between AR mode and a more YouTube fullscreen-style mode. When you weren’t drawing, you could simply switch back to fullscreen mode and watch the video as usual. For full-screen mode, we simply overlaid a video on top of the camera feed. The tracking was still safe and secure behind the scenes, and toggling back and forth was fast and easy.
It sounds simple, but thinking about AR as a short-term activity, rather than one used for a full 5–10 minute drawing video, sparked some interesting discussion around additional ways to use the app, outside of just continually holding the phone out over paper. For example:
- You could watch the bulk of the video comfortably in fullscreen mode, then switch to AR mode when you see what you want to draw.
- You could follow along in fullscreen (not AR) mode, and switch to AR to compare the video creator’s end product with your own, as a way to see exactly where your techniques might have differed.
A big part of designing an AR app is being honest about how long your users can and want to stay in AR. If your users are starting to lose circulation in their arms, that’s a problem! Think about ways that you can effectively toggle between an AR/non-AR experience to help make things more comfortable. You might find some interesting use cases along the way.
Seeing what you’re doing
One of the trickiest parts of turning already-existing YouTube videos into traceable AR objects is user visibility: the user needs to be able to see the video to follow along with it, but the user also needs to have a very clear sense of where their hand is in space. Most of us can’t draw very well with our eyes closed: when you can’t see what you’re doing, it’s easy to get disoriented and mess up.
So how could we make sure that our AR object allowed our user to see the video clearly and also their own hand clearly? We tackled this with two simple techniques: opacity and chroma keying.
Opacity refers to the transparency, aka the alpha, of the overall image. A higher opacity means the user will see more of the video and less of the drawing surface. A lower opacity means the opposite: the image becomes more transparent, and the user can more easily see through to the surface below. Our opacity slider lets a user set all the pixels of the AR image to somewhere between 0 (totally transparent) and 1 (totally opaque).
But this didn’t quite solve our problem. Most drawing tutorials were set on white paper — and those white pixels were also getting projected on top of the user’s drawing surface. Those white pixels are unneeded information in this context: when a user is learning how to draw, what they need to see are the lines, and perhaps the artist’s hand — not the surface they’re drawing on. Opacity would allow us to make that white paper more transparent, but since it also applied to all the other pixels (not just the white ones), it also made the drawing harder to see — meaning opacity was not a good enough tool on its own.
This is where we started to explore chroma keying (which we refer to as “See-Through” mode in the app). Chroma keying is a technique that lets you replace a particular hue in one image or video feed with something else — a different color, a computer generated background, etc. It’s effectively the same thing as green-screening. We realized that the uniform white surface could work to our advantage, because its uniformity meant we could key it out with relative ease: we could make those pixels transparent/replace them with the video feed of the user’s surface. The colored pixels — which represented the actual drawing and the artist’s hand — would remain. (We used white, because Amanda was drawing on white paper, but you could do this on for color.) This was a major help in making it feasible to trace along with already-made YouTube videos. Even the hackiest, jankiest chroma key shader made it an exponentially better experience.
A feature we didn’t end up implementing was using this shader to compensate for lines that, after chroma keying, were too faint to see. (We didn’t add this only because we could already see Amanda’s lines easily, and didn’t want to add complexity to what was supposed to be a quick experiment.) Let’s say the artist in the video is drawing with pale yellow ink, and our viewer wants to copy this on white paper. Even after the chroma key, the pale yellow might be hard to see on the white surface. No worries: we could just add a color tint to our shader, making the opaque pixels darker and easier to see. It’s a feature that could also be helpful for folks who generally need higher contrast to see as well.
By combining opacity and chroma key, we were able to make it much easier to see where you were in relation to a given point in a drawing tutorial, in turn making it much easier for a viewer to follow along — without adding any burden to the video creator.
We had a blast testing our prototype out with Amanda, using everything from standard brush pens on paper to attempting to ice a cake in AR. As we tried out different videos, we talked about how this kind of video AR app might come in handy for both viewers and video creators.
A few topics came up that apply both to Amanda’s videos, as well as other kinds of educational content folks might make AR apps for.
The first was using AR videos as a way to communicate hand positions and angles. Amanda mentioned she frequently gets comments from folks who have a hard time understanding how to hold their pen and how to transition between different angles to lead to different line thicknesses. (I can confirm that this is a lot harder than it might look!) We really liked how this video-approach let viewers see those small details, rather than just having static steps that don’t show the process in-between.
We also liked how it could be used to generate AR practice worksheets for any subject the user likes. We used our AR app to practice our cursive English letters, but it also has potential for anyone who is trying to learn a different alphabet. For example, in Arabic, letters look different depending on where they are in a word (isolated, initial, medial, final) and which letters they’re next to. Letters can also take on entirely different shapes and flourishes depending on the calligraphic or handwriting style in use. Likewise, with languages like Chinese and Japanese, Hanzi and kanji have a particular stroke order to learn.
For newbies like me, who might have a hard time keeping track of where they are in a character or a word, worksheets really help build that muscle memory — but creators like Amanda just don’t have the time to translate their videos to paper and make worksheets for everything. Further, traditional worksheets can really only communicate the width and directionality of the resulting stroke — not how to get there. We loved the idea that this could serve as a kind of living worksheet generator for any educational scenario including curves, lines, and pens.
We also really enjoyed how bringing the video to AR personalized the experience, even without changing the content of the video. We could retain the helpful AR overlay while having the freedom to move around, zoom in, and adjust the video to fit whatever it is we want to draw on. It had a way of making it feel like the video creator was in the room with us. There was something that always felt consistently awesome about seeing Amanda’s title cards doodle to life on top of our own sketchbooks. All of us were really excited to see that develop as more creators make AR-first educational videos.
We had a blast doodling with Amanda, and we hope this blog post has given you some useful ideas about possibilities for video learning in AR. You can read more about ARCore here — or if you’re just in it for the hand-lettering, you can subscribe to Amanda here. If you develop something cool, submit it to our Experiments with Google page. Have fun!