Screencast: what, why, and how?

Published in

Learning is FUN

9 min readMar 1, 2018

Are you a frustrated instructional designer looking for a perfect tool for your tutorial video? Have you ever had an existential crisis on what it really means to make a screencast video? You have come to the right place.

In this article, I will try my best to discuss issues ranging from what, why and how. I will discuss things such as the the origin of the idea, its underlying philosophies, different types of screencasts, an overview of the currently available tools on the market, workflow problems and finally, my thoughts on alternative practices, if after reading all this, you decide not to pursue this route!

Where does the idea come from?

Screen capture has always been a handy tool since the advent of computer.

A bit of clarification first; and history.

When we talk about screen capture we may refer to the act of:

Producing a “hardcopy” of the text based terminal in front of you by pressing the “PrtSc” button on your keyboard (a key persists to this day!)
Capturing a static image of your whole desktop.
Making a video file of what you see on your screen, including your mouse movement, and sometimes a narration.

It is obviously the third kind that we are primarily concerned with here.

I have yet to see a computer historian to do the dirty work, but to my knowledge, the first tool that can capture your screen in a video format came in the 1990s. There was a product, by the now defunct Lotus company, called ScreenCam. A May 9 1994 InfoWorld article describes it as:

Launching ScreenCam displays a small, VCR-like control panel, with Record and Stop buttons. When you are recording, everything you do in Windows is captured into an animated movie file that precisely record your actions.

Sounds familiar? Yes, it is very much the same thing we do these days. We are still stuck with the VCR metaphor.

One important feature of the ScreenCam software is that it records Windows events instead of every single frames that are displayed on the screen. This is important. It therefore begs for a fourth definition of the term screen capture.

4. Making a series of static images (screen shots), complemented by human interactions (mouse movements and keyboard clicks) between them.

This notion of screencast serves the purpose of, in the case of ScreenCam, reducing the computational power it needed. But today, when that is no longer a concern, it serves the purpose of breaking down a chain of actions into smaller units which can be precisely defined and therefore lend itself to a mode of technology training called simulation. More on this later.

A recording of your screen is not a screencast

Right now let us stay with the third definition: you produce a video file of what happens on your screen.

However, simply record your screen is not enough to bring your audience’s attention to where they should be in the elearning context. Certain features of the tool get fine-tuned. Hence the the suite of what I call “special effects” that are increasingly regarded as essential in addition to the simple and straightforward screencast. These effects include but is not limited to:

zooming in and out to highlight certain area of screen.
panning movement to highlight how the mouse is moved or how one area is connected to another.
magnifying the mouse (or changing it into some other shape)and its trace to make it more discernible.
using markers (text and icons) to explain a specific thing on screen.
using exaggerated clicking sound to imitate mouse clicking or keyboard input.
using animations (flashes) to imitate mouse clicking.
sectioning by lower-thirds such as caption for steps, chapters etc.
branding with logo or watermark.
slowing down: this feature is available by manipulating the timeline?

If we are deal with touch screen here, you need to visually indicate gestures in a way that is self-explanatory: here is a tap, here is a swipe, here is a pinch, etc.

More importantly, the role that a screencast video may play needs to be contextualized. Many people, when thinking about the screencast video, tend to think of it as a standalone resource, as if the only thing we need to do is to make such a video and throw it out there. Technically it is true, and it certainly reflects the humble origin of the tool, as well as the whole experience characterized by convenience and complete passivity, popularized by Youtube.

But as recent research finds out, video is not the solution to everything. People still prefer to read when it is more efficient way of gathering information. Seriously, if you have been watching tutorial videos, count the time you wasted listening to all sorts of nonsense!

Even if video were superior (in some cases) to words, what is even more superior than video is one that invites active participation. This could be the future of Youtube. And Facebook is actively seeking to incorporate video as a community based, collaborative, and highly interactive feature.

Workflow Dilemmas

To record a screencast, the first thing we need to think about is if we need a camera feed. In other words, do you want a talking head, sometimes taking the whole screen and sometimes exists as PIP? The personal touch is useful under certain circumstances but generally not needed in the elearning context.

The second thing of concern is the voiceover. Many people record their screen and just throw some text on the screen, or use a background music track. This shows how intimidating the task of creating voice over can be. It is something that requires careful planning, skillful execution, reasonably competent hardware (which we have to purchase), and often painful post-production.

The final piece of the puzzle is the subtitle. If you are a Youtuber who just throw your video out there then this is not your concern at all. But for many organizations producing video requires ADA compliant subtitles.

Looking at the three pieces (video, voice, subtitle) together, we ask: in what order do we produce these? The answer is often: it depends.

If a screencast is produced in a professional scenario, you often need a script written and approved in advance. You could use this script to make the video first, and then add narration. The order could well be: subtitle, video, voice.

I have also heard that some prefer to record the narration first, and then try to play it while recording the video. This is equal to “do what is said”. Sounds easy, but you do need to manipulate the video a bit if the system is not up to your pace.

On the other hand, one could argue that this is precisely the reason to do narration first. If you are recording while looking at the video, how do you manage to look at the script?

Either way the fact that we have a script creates a dilemma we cannot completely obliterate: reading from a written script would sound unnatural. But improvising when recording or watching the video is not an acceptable option either.

Where is the Challenge

Making screencast is a time-consuming process. It is so not because it is difficult to record your screen while mumbling — numerous online tutorials will tell you otherwise. But talking while recording is not the kind of screencast I can afford to use.

To achieve a certain quality standard, I need to record the voice over and the video separately, and by different persons. I also need to write a script beforehand so I can make sure to cover all the ground. Finally, I may also change the script a little bit after I try to do the video, as this is where I may realize some ordering problems or missing points. Given all these challenges, the thing can still be done, albeit it becomes a time-consuming process. You may need to spend one whole afternoon trying to establish precise synchronization between your video and VO script. And this is too long in current industrial standards.

Here are some of the most encountered issues in the process :

VO talks about something while the video moves on to other things. Solution: freeze the video (sometimes called extend frame)until the VO is ready.
VO has done the talking while the video is still going. Solution is simple: split the audio and leave a gap in the soundtrack

A more challenging problem is that while you are recording you realize you should have said something else. You want to change the script. If this is simple matter of “let’s add one more point” you can probably squeeze it in the soundtrack, but if it involves doing things in a different order, then the video part is not as easy to fix as the audio.

This demands the following workflow arrangement: try to rehearse the video using TTS voice, if you find something needs tweaking, change the TTS text and do it again. Nothing is lost. Of course, the pacing of TTS voice will be a little different from the real voice. When the latter comes, you need to sync the video back to the voice.

Choices of Tools

Even in the context of elearning, there are many tools that can deal with screencast. The problem is: these tools overlap in terms of features and it is not always easy to figure out a workflow that fits your scenario. For this purpose I have divided the tools into three separate categories:

Prototyping tools: rapid and hassle-free recording.

Peek: available both on Windows and Mac. It records and uploads immediately so you can share it with somebody else. No editing.
SnagIt: a tool by TechSmith that is far superior than anything in its range. It captures image and video. It can record panoramic or scrolling image. It can be basic video trimming. It can produce gif. It can do simple markup. It even has an asset management system which is really useful if you are making tons of captures. This is an essential tool.

Specialized tools: these tools can not only record, but also do some post-production editing. However, they are not fulfledged video editing tools. Rather they want strike a sweet balance between features and ease of use.

Replay (a standalone product in Articulate 360): Windows only; a recording tool that can take three tracks (screen, webcam, lower thirds) and conveniently switch between them, or use PIP. Basic Timeline editing is provided.
Camtasia: one of the most well-known tools of the trade. It is positioned as a video editing tool. But it is very different from the general purpose video editor such as Premiere, Final Cut Pro or even iMovie. It really addresses the special needs of screencast productions.
ScreenFlow: native Mac app that is similar to Camtasia. I tried it and I found even the Mac version of Camtasia (which everyone knows is a little brother to the Windows version) is far superior.

Heavy-weight tools: these are not screencast software per se but have the functionalities built-in. More importantly, it doesn’t record the screen the same way. The output is not a video file, but a combination of several files: one about the visual, one about keyboard and mouse actions.

In these tools, the philosophy of making a screencast is that, since working with in-slide interactions already involves sophisticated manipulation of timeline, zooms and pans, as well as markers, are just adding other elements to the timeline. It should involve minimal additional learning for designers.

Storyline (Windows only): the screen recording feature inside Storyline gives you some extra features compared to Replay, which are essential in some cases:
* record system sounds
* record the video as step by step slides and later, as simulations.
* Move new windows: this is designed to deal with multiple windows interaction.
* Zooming and panning.
* Add markers to particular point on screen and timeline.
* Support subtitle.
Adobe Captivate: achieves a similar set of functionalities.

It is worth mentioning that for these two elearning authoring tools, screen recording is basically understood as adding something to its slide timeline. There already is a timeline! Therefore you don’t need to learn new things.

This philosophy makes perfect sense, assuming that we are talking about people who are already deeply entrenched in slide interactions building. But there is a large percentage of IDs who are not. In fact, slide-based elearning is not something we take for granted now. This may come as a shock to some, who would instantly see elearning as such, just as people would think powerpoint when they think presentation.

But for the topic at hand, the biggest difference is that, in the context of a slide, a video is just one element on the timeline. By adding other elements to the timeline we can work “within” the video, to show things or change the video itself according to timecode. In the webpage-based format, however, the video remains a non-divisible unit. It is embedded in a web page. While we can access randomly the video from any point in time, all other activities on the web page are suspended, as they exist essentially in a timeless fashion.