List of skills and equipment required to start an Augmented or Virtual Reality side-project

4 min readApr 17, 2020

3D modelling and animation software
You would need it both for Augmented Reality (AR) and Virtual Reality (VR). You either need to create objects you would insert into the real world or design your own virtual words. And the answer to all your needs is always BLENDER! It is top-of-the-class and OPEN-SOURCE which means FREE. And there is so many tutorials on how to use it. All you need to master the technology is your time. This course takes around a week of full time work to complete plus add another week to practice. Two weeks and your are ready to go!
VR headset (for VR only)
I can not really say much about VR because my experience is more in AR space, but the cheapest headset you can get is GoogleCardboard which would cost you only around 40 dollars. You would also need to learn a bit of Unity coding, as using Unity seems to be the best way to do a VR project. To use Unity you need to bring up your C# coding skill at least to an intermediate level, so add another month to your learning schedule
Video processing
You need to be able to break a video into frames, do some operation frame-by-frame, and then output video. Ffmpeg (the default program to work with videos) is fairly simple to use, so time to master video-processing depends on how ambitious you are in terms of what to do. Some simple operations, like polygon or circle detection can be done using out-of-the-box OpenCV instruments and given good knowledge of Python should also take couple of weeks to master. If you never programed in Python add another few weeks to get some grasp of the language.

If you want to do something more interesting/advanced you will soon find out that what classical computer vision techniques can achieve is quite limited. To achieve more, one need to learn convolutional neural networks and deep learning. At this point you may start thinking that probably my claim about accessibility of technology was a bit of overstatement, if one have to learn AI to master it. I still believe it is accessible, and I will make another big claim that learning deep learning (no pun intended) is as hard as learning any other computer technology such as React.JS or cloud computing.

Essentially, if you check out fast.ai, the first seven lessons should be enough to be able to train neural networks on your own dataset, and that would take you only couple of hundreds lines of Python code. Free Practical Deep Learning for Coders would also give you good understanding of what state-of-the-art AI is capable of. As developments in deep learning go faster than books or university courses and tutorials are written, you need to keep continuously following the community including fast.ai forum, OpenDataScience, PyTorch forum and kaggle.com, as some of the hacks known to practitioners may not be available in blogposts or tutorials.

Concluding, to get a basic understanding of techniques to do video processing you need, given some knowledge of Python, around one month, but then you need to continuously keep learning. You may also need a computer/server with GPU, but you can use some free resources provided by GoogleColab
Camera tracking (depends on the project)
The problem of camera tracking is the reason why Extended Reality is still a low competition space; the reason why there are not many successful startups out there. The formulas behind the camera model are very simple and are described here. There are also many ways to find a position of the camera in the world, including sensors such as gyroscopes and accelerometers or using markers — some static features of the scene. However, comparing to deep learning, where you typically use very similar algorithms in different environments, the process of camera calibration is much more creative, not very well documented and depends on the environment you are working with.

For some environments, where a person just sits and rotates his head, gyroscope would be enough. In some environments, camera won’t be moving, making things easier. In other situations camera will be moving and zooming, and there won’t be many markers. The environment you pick will determine how much time and effort you would need to put. For more technical details about camera calibration process checkout this blogpost of mine.

The cool thing is that many things which were impossible before are possible now with advances deep learning and differential rendering techniques. These techniques were very recently developed and they are becoming publicly available right now, when markets had not yet time to react! Camera tracking is full of tricks only known amongst small circle of practitioners, it is also wide open to new ideas. Thus it is again of utter importance to engage with community and find some mentors

To start with, I would spend around 2 weeks equivalent of working full time to go through OpenCV tutorials, understanding how tracking extensions work in Blender and to get to know some theory from this Bible. (don’t worry if you don’t understand most of it — read it after getting familiar with OpenCV)

Once you get the background, it is all learning-by-doing and interaction with community. OpenDataScience community was very helpful for me and I would have spent ten times more time without it to get to the level I am now. I think it should take around a month to become more confident about what you are doing

List of skills and equipment required to start an Augmented or Virtual Reality side-project

Written by Andriy Levitskyy