Stack of any Live Augmented Reality application across platforms
In the absence of adequate information available regarding Augmented Reality (AR ) application architecture and plethora of confusing marketing articles available on several frameworks makes many new Application architects and Product managers confused about the real problem of their product in AR domain.They are struggling to find right solution for their specific Business problem. Before deciding among multiple softwares, technologies and frameworks available one must understand the basic components involved in building any live AR application and what their corresponding responsibilities are . Hence in this article shall try to enumerate major components involved in the architecture of any live AR application, what are their individual responsibilities and how they interact in pipeline to offer complete solution. Mentioned below are the layers which interact with each other to make immersive world a reality on any platform.
This is the layer where core of solution runs i.e. the nectar carrying life of sales, marketing and delivery head. Jokes apart it’s where the actual problem of user is resolved, USP of whole product. For example, if the solution is of education using immersive technologies then the business logic runs around enhanced learning score of students enrolled per class , count of subject-matter specialists, status updates and communication between student and subject-matter specialists using immersive technologies.
This layer is generally responsible for aid in rendering graphics on surfaces like mobiles , browsers, desktops,Head Mounted Display(HMD) etc.In market many frameworks are available but major share is taken by OpenGl for rendering assistance in iOS,Android mobile devices and desktop. Stripped but more optimized version of it is also available on Web Browsers in the name of WebGl. WebGL is “OpenGL ES 2”, not plain OpenGL (the ES stands for ‘for Embedded Systems’). OpenGL ES is essentially a subset of OpenGL. In addition, WebGL is almost the same as OpenGL ES 2, but has some subtle differences.
This layer is responsible for handling the real time requirement of AR application. It conducts lot of jobs like media stream capturing, media processing , generating images/frames from live feed in contextual order, exchanging these frames over the network for further processing.For a long period this space was reserved for commercial players but the invent of open source technologies like WebRTC resolved multiple concerns like network security ,adaptive and scalable media codecs processing,network traversal, media resource management etc. It acts as an input to AR layer where it provides the consistent frames as input with required temporal-spatial information of different size and resolution.There are multiple commercial player also in this field like Frozen Mountain, Agora.io,OpenTok etc which provide fairly better solution at significant cost.
AR Framework Layer
This is a layer where main AR related processing is done. AR involves scene or virtual object creation, positioning it on real object, point of interest real object detection , its tracking in real 3D world from its spatial geometry. These information are extracted from frames received from communication layer mixed with shared context. In its core this layer deals with Computer vision algorithms for image processing, applying Machine Learning/Deep Learning (ML/DL) algorithms to build models and later using these models for Object detection and tracking for new images received etc. As this in itself seems a lot of stuff so many players be it open-source or commercial came out in market considering it as an opportunity and developed really great frameworks like ARCore, ARKit,ARFrame,AR.js etc. All of them fundamentally use Computer Vision in one or other sort to build some feature statistics for feature detection and tracking of objects in real-time.
In order to enable upper layers functionalities like Object tracking,detection computation ,user movement,capturing wide angled images etc. Hardware layer comes to rescue. It provides multiple functionalities like higher order computation using GPUs and TPUs,tracking the changes in motion using Accelerometer, live feed of the user’s surrounding environment on which AR objects can be overlaid using Camera, measuring the angular velocity or orientation/inclination of the device using Gyroscope.
All these layers work in conjunction to bring real-time AR experience to users. The objective of this article was to separate out the concerns as different layers and help the Product owners and architects to choose relevant framework for their respective concerns rather than getting confused by thinking any communication layer framework/solution can resolve AR layer issue or vice versa. These things generally marketing articles hide for the obvious reasons. With this I end this article here , Happy ARing.