Community Spotlight

AI for AR: Role of Artificial Intelligence in Augmented Reality

Attempt to bring AR Projections to life from text using Jina

Rishi R
Jina AI
Published in
5 min readSep 6, 2022

--

Mixed Reality, The future?

Background

Artificial Intelligence (AI) has been used in various sectors such as gaming, manufacturing, military, and law enforcement systems. With the rapid growth of augmented reality development tools and the expansion of their functionality over the years, developers believe that the usage of AI in the development of AR applications will play an important role in the coming future. They also believe this technology can go beyond, but that’s pretty obvious since everything evolves with time.

Over the past decade, Augmented Reality has emerged from a fringe concept to the technology that is about to change our lives. It’s everywhere now, from smartphones to smart glasses to web browsers, and it is all set to become even more ubiquitous. The utilization of Augmented Reality increases each day, which causes the industry leaders such as Microsoft and Google to step up their AR game and make products like the HoloLens or Google Glass. At the same time, some of the most successful companies in Augmented Reality are Meta and Magic Leap, which have raised millions of investments for mixed reality.

What are AR Projections?

Augmented Reality (AR) projections are projected images that are overlaid on a real-world object or object. Augmented reality projections are a way of creating immersive visualizations in real-time. It allows the user to see and interact with 3D objects in the real world, whether a virtual room or an actual person. These projections have several applications and fields for which they can be used.

Augmented reality (AR) projections are increasingly used by companies and marketers to visualize their ideas and concepts in 3D. AR projections are known for providing a realistic look at any environment, and people can interact with them using their smartphones or other mobile devices. Currently, it’s considered one of the most popular visualization types because it provides the ability to add motion graphics, animation, and sound effects into an existing video or holographic projection.

What are some benefits of using AR projections to bring the text to life?

Note-taking methods and reading from books have some limitations.

Traditional note-taking methods do not help students build their understanding of the materials in class. Reading from a book does not help students focus on important information efficiently as they will not be able to visualize the context or a particular topic.

Data is recorded in multiple formats. One of the most used ways of storing data has been in the form of text. There are many challenges associated with the textual form of Data, one of them being the lack of visualization. The ability to visualize data plays an important role when it comes to analyzing it, the problem statement focuses on converting 2-Dimensional text data into 3-dimensional visualizations and then projecting it into a real-world environment through augmented reality.

Here are a few examples,

Trying to obtain a 3D mesh of Dinosaur from its keywords ‘Dinosaur’ ‘T-Rex’ and so on.

3D Mesh of a Dinosaur

Trying to obtain a 3D mesh of Airplane from its keywords ‘Airplane’ ‘Aircraft’ and so on.

3D Mesh of an Airplane

AR projections for an immersive reader’s experience

The new generation software that we are trying to create combines the power of artificial intelligence and augmented reality to bring AR Projections to life from texts. But how do we get to these AR projections for an immersive reader’s experience, and where does a neural search framework like Jina come into the picture?

The main focus here is to bring AR Projections to the users as seamlessly as possible. To achieve this, we will be leveraging a search engine to fetch the AR models, not any search engine, but a neural search engine in this case. Since the input format can be anything from a text to a picture, a neural search engine would enable the user to fetch the required output from any input.

Well, the answer to this question lies in the overall approach of the application, which is as follows,

  • Let’s say the user first scans content using a smartphone camera,
  • Then the text should be recognized, identified, and analyzed accordingly.
  • The extracted text and the content will be analyzed for the relevance of the context (for content filtering). Only those texts which have the capabilities or are feasible for projection are chosen.
  • The filtered and extracted content is identified for users to choose their preferred output. Here we are collecting the final input to process it further to obtain the final output.
  • After the user’s selection, the text will be processed as follows,
  • At first, We try to run the input through a neural search framework such as Jina. In this case, obtain an existing 3D model (if available) and then send for projection directly.
  • Suppose there is no open 3D model already available. In that case, we try to run the input through the neural search engine, find and collect 2D images, and then convert 2D images to a 3D model making it ready for the projection.

A quick look at the architecture diagram 👇

Planned Architecture of the prototype

Why Neural Search?

Using a neural search framework like Jina becomes crucial to problem statements where creating and maintaining a usual Database is not feasible. A neural search engine would enable us to retrieve the specific type of data that we would require regardless of the input data format. It could be text, images, tables, or whatever we feed.

The proposed methodology would not only help us obtain AR objects from texts but opens up a new possibility for all formats, such as images, tables, graphs, and so on. This will enable us to create a new environment where anything can be visualized.

Challenges

There are always a few challenges that come with any process. Here those would be:

  • Precisely identifying potential input objects
  • Handling missing AR Projections that are not available anywhere
  • Processing of 2D images to 3D
  • Real-time precise projection positioning

After focusing on these challenges, the functioning of the application should be seamless and efficient as a whole.

Conclusion

AR is an exciting technology, there are endless possibilities across different industries and fields, and it’s only set to become more widespread. Due to its potential in many areas, from medicine to education, the technology is unlikely to go away anytime soon. If you have a camera on your device (and who doesn’t these days?), you can easily replicate these effects for yourself, and this method of bringing anything to life using AR would make it seamless, unlike anything before.

--

--

Rishi R
Jina AI
Writer for

Student | Open Source Contributor | Open Source Developer | Ardent Programmer | GitHub Account - https://github.com/Rishi0812