In pursuit of a decentralized large-scale simulation of the world
--
Background
More than a year ago Visualix from Berlin/Germany embarked on a remarkable journey into the visual perception space. We created an intelligent multi-camera system, enabling us to capture semantic meaning (mostly movements and item interactions) from the visual input. The system has been tested by our first customers and attracted attention of new partners. On our way there, we created deep learning models for person re-identification as well as learnt much synchronization, optimization and multi-view geometry. We then used our system to create our first shared (perstistent) AR experience (see below).
We experimented with descriptors, synchronization in the multi-cam system and calibration. On the mobile-side we have been building on top of the existing solutions from Apple (ARkit Version 1) and Google (ARcore SDK for Android Preview). We know ARkit and ARcore from the very beginning, and building on top of them allowed us to win the first deals in the space. But those technologies alone would not solve large-scale navigation problems for our customers. We had to look further.
The need for robust re-localization at scale made us focus even more on algorithms, optimization and scalability. Our goal was to enable the persistent large-scale AR use case through progressing CV-based visual perception for localization and navigation in fast-changing environments.
Since then, our team has grown to around twelve people (mostly computer vision engineers and physicists), we have built our product suite and sold our product to large enterprise customers around the world. We have built technology that has strong scientific foundations, with around ten patent-pending technologies in the localisation and mapping spaces.
Let’s start from the beginning..
First Steps
We started in the “garage” mode using regular cameras and created an intelligent multi-camera system that gained understanding about the spatial attributes of the world, and objects moving within it. That was a rewarding experience!
We were, indeed, reconstructing interactions in the world with camera input. Quite a bit of technology for making multi-camera systems intelligent in different lighting conditions, in spaces with lots of shadow or reflective surfaces etc.
Then, we realised the potential (and limitations) of mobile phones as well as understood that our technology could empower both on-mobile computation (some of our algorithms work on the mobile side) as well as be implemented in edge-computing settings where computation “power” is distributed (the default situation, in which heavy computing is not only done on the mobile). Also, we realised that we could be one of the very few players that can crowdsource mapping (given the limitations of the mobile phones) through creating innovative user- or business-facing experiences. While (in the meantime) having our solution tested at the medium-size scale, we have been working on making it more accurate and robust.
Eventually, we combine the best of the learnings from the regular camera world and the mobile world, and present a device-agnostic solution. The heavy computation takes place on the Visualix server (we solve problems in feature extraction, indexing and matching, cluster identification, outlier removal etc.) and therefore enables mobiles to achieve more, for longer, without affecting user experience (low battery drain, no “running hot”) and only some algorithms on the mobile. Visualix server can be in the cloud as well as on premise. The new capabilities that we bring to the world can be one of the backbone technologies for the existing infrastructure solutions, as well as empower solutions in a more decentralised world. The ability to map, reconstruct and simulate could grant superpowers to the otherwise regular nodes in the global network.
Vision and current capabilities
We believe we can push forward creating the digital twin of the world. In order to achieve that we have created software that enables businesses to:
- Map large spaces (and store those maps in the cloud, on an on-premise server or in a decentralised fashion)
- Understand the semantics of interactions in fast-changing environments (we create algorithms to understand spatial context better, mostly for shared experience use cases)
- Enable robust reconstruction and localization at scale
- Create shared experiences on top of existing AR solutions for Android and iOS devices (our solutions enable higher accuracy, higher robustness, deal with battery draining or “running hot” issues)
In order to reach our vision, we started a product suite for persistent AR experiences and had it deployed by some large European and international companies.
Our Product
We have developed the Visualix Product Suite (on the mobile side we use ARkit/ARcore as well as our patent-pending technologies, on the server side — we use Visualix Server)which allows businesses to:
- map their venues with mobile phones (using our API — it can also run with a regular stereo camera, if you wish)
- place their AR content using our CMS (content management system, we currently have a simple one, here the 2d view, soon also with the 3d view rendered in the browser)
- integrate our AR viewer into their existing applications (this part you can see in pretty much all of our videos — we can integrate our technology in pretty much any application)
For now, you can request a demo directly on our website:
Direction
Our big goal is to enable more a better life experience for the entire world. We build the technology to enable businesses to solve their efficiency problems (navigation in large venues, information attached to space rather than product labels) using augmented information, navigation, ability to simulate , shared AR experiences, products based on the AR cloud data etc., based on the digital twin of the real world. This way, we can enable new services and better solutions to the most important problems (certain problems can be better solved in the digital space, certain problems will cease to exist after the couple the digital and the real worlds).
In order to achieve our big goal, we are working hard to achieve the following:
- Enable the large-scale re-localization in fast-changing environments (robustness and accuracy required to achieve scale mean dealing with dynamic environments, “working” loop closure, “better” bundle adjustment, dealing with different lighting conditions, dealing with reflective surfaces etc.)
- Reconstruct the static world in the AR cloud (we can already reconstruct indoor and outdoor world, but for now focus on indoor use cases — those are a bit different, have different problems)
- Reconstruct interactions and capture the dynamics of environments (we train machine learning models to understand depth, semantic meaning etc.)
After that, we plan to expand our technology to allow capturing more subtle information about interactions in local scenes, so that those too can be stored in our AR cloud. At this point of time, we will be able to understand the world to an extent, in which we can actually simulate life!
To achieve all of that, we grow our Team further. We are HIRING amazing computer vision engineers and are always looking for new people excited about the future we want to create!
Visualix Team In Action
Check out some more demo videos from our team —
- the first one shows a large-scale navigation system (stable movement, no jitter, very high accuracy, with an on-premise server)
- the second one shows the accuracy of us dealing with shared experiences involving human avatars
Author
Michael is the CTO of Visualix. He loves creating passionate and determined teams that use technology to create capabilities that enable humanity to solve important global problems faster.
He has a software engineer background, with focus on creating ambitious products in a fixed time. He likes to think about writing re-usable code, algorithms in computer vision and applied (scalable) machine learning. You can also see his avatar in one of the videos.
Happily Married… with Children :)