Summary of Moonshot Meetup#4: Eye on AI 👀 🤖

--

In the last few weeks, we’ve invited 3 experts in the AI field to SCB 10X’s latest collaborative space — DistrictX for a 2-hour long panel session about the hottest topic in the technology space — ‘Artificial Intelligence’.

MOONSHOT MEETUP #4: Eye on AI

🗣Our distinguished speakers of the session:

  • Dr.Ithipan Methasate — CTO at ZTRUS
  • Dr. Sanparith Marukatat — Senior Researcher Nectec
  • Dr. Andre Esteva — Co-founder & CEO at Artera

As artificial intelligence advances at an unprecedented rate, computer vision has emerged as one of its most powerful as we’ve seen in many promising applications globally. From autonomous vehicles to medical imaging, computer vision has transformed the way we interact with our environment and opened up a world of possibilities for industries of all kinds.

This article will summarize the insights and ideas shared by the speakers at the event, specifically: natural language processing, human-computer interaction, and ethics and privacy. Their diverse backgrounds and expertise in the field of computer vision — from smart city, natural language processing and speech recognition to the field of healthcare through the use of AI and machine learning, provided valuable perspectives on the latest trends, technologies, and applications in this exciting and rapidly growing field.

🔑Key Takeaways:

  • Technological advances are rarely long-term defensible, often companies focus on building a moat around branding, commercial footprints with long-term contracts, and ecosystems of startups or apps that rely on their platform.
  • Human-centered design and user experience are highly important. This involves understanding the needs and expectations of the users and designing interfaces and interactions that are intuitive and easy to use.
  • Incorporate multimodal inputs — such as audio and natural language, to allow for more natural and intuitive interactions between humans and machines. This involves not just computer vision but also natural language processing and other fields.
  • Privacy and security still remain a big concern, particularly in applications such as facial recognition, where the potential for misuse is high.
audiences

Computer vision is a field of computer science that involves understanding visual information, whether it’s images, scans, or even ultrasound images. It has a wide variety of applications, including document analysis, CCTV cameras, and X-ray images which also meant potential broad applications and business opportunities.

🇹🇭Local View (Thailand)

Dr. Ithipan discussed the current trends in the field of computer vision in Thailand, noting that the COVID situation has led to an increase in the adoption of technology by Thai people. One example is that face recognition and eKYC are now more common, and every bank uses face verification. Many AI services are also becoming more prevalent, such as automation in document automation, which many companies are investing in to reduce manual labor.

Dr.Ithipan

There are also Thai companies making AI, computer vision, and NLP products, and Thai NLP groups are collaborating to make Thai accessible globally, which could potentially spark integration into more applications, including those from Thai companies.

There is a problem with accessing large amounts of data in Thailand, and there are government-led initiatives among private companies to allow for data sharing to improve AI. The Association of AI Companies in Thailand is working on creating their own AI services and tackling data issues. There are also connections with academic associations and government organizations to collaborate and provide funding for the AI community.

🏰Building defense

Dr. Andre Esteva discusses the challenges of building a moat around an AI business given the availability of open-source models and the convergence of AI approaches. He notes that technological advances are rarely long-term defensible, and companies can create a moat by having access to proprietary or hard-to-access data sets. However, this is only an intermediate-term defensibility as there will always be innovation on top of whatever is out there. Instead, he suggests that companies focus on building a moat around branding, commercial footprints with long-term contracts, and ecosystems of startups or apps that rely on their platform. By doing so, they can not only create defensibility but also sell bundles to customers that include a range of offerings.

💡Challenges faced in the computer vision field

Dr.Itipan mentioned that the biggest challenges in computer vision are managing computing power and making applications meet human expectations. The goal is to reach the same level as human ability, such as in the case of autonomous vehicles. The challenge is to develop technology that can help people work more efficiently, but not replace them entirely. This requires a lot of training and development.

While Dr. Andre mentioned that the history of computer vision shows that many fundamental tasks have largely been solved in recent years, there are still high-level tasks that machines struggle with, such as total scene understanding. There are also challenges in becoming more multimodal, learning from smaller amounts of data, and intersecting with other fields such as NLP and RL. As the tasks in singular domains are solved, there will be more tasks at the intersection of these fields, which will be a big challenge for building artificial, generalized intelligence. Given the context of the US, the challenges of hardware in AI depend on the specific use case, with autonomous driving being a potential challenge due to the need to collect data from a fleet of cars before training effective models. In general, building and deploying AI systems in the commercial space requires collecting data from that system and training models with it. However, if the hardware is purely in the cloud, then hardware is not a challenge, and scaling with GPUs is straightforward.

While Dr.Sanparith says data limitation is a huge challenge in this space. As large language models require massive training data, some public datasets contain inappropriate content, and obtaining large datasets for facial recognition testing is difficult due to data protection laws. For example, there is a problem with accessing large amounts of data in Thailand, and there are government-led initiatives among private companies to allow for data sharing to improve AI. Meanwhile, the Association of AI Companies in Thailand is working on creating its own AI services and tackling data issues. There are also connections with academic associations and government organizations to collaborate and provide funding for the AI community.

Dr.Sanparith

📈Trends and new promising technologies today

Some of the technologies that were mentioned include generative AI for generating images and videos, multimodal computer vision for fusing image or video streams with language or other data, self-supervised and unsupervised learning, and few-shot learning.

Some of the applications of these technologies include predicting patient outcomes in healthcare, better video search for security and surveillance purposes, and natural language querying of thousands of cameras in real-time for law enforcement purposes. The interviewees emphasized that the intersection of technology and application in computer vision is incredibly powerful. City management, particularly crime monitoring, can benefit from computer vision. However, indexing data for search can be a challenge, and traditional image search requires indexing each image by image features. A new model called Clip from OpenAI, can process both text and images and match the features. This new technology allows for indexing videos with image features and searching using text features, which was not possible 20 years ago which brings out the question of whether this technology can be turned into a profitable business.

Additionally in the context of smart city solutions, specifically the example of Bangkok using license plate snapshots to detect speeding. They then mention the challenge of facial recognition and the difference between detection, verification, and recognition tasks. They suggest that while detection is now easy, the verification and recognition tasks are still problematic, particularly for large-scale implementation.

Design computer vision systems that are perceptive and user-friendly and that also allow for natural interaction between human and machine…

Dr. Ithipan discusses the recent keyword of “vision transformer” in academic topics, which has been traditionally used for natural language processing (NLP) but is now being applied to computer vision. He notes that this technique can be useful in addressing the classic problem of lack of data in training models, and highlights various time-for-learning techniques such as few shots, zero-shot, one shot, and a lot of shots. He believes that these techniques are game changers in making model building easier and more efficient. The prompt asks about designing computer vision systems that are perceptive, user-friendly, and allow for natural interaction between humans and machines.

Dr. Sanparith believes that combining text with images is the right direction for designing user-friendly computer vision systems that allow for natural interaction between humans and machines. This approach allows users to express their needs and wants more naturally. However, to achieve this, it is important to understand local dialects, which can be a challenge. However, he believes that achieving 100% accuracy in AI is possible but not yet fully attained. The focus should be on designing interactions between humans and technology that are easier and require less effort. He emphasizes the importance of human interlock, which involves transferring human intelligence and knowledge to AI, in improving AI systems. Failure to do so will result in the eventual demise of AI.

Dr. Andre discussed the intersection of computer vision and natural language and how it can greatly support designing computer vision systems that are both perceptive and user-friendly. He also emphasized the importance of machines understanding natural human behavior and predicting it, which can allow for more natural interaction between humans and machines in various environments. Dr. Andre mentioned that machines need to understand humans at a much deeper level than they currently do.

❔Question of Ethics

Dr. Sanparith emphasizes the need for AI literacy among those who use AI to understand its limitations.

Dr. Ithipan believes that ethics depends on various parameters such as culture and situation and that guidelines for the use of computer vision are still in the early stages.

Dr. Andre emphasizes the need for policies to regulate the development and deployment of AI technologies, citing HIPAA as an example of outdated policies that need to catch up with the latest technology.

All three speakers agree that there is a need to balance security and privacy concerns when using computer vision, particularly with regard to surveillance and facial recognition systems in public spaces.

👩‍⚕👨‍⚕Medical professionals and their thoughts on AI

Dr. Andre who has experienced and interacted with many medical professionals stated that the opinions of medical professionals regarding AI in their careers vary. Those in earlier stages of their careers are excited about the technology, while those later in their careers are split in their opinions.

The consensus is that AI will not replace physicians but will augment their abilities, especially in diagnostic tasks where AI has shown superior performance to humans alone.

The adoption of AI in medicine still faces barriers such as security, privacy, and regulations. However, the technology is available, and integration into workflow remains a question of excitement for some and less for others.

Dr. Andre

⭐️What does the future hold

The three speakers had different views on how computer vision will shape society in the coming years.

Dr. Ithipan emphasized that AI is becoming more prevalent in everyday life and gave examples such as CCTV cameras with AI detection and virtual assistants. He also mentioned that AI OCR(optical character recognition) is not intended to reduce employment but to help employees work smarter.

Dr. Andre predicted that in 20 years, AGI, or artificial generalized intelligence, will be a reality, which will fundamentally change society. He believes computer vision will be one of the most important ways in which AGI will interact with us in the real world, allowing robots to interact with us and autonomous driving to function as well as the potential for computer vision to revolutionize industries such as healthcare, agriculture, and construction by providing real-time data and insights to workers.

Dr. Sanparith believes that it’s difficult to predict how society will change in the next 20 years, given the rapid pace of technological advancements. He notes that AI technology adoption is quick and cites examples of how it has been accepted in different fields such as medicine. He mentions that doctors are comfortable using AI black boxes as long as they understand the limitations of the models and how to use them effectively. He further emphasizes that AI can be a useful copilot for humans, as seen in his work with doctors on addressing operator-dependent problems through AI analysis of images.

While the advancements in AI remain an excitement to some and less to others, these technologies hold immense potential for improving our lives and solving complex problems, they also come with significant risks and ethical concerns that must be addressed.

🔹Thank you for tuning in and don’t forget to check out the full video today
Feel free to share this post and comment down your thoughts!

🏢SCB 10X is hiring, interested?

Check out at https://www.scb10x.com/careers

📲Visit SCB 10X at

Twitter | Medium | Youtube | LinkedIn | Discord | Facebook | Website

--

--