Some thoughts on AI

Hao Guan
5 min readMay 16, 2024

This year, one of the significant changes in my work has been transitioning to AI-related projects. It’s just the beginning, with many directions to explore and a lot to learn. Having worked in business system development for over a decade, I find this opportunity to do something different both interesting and challenging.

My blog has been neglected for many years. Numerous times, I wanted to write something, but it was always put on hold for various reasons. Recently, I’ve been thinking, could I try using AI to write my blog? This way, I could experiment with recent technology while keeping my blog updated. Perhaps in the future, I might only need to have some ideas, and AI could help me turn these ideas into articles, eliminating the struggle of writing.

As for topics, let’s start with AI.

AI has developed rapidly over recent years, achieving significant breakthroughs in many areas. Even someone like me, who wasn’t originally in this field, often comes across AI news through various channels. A few years ago, the emergence of GitHub Copilot frequently surprised me in my daily coding work. I often wonder, how will AI change our future? The recent appearance of various new models often sparks my imagination, so I’ll start by recording some thoughts today.

Currently, multimodal models can understand images and videos. In the future, there might be AI models trained specifically for understanding and interacting with application UI, potentially reaching or even surpassing human understanding of UIs. Humans learn how to use applications by reading manuals and help documents, and AI can do the same.

I believe AI will soon break the boundaries of human-computer interaction, bringing significant changes to the software development field. When we develop applications now, we provide users with a UI, either GUI or CLI, to allow human users to interact with the program and use its features. The UI is a gateway to using the program: it is both a path and a limitation. Users can only use the program according to the UI’s rules, much like how a large sofa might not fit through a small door. If an application only offers a UI without an API, its functionalities cannot be accessed by other programs. Of course, we often use various methods to bypass this limitation, such as web scraping or RPA. However, these methods are not universal solutions; they usually require custom development based on the scenario and can easily fail due to UI changes. Overall, when developing applications now, we can assume the users are human, and we don’t need to consider extreme cases too much.

With the continued development of multimodal models, the future situation will be entirely different. AI will be able to directly understand a program’s UI and operate the program through the UI. With this capability, interaction between programs will no longer rely on APIs; the UI itself becomes the API. Although it would still operate according to UI rules, it would completely break human limitations. Additionally, this approach is different from encapsulating the UI as an API through web scraping or scripts; it can be universal and does not require customization based on scenarios.

Thinking about this, a significant impact would be breaking the closed nature and boundaries of applications. This impact can be viewed from two angles.

A robot using mobile phone. Generated by SDXL.

Optimistically, many old systems can be integrated into the modern world without upgrades. Things that were previously disconnected due to requiring human interaction can suddenly be linked together. For example, reimbursement or insurance claims often require manual processing, which is cumbersome despite being straightforward. Currently, electronic invoices are widespread, and theoretically, we could develop a program to automatically collect invoices and submit reimbursement claims. However, gaps between systems make it challenging to develop such an automation tool: electronic invoices might be sent via WeChat, requiring a browser to download the PDF copy; submitting a reimbursement claim might involve another system, like filling out a web form and attaching the invoice file. These systems do not provide APIs since they were designed for manual operations, making it difficult to develop an automation tool connecting them. Even if we achieve this through reverse engineering or automation scripts, it is hard to scale since there are countless ways to obtain invoices from different merchants, each may use a different system with different UI. But as AI models progress and can understand UIs, implementing such an automation tool becomes possible. Writing code might not even be necessary; a simple prompt like “help me find today’s lunch invoice from WeChat and submit a reimbursement claim” could suffice.

On the other hand, the considerations for developing applications will become more extensive and complex. We can no longer assume that our application users are human; they could be machines using the application in ways beyond human limits. For example, we might currently think, “No human can click this button so quickly,” and leave out validations, rate limits, and debounce measures. Such considerations will no longer be relevant; while humans can’t click that fast, machines using our applications can. Many systems now use captchas to prevent bots, but many captchas cannot stop AI anymore. From recognizing text to images, puzzles, and word selection, captchas cannot be indefinitely complex, as even humans can’t solve overly complicated ones, whereas AI might do better. Some anti-scraping techniques will become meaningless, such as using data obfuscation and image splicing to display product prices on the webpages to prevent scraping. Prices need to be visible to humans, and if humans can see them, so can AI. Many app features designed for user engagement, like lotteries, check-ins, and daily tasks, will lose their meaning. I can simply ask AI to “check in on all the apps on my phone.”

Thus, not only will there be significant technical challenges, but product design will also undergo transformative changes. We can no longer rely on exploiting human weaknesses to design products.

AI technology is evolving rapidly, making the future world more interesting and complex. Whether we anticipate it or not, significant changes are coming soon, and everyone must be prepared.

--

--

Hao Guan
0 Followers

Shanghai-based software engineer with over a decade in backend systems. Skilled in Java, Kotlin, Python, and JavaScript, now exploring AI applications.