Navigating Modern AI — What goes in and how does it come out?
Whether you’re a tech enthusiast, or a curious bystander, AI is already part of your daily life and it will only continue to grow. Learning to understand and embrace its impact on our lives will be vital to discerning what to trust in AI’s mass adoption.
This article is designed as a user-friendly roadmap to unravel the fundamentals of veracity in modern AI, focusing on how it learns and what it learns from. Below, we have a simplified view of modern AI’s most important use cases: language processing. It has three main components which will affect you as a user: what comprises the AI models, what is queried into the model, and what the model outputs.
As seen above, AI language models can take information from just about anywhere and can learn just about anything. An important feature of this training process is getting a lot of data so that a model doesn’t pigeonhole itself in a specific domain or language style (unless it’s a domain-specific application, e.g. legal). By creating a diverse dataset, the model gets exposed to numerous ways of writing, dialects, expressions, and other language elements. These elements allow for your input/query to be fairly general rather than a rigid format (e.g., you can say something like “write me an email” or “compose a digital letter” and the model will likely interpret it in a similar way). Using your inputs as a guide, many organizations build upon the base models to enhance the outputs. This involves using your input data in subsequent model training to improve how they answer your specific questions. This is why many AI model policies need to specify ownership of input/output data.
How does this affect you?
Each of the stages of the AI, from input to output, requires a different type of your attention. Large AI model training requires your passive attention to ensure that your digital profile is either well protected or it is only giving out information which you are comfortable sharing. Your input or query requires proactive attention to ensure that you’re not putting in any proprietary information. Any output then requires reactive attention in the form of accessing and validating any information which is being generated for you. This assessment can consist of finding and referencing important information with a known source of truth.
To highlight these three forms of attention and some tangible steps to tackle them, we have related them to current events. Below, the table outlines examples of how AI models are using your data along with the different ways you can prepare yourself.
As AI evolves over the coming years, so will the need to protect your information and validate theirs. Making sure you keep secret data to yourself and truth-checking by staying vigilant on what you need to give the model, what data you have lying around, and how you use what AI gives you.
Contributors:
Benji Christie is passionate about creating more explainable large language models for enterprise.
Len Zapalowski has entrepreneurial success in enterprise software, web infrastructure, robotics, and AI.