Introduction to AI — Learning AI from scratch— Part One

Yeabsira Tesfaye - 王夏雨
24 min readSep 25, 2024

--

Hello People, today we will try to learn about AI fundamentals and every terminologies and basic concepts we should know. In these series, my main aim is to learn our way into connectionist AI, mainly neural networks and deep learning.

Introduction to AI

Artificial intelligence (AI) is technology that enables computers and machines to simulate human learning, comprehension, problem solving, decision making, creativity and autonomy. In todays world there are two things that kept the hype of AI and making it grow exponential. The first is the availability of vast amount of digital data available for the AI model to learn from and the second is the growth of computational power due to the increasing power and efficiency of micro processors.

We can define AI in simple form as a program that can super pass or match humans intelligence. To do this the program must satisfy

  1. Discovery: The ability to discover new information's.
  2. Information: The ability to look into data or information which wasn’t explicitly stated.
  3. Reasoning: The ability to take inputs, reason and give out puts.

Why Math become so important in the field of AI?

The main aim of AI is to create a machine which can think and reason as humans do, this will raise a question of putting this into a mechanized form of formal reasoning which can be expressed in a way these machines can learn. In order to do this the language in between has to be

  1. It needs a physical symbol system.
  2. It should be able to create relationships with in it self and express.
  3. It should be able to be manipulated to create a set of expressions.

Since math's is able to this its chose as formalized language of AI.

History of AI(Highlights version)

The name Artificial Intelligence was coined by four scientists at the Dartmouth Conference and believed in the development of computer applications which are capable of doing tasks in human level.

The Logic Theorists

The Logic Theorist (LT) was one of the first artificial intelligence programs, developed by Allen Newell and Herbert A. Simon in 1956. It was designed to mimic human problem-solving and reasoning skills, specifically focusing on proving theorems in symbolic logic.

Key Aspects:

  1. Purpose: The Logic Theorist was created to demonstrate how machines could replicate human logical reasoning, proving mathematical theorems automatically.
  2. Method: The program used a method called heuristic search, which is a strategy to explore and eliminate possible solutions based on learned or estimated strategies, much like how a human would approach a problem.
  3. Significance: The LT was able to prove 38 of the first 52 theorems in Principia Mathematica, a seminal work on mathematical logic by Alfred North Whitehead and Bertrand Russell. In some cases, it even found more elegant proofs than those originally presented.
  4. Impact: The development of the Logic Theorist is often considered the birth of AI as a formal discipline. It laid the foundation for future work in automated reasoning and problem-solving in artificial intelligence.

AI Winter refers to periods in the history of artificial intelligence when enthusiasm, funding, and research interest in AI significantly declined due to unmet expectations and slow progress.

Deep Blue was a chess-playing computer developed by IBM that famously defeated world chess champion Garry Kasparov in 1997. It became known as a milestone in artificial intelligence, particularly in the domain of game-playing AI. Here’s how Deep Blue worked:

Key Components of Deep Blue:

1.Brute Force Search: Deep Blue used a brute force search algorithm to evaluate potential chess moves. This involved looking ahead at millions of possible sequences of moves (both for itself and its opponent) to find the best outcome. It could evaluate up to 200 million positions per second, making it capable of looking several moves ahead to foresee the consequences of its actions.

2. Minimax Algorithm: Deep Blue employed the minimax algorithm, a decision rule used for minimizing the possible loss in a worst-case scenario. The algorithm worked by considering the best move for Deep Blue and the best possible counter-move by Kasparov, then selecting moves that would minimize the worst possible outcome.

3. Alpha-Beta Pruning: To optimize the search process, Deep Blue used alpha-beta pruning, a technique that reduces the number of nodes evaluated by the minimax algorithm. It eliminated moves that wouldn’t be played by either player, thus speeding up the decision-making process.

4.Specialized Hardware: Deep Blue was powered by custom hardware, including 480 specialized chess chips designed specifically for evaluating chess positions quickly. This hardware allowed Deep Blue to process many more positions per second than a general-purpose computer.

5. Heuristics & Chess Knowledge: In addition to brute force, Deep Blue had a large library of pre-programmed chess knowledge, including opening book moves, endgame databases, and positional heuristics.

Human chess experts helped program Deep Blue with strategic knowledge to handle specific situations that brute force alone could not effectively evaluate (like complex endgames).

6. Learning: Deep Blue was not a learning system. Unlike modern AI systems like AlphaZero, it did not improve by playing or learning from games. It relied on pre-programmed algorithms and human-engineered knowledge.

How It Defeated Kasparov:

  • Deep Blue’s ability to calculate deeply, combined with its massive computational power, enabled it to explore more possible moves than a human could in a short amount of time.
  • It followed principles of positional play, making moves that did not always appear aggressive but effectively constrained Kasparov’s options.

Core Concepts in AI

This is the backbone of AI, where algorithms learn from data without being explicitly programmed. It involves training an algorithm on a data set, allowing it to improve over time and make predictions or decisions based on new data. The more data is the better.

Neural Networks are computational models that mimic the complex functions of the human brain. The neural networks consist of interconnected nodes or neurons that process and learn from data, enabling tasks such as pattern recognition and decision making in machine learning.

Deep learning is a type of machine learning that uses artificial neural networks to learn from large amounts of data. It mimics how the human brain works to recognize patterns and make decisions, powering tasks like image recognition, speech processing, and language understanding.

In some cases we might not even trace back and understand how we reached to that conclusions since the layers we are using becomes deeper and complex and its called black box problem.

You might be asking, then what’s the difference between Machine Learning and Deep Learning at all, right? The difference is the way these models learn, for Machine learning it requires a human to select and prepare the features making it suitable for the machine to learn and for deep learning it learns from the raw data.

  • Features

Features are individual measurable properties or characteristics of the data that are used as input to a model. They help the model make predictions or classifications. Each feature represents a variable in the dataset.

Example: Features might include email length, presence of certain words, sender address, and frequency of punctuation marks.

It enables machines to understand, interpret, and generate human language (both written and spoken). The goal of NLP is to bridge the gap between human communication and computer understanding.

Its a branch of AI which aims to create machines which are capable of mimicking the cognitive ability of humans which includes perception, learning, reasoning, and problem-solving. The main aim of this function is to create something which understands and interpret complex information’s like humans do.

  • Expert systems

These are a type of artificial intelligence (AI) program that emulates the decision-making ability of a human expert in a specific domain. They are designed to solve complex problems by reasoning through knowledge, represented primarily in the form of “if-then” rules, facts, and heuristics.

Reasoning as search

This is a process of solving problems or making decisions by systematically exploring a space of possible solutions. The AI systems use search algorithms to explore this space, applying logical rules or heuristics to narrow down the possible options and find a valid or optimal solution.

Key Concepts in Reasoning as Search

  • Search Space: The set of all possible states or configurations the system can explore. This includes the initial state, intermediate states, and goal states.
  • Initial State: The starting point from which the reasoning process begins.
  • Goal State: The target solution that the system is trying to reach.
  • Operators: Actions or rules that transition the system from one state to another within the search space.
  • Search Algorithm: The method or strategy used to explore the search space (e.g., breadth-first search, depth-first search, A* algorithm).
  • Heuristics: Rules or strategies that guide the search process, helping the system prioritize which paths to explore first.

Heuristics

Heuristics are problem-solving techniques or rules of thumb that help make decisions or find solutions more quickly by focusing on the most promising options. In AI, heuristics guide the search process, helping the system prioritize paths that are likely to lead to a solution, rather than exploring every possibility.

Heuristics play a very important role in preventing unlimited back tracking and waste of resources by optimizing and prioritizing the optimal paths.

General Problem Solver

The General Problem Solver (GPS) was another groundbreaking artificial intelligence program developed by Allen Newell and Herbert A. Simon in the late 1950s, following the success of the Logic Theorist. The GPS aimed to solve a wide variety of problems, rather than being limited to specific domains like theorem proving.

Key Aspects of the General Problem Solver:

  1. Purpose: GPS was designed to be a more flexible and general-purpose problem-solving system. Unlike the Logic Theorist, which focused on formal logic, the GPS sought to tackle problems that could be defined by rules, goals, and sub-goals — essentially, any problem that could be described in terms of operators, states, and goals.
  2. Structure:
  • Initial State: The starting point or current situation in a problem.
  • Goal State: The desired solution or end state.
  • Operators: Actions that could transform the problem from one state to another.
  • Heuristics: Strategies or rules that help choose which operator to apply at any given step to move toward the goal.

3. Means-End Analysis: The core strategy used by GPS, called means-end analysis, involved:

  • Comparing the current state with the goal state.
  • Identifying the difference between the two.
  • Applying an operator that reduces the difference. This is similar to how humans break down problems by focusing on the differences between where they are and where they want to be, then applying steps to close the gap.

4. Domain Generality: GPS was designed to work on various types of problems, from puzzles to simple mathematical tasks, by representing them in a similar formal structure of states and operations. This versatility was one of the program’s most innovative features.

5. Challenges: While GPS was a major step forward in AI, its performance was often limited by the complexity of real-world problems and the lack of domain-specific knowledge. The heuristics it used worked well for smaller, well-defined problems but struggled with larger, more complex domains where specialized knowledge was necessary.

Gelernter’s Geometry Theorem Prover

The main goal of this program is to prove geometric theorem's as human and this model was able to combine symbolistic and graphical terms.

Key Features of Gelernter’s Theorem Prover:

  1. Diagram-Based Reasoning: It used geometric diagrams as part of the problem-solving process, which made it different from many other AI theorem provers of the time that only worked with symbolic logic.

2. Symbolic and Numeric Representation: The prover combined both symbolic reasoning (using formal logic) and numeric reasoning (with coordinates) to understand geometric relations. It represented points, lines, and figures in both a logical form and a coordinate system.

3. Heuristic Search: Gelernter’s system used heuristics (problem-solving shortcuts) to decide which steps to take when proving a theorem. These heuristics helped narrow down the search space of possible solutions, making the process more efficient.

4. Forward and Backward Reasoning: The system used both forward reasoning (starting from known facts to derive new conclusions) and backward reasoning (starting from the desired conclusion and working backward to the known facts). This mimicked how humans often approach complex problems.

SAINT (Symbolic Automatic INTegrator)

The program was called SAINT (Symbolic Automatic INTegrator), and its purpose was to solve integration problems in calculus, simulating how a human would reason through symbolic integration.

Key Aspects of the SAINT Program:

  1. Symbolic Integration: SAINT was designed to perform symbolic integration, a key operation in calculus, where functions are integrated in a step-by-step manner. It tackled problems like finding the integral of algebraic expressions.

2. Heuristics-Based Problem Solving: SAINT used heuristic techniques to guide its problem-solving process. Rather than blindly applying integration rules, it employed strategies that human mathematicians often use, such as pattern matching and trial and error to simplify integrals and choose the appropriate techniques.

3. Cognitive Simulation: SAINT aimed to mimic human-like reasoning. It simulated the way a student might work through a calculus problem by breaking it down into smaller parts and applying known integration techniques (e.g., integration by parts, substitution, etc.).

4. Step-by-Step Problem Solving: Like human problem solvers, SAINT would tackle integration problems step by step, applying integration rules one at a time, and it could explain the steps it took to reach the final solution.

5. Error Handling and Learning: One interesting feature of SAINT was its ability to deal with errors and incorrect solutions. When the program encountered mistakes, it could backtrack and attempt alternative strategies, which gave it a learning-like quality.

Bobrow’s NLP Processing (STUDENT Program)

Its a program which was capable of processing highschool level algebric problems and solving them.

Key Aspects of Bobrow’s NLP Processing (STUDENT Program)

  1. Natural Language Understanding: The STUDENT program was designed to read and understand simple word problems stated in natural language (English), such as algebra problems, and then convert them into algebraic equations that could be solved mathematically.

2. Parsing and Interpretation: The program used early NLP techniques to break down sentences into their grammatical structure and recognize keywords that indicated mathematical relationships (e.g., “sum,” “product,” “more than”). This allowed it to interpret the problem and represent it symbolically.

3. Symbolic Representation: Once the program understood the natural language problem, it would translate it into a symbolic form, typically algebraic equations, which could then be solved using standard mathematical methods.

4. Problem Solving and Logic: After translating the problem into algebraic form, STUDENT would apply standard algebraic rules to solve the equations. For instance, if the word problem was about finding two numbers whose sum is given, the system would convert it into an equation like x+y=10x + y = 10x+y=10, then solve it for the unknowns.

Shakey

It combined physical movement with decision-making and problem-solving capabilities. Shakey was designed to navigate its environment, interpret instructions, and break down complex tasks into simpler steps.

Key Features of Shakey:

  1. Physical Movement: Shakey was a mobile robot equipped with wheels and sensors, allowing it to move around its environment, avoid obstacles, and interact with objects. It had a camera, touch sensors, and a rangefinder to perceive its surroundings.

2. Artificial Intelligence and Planning: Shakey could reason about its actions using AI. It was one of the first systems to integrate problem-solving with physical actions, using a form of reasoning known as STRIPS (Stanford Research Institute Problem Solver). STRIPS allowed Shakey to plan a sequence of actions to achieve a specific goal by breaking it down into smaller steps.

3. Autonomous Decision-Making: Unlike previous robots that followed predefined routines, Shakey could make decisions based on what it “saw” and plan its actions dynamically. If it encountered a new obstacle, it could figure out an alternate route to achieve its goal.

4. Task Decomposition: Shakey used a hierarchical approach to problem-solving. Given a high-level command like “move the block to the next room,” Shakey would break this down into smaller steps like moving toward the block, pushing it, finding the door, navigating through it, and so on.

5. Perception and Environment Interaction: Shakey used its sensors to perceive its environment and could analyze the layout of its surroundings to plan paths and avoid obstacles. It used both vision and touch to interact with objects.

ALIZA

Alayza (also known as ALIZA) was an early AI program created in the 1960s by Dr. Joseph. The program, more commonly known by its famous version ELIZA, was one of the first computer programs designed to simulate human conversation.

Key Features of ELIZA:

  1. Simulated Conversation: ELIZA could mimic the interaction of a human therapist by responding to user input in natural language. It simulated a specific kind of therapist, known as a Rogerian psychotherapist, by encouraging the user to keep talking, often repeating their own statements back as questions.
  2. Pattern Matching: ELIZA worked by using simple pattern-matching rules. It did not understand the conversation but used predefined templates and keywords to generate responses. For example, if the user said, “I feel sad,” ELIZA might respond, “Why do you feel sad?”
  3. No Understanding of Content: ELIZA didn’t actually “understand” the user’s input in the way modern AI systems attempt to understand language. It relied purely on recognizing specific patterns in the user’s text and responding based on those patterns. Despite this, many users felt as though they were having meaningful conversations.
  4. Psychotherapy Simulation: The program’s default script was designed to simulate therapy sessions, asking reflective questions and encouraging users to talk more about their thoughts and feelings. This gave the illusion that the program was empathetic and insightful, even though it was merely following simple rules.

The ELIZA Effect

The ELIZA effect refers to the tendency of people to attribute human-like understanding, intelligence, or emotional capabilities to computers or AI systems, even when the systems are following simple, predefined rules and lack true comprehension.

ELIZA was programmed to engage users in conversation by using basic pattern-matching techniques and responding with pre-scripted, generic responses. Despite the simplicity of its design, many users believed ELIZA was genuinely understanding and engaging with them in a meaningful way, even though the program merely echoed their statements or asked leading questions without real insight.

ELIZA was programmed to engage users in conversation by using basic pattern-matching techniques and responding with pre-scripted, generic responses. Despite the simplicity of its design, many users believed ELIZA was genuinely understanding and engaging with them in a meaningful way, even though the program merely echoed their statements or asked leading questions without real insight.

Key Aspects of the ELIZA Effect:

  1. Attributing Intelligence: Users often ascribe deeper intelligence or understanding to a computer program simply because it provides responses that appear thoughtful, even if those responses are generated by simple, rule-based algorithms. In the case of ELIZA, people believed they were interacting with an empathetic system, even though ELIZA had no true comprehension.
  2. Cognitive Illusion: The ELIZA effect creates an illusion where users think the machine “understands” them because it responds in a way that seems appropriate. This illusion is enhanced when the system’s output aligns well with the user’s expectations or when it uses natural language.
  3. Emotional Response: Some users even formed emotional connections with ELIZA, feeling as though they were in a real therapy session. This shows how easy it is for people to project human-like qualities onto machines, especially when they interact in a conversational, human-friendly way.
  4. Misunderstanding of AI Capabilities: The ELIZA effect highlights the gap between how AI systems are perceived and how they actually function. Many people assume that a system capable of engaging in conversation must possess some level of understanding or intelligence, even when the system is based on shallow techniques like keyword recognition or script-following.

Combinatorial explosion

Combinatorial explosion in AI refers to the rapid growth in the number of possible combinations or solutions as the size or complexity of a problem increases. This occurs in many AI systems, especially in search problems, decision-making, and optimization tasks, where exploring all possible options becomes computationally infeasible due to their vast number.

Key Concepts:

  1. Exponential Growth of Possibilities: As the number of variables or components in a problem increases, the number of possible combinations grows exponentially. For example, in a chess game, the number of possible moves grows exponentially with each additional move, leading to an enormous number of possible game states.
  2. Search Problems: In AI, many tasks require searching through a large space of possible solutions (e.g., pathfinding, planning, game playing). When the number of choices expands exponentially, the search becomes computationally expensive or impossible to handle exhaustively. This is the heart of the combinatorial explosion problem.
  3. AI Planning and Decision Making: In areas like AI planning, where a system must decide a sequence of actions to achieve a goal, the number of possible action sequences can explode as the number of actions or possible states increases. This makes finding the optimal sequence of actions highly challenging.
  4. Optimization: In optimization problems, such as finding the best solution from many possible configurations, the combinatorial explosion makes it extremely difficult to evaluate all potential solutions in a reasonable amount of time.

How can it be solved?

  1. Heuristics: AI systems often use heuristics (rules of thumb) to narrow down the number of possible choices, helping to focus the search on more likely solutions. For example, in chess, AI can evaluate which moves are most likely to lead to a win and avoid less promising moves.
  2. Search Pruning Techniques: Techniques like alpha-beta pruning in game-playing AI or A search* in pathfinding help reduce the number of possibilities by cutting off large portions of the search tree that are unlikely to lead to optimal solutions.
  3. Approximation Algorithms: In cases where exact solutions are computationally infeasible due to the combinatorial explosion, AI often uses approximation algorithms to find near-optimal solutions within a reasonable time frame.
  4. Constraint Satisfaction: Limiting the problem space by imposing constraints on possible solutions can help reduce the number of combinations to explore. AI techniques like constraint satisfaction focus on finding solutions that meet a set of predefined conditions.

Moravec’s Paradox

Moravec’s Paradox is the observation that, in artificial intelligence (AI) and robotics, tasks that humans find difficult, such as complex reasoning or solving math problems, tend to be easier for computers to perform. Conversely, tasks that humans find effortless, like sensory perception, motor skills, and understanding language, are extremely challenging for computers to replicate.

Key Points of Moravec’s Paradox:

  1. Hard for Humans, Easy for Machines:
  • Tasks like playing chess, solving mathematical equations, or performing logical reasoning are hard for humans but relatively easy for AI. Computers excel at structured, rule-based tasks where they can leverage their computational speed and accuracy.

2. Easy for Humans, Hard for Machines:

  • Tasks like walking, recognizing faces, understanding natural language, or navigating a dynamic environment are simple for humans but extremely complex for AI systems. These tasks require fine motor skills, sensory integration, perception, and learning from experience — skills that are deeply rooted in millions of years of biological evolution.

3. Evolutionary Perspective:

  • Moravec suggested that the tasks humans find easy, such as perception and motor control, are skills honed over millions of years of evolution. These abilities are deeply embedded in our brain’s “older” structures, which evolved long before complex reasoning. On the other hand, tasks like abstract reasoning are more recent evolutionary developments and are not as deeply ingrained in our biological systems, making them cognitively harder for humans but simpler for machines to replicate using explicit rules.

This paradox emphasizes on the perspective of humans intelligence, and its evolutionally bases which made the motor senses deeply rooted in the older or so called the reptilian part of the brain and things like reasoning and other computations in the frontal lobe of the brain which is relatively the newer and shallowest part of the brain.

Anatomical explanation

1. Motor Skills and Sensory Processing (Older Brain Structures):

  • Tasks like walking, perceiving the world through our senses, and coordinating muscle movements are managed by older brain structures, such as the cerebellum, basal ganglia(the reptile brain), and parts of the brainstem. These areas have evolved over millions of years and are highly efficient at handling these tasks with minimal conscious effort.

For example:

The Cerebellum: Plays a crucial role in fine motor control, balance, and coordination. It helps us perform complex movements smoothly and automatically without much conscious thought.

The Visual Cortex (part of the occipital lobe) processes visual information, allowing us to recognize faces, navigate environments, and perceive depth almost instantly.

  • These abilities are instinctual and refined by evolution because survival often depended on quick responses to physical stimuli (e.g., moving away from danger or catching prey).

2. Abstract Reasoning (Newer Brain Structures):

  • On the other hand, tasks like solving complex math problems, logical reasoning, and planning are handled by the prefrontal cortex, part of the neocortex — the brain’s most recently developed region.
  • The Prefrontal Cortex: It is responsible for decision-making, problem-solving, planning, and abstract thinking. These cognitive tasks are much more resource-intensive and require conscious thought because they have only recently evolved in our ancestors, making them less efficient and more “effortful” for humans.

Perceptron

A perceptron is the simplest type of artificial neuron or neural unit, introduced by Frank Rosenblatt in 1958. It models a biological neuron and is used in binary classifiers, where it predicts whether an input belongs to one of two classes.

Understanding Bias in depth

  • Bias is a constant term in the mathematical model of a perceptron that is added to the weighted sum of the inputs. It allows the model to adjust its output independently of the input values.
  • Think of bias as an adjustment factor that helps the perceptron better classify data points by shifting the decision boundary.

Importance of Bias

  1. Flexibility in Decision Boundary:
  • Without bias, the decision boundary (the line or plane that separates different classes) is restricted to passing through the origin (0, 0). This can be problematic if the data is not centered around the origin.
  • By including bias, the decision boundary can be shifted up or down (in 2D) or left or right (in higher dimensions), allowing the perceptron to better fit the data.

2. Modeling Real-World Scenarios:

  • In many real-world situations, it’s common for some inputs to be zero, but the model still needs to make a prediction. Bias helps address this issue by providing a baseline output.

3. Improving Model Accuracy:

  • Including bias often leads to a more accurate model, as it gives the perceptron more freedom to learn from the training data.

Mathematical Explanation

Let’s break down the mathematical components with a focus on how bias works.

  1. Weighted Sum Calculation: The output of a perceptron is calculated as:

z = w₁x₁ + w₂x₂ + b

  • w₁​ and w₂​ are weights assigned to inputs x₁​ and x₂.
  • b is the bias term.
  • This sum z is then passed through an activation function to determine the output.

2. Impact of Bias:

  • If b is positive, it effectively raises the decision boundary.
  • If b is negative, it lowers the decision boundary.

Example

Hopfield Networks

How does AI Work?

Well know lets get into everyone’s question, we know how to use the AI through GPT or some other model but have you stopped to think but how does this thing actually works? Well in this part of the section we will try to see the abstracted version.

  1. Data Collection and preparation:
  2. Training the algorithms — this process requires us to train the algorithms in order to help them understand the pattern.
  3. Model Building: we also need to build a simulation or representation of a real-world system, process, or phenomenon using algorithms and mathematical equations executed by a computer.
  4. Learning and Optimization: The computer will use the training algorithms, and the models we built to learn from the data and optimize the model by updating the parameters and features.
  5. Prediction and Inference: This is a process where the trained algorithm make a prediction based on the newly available data.
  6. Feedback Loop and Improvement: Using the feed back that come from then predictions made and the outcome of the data and use that to improve the model.

Generalization, Reasoning and Problem solving are the basic things that makes AI AI.

Generalization in AI refers to the ability of a machine learning model to apply what it has learned from training data to new, unseen data.

How does GPT works?

1. Tokenization

First, I break down your input into smaller parts called tokens. A token can be a word, part of a word, or even punctuation. For example, if you input “How do you process things?”, I might break it into tokens like:

  • “How”
  • “do”
  • “you”
  • “process”
  • “things”
  • “?”

Each token is assigned a unique numerical representation so it can be processed by the model.

2. Context Understanding via Embeddings

Next, I convert these tokens into embeddings, which are numerical vectors that capture the meaning and relationships of the tokens in a multi-dimensional space. Embeddings help me represent not just the words themselves but also their relationships and context.

For instance, “process” in “process things” will be treated differently from “process” in “legal process” because of how the words around it influence its meaning.

3. Attention Mechanism

Once I have the embeddings, I use an attention mechanism to weigh the importance of each token in the context of the whole input. This helps me figure out what parts of the input are more relevant to generating the best response. For example, if you ask, “How do you process things?”, I might give more attention to the words “process” and “how” because they guide the meaning of your question.

The attention mechanism helps me focus on different parts of the input in order to understand its structure and meaning.

4. Predicting the Next Word (Generating a Response)

Using my trained neural network, I then predict the most likely next word or phrase based on the input. I do this step-by-step, generating one word at a time, while considering the entire context of your question.

Here’s how it works:

  • I calculate the probability of every possible word that could come next based on the input.
  • I choose the word with the highest probability, then move on to predicting the next word in the sequence.

For example:

  • After processing “How do you”, I predict that “process” is likely to follow.
  • Then, after “How do you process”, I predict that “things” is a strong next candidate.

I repeat this prediction process until the response is complete.

5. Context Continuity

During the conversation, I use previous exchanges as context. This helps me maintain consistency in responses. For instance, if we’ve been discussing AI for several turns, I recognize that your input is probably related to AI and adjust my responses accordingly.

However, my memory is limited to the current session, and I don’t “remember” information once a conversation ends unless stored in some external context.

6. Returning the Response

Finally, I convert the generated sequence of predicted words back into text and display the response to you.

Summary of Steps:

  1. Tokenization: Break your input into smaller units (tokens).
  2. Embeddings: Convert tokens into numerical representations that capture meaning.
  3. Attention Mechanism: Focus on the most relevant parts of the input.
  4. Word Prediction: Generate the most likely response word-by-word based on learned patterns.
  5. Context Continuity: Maintain context throughout the conversation.
  6. Response Output: Display the final response to you.

While this process mimics understanding, it’s all based on pattern recognition and mathematical models rather than actual comprehension or reasoning.

Types of AI

We can classify AI based on various categories.

Based on Functionality

  1. Reactive Machines: These AI systems have no memory or ability to learn from past experiences. They simply respond to the same input in the same way every time.
  2. Limited Memory: These AI systems can look into the past and make decisions based on historical data, but they cannot form memories or learn over time.
  3. Theory of Mind: This refers to AI that has an understanding of emotions, beliefs, and intentions of others. Such AI can engage in social interactions.
  4. Self-aware AI: The highest form of AI where machines possess self-consciousness and awareness similar to human beings.

Based on Learning Capability

  1. Supervised Learning: AI systems are trained on labeled data, where the input-output pairs are clearly defined, and the model learns by comparing the output with the correct label. We can see image recognition algorithms as an example for this type learning.
  2. Unsupervised Learning: AI works on unlabeled data and finds hidden patterns without specific guidance on what to look for. We can see clustering as an example of this.
  3. Reinforcement Learning: This AI learns through trial and error, receiving rewards or penalties based on actions and adjusting its behavior accordingly. The famous example for this will be ALPHA GO.
  4. Semi-supervised Learning: This is the case where the AI learns from both labeled and unlabeled data.

Based on Intelligence Level

  • Narrow AI(weak AI): AI systems designed to perform specific tasks with a high level of efficiency, but they cannot perform tasks beyond their predefined scope.
  • Artificial General Intelligence (AGI)(Super AI): AI systems that possess human-level intelligence across a variety of domains and can perform any cognitive task that a human can do.
  • Artificial Superintelligence (ASI): A level of AI that surpasses human intelligence in all fields, including creativity, problem-solving, and social skills.

The singularity in artificial intelligence refers to a hypothetical point in the future where machines surpass human intelligence, leading to unpredictable or uncontrollable changes in society and technology. This concept is often associated with super intelligent AI that can improve itself at an exponential rate, far beyond human comprehension.

Overfitting is a common problem in machine learning and statistics that occurs when a model learns the training data too well, capturing noise and fluctuations instead of the underlying patterns. This leads to a model that performs very well on the training data but poorly on unseen or test data.

Underfitting is the opposite of overfitting. It occurs when a machine learning model is too simple to capture the underlying patterns in the training data, leading to poor performance on both training and unseen data. Essentially, the model fails to learn enough from the data.

Outliers are data points that differ significantly from the majority of a dataset. They can be much higher or lower than the rest of the values and can arise due to variability in the data or may indicate experimental errors or anomalies.

Foundation Models

A foundation model is a type of large-scale artificial intelligence model trained on vast amounts of data and designed to be highly versatile. These models serve as the foundational layer upon which various AI applications can be built by fine-tuning them for specific tasks.

Key Characteristics

1.Pre-trained on Large Data Sets: Foundation models are typically pre-trained on massive amounts of diverse data, which could include text, images, audio, or a combination. The model learns general patterns, relationships, and knowledge from this broad dataset before being fine-tuned for specific applications.

2.General-Purpose Capabilities: Because of their extensive pre-training, foundation models are general-purpose. They can be adapted to various tasks with minimal additional training, such as text classification, translation, image recognition, or even generation of novel content (like writing, art, or music).

3. Scalability: Foundation models tend to be extremely large, often with billions or trillions of parameters. This scalability allows them to capture complex relationships in the data and perform sophisticated tasks.

4. Fine-tuning: After pre-training, these models can be fine-tuned with smaller, task-specific datasets. This fine-tuning tailors the model for specific uses like sentiment analysis, medical diagnosis, or language translation without needing to train a new model from scratch.

5. Modality Versatility: Some foundation models can work across different modalities (e.g., text, images, and speech), making them flexible in handling multiple types of data inputs. This allows for building multi-functional systems from a single model base.

Examples of Foundation Models:

  • GPT-3 (used by ChatGPT) is an example of a foundation model for natural language processing (NLP). It was pre-trained on a massive amount of text and can be adapted for various language-related tasks.
  • BERT (Bidirectional Encoder Representations from Transformers) is another popular foundation model for NLP tasks like question answering or text summarization.
  • CLIP (Contrastive Language–Image Pre-training) is a foundation model that bridges language and image understanding, enabling tasks like image captioning and text-based image generation.

--

--

Yeabsira Tesfaye - 王夏雨

@yeabfuture - Someone who dares to jump into the unknown world.