The Mathematical Foundations of Artificial Intelligence

Gani Çalışkan
Turk Telekom Bulut Teknolojileri
5 min readMay 5, 2022

Different definitions of AI

In some CS courses, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans. Also some AI textbooks define the field as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term “artificial intelligence” is often used to describe machines (or computers) that mimic “cognitive” functions that humans associate with the human mind, such as “learning” and “problem solving”.

Russell, Stuart J.; Norvig, Peter (2009). Artificial Intelligence: A Modern Approach (3rd ed.). Upper Saddle River, New Jersey: Prentice Hall. ISBN 978–0–13–604259–4

Functional Analysis

  • Establishes to domain of the model or “hypothesis” 
  • Defines operations within the domain and transformations into adjacent domains 
  • Provides for measures of completeness: orthonormal function sets, vector projection 
  • Simplifies to more tractable implementations: linear algebra, matrix arithmetic, Fourier series.

Numerical Methods

Solutions to multivariate classification problems often require optimization routines:

  • Establishment of cost and gradient functions
  • Numerical search strategies
  • Linearization/determinism of stochastic process
  • Application of heuristics and ontologies
  • Numerical integration and differentiation required for ill defined data or “complicated” regions

Probability Theory

  • Establishes performance bounds upon stochastic classifiers: Bayesian networks, Particle Filters, Markov Chains, Maximum Likelihood, Parameter Estimation, Statistical Analysis of Physical Parameters
  • Accommodates stochastic processes and multivariate data — employing measures such as Mahalanobis Distance and MahalanobisBregman divergence

Convolutional neural networks are a specialized type of artificial neural networks that use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers.[13] They are specifically designed to process pixel data and are used in image recognition and processing.

Ian Goodfellow and Yoshua Bengio and Aaron Courville (2016). Deep Learning. MIT Press. p. 326.

Questions??

• Is the image sufficiently sampled to capture “high frequency” effects- Nyquist criteria

• Does the discretization of the convolution function compromise the output

• How much data is lost when using max pool compression

• Is fidelity of training data sufficient • Would alternate approaches (DCT, for example) provide sufficient compression and maintain fidelity

  • What would be the difference in compute resource requirements

NNs and Numerical Methods

A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes.[1] Thus, a neural network is either a biological neural network, made up of biological neurons, or an artificial neural network, used for solving artificial intelligence (AI) problems. The connections of the biological neuron are modeled in artificial neural networks as weights between nodes. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.

Hopfield, J. J. (1982). “Neural networks and physical systems with emergent collective computational abilities”. Proc. Natl. Acad. Sci. U.S.A. 79 (8): 2554–2558. Bibcode:1982PNAS…79.2554H. doi:10.1073/pnas.79.8.2554. PMC 346238. PMID 6953413.

Forward Propagation

Formula of forward propagation

Back Propagation, minimize wrt θ, gradient derivatives

Searching for the minima

  • Classic optimization theory
  • Conjugate gradient
  • Simplex
  • Direct search
  • Stochastic Gradient

Challenges

  • Well behaved and global minima
  • Oscillatory behavior
  • Regularization
  • Convergence Rate

BBNs and Probability Theory

Bayesian belief networks (also known as belief networks, causal probabilistic networks, causal nets, graphical probability networks, probabilistic cause–effect models, and probabilistic influence diagrams) provide decision-support for a wide range of problems involving uncertainty and probabilistic reasoning. The underlying theory of BBNs is Bayesian probability theory and the notion of propagation.

There is an example of this theory as you see below

Naïve Bayes Probability Condition

Vapnik’s Learning Model

  • A generator of random vectors ∈ , drawn independently from a fixed but unknown probability distribution function F(x)
  • A supervisor who returns an output value to every input vector according to a conditional distribution function F(x|y) also fixed but unknown
  • A learning machine capable of implementing a set of functions f(x, α), α ∈ Λ, where Λ is a set of parameters.

Thank you for reading my article about mathematical background of AI. Hope you enjoyed. You can reach all sources as you see below.

https://r1.ieee.org/maine/wp-content/uploads/sites/29/

https://www.researchgate.net/publication/229674541_Bayesian_Belief_Networks_BBNs

--

--