Image created with Dall·E by the author.

Exploring the Uncharted Territories of Math in Data Science

Beyond the usual topics like statistics, calculus, and linear algebra

Hennie de Harder
Published in
6 min readMay 27, 2024

--

Data science isn’t just about crunching numbers or mastering the common areas like statistics and linear algebra. It’s also about exploring other math topics, like chaos theory and Fourier analysis. These are unknown for many data scientists, but they can actually help us to see and solve problems in unexpected ways.

One nice thing about math is that it often starts as something people explore for fun, but then it turns out to have uses in real world problems. Math concepts born from curiosity often end up being very useful. In this post, we’ll point out lesser-known math content that is closely related to data science. You will not find these topics in (most) existing data science curriculums. We searched for posts that are easily accessible, together with more advanced content if you like to dig deeper.

Are you ready to explore and see how these topics can power up data science? Let’s dive in!

This post is co-created with my colleague Joris Bukala. We are both data scientists with a background in math.

Chaos Theory

Chaos theory is important for data scientists when modeling and forecasting complex systems. In chaotic systems, small variations in the starting conditions can drastically alter outcomes. It’s relevant in areas such as weather forecasting, stock market analysis, and complex system simulations, providing insights into the behavior of dynamic systems over time.

A key take away from this (beginner friendly) post: if you are working with a truly chaotic system, trying to make long-term predictions is a waste of your time!

Logistic map. For an r-value between 2.4 and 3.0, the population stabilizes. For higher values, there are multiple possibilities.

Game Theory

You have probably heard about SHAP values for interpreting the predictions of a machine learning model, but did you know they are coming from game theory?

Game theory in general helps in understanding strategic interactions in competitive and cooperative environments. SHAP (SHapley Additive exPlanations) values are a breakthrough in explaining the output of machine learning models. This application of game theory in data science helps in making models more interpretable and fair.

But SHAP isn’t the only way in which game theory is related to data science. Game theory can help in decision making and offers structure in analyzing competitive scenarios. If you are unfamiliar with the basics of game theory, you can read this post for common terms and visualizations of a game.

If you are thinking about games and AI, maybe Google DeepMind comes to mind. And indeed, they use game theory to train AI agents playing games. (Remember Go and Starcraft?) A cool thing is that they use their findings in other areas besides gaming. An example is this post. DeepMind explains how they reformulated PCA as a competitive multi-agent game, making it faster to solve for massive datasets.

Fourier Analysis

Fourier analysis is a mathematical tool that decomposes a function or a signal into its component frequencies. It’s like breaking down a complex musical chord into individual notes that, when played together, form the original sound. This technique is named after Jean-Baptiste Joseph Fourier, who introduced the idea that any periodic function could be represented as a series of sines and cosines.

By transforming complex data into a format that shows its fundamental frequencies, Fourier analysis enables data scientists to uncover patterns, make predictions, and compress data for more efficient storage and transmission.

This post from Joris Bukala explains the topic from the start and shows how Fourier analysis and transformations can be used in data science projects.

Waves. Image created with Dall·E by the author.

Graph Theory

Did you ever wonder how Google maps finds the fastest route or how Facebook suggests friends you might know? Graph theory is at the center of these networks. It helps in understanding and navigating the connections and pathways within massive sets of data. It’s very useful in data science and operations research, as many problems can be formulated as a graph. Here is a beginner friendly post to understand common graph terms and algorithms.

There is a lot more to this topic! If your data has a graph like structure, you can use graph properties to create new features and improve your predictions. Some examples are node degree (number of connected edges), centrality measures (how important is a node), identifying communities, and path lengths. If you are unfamiliar with Graph Data Science, the guide from Neo4j might be a good place to start.

Another, more advanced part of Graph Data Science are Graph Neural Networks (GNNs). A GNN can capture complex relationships in a better way than traditional neural networks, and they are a hot area in deep learning research, according to this post from NVIDIA. There are many, many methods and applications of GNNs.

A cool application from DeepMind is GraphCast. GraphCast combines GNNs with machine learning to make 10 day weather predictions. GraphCast is at the time of writing the most accurate 10-day global weather forecasting system in the world!

Hyperbolic Geometry

Hyperbolic geometry is an area of mathematics that explores the properties and structures of hyperbolic space, which differs greatly from the flat surfaces we’re used to in Euclidean geometry. This type of geometry is characterized by its hyperbolic shape looking like a horse saddle (aka constant negative curvature). Hyperbolic geometry is about seeing how bending or curving space in certain ways can make our usual measurements and the connections between points different.

This might sound a bit vague if it’s a new topic for you, so let’s make it a bit more practical by looking at data science applications. Since hyperbolic space can efficiently model relationships in complex networks and hierarchies, it offers a more natural framework for embedding and analyzing such data compared to traditional Euclidean spaces. For example, social networks, biological networks, and the structures of languages can be better represented in hyperbolic space due to their intrinsic hierarchical nature.

This post takes you from curvatures to embeddings and is a very good place to start if you are interested in this topic (the post explains the findings of this paper from Facebook).

If you want to dig deeper: This paper provides an overview of geometric deep learning problems together with solutions, difficulties and applications.

Different saddle surfaces. Click to enlarge. Image by author.

Conclusion

Hopefully you enjoyed this trip through lesser-known mathematical topics! If you are sometimes hesitant to learn more math, I hope this post takes your doubt away. It showed how seemingly unrelated topics turn out to be very useful in data science and can help in decision making, feature engineering, and change your view on how to look at projects. Besides that, math is fun ;) (Okay, I know not everyone will agree on that…)

Thanks for reading, and until next time!

Are you a data scientist or data engineer and interested in a new opportunity? At BigData Republic, a data consultancy company in the Netherlands, we are hiring!

--

--

Hennie de Harder

📈 Data Scientist with a passion for math 💻 Currently working at IKEA and BigData Republic 💡 I share tips & tricks and fun side projects