The Language of the Mechanical Universe

19 min readAug 18, 2023

--

How could we understand the universe in such a way we could predict the future knowing the past and the present? Let me introduce you to that language. Learn how is the code behind the Mechanical Universe.

Laplace’s demon and angel meet Laplace’s basis: exp(st).

[Spoiler alert] What you would find in this article: Intuition and the utility of the Laplace transform, how to use it to solve Ordinary Differential Equations (ODE), and understand how it works to describe the behavior of mechanical systems making it the language of the mechanical universe. , for its simplicity.

Context

In our universe, we as human beings have many questions, one of the most constant in history has been focused to understand our universe, the place where we are.

Is not a secret today, that most of the nature laws can be written as mathematical statements. In general, we could say mathematics is the language. Although we could fine-tune it a little more, and that is why we are here today.

We could describe the Mechanical Universe simply by; describing each detail, with high memory; or we could define a simple rule that though short, could describe almost anything in our universe. In this context, language is a way to describe many things without relying on memorization.

With a concrete math example: The difference between storing (1,2,4,8,16,…) in memory (would result infinitely large), or simply using an algorithm that computes 2^n to get the same sequence. Here the algorithm acts as a “natural language” as it allows us to describe the same information with a little number of expressions or symbols. (Related to Occam’s razor, reduccionism & entropy)

Memory storage versus generating algorithm

Requirements

This is mostly for those who have the fundamentals of linear algebra and calculus. Today I share a panoramic view of how it is related to the description of the Mechanical Universe.

Focused on those curious ones who want to understand the Universe. Even if you don’t have the requirements, I hope you get the intuition.

Linear Algebra ~ Basis of information

How do we begin to explore that “language of the universe”? First of all, we must think about what are the things that are common to each and every object. Think about it, if we want to describe all of nature or objects by simple rules, what we are really looking for is what all of them have in common and write it down.

The Scene: Spacetime

First of all, the house. The location, the “where” are all in.

We call it space, a three-dimensional habitat, some might argue there is more than we can’t see… For this case, let’s focus on what we actually see. The key point here is that exists something in nature that preserves magnitude along three different sides, which we called spatial dimensions.

Now, we not only have objects statically in a location, but they change while time is making its tik-tok. We add time to make it possible to keep track of changes in the scenario. — Even if it hasn’t moved you could see changes from what was initially.

Evolution in discrete (points) or continuous (curve) time

The Actors: Objects ~ Particles

We already have the place and the nature of the scene, and not only render a single image of the scene but run a movie over time. Next, we need to know more about the actors in the scene: the objects.

What have all objects in common? They have a list of qualities and quantities attached to them that works as descriptors of properties. Whether it is A or B, an apple or an orange (quality differences), or 3pH orange vs 4pH orange (quantity difference).

Note: We could also use each dimension as a quality, and the distance in each of them, the quantity.

Math representation

With math, we can encapsulate qualities as directional vectors, while the quantities would rely on the magnitude or coefficient aside from the vector. In this case, we call the directional vector a basis. By directional vector, we represent its uniqueness by the direction of the vector.

Geometric algebra notation for x, y, and z position

Here ‘x’ head (^) represents a directional vector or basis for the x-dimension, and so on for the others.

This is a similar-like notation to Dirac notation (bra-ket), while we could use an equivalent with matrix notation as:

Matrix notation for x, y, and z position

— If you are familiar with dictionaries in Python or objects in JavaScript. Here, each item in a vector could represent a key (in this case a dimension), while the number is the value associated with each key.

Until here, any actor/object could be described as a bunch of properties and how much of each property it has. For example, a fruit salad:

*A fruit salad made of 3 oranges, 2 apples, and 5 pears. Geometric algebra notation.

Projection and Perspective

Although we might have different names (comes from different perspectives), we might find two vectors with the same “meaning” or direction, on the other way, may find the two vectors are entirely unrelated. Also, it is possible to find a percentage value in between is called correlation.

Off-topic: Analogous to two words with the same or similar meaning seen in word embeddings for neural networks.

The way we calculate how much a vector is related/similar to another is the dot product.

Dot product as conversion from orange to vitamin A

The dot product here tells us: how much is vitamin-A related to the fruit- orange? then it acts as a quality/unity conversion (transformation for future reference) from fruit-orange to vitamin-A.

We can concatenate vertically multiple vitamins, hence we get something that could be represented as matrix multiplication.

Writing “qualities”, fruits and vitamins, as vector embeddings

As a matrix multiplication, the matrix acts a transform of information from fruits to vitamins:

Matrix notation: transform fruits information to vitamins information

What we should appreciate here, is that a transformation (the ones we care) is a projection of information, in other words, it describes the same thing, with different concepts (basis) or from different perspectives. In this case, a salad is described by the fruits its own or by the vitamins its own.

There are exceptions from singular matrices, as they erase information, do not allow recovery, and hence yield a non-invertible transformation.

Null space, the black hole of Math.

Why do neural networks learn? What does the null space have to do with the information? Why does a system of linear…

medium.com

Extending to continuous functions, we could see f(x) as a vector […,f(0*h),f(1*h),f(2h),…] where later makes h tend to a tiny-infinitesimal value. However, this holds what we have seen about projections and transforms. From a math perspective functions are vectors with high density.

Discrete vectors’ relation with continuous functions

Rotations and Orthonormal Matrices

Rotations are a special case for transformations, they play a very important role: They preserve the magnitude, and the vectors/basis are isolated from each other (read linear independence). Here, more than rotations we are interested in orthonormal matrices as transformations, but they are similar, at least for the good of the intuition, to rotations.

A single position described from two different perspectives or basis

Here two vectors that look different, are actually telling us the same position, just in different basis one in (x,y) and the other in (u,v). What a transformation do is a link between them (x,y)->(u,v) or reversible (u,v) -> (x,y).

How do we find those transformations? we are interested now mainly in orthonormal matrices as they allow us to make straightforward calculations for both transformation and the reverse, anti-transformation.

Let’s say we write a function on any orthonormal basis:

A function is described as a linear combination of others. Geometric algebra similar-like notation.

How do we find the coefficients C_1, C_2, and C_3? well, we apply the dot product:

Projecting a function into a single basis O(x)

Making use of orthonormality, we get almost all canceled and get only the coefficient of interest:

Finding unknown coefficient for basis O(x)

If we do the same for discrete cases we get:

Discrete equivalent notation for a function described as a linear combination of others.

Where C_n are the correlation coefficients related to each basis V_n. we could say it is a transformation from C_n space to b space (like in the fruit salad example from fruits to vitamins).

The explicit solution to find any C_n/C_m

This would be the anti-transform equivalent, we get C_n from b. We could see this structure in the Taylor series, Laurent series, Fourier series, and so on…

Common series which are similar to basis projection

While transforms are represented by matrices for discrete cases, continuous cases are two-dimensional functions g(x,s), and the matrix multiplication becomes an integral that computes each row for particular values of s. For Laplace’s transform we have:

Laplace transform, continuous definition.

Here as an example, we use the “matrix” transform directly as g(t,s)=exp(st), but it could be a different one for general purposes.

From now we choose exp(st) as the basis which yields Laplace’s transform. Why did we choose this basis? first is low entropy, second is orthonormal, and lastly, it is an eigenfunction for ODEs which means its derivative is proportional to itself. A little more about exp(x) and its relation to the “most beautiful equation in math”:

What is behind the most beautiful equation?

Euler’s identity is a special case of Euler’s Formula. It’s known as “The Most Beautiful Equation in Math”. It hides a…

medium.com

For future reference, Laplace’s transform is the change of basis from flat space to frequency space (or dual space). Duality with cartesian coordinates and polar coordinates.

Summary Part 1: Linear algebra

We got the intuition of qualities/keys and quantities/values to describe space and objects with linear algebra. We can use different basis to represent the same information, the tool that changes basis are transform of basis. How much two vectors or basis are related to each other is given via the dot product.

Changes ~ Calculus

A more accurate title would be “the language of Changes” which are basis or eigenfunctions for ordinary differential equations (ODEs). However, it is strongly related to the way the universe has become. As with any state our universe is, is actually the accumulation of changes over time. We are ignoring chemistry reactions, changes in atoms or molecules.

If you walk from A to B, or if any object changes from one state to another… what defines that change are the overall changes in the way, all the little steps that ride you closer and close until finally get to the final state.

Above the scene and actors, we actually have changes. Changes in spacetime or in the evolution of the object itself.

To understand the Mechanical Universe we should understand how things change, and the initial state.

Time: the ruler of changes

Changes are strongly related not only to where it is located but also to its evolution through time. Its relation is so strong that without the passing of time, we cannot conceive any change.

Let’s say we want to discover the value any property takes at any time. This means y=f(t) for continuous approach or x_[n] for discrete.

As said upper “language” be the low entropy replacement for memory, we could in memory have each pair value [t0,t1,t2,t3,t4,t5,…]->[5,3,6,4,6,8,5,…] stored or have a tinny rule. Where t3 represents 3 seconds and matching position the value is 4 inside [x0,x1,x2,4,…].

Where would be next (x_[n+1]) based on the previous value (x_[n])

The same relation is defined from backward steps instead of forward steps. (Implicit equation of the entire sequence)

This is like discovering a map in a game while walking, you may don’t know explicitly all the values for x[n] over n (time), but you could figure out making step by step. In other words, this rule has implicit information over all ’n’, but you need to walk in each step to figure out the big picture shape of x[n] over time.

The value a gets repeated m times for m steps backward:

If we set to ground the x[n-m] as the initial state let n-m=0, then

Setting it to begin from zero. (Explicit equation of the entire sequence)

Now we know every single data in our x-n relation, we only have to know an initial state and not the nearest neighborhood. This is the difference between having the solution explicitly and implicitly.

Similar for continuous functions, but we shrink what was n+1 for n+h, where h tends to a infinitesimal value.

If we add m=100 points for x from 1 to 3:

Trying the same process for discrete formulation.

There is no out from here… In continuous functions to avoid infinity troubles we use proportions, then instead of using f(x+h) we use a proportion (f(x+h)-f(x))/h, such is not infinite.

Looking one step forward for continuous functions.

If we repeat 1/h=100 steps from 0 to b, we get b*100 or b/h multiplications, then power to 100.

Euler, the Laplace basis appears naturally.

Note this is almost the definition of the Euler number. From an initial state of zero:

So, having a single point, call it the initial state and a next-step algorithm we have, in fact, the behavior throughout all time.

Note: Let’s keep in mind something, first qualities don’t change, they might interchange but in that case, there is an equivalence in the exchange, instead of changing the quality by itself.

For example, if something changes from movement to heat, instead change the quality, we say the movement has become 0 while the heat has increased in a fair exchange.

Functions

Functions are a link between two (or more) qualities or two (or more) dimensions (space-time for example).

We are now trying to figure out the rules of changes, so we want to know the relation between quality and time (graph x vs time) t:[0,1,2,3,…]->x:[0,1,4,8,…]. For this relation we know the property x is 4 when the property t is 2, and also for each pair-value relation.

First, there is a single state for each quality, for example, being in x:5 and only at that point, or having acid level 7… and so on. if there are “many” what we do is actually use a different quality for each of them.

In short, functions are mapping from one list to another, but there are some conditions we already assumed but never wrote:

One-to-one relation from domine to image
There is always a next-step neighbor

If there was one-to-many, many-to-one, or even worst many-to-many mapping, information would lose (as above referenced Null Space article says). If a value tends to two different values, it becomes nondeterministic as we don’t know for sure which of them would be (stochastic branches) and if you try to get the initial information back, you don’t know which value was, as it is a link of many it could be any of them.

Also, we need continuity, to avoid getting a break while giving little steps. because a^m won’t work as it means x[n+1]=a*x[n] don’t hold for all.

Transformations

Here we have not only the link between two (or more) qualities like in functions but the relation between two links (functions).

Derivatives and changes over time/iteration

Geometrically, the derivative is the velocity vector, if it is positive it means to go up, if it is negative it means to go down, and the magnitude tells us how fast it goes.

Derivative of a function as the inclination.

Derivative of a multiple variable function: An object with a velocity

By only knowing the velocity for each location, we could know all of them, how it moves, and where it would be located.

Ordinary Differential Equations (ODEs)

We have functions to describe objects, and although they can describe how things are over time, in many cases, we don’t have that relation explicitly.

For example, we usually want to know our property as f(x)=expression. Where expressions are known and easy math steps to take, usually arithmetic.

But what we actually know is not it, but a differential equation which is an implicit solution. Those equations were discovered by rigorous observations (Newton’s law, Schrodinger equation, …)

Second order Ordinary Differential Equation

A differential equation already tells us how things behave, but it doesn’t explicitly, by solving a differential equation we actually find an explicit relation between the two qualities (one of them usually time)

Eigenfunctions for ODEs

We already saw exp(x) acts as an eigenfunction for derivatives, this means its derivative, no matter how many times you apply to it, is the same function, or at least, proportional to itself.

If we use it as f(t)=exp(st):

We use the basis or eigenfunction exp(st) in a second-order differential equation.

This eigenfunction allows us to undo the distribution, factorizing easily f(t)=exp(st):

As it is proportional to its derivatives, it allows us to factorize the core kernel, the function.

Then we get the explicit solution:

This works only for eigenfunctions? what if we want others that are not eigenfunctions exp(st)?

Here comes the first part, we could write almost any function in terms of exponential basis (let’s call it Laplace-basis), and find each coefficient with orthonormal projections.

Writing a function from basis exp(st) perspective (aka Laplace’s space)

Using math symbols to write a summation we have:

Note: Calculus is restricted to continuous and differential functions

Once we have our desired function writer as a summation of eigenfunctions (Fourier space or Laplace space) we could make use of linear independency and orthonormality to insolate each of them, solve individually, and after getting the result for each we could add them at the end:

Write the function with Laplace-basis in the ODE.

Thanks to linearity we could know the solution looks exactly the same for general cases for a, b, and c parameters:

Explicit solution using Laplace-basis, found with algebra.

We already know the procedure to find c_n here, using the dot product as before, where almost all terms inside summation get rid of zero, and we know which value is for each of them.

Now, linearity, more than use linearity to simplify calculations for us, it appears nature also chooses more of the behavior to be linear. However, there are cases where non-linearity exists.

It turns out that non-linearity is actually linearity making use of infinity. we could say non-linearity is built of infinite linear chunks.

Others: PDEs and non-linear

Partial derivatives are multidimensional, this keeps many things similar to ODEs, but needs additional considerations are out this article.

Non-linearity is in most cases linearizable if we keep continuity and differentiability, so we could use linearization locally with terms of the Taylor series.

Laplace transform

In college solving differential equations go through many different methods, but the one which solves almost all systematically and we just linear algebra is with Laplace transform.

— If differential equations are the book describing how things behave, but an encrypted but. Discovering the eigenfunction is almost the unlock of encryption.

In terms of math and within the context of this article, the words of the “natural language” would be equivalent to eigenvectors or eigenfunctions.

Laplace’s transform allows us to find the coefficients for each eigenfunction, so we could solve ODEs easily with nothing more than basic algebra.

The intuition of Laplace’s transform

We already know that Laplace’s transform gives us a new function, the way we make sense of those functions is:

Wherever a characteristic function F(s) has poles it means in the flat-space it has an eigenfunction. Poles are where the functions give us infinite, or where the denominator is zero.

y(t) flat space explicit solution, Y(s) laplace space explicit solution

Let’s take a look at the common table of Laplace transform:

Cosine and sine after Laplace’s transform

Note here sine and cosine have poles such that they describe the respective function from eigenfunction basis sum(c_n*exp(st)):

The final intuition from Laplace’s transform is, each pole represents a Laplace-basis in flat-space. If we have a transfer or characteristic function, maybe the solution of an ODE in Laplace-space:

An example of transfer function in Laplace-space

Then its explicit solution in flat-space would be an eigenfunction located in each pole:

An example of the transfer function in both: Laplace-space and flat-space

Laplace’s demon and angel

According to determinism, if someone (the demon) knows the precise location and momentum of every atom in the universe, their past and future values for any given time are entailed ~ Laplace’s demon — Wikipedia

Laplace’s demon was proposed as a black box, an unknown mechanism that makes use of perfect information about the past and the present in a system, then applies its knowledge/algorithm about the observational universe laws (physics) and calculates the future.

Following that metaphor, we may define Laplace’s angel as a white box that allows us to see clearly that “unknown” schema we assumed the demon had. A much clearer way to represent that black box.

Laplace’s demon and Laplace’s angel metaphor.

Why it is related to all mechanisms? the mechanical universe

From heat transfer, movement equations/Newton's laws, wave equations or electromagnetic Maxwell equations, Schrodinger equations, Dirac’s equation, and Einstein equations…

Almost anything you can see can be described with a differential equation, and the way they communicate with each other is with “signals” or information exchange.

We could assemble them in a single model with simple addition and multiplications of transfer functions / characteristic functions. And model the entire system (or the mechanical universe) with a single characteristic function.

A bigger function could be the assembly of many internal systems.

Example: A car

A car has 4 wheels each with a suspension system, given a linear model where m is mass, b is the damping coefficient and k is the spring elastic constant:

The transfer function for a spring-mass damper system

The input signal for this system is the force reaction from the ground, which is the weight of the car, the normal force:

As we have 4 wheels and it is a 3d problem we should take into account the rotational inertia. On the first try let’s assume a perfect symmetric distribution.

Diagram for a car and its 4-wheel damped s-functions.

We could connect the engine, which is a heat system also described by differential equations (if it was a steam engine for simplicity):

Steam engine connected to wheels through a signal.

In short, I want to say we could attach individual systems simply by multiplying (if the output is the input of the next system)or adding them (if they share a common input or output).

To the car model, we could add as many details as we wish, like wave equations for the sound of the engine, the circuits inside the car may control the heat gradient...

*This is an overview, I won’t detail specific math solutions as this story would take much more long. Probably, I would be redundant in future publications, going into details and specific explanations or examples.

Conclusion

In general, this works to describe most of the mechanical systems we see. The nature way to see the world throw these transfer function boxes is by decoupling individual systems and connecting them in parallel or in series. If we wish to know the equivalent transfer function for a bigger system (or the universe) we could use algebra to add and multiply all the transfer functions that are coupled and get the biggest one.

Limitations

Mindly comes to my mind are, chaos theory and quantum uncertainty principle behaviors. (maybe others)

Chaos theory is still predictable, the difference is that is non-linear. So, in the end, the same applies here, only that there are so many (infinite) poles to handle that makes it predictable in theory but not in practice. You could use linear approximations (like first n-terns of Taylor's series, Fourier or Laplace)

Quantum mechanics is still linear, at least from the perspective of the evolution of the probability distribution. Determinism fails at the time of measurement. In general, entropy saves determinism here, since we live on a macro scale, each individual particle can be unpredictable on its own, but the mean or average property over many particles (what we see) is really predictable.

In other words “the language of the Mechanical Universe” would work here too for the calculation of distribution, what it won’t tell you is which random measurement would you get for each distribution.

Personal Message

I didn’t expect to write this any time soon, as I wished to share it in little chunks until the panoramic view gets shaped. However, seeing SoME3 looks like a good opportunity to share the panoramic view of my initial wish.

It has been some months since the last publication where I tried to publish a story each month haha, I think it was not a good idea, as I try to explore knowledge and provide new quality, not exploring current knowledge and providing new quantity.

However, I admit; First, setting a deadline is worth it to not keep delaying a submission; Second providing new quality is a little naive as there is always someone who already discovered it and made it. So, I will focus on things apparently none have shared, but probably someone already did, haha. Lastly, regarding monetization, I hope to win the SoME3 :D, Medium doesn’t allow me to monetize due to my location, which is one of the reasons I left the goal of monthly publications.

Future works

So, that is why I am not sure if keep publishing, at least regularly. I may do it each 3–6 months. However, I let you know what I have in mind next:

As this one is an overview,

I wish to go a little more in detail for some specific subjects. Also some similar-like new insights about quantum mechanics and general relativity.
About entropy, information, and how it is related to what I have written (including this article).
Complex numbers and their evolution to geometric algebra (still have to learn more).
The connection between the discrete and continuous world (in math and physics).
And lastly residue integral theorem and their implications with all this stuff (still have to learn more).

I usually like to connect all in a deep background as you could appreciate. but is not restricted to* I might add some non-connected subjects.

Hope you enjoy it and learn a new way of viewing the world. Don’t be as cold as a Laplace’s demon, but be happy to satisfice the curiosity that so many people worked to ride us here, to this level of understanding the universe we live in. Thanks for reading!