Chaos and Control, or the art of randomness

Jorge Castro
Cook php
Published in
6 min readDec 27, 2018
No that Chaos but it’s an excuse for heresy.

So, it’s the deal. I am working on a new system (I work for the Machine God), with some neat graphics and such. Let’s say that it’s a sale system that sells products (drinks), it has customers and sales (obvious). However, do you know how a graphics look without data?

It is an empty graphic. Yes, its MEH.

I can’t present a new system without data.

So, I added some random data. Is it easy? No really. For example, I added random data and it is the result:

So much chaos

While the chart has information and it is better than an empty chart. However, it is too much chaos, it’s just garbage data and it looks garbage, i.e. it is garbage.

Lesson learned random data without a flow looks awful.
So, it doesn’t look fine. So, let’s try using a math formula: sine (ok, it’s a bad pun)

It’s math heresy
It is a sine chart.

A Sine chart generates a flow. It is controlled, but it is not real, the information looks rigged (ok, it is rigged, it is fake data after all). However, there is a trend; for example, some information moves in a cycle.

So, let’s mix it all together.

Fine but it’s not perfect

Ok, it is better, it has a trend and it has some random factor. It is far from perfect but it works.

Working on PHP

Now, I created this library (free, open source). The example is in the example folder (exampledb.php)

It’s objective is to fill a database with random (but controlled) information.

How it works? It simple, you set a table, you set some columns and you set some logic and setters and that’s it.

Context

Now, if you are worked on Business Intelligence, there are some key factors (or dimensions if you want to call it) that are used regularly. The most commons are time, money, location and subject (or noun).

Money is easy; it could be a random (but controlled) value. It also means for amount, quantity and price. However, it has some restrictions, it could sound obvious, but it is not: money can’t be negative (or too big).

Location and subject (noun) are something (a row for example) that it must exist in the database. And of course, the information must be filled with some information.

Time is tricky, I will explain it later.

So, let’s create some subjects (products)

Products

Code: (where Products::$products is a list of products.

$chaos = new ChaosMachineOne();
$chaos->table('products', count(Products::$products))
->setDb($db)
->field('idproduct', 'int', 'identity', 0, 0, 1000)
->field('name', 'string', 'database', '', 0, 45)
->field('price', 'decimal', 'database', 2, 0, 100)
->setArray('productname', Products::$products)
->gen('when always set price.value=random(0.5,20,0.1)')
->gen('when always set name.value=arrayindex("productname")')
->insert()

The name of the products come from an array (called productname). The price is a random value from 0.5 to 20 (step 0.1) and the Id is generated by the database. The “magic” is done by the “gen()” function.

It is part of the result:

Note: Why Coca-cola?. It’s xmas and Cocacola invented it (sarcasm). It is because it took the list of drinks from wikipedia, the first list was about Coca-Cola.

Customers

And we want customers. Now, it is a bit tricky.

$chaos->table('customers', 1000)
->setDb($db)
->field('idcustomer', 'int', 'identity', 0, 0, 1000)
->field('name', 'string', 'database', '', 0, 45)
->field('datecreation', 'datetime', 'database', $chaos->now())
->setArray('namemale', PersonContainer::$firstNameMale)
->setArray('lastname', PersonContainer::$lastName)
->setArray('namefemale', PersonContainer::$firstNameFemale)
->setFormat('fullnameformat', ['{{namemale}} {{lastname}}', '{{namefemale}} {{lastname}}'])
->gen('when always set datecreation.speed=random(5000,86400)')
->gen('when always set name.value=randomformat("fullnameformat")')
->insert()

Now, we set the table (1000 rows), the fields, we set some arrays (with names and last names) and we set a format (template) to mix name and last names, we use two formats:

  • {{namemale}} {{lastname}}
  • {{namefemale}} {{lastname}}

But, what is “namemale”? it is a random value inside the array called namemale. And the result is the full name.

And now, the date is different. We are not setting a value directly but we a setting the speed for a random value. The speed is calculated for each row. The date is in seconds, so if, for example, if we add 86400 to the date then, we are adding a day. In this case, we are random date that varies between 5000 seconds (1.3 hours) to 86400 seconds(a day)

Sales

Now, it is a real challenge

$chaos->table('sales', 5000)
->setDb($db)
->field('idsales', 'int', 'identity', 0)
->field('idproduct', 'int', 'database', 1)
->field('idcustomer', 'int', 'database', 1)
->field('amount', 'int', 'database', 1, 1, 100)
->field('date', 'datetime', 'database', $chaos->now())
->gen('when date.weekday>=1 and date.weekday<=5 then date.speed=random(5000,50000)')
->gen('when date.weekday>=6 then date.speed=random(3000,10000)')
->gen('when date.hour>18 then date.skip="day" and date.add="8h"')
->gen('when always then idproduct.value=random(1,$countProducts)
and idcustomer.value=random(1,1000) and amount.value=random(1,10)'
)

First, we set the table and fields. In easy words, a sale is when a customer buys an amount of products in a specific date.

The product and customer is easy, it is one of the valous previously entered. The amount is also a random value.

However, the date is hard to control. This library considers that the week starts with the monday(1) and end with sunday(7). We have the next premise:

  • During the working day, the sales are slow. The amount of sale is increased by decreasing the speed of the date.

when date.weekday>=1 and date.weekday<=5 then date.speed=random(5000,50000)

  • The sales increases during the weekend (weekday 6 and 7). The sale increases by decreasing the speed of the date. A speed of 60 means that there is one sale per minute.

when date.weekday>=6 then date.speed=random(3000,10000)

  • There is not sales after 18pm. When the hour is 18pm, then we jump to the next day, and we add 8 hours. So we start at 8am of the next day.

when date.hour>18 then date.skip=”day” and date.add=”8h”

It is even possible to fine-tune the random values, for example, to sets more sales during morning.

Data Analysis

It is impossible to do data analysis with a bunch of random data. However, our random data is not so random after all.

Data Analysis is gold, especially if you have the monopoly of it.

Now, it is the analysis:

It is the chart of sales per date.

The data is right but the chart is off, clearly off (yuck)

Now, let´s group by month:

It is fine but there is not a trend.It is because we don’t consider the month as a factor and it should be one. For example, people drink more during summer. So, it is a pending task.

And now, for the week day

Now, that’s a trend.

Conclusion

March is a good month but not for much, around 5–10% (we know that there is not a trend per month).

However, the week-ends are good for the business (+100%), so it is when we should invest in more people and squeeze more lemons.

Lemonade stand because Khorne needs incomes.

To-Do

We marked a trend during the weekends. It was easy. However, we could mark a trend during some months. Also, there is not a trend for customers, and I don’t think it should exist one. However, it is different for products, it must exist a tendency for products. It also for the price of the products.

Comments and suggests

You are welcome.

--

--