Image Credit: Bing image creator plus some photoshop magic

Data Playground Part 1: Crafting Realism the Fun Way

Sencer ÖZTÜFEKÇİ
12 min readDec 24, 2023

When I decided to write this article, this was not the topic I wanted to write about. The topic on my mind was that SQL databases were working inefficiently. Especially on the same hardware, what I wanted to draw attention to was, “Your database is probably running slower than it could.”.

But then I realized something, and then something else. The first thing I realized was that I needed data to prove my “attention-grabbing thesis”. So explaining how to create a package that generates random data could be a good article topic. The second thing I realized is, sharing this package with people could be both an alternative for the benefit of the community and again, a good article topic.

So here’s the deal. We’re going to break down those topics into three parts. Part one, which you are reading right now, is all about creating a fake data generator package. Part two involves publishing our package on a package manager, like — obviously — NPM. The last part is where I’ll share my experiences regarding SQL database architectures, queries, indexes, PKs, etc., which directly affect their performance. Let’s start the first part.

The Data Types We Crave

In this article, when I refer to ‘data type’, I mean distinct sets of data that, within a relational database architecture, should exhibit variations in their reading methods when performance is a criterion. This should not be confused with types like VARCHAR, INTEGER, or BOOLEAN etc.

Let’s divide the data types we will need to examine into two categories based on their purpose of use. The first data type is what we call encyclopedic data or metadata, which does not change frequently or even never, and contains information about the processes or objects that we keep records of. One common example of this data type is a user’s information, including their name, surname, city of residence, and phone number. We will soon explore how to write functions that generate this information randomly.

The second data type is time series data that changes frequently and is measured over time. This data type can encompass any information accompanied by a timestamp. Imagine the evolving coordinates of a vehicle throughout its journey, the average temperature of an area changing over the months, or the weight of livestock increasing or decreasing over the course of its life.

Besides these, there are data types that serve various purposes. However, I specifically selected these data types to test the architecture and methods that we will discuss in the third part of our article series.

The Generators We Need

To create random metadata, we will write a generator for creating random persons. A random person could have these properties:

  1. Name
  2. Surname
  3. Gender
  4. E-Mail
  5. Phone Number
  6. Password
  7. Street Address
  8. City
  9. Zipcode

Sounds good, right? And for the timeseries, we’re going to need signal generators and a timestamp generator. Let’s imitate the signal types frequently used in the electronics world:

  1. Sawtooth Wave
  2. Sine Wave
  3. Square Wave
  4. Triangle Wave

And the timestamp generator.

Person Generator

This will be the output when we are done

That’s enough talking; let’s write some code. The first property of a person is ‘gender’ because the name depends on it. There could be three types of gender, let’s say: ‘male,’ ‘female,’ and ‘non-binary’. This is pretty straightforward as it involves only a number generated by Math.random().

const _gender = Math.floor(Math.random() * 3);

Considering three functions named randomMaleName, randomFemaleName and randomNonBinaryName, the name determination process should be as follows:

const name =
_gender === 0
? randomMaleName()
: _gender === 1
? randomFemaleName()
: randomNonBinaryName();

Now we can start with randomMaleName function.

const maleNames = [
"John",
"Michael",
"William",
"David",
"James",
"Joseph"
//add many more names here...
];

const randomMaleName = () => {
/**
* @description Get a random male name
* @return {string}
*/
return maleNames[Math.floor(Math.random() * maleNames.length)];
};

export default randomMaleName;

The same structure can be applied to female names.

const femaleNames = [
"Amanda",
"Anna",
"Barbara",
"Dorota",
"Ewa",
"Grażyna",
//add many more names here...
];

const randomFemaleName = () => {
/**
* @description Get a random female name
* @return {string}
*/
return femaleNames[Math.floor(Math.random() * femaleNames.length)];
};

export default randomFemaleName;

For the randomNonBinaryName function, its operation is straightforward: it flips a coin and selects a male or female name based on the result.

function randomNonBinaryName() {
/**
* @description Get a random non-binary name
* @return {string}
*/
const binary = Math.floor(Math.random() * 2);
return binary ? randomMaleName() : randomFemaleName();
}

Ok, so far so good! next property will be the surname. Again same structure applied.

const lasstNames = [
"Smith",
"Johnson",
"Williams",
"Brown",
"Jones",
//add many more names here...
];

const generateLastName = () => {
/**
* @description Get a random last name
* @return {string}
*/
return lasstNames[Math.floor(Math.random() * lasstNames.length)];
};

export default generateLastName;

We have another property that is dependent on others. The E-Mail address. Most people use email addresses that combine their first and last names. To create an email address, we will need not only the name and surname but also the domain of the email address, top-level domain (TLD), and, if applicable, a regional extension. Apart from these, people can separate their names and surnames with a separator like dot or underscore.

const tlds = ["com", "net", "org", "edu", "gov", "mil", "biz", "info"];

const fakeDomains = [
"example",
"test",
"fake",
"sample",
//add more domains if you like
];

const generateEmailAddress = (firstName, lastName) => {
/**
* Generates a fake email address.
* @param {string} firstName
* @param {string} lastName
* @return {string}
*/

const _firstName = firstName.toLowerCase().latinize();
const _lastName = lastName.toLowerCase().latinize();
const _seperators = [".", "_", ""]; //the seperators for name and surname
const _seperator =
_seperators[Math.floor(Math.random() * _seperators.length)];
const countryCodes = [
".pl",
"",
".de",
"",
".fr",
"",
".es",
"",
".it",
"",
".gb",
"",
".tr",
"",
".ru",
"",
];
const countryCode =
countryCodes[Math.floor(Math.random() * countryCodes.length)];

const _domain =
fakeDomains[Math.floor(Math.random() * fakeDomains.length)] +
"." +
tlds[Math.floor(Math.random() * tlds.length)] +
countryCode;

const _emailAddress = `${_firstName}${_seperator}${_lastName}@${_domain}`;

return _emailAddress;
};

export default generateEmailAddress;

As you can see, I lazily added empty string values into the region codes array so that emails without region codes can be generated 😄.

You may have noticed that when using names we get a return from a function called ‘latinize’.

const _firstName = firstName.toLowerCase().latinize();
const _lastName = lastName.toLowerCase().latinize();

The purpose of this function, which I will explain in detail below, is to convert special characters found in names, such as the letter ‘ż’ in the name ‘Grażyna,’ to their English alphabet equivalents.

The code I used for this can be found in this stackoverflow answer.

Now that ‘Name,’ ‘Surname,’ ‘E-Mail,’ and ‘Gender’ are done, it’s time to address some minor issues, such as ‘Password,’ ‘ZipCode,’ and ‘Phone Number.’ All of them involve generating randomized numbers or letters.

const generateZipCode = () => {
/**
* Generates a random zip code
* @return {string}
*/
const zipCode = Math.floor(Math.random() * 100000);
return zipCode.toString().padStart(5, "0");
};

const generateTaxNumber = () => {
/**
* Generates a random tax number
* @return {number}
*/
const taxNumber = Math.floor(Math.random() * 1000000000);
return taxNumber;
};

const generatePassword = () => {
/**
* Generate a random password
* @return {string}
*/
let password = "";
const characters =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()_+";
const charactersLength = characters.length;
for (let i = 0; i < 16; i++) {
password += characters.charAt(Math.floor(Math.random() * charactersLength));
}

return password;
};

const generatePhoneNumber = () => {
/**
* Generates a random phone number in the format (XXX) XXX-XXXX
* @return {string}
*/
let phoneNumber = "(";
const areaCode = Math.floor(Math.random() * 1000);
const firstThree = Math.floor(Math.random() * 1000);
const lastFour = Math.floor(Math.random() * 10000);
phoneNumber += areaCode;
phoneNumber += ") ";
phoneNumber += firstThree;
phoneNumber += "-";
phoneNumber += lastFour;
return phoneNumber;
};

Here comes the fun part — the Street Address and City. In general, places are named after locations combined with directions or prefixes like ‘Old’ or ‘New,’ and then we like to add some suffixes like ‘shire,’ ‘ville,’ ‘burg,’ etc. We are going to code the exact same logic.

In fact, in this section, I drew strong inspiration from a retro game legend, ‘Transport Tycoon Deluxe’. In this game, city names are generally determined randomly using precisely this method. By the way, if you’re not familiar, it’s worth pointing out that the game has an up-to-date open-source implementation. However, you cannot pull off tricks like attaching rails to the backside of the opponent’s train station and blowing up their locomotives :)

Let’s start with city names:

const citySuffixes = [
"town",
"ville",
" City",
"borough",
"ton",
// more suffixes here
];

const cityPrefixes = [
"Fort ",
"",
"New ",
"",
"Old ",
//.. more directions here
"Lower South ",
];

const cityNames = [
"Park",
"Ovmont",
"Lewood",
"Lakeview",
"Lake",
//.. more city names here
];

const randomCity = () => {
/**
* @description Generates a random city name
* @return {string}
*/
const _city = [];
const citySuffixesLength = citySuffixes.length;
const cityPrefixesLength = cityPrefixes.length;
const cityNamesLength = cityNames.length;

const prefix = cityPrefixes[Math.floor(Math.random() * cityPrefixesLength)];
const name = cityNames[Math.floor(Math.random() * cityNamesLength)];
const suffix = citySuffixes[Math.floor(Math.random() * citySuffixesLength)];

_city.push(prefix);
_city.push(name);
_city.push(suffix);
const city = _city.join("");

return city;
};

And the streets…

const streetSuffixes = [
"Avenue",
"Boulevard",
"Circle",
"Court",
//.. more street Suffixes here
];
const streetPrefixes = [
"North",
"East",
"West",
"South",
"Old",
//.. more street prefixes here
];

const randomStreetName = () => {
/**
* Return a random string from the streets array.
* @return {string}
*/
const streetPrefix =
streetPrefixes[Math.floor(Math.random() * streetPrefixes.length)];
const streetSuffix =
streetSuffixes[Math.floor(Math.random() * streetSuffixes.length)];
const streetName = `${streetPrefix} ${streetSuffix}`;
return streetName;
};

Thus, we have finished the part of producing fake person data. The final generator code that uses the functions we wrote should be as follows:

const randomPerson = () => {
const _gender = Math.floor(Math.random() * 3);

const gender =
_gender === 0 ? "Male" : _gender === 1 ? "Female" : "Non-Binary";

const name =
_gender === 0
? randomMaleName()
: _gender === 1
? randomFemaleName()
: randomNonBinaryName();
const lastName = randomLastName();
const email = randomEmail(name, lastName);
const phoneNumber = randomPhoneNumber();
const password = randomPassword();
const streetAddress = randomStreetName();
const city = randomCity();
const zipCode = randomZipCode();

const person = {
gender,
name,
lastName,
email,
phoneNumber,
password,
streetAddress,
city,
zipCode,
};
return person;
};

Let’s run it and check the results…

{
gender: 'Non-Binary',
name: 'Agnieszka',
lastName: 'Kelly',
email: 'agnieszka_kelly@example.biz.fr',
phoneNumber: '(403) 164-5054',
password: 'tMupD_!8@r1AKf7V',
streetAddress: 'Lake Loop',
city: 'Lower East Lewoodfurt',
zipCode: '48045'
},
{
gender: 'Male',
name: 'Andrew',
lastName: 'Parker',
email: 'andrew.parker@demo.mil',
phoneNumber: '(98) 343-7448',
password: 'IOv!L6Xk+6+m_z$b',
streetAddress: 'Fifth Loop',
city: 'Central Parkton',
zipCode: '81771'
},
{
gender: 'Male',
name: 'Brian',
lastName: 'Stewart',
email: 'brian.stewart@localhost.biz',
phoneNumber: '(554) 316-951',
password: '^ql!JrAg&HXn^ukm',
streetAddress: 'Highland Loop',
city: 'Upper West Otterhaven',
zipCode: '87026'
},
{
gender: 'Non-Binary',
name: 'Karolina',
lastName: 'Kelly',
email: 'karolina_kelly@site.mil.ru',
phoneNumber: '(106) 715-8846',
password: 'MZ(ycofcg6IEN!j+',
streetAddress: 'Hickory Place',
city: 'Upper South Southchester',
zipCode: '04399'
},
{
gender: 'Female',
name: 'Sylwia',
lastName: 'Murphy',
email: 'sylwiamurphy@foo.biz.it',
phoneNumber: '(774) 329-8553',
password: 'vF_c(g^*0#DXjwIo',
streetAddress: 'Yale Trail',
city: 'Upper South Southhills',
zipCode: '83112'
},
{
gender: 'Female',
name: 'Lidia',
lastName: 'Lopez',
email: 'lidia_lopez@localhost.com',
phoneNumber: '(115) 384-685',
password: 'AnPkDk)eUf7TM)7g',
streetAddress: 'Eighth Parkway',
city: 'Old Xeniaville',
zipCode: '02596'
},
{
gender: 'Male',
name: 'Henry',
lastName: 'Hill',
email: 'henry_hill@xyzdomain.net.tr',
phoneNumber: '(584) 870-9874',
password: 'sQ6(M$N*20pzPBYV',
streetAddress: 'Sixth Court',
city: 'Lower Mnoliahaven',
zipCode: '90948'
},
{
gender: 'Non-Binary',
name: 'Levi',
lastName: 'Baker',
email: 'levibaker@yoursite.gov.ru',
phoneNumber: '(122) 114-3132',
password: 'f9blves4JV!+11P8',
streetAddress: 'Fifth Trail',
city: 'Old Eastborough',
zipCode: '10041'
},
{
gender: 'Female',
name: 'Dorota',
lastName: 'Thompson',
email: 'dorota.thompson@abc.info',
phoneNumber: '(42) 402-5596',
password: 'Ng9zjCd$rrM7U!v^',
streetAddress: 'Lake Boulevard',
city: 'Lower West Yuanitaville',
zipCode: '32715'
}

Wow! Look at this beauty! Some email addresses have separators, while others do not. Also, there are email addresses with country codes. Look at the city names, like ‘Lower West Yuanitaville’ or ‘Central Parkton’ — our functions are working as they should.

Wave Generators

If you are still with me in this part of the article, well, congratulations. Now it’s time for the signal generator functions. In fact, in this section we are actually using a spesific type of functions ‘The Generator Functions’.

Alright, imagine a generator function in JavaScript is like a special recipe that allows you to pause and resume the cooking process whenever you want. It’s not like your regular function that cooks a dish from start to finish in one go. Instead, it’s like a chef who can take a break, chat with someone, and then come back to cooking.

Now, it’s all happening when you use the next() method on this iterator. It's like saying, 'Okay, let's go to the next step of the recipe.' The generator function runs until it encounters a yield statement. When it hits that, it pauses and gives you back the result of that step.

So here’s our Sine Wave Generator code:

function* sinwave() {
let t = 0;
while (true) {
yield Math.sin(t);
t += 0.1;
}
}

const sin = sinwave();

const nLongSinWave = (n) => {
/**
* @param {number} n - number of steps
* @return {number[]}
*/
let result = [];
for (let i = 0; i < n; i++) {
result.push(sin.next().value);
}
return result;
};

What we do here is return the result by incrementing the starting number by 0.1 in each iteration with our generator function. We pass this result as a parameter to the Math.sin()function. The nLongSinWave function takes a parameter n, specifying the quantity of numbers we aim to produce.

A 100 length array output is like below:

The same structure can be applied to Sawtooth, Square and Triangle Waves.

function* sawtooth() {
let i = 0;
while (true) {
yield i;
i = (i + 0.1) % 1;
}
}

In the Sawtooth wave, we again increased by 0.1 between iterations, but this time we rotated the remainder from division by 1.

Sawtooth Wave Function Output

The squarewave function is pretty interesting

function* squarewave() {
let i = 0;
while (true) {
yield i < 2 ? 1 : 0;
i = (i + 1) % 4;
}
}

But it’s output not that much:

In TriangleWave function I thought it might be better to add some parameters:

function* trianglewave(period, amplitude, offset) {
while (true) {
for (let i = 0; i < period; i++) {
yield (amplitude / period) * i + offset;
}
for (let i = period; i > 0; i--) {
yield (amplitude / period) * i + offset;
}
}
}

The period is how long will it take to return the same value in triangle wave, amplitude is max height of the triangle and the offset is the distance to X axis. With a call like below

const triangle = trianglewave(10, 1, 0);

Output is like this:

Timestamp Generator

To complete the process, we require suitable timestamps for the arrays we generate to form a time series.

const generateTimeSeries = (start, n, stepSize) => {
/**
* @param {string} start - start date
* @param {string} n - number of steps
* @param {number} stepSize - step size in seconds
* @return {string[]}
*/
let result = [];
let t = new Date(start);
for (let i = 0; i < n; i++) {
result.push(t.toISOString());
t = new Date(t.getTime() + stepSize * 1000);
}
return result;
};

As indicated in the JSDoc, our timestamp function has three parameters. start represents the initial date, n is the count of timestamps we intend to generate, and stepSize denotes the interval between timestamps in seconds. Output a call like below

const timestamps = time("2023-12-24 19:34:15", 20, 0.1);

is like this:

Wow! That’s it! When I started writing, I didn’t expect it to take this long. I would greatly appreciate it if you read through to the end and discovered useful aspects in what I shared.

In the next part, we will see how we can publish this package to NPM with a few optimizations and additions and how we can use it in future projects.

I’ll be sharing the package in a GitHub repository at the end of the next part. If you’re interested in including various types of artificial data in the package, feel free to contribute. Your input is welcome!

If you have any questions or think you need to correct something, do not hesitate to leave a comment.

Until next time, happy coding!

Update:

Part 2 is now available in following link:

Data Playground Part 2: Publishing a New NPM Package

--

--