Generating mock data using Mimesis: Part I

The ability to generate mock but valid data comes in handy in app development, where you need to work with databases. Filling in the database by hand is a time-consuming and tedious process, which can be done in three stages — gathering necessary information, post-processing the data and coding the data generator itself. It gets really complicated when you need to generate not just 10–15 users, but 100–150 thousand users (or other types of data). In this article as well as the two following ones we will introduce you to a tool, which immensely simplifies generating mock data, initial database loading and testing in general.

Mimesis is a Python library, which helps generate mock data for various purposes. The library was written with the use of tools from the standard Python library, and therefore, it doesn’t have any side dependencies. Currently the library supports 32 languages and 19 class providers, supplying various data.

Installation

A normal way to install mimesis is via pip:

➜  ~ pip install mimesis

If for some reason you can not install the package with the help of pip, try installing it manually, as shown below:

(venv) ➜  ~ git clone https://github.com/lk-geimfari/mimesis.git
(venv) ➜ cd mimesis/
(venv) ➜ python3 setup.py install
# or
(venv) ➜ make install

Please note, that the library only runs on Python 3.3+. Developers don’t have any plans of adding Python 2.7 support.

Generating data

Initially, we planned on showing data generation using the example of a small web-application Flask, but we decided against it because not everyone is familiar with Flask nor they are willing to change that. Therefore, we are going to showcase that solely on Python. In case you want to transfer everything to your project on Flask or Django, you simply need to define a static method that will run all the manipulations related to the current model and call it when you need the initial database loading, as demonstrated in the example below.

Model for Flask (Flask-SQLAlchemy) would look like this:

Now let’s transition to shell-mode:

(venv) ➜ python3 manage.py shell

And generate data. Beforehand, we need to make sure that the database and the model in question are available.

>>> db
<SQLAlchemy engine='sqlite:///db.sqlite'>
>>> Patient
<class 'app.models.Patient'>
>>> Patient()._bootstrap(count=40000, locale='en') # generate 40к entries in English.

Introduction

It is worth noting that we will be showing the basic capabilities of the library and we will be using a few most common class providers, since there are too many of them to cover each one in detail. If the article sparks your interest to the library you can visit the useful links listed in the end of the article and find out more.

The library is pretty simple. All you need to do to start working with the data is to create a class provider. The most common type of data in apps are personal users’ data, such as name, last name, credit card info, etc. There is a special class provider for this type of data — Personal(), which takes the code from the language standard in the form of a line as shown below:

>>> from mimesis import Personal
>>> person = Personal('is')
>>> for _ in range(0, 3):
... person.full_name(gender='male')
...
'Karl Brynjúlfsson'
'Rögnvald Eiðsson'
'Vésteinn Ríkharðsson'

Almost every web-application requires e-mail for registration. Naturally, the library supports the ability to generate e-mails with the help of email() method Personal() class, as below:

>>> person.email(gender='female')
'lvana6108@gmail.com'

>>> person.email(gender='male')
'john2454@yandex.com'

There is a little problem with the method above, which may cause the code to be slightly “dirty” in case the app uses more than one type of class providers. In such situation you should use object Generic(), which grants access to all providers from one single object:

>>> from mimesis import Generic
>>> g = Generic('pl') # pl – code of Poland (ISO 639-1).
>>> g.personal.full_name()
'Lonisława Podsiadło'
>>> g.datetime.birthday(readable=True)
'Listopad 11, 1997'
>>> g.code.imei()
'011948003071013'
>>> g.food.fruit()
'Cytryna'
>>> g.internet.http_method()
'PUT'
>>> g.science.math_formula()
'A = (h * (a + b)) / 2'

Combining data gives you a vast field for experimentation. For example, you can create mock (female) Visa (Maestro, MasterCard) credit card holders:

>>> user = Personal('en')
>>> def get_card(sex='female'):
... owner = {
... 'owner': user.full_name(sex),
... 'exp_date': user.credit_card_expiration_date(maximum=21),
... 'number': user.credit_card_number(card_type='visa')
... }
... return owner
>>> for _ in range(0, 3):
... get_card()
...
{'exp_date': '02/20', 'owner': 'Laverna Morrison', 'card_number': '4920 3598 2121 3328'}
{'exp_date': '11/19', 'owner': 'Melany Martinez', 'card_number': '4980 9423 5464 1201'}
{'exp_date': '01/19', 'owner': 'Cleora Mcfarland', 'card_number': '4085 8037 5801 9703'}

As mentioned above, the library supports more than 19 class providers with data for all possible situations (if not, your PR with corrections of such an awful injustice are more than welcome). For example, if you are working on an app dedicated to transportation and logistics and you need to generate transportation models, you can easily do this by using Transport()class provider, which contains data related to transportation:

>>> from mimesis import Transport
>>> trans = Transport()
>>> for _ in range(0, 5):
... trans.truck()
...
'Seddon-2537 IM'
'Karrier-7799 UN'
'Minerva-5567 YC'
'Hyundai-2808 XR'
'LIAZ-7174 RM'

Or you could indicate the transport mask model:

>>> for _ in range(0, 5):
... # Here # (sharp) - placeholder for numbers, @ - for letters
...
trans.truck(model_mask="##@")
...
'Henschel-16G'
'Bean-44D'
'Unic-82S'
'Ford-05Q'
'Kalmar-58C'

Quite often when testing web-applications (blog would be an excellent example) you need to generate text data (text, sentences, tags, etc.). Manually inputting the text is long and boring, and Mimesis allows you to avoid this thanks to a class provider Text():

>>> from mimesis import Text
>>> text = Text('en')
>>> text.text(quantity=3)
'Language includes means for creating light parallel processes  and their interactions via exchanging asynchronous messages according to the actors’ model. Python supports several programming paradigms, including structural, object-oriented, functional, imperative and aspect-oriented. For instance, some functions that use comparison of examples to choose one calculating option or extracting data points looks similar to an equation.'

You can get a list of random words:

>>> text = Text('pt-br')
>>> text.words(quantity=5)
['poder', 'de', 'maior', 'só', 'cima']

Generate a street name:

>>> from mimesis import Address
>>> address = Address('en')
>>> address.address()
'77 Shephard Trace'

Get a name of a state/area/province, which is related to the chosen language. : In this case it is an state of the USA:

>>> address.state()
'Texas'

The library also has means to Romanize Cyrillic languages (for the moment only Russian and Ukrainian are supported):

>>> from mimesis.decorators import romanized
>>> @romanized('ru')
... def name_ru():
... return 'Вероника Денисова'
...
>>> @romanized('uk')
>>> def name_uk():
... return 'Емілія Акуленко'
...
>>> name_ru()
'Veronika Denisova'
>>> name_uk()
'Emіlіja Akulenko'

In reality there are a lot of possibilities and you can come up with a huge number of great use-cases, where the data would look more useful than in our examples. We are looking forward to getting them from our users. And we would be happy to read how you are successfully applying the library to your projects.

Useful links

Here we are creating a web version of the library using sanic and GraphQL.

Here you will find a plugin for py.test.

Documentation is available on ReadTheDocs.

Project is available on GitHub:

Subscribe to read more articles.