Announcing Randomkey.io: Your Test Data Under Control, Finally
Creating realistic data sets on demand to improve data security in software development
Foreword: To support our creative community at Glolent we encourage projects like Randomkey and publish relevant and interesting guest articles on our channel. Learn more about us at www.glolent.com.
Test Data Management has been haunting developers for decades. An integral part of every software development project, test data generation is complicated and almost always, despised. A test data set has to look like a real data set, and at the same time be completely fictitious. Some Mary Boggles based in Birmingham, UK surely sounds like a real person — and she might indeed exist — but as long as nobody identified by those personal attributes (name, surname, address, date of birth, passport number, etc.) lives in our database, she is all fiction to us. Data masking of this kind is done to limit the exposure of customers’ data within an organisation, and essentially reduce the probability of a data breach. A discerningly common answer to the problem has been to ignore it, and use a subset of production data instead (a YOLO approach). A more elaborate method involves building an in-house test data generator that returns realistic values and takes care of the relationships between the data sets. This strategy often proves cumbersome, as such solution becomes another piece of software to maintain, and well — someone has to build it.
With the growing adoption of data security regulations around the world, ignoring the problem is no longer an option. One of the software security cornerstones introduced by the EU’s GDPR is guaranteeing privacy by design and by default. Organisations processing European data are required by law to protect sensitive information of their customers. Leaving it out in the open for the testers is against the regulation, and prone to a data leak.
Randomkey.io is here to help with the problem. We aim to become a developer’s toolkit for data privacy, focusing on test data management specifically. The application produces test data sets that are realistic but fictional, regional where need be, and keeps the data sets’ referential integrity by respecting geographical hierarchies and always producing the same answer for each specific input value. Some of the apps main features and our philosophy is outlined below: we encourage you to comment or get in touch if this is of interest to your organisation.
Randomkey, in its essence, creates test data sets. It produces syntactically- and semantically- correct value substitutes for the information it receives. In other words, it translates user’s input into a realistic output. If you send ‘Monica’ to the app, you would get ‘Rachel’ back. If you send ‘Cardiff’, you might get any British city back — ‘Inverness’ and ‘Cambridge’ are some of the valid candidates. Same goes for numbers (integers, doubles, and numbers with leading zeros). The substitutes are generated randomly, and are unique per every registered user. Data’s referential integrity is maintained: you get ‘Rachel’ every single time a substitute for ‘Monica’ is requested.
How does it work? Randomkey is a REST API. It couldn’t be simpler: you send a request to the app with a value you wish to substitute, and a couple of milliseconds later you receive a response. By using your authentication key in the request you ensure that the values are always translated the same way. Various endpoints serve different purposes: there is a separate url for every type of a data translation. The first app iteration covers first names, last names, dates of birth, National Insurance Numbers, locations (cities, and states — post codes to follow shortly), and a variety of numeric types.
Localisation, or creating regionally realistic data sets, often poses a challenge to static data masking projects. Consider an insurance provider that covers all Eastern Europe: they need to test their software with data that reflects their customer’s geographical distribution and covers a variety of alphabets and diacritical marks. Test data software is often US-centric and does not account for those regional differences. Randomkey’s goal is to cover all European languages to provide quality data to our customers. Our current focus are the UK and Germany, other countries will follow shortly after. You can review our plans on the Roadmap or sign up for the newsletter and follow the app’s development.
Randomkey is currently under development — with some endpoints already in Beta testing stages. In December 2019 we will open the app to the public testers. We welcome everyone’s feedback: if you are a company that struggles with test data management, please reach out to us at info@randomkey.io. With your help we could make our app relevant to the users, and make test data management easy and safe.