What if I told you, that all this data was manufactured?

Staging to Sell: the seduction & danger of demo data

Building data collection and visualization systems is easy. Showing them off is hard. How smoky should that mirror be? Some things I’ve learned about building and using staging data to sell software.

Your killer product idea is taking real shape. It’s moved from sketches and wires and storyboards into the world of real code. Pages and workflows, with thoughtfully designed empty states and obsessively kerned layouts are starting to appear. It’s an exciting time. But it also comes with a realization — that clever lorem ipsum generator, or the developers typing asdfasdf in every form field, isn’t showing the power or the vision. You need some demo data!

Ok, sure. Easy is relative. It’s more accurate to say that building empty data collection and visualization systems gets at least geometrically harder and more complicated once data is flowing through the pipes and being visualized on the screens. How then do you ensure that at least some of the data shows off the value of the system, hopefully without resorting to photoshop shims, slideware or other vaporware cover?

In some ways, this is analogous to staging a home for sale. Just as you want the furnishings to enable a prospective buyer to imagine themselves living there, you want the data you show to prospects to let them imagine their own data in your tool and what they’d do with it. Importantly, both kinds of staging require planning, design and investment.

Scaling your data — Ain’t no mountain high enough

How much data do you need? Importantly, where do you get it from, and how do you make it effective to include (both cost and time) when you need it? How do you tailor it to the audience who’s going to get excited and buy and say things like “I didn’t even know I needed that view, now I can’t live without it!

Depending on your product, you’re likely to need a fairly good lot of data, and the more you need (amount and frequency) the less you want anyone to have to enter it by hand.

Homogenous == Dangerous — Same as it ever was

Depending on your testing or QA practices, you may have the capability to generate a ton of data right at your fingertips — maybe with some extra tweaking or scripting. If not, you may need better testing practices.

Unfortunately though, generated data for testing tends to be massively redundant. We need to fill out every field in every form to create 100 user accounts? Say hello to user001, user002 and their 98 family members through user100, all from address1,city1,ST,00001. Does your testing need to check pagination or infinite scroll at a 100 record page size? No problem, because numbers can go that high — just turn your extrapolation to eleven.

Very thorough test data banks or automation scripts will throw in the occasional O’Neil or to test boundary conditions, but in general, the purpose for this data is to represent and test ‘pseudo-scaling’ — does your awesome product grow big well or grow big poorly? Do your visualizations lose meaning when there’s lots of data? Do your lists get too long to sort or render? Do the layers of your product correctly escape that tick mark or handle that double-byte character? Crucial questions for this data to answer, but the data won’t match your real customer’s data. It has scale, and it makes all your pages “full”, but it has no resonance. Your prospects will get that ‘not really what I need’ or worse ‘not quite ready for prime time’ taste in their mouths if you just expose your QA data in a demo context.

Borrowing data from the “real” world — careful, Icarus

Ok, what then? If part or all of what your app does involves merging data from various sources together (particularly if some or all of those sources are ready to hand), it can be tempting to just grab a real data set and plug it in. Presto! Can’t get much more resonance than that, can we?

Here be dragons, and their names are: Information Privacy, Personally Identifiable Information, Records Protection Acts, Data Sovereignty Laws and Data Protection Directives. Unless you’re sure that any user data you’re grabbing is free of information about children 13 or younger, any real personally identifiable information, medical or educational records, names, addresses, social security numbers &c, AND you’re confident that the data can be used in your jurisdiction for your demo purposes, you may expose your company, even innocently, to criticism, fines or even litigation.

So how do you split the difference? How do you get a safe to use yet resonant set of data? You’ll need a bit of a fake-out, depending on what data set you’re starting with, and where it needs to show up in your application. Generally, this fake-out is called pseudonymizationit’s your basic goal if you have a set of QA or real-world data about people and their activity already. Bottom line, keep the key metadata ABOUT the people, scrub their names & addresses & pictures and other identifiers.

User Generators like http://www.fakenamegenerator.com/order.php or others. I like FNG because it can do big data sets and can handle multi-country data, but there are many of these that operate on the same principle — delivering a delimited file suitable for a quick scriptable import as a staging or demo table, or for UPDATE commands into your QA or real-world data sets for pseudonymization.

Robohash https://robohash.org/ is a handy resource for (among other things having to do with tamper detection) generating a pseudo-random series of avatar face images, using cartoons of robots. As a bonus, the service supports generating them on the fly at render time, or you can store the images in your app. Obviously, this exposes the fakiness, but so do those stock photos, actually. One effective approach can be to use robots for lots of “filler” users, but swap in a human photo for the users that illuminate the demo path best. As a bonus, this can act as a nice aid for your demo guy/gal too.

Random User API https://randomuser.me/ like robohash, this is an interface hack, but it has more data fields available than just an avatar — you can unpack its JSON directly into whatever parts of your app show user data. One big caution though: ideally, you want this process to function in your code exactly like the real process of unpacking user data from your app’s storage, so this tool may only be appropriate for design stages and prototyping rather than for demoing a live app.

Building a time machine — really, what’s the rush?

If you are representing the fourth dimension in your data — and who isn’t showing progress or change over time in some aspect? — you’ve got a further problem in demonstrating: simulating the passage of time, or at minimum representing the past. Test data banks generally are little help here, particularly if they are automated for a workflow like setup + test + teardown + report results.

Here, you’re likely to need a specialized bespoke tool or script that can populate your product’s “history” tables or equivalent storage in your app, backwards, as though your data had been collecting for the past n weeks, months or years. This is another fake-out. Look at the boundaries of your demo data set and the story you are building for demonstrations and make some decisions about how far back you want to go, and what trends are important to tell your story — is it “sales are up”, or the correlation of two trends, or identifying drop-offs or troubling signs? Make your time data “flow” in the appropriate direction to tell the story.

Don’t mix the colors — gotta keep ’em separated

Apart from things you’ve added specific to the demo set, you should try to remain as faithful as possible to the way things do/will work for your real customer data. This saves effort, as well as reducing fakery. Critically though, you must have a way to isolate the demo data from customer production data — that may mean a special instance of your product, or a special demo customer ID or login credential, or all of the above. But it also means that you don’t want this data (or your QA data) infecting your own analytics on what your customers are doing with your product. You may need custom filters implemented for internal reporting purposes, depending on the tenancy of your architectural stack.

Customizing — starting with the man in the mirror

“We’re demoing to BigCorp’s Government Services Division on Tuesday, can you change all the data to say Critical Intelligence Agency?”

Be judicious here. If there are easy, singleton things to change (logo, client name), do those. If the story you want the data to tell has to change for the customer’s context, then either you may need a whole new and different staging data set for that market segment or vertical, or you may not have a good product fit for that client.

Also crucial but easily overlooked — try to provide a lever/button for your sales/marketing team to recreate a “fresh” demo data set on demand — no way do you want to craft your demo data set this week, and watch the data recede into the past before you land the sales meeting or launch the webinar. In extreme cases of this, you could end up showing 2012 data in 2016, and have to blow dust off the cushions of your technology with every demo

Mapping data to the sales/demo story — it’s a matter of trust

Ultimately, once your demo generator is running the way you want, you can use it to generate collateral, storyboards, etc. It can even be an effective tool to iterate internally on designs and find subtle bugs. But when using the demo for a prospect or customer audience, it’s super important to be straightforward about what you’re showing to the audience. Take credit for what’s running for real, and be honest about what got faked. “This is example data in a real production system” or “This is a scrubbed data set of pseudonyms” goes a long way to establishing this trust.

After all, any demo that isn’t using the customer’s own data (with permission!) is asking them to exercise their imaginations. But the more resonant our staging data is, the less work that should be.

Dan Rinzel

Written by

Software maker for education. Husband & daddy. DC native, Capitals fan. Wannabe world traveler. Sometimes talk ABOUT my employer, but not FOR them.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade