The Fluidity & Weight of Data

I’ve spent the last few years exploring the different forms of value that can be drawn from a range of data sets: from client’s “big data”; using tools such as KimonoLabs to parse the web; building out our own sensor networks; to more traditional qualitative techniques. There’s significant hype around the impact of data on the innovation process and it takes time to tease out the value for our clients.

Our studio works in sensitive environments, on highly sensitive topics. The consequences of failure are not abstract: people that we know will be impacted in ways that range from social shame through to violence and imprisonment. In a world where much of the talk is of anonymising or pseudo-anonymising data, we consider knowing the source of the data we are working with an asset. We want the people handling that data understand both their moral and legal obligations to its source.

We want the people handling that data understand both their moral and legal obligations to its source.

One branch of this exploration is an analysis of data gathered through traditional field research methods. The following is an analysis of a photo archive taken from a six week project that included three weeks of international field study time. With thanks to Dorothy Xu for wading though the archive. The analysis of the photo archive was focussed on understanding the relative cost of collecting photos versus their use, the effort required for proper archival management, and the image capturing cadence of the team. These are the top line numbers:

We typically generate 10 to 25,000 photos on a multi-location foundational research project, less for design and implementation. We took a representative study that had 13,514 photos, the unfiltered archive. We shoot in RAW.

It took ~6.4 hours of transfer time to import these files into the computer although the time/researcher-attention cost is far higher once the number of imports and mental load of tracking imports are taken into account.

Of the 13,514 photos in the unfiltered archive 5,604 duplicate, blurred, or unwanted photos (41% of the total) were deleted leaving 7,910 photos — a process that took a trained intern 3 days 1,868 photos per day, or 4.5 photos per minute, working 7 hours/day. The hit-rate of usable photos can obviously be higher with slower recording equipment (we use rapid-fire 5D Mark IIs) and a more competent, more cautious photographer. However the skill level on this team was fairly consistent with studies where designers, strategists and other ‘non-research’ professions are put to work in-field.

The camera shutters for all cameras (two 5D Mark IIs, and some point-and-shoots) used over the entire course of the study were open for 7.4 minutes. For the filtered catalog of 7,910 photos the camera shutters were open for a total of 260 seconds (4.3 minutes). The filtering process obviously starts with where we decide to point the cameras and when.

The camera shutters were open for 7.4 minutes over a six week project.

Full daily backups of the archive to an external drive took up to <4 hours.

An estimated 1,000 additional photos were deleted from the cameras prior to importing into the unfiltered archive typically by the photographer in a car on the way from an intensive interview session to another intensive interview session.

The processing time to obtain 100 ready-to-use photos from this study is around ~24 hours: to transfer 13,481 photos from memory card to photo management software (6.4 hrs); export as jpegs from photo management software (19 hrs); compress those photos into a zip file (4.5 mins), and copy them to a hard drive (5 mins) or upload them to the a cloud backup service (10 mins with decent connection). This doesn’t include time spent selecting key photos.

How much time did the team spend capturing their surroundings? For the original catalog of 13,4810 photos the camera shutters were open for a total of 444 seconds (7.4 minutes), an average shutter speed of 1/30th of a second. This is a relatively long time for a camera where a minimum of 1/60th of a second is considered the benchmark to obtain a clear photo in good lighting without a tripod. The archive included a few super-long exposures, mostly errors by the photographer.

The camera flash was not used in the entire study. It is normally too disruptive to the research process except in situations where research-theatre is required.

A team member that has worked in field on at least some the data collection can comfortably browse a folder of ~600 photos if they are properly named and contain minimal metadata. For a team member with no in-field experience on that project a comfortable folder size is close to ~200 photos, the difference in numbers being related to familiarity with the material and the mental cues required to effectively scan. Making effective use of and printing more than ~100 photos from the field research can be a painful experience. A single home visit generates between 50 and 800 photos.

Around ~200 photos were used in the deliverables representing a total capture time of 6.6 seconds. (6 weeks is 3,628,800.0 seconds of potential capture time for one researcher figuring out where to point the camera and press the shutter). Any conversation around filtering photos that enters the work stream has to start with the photographer-researcher’s ability to know where go to optimize relevant data collection, what to point the camera at and when to shoot.

By our estimation 5 photos from the study were widely used in the organisation less than a second of exposure time.

The things that impact the the number of photos that are taken includes: how many cities the team has already visited on the same project (the team tends to be less trigger-happy as the study progresses because they have a better idea of where the value lies and have a more focussed appreciation of the weight of data); the team on the ground; whether the location and participants are photogenic.

Consider these two concepts:

The fluidity of data is its ability to travel in the project team, amongst stakeholders and in the client organisation. The atomistic unit that has the highest signal-to-noise ratio, and is mostly widely shared is typically one photo + one observation, where the insight is relatively obvious and doesn’t require much explanation. Time pressures mean that the team typically needs to rapidly optimise photographs for optimal fluidity.

The weight of data is the time it takes to collect, parse, process, and apply that data. For the researcher there is psychological component where data that is collected, but has not been analysed or otherwise brought into the process becomes a mental burden of work that remains to be done. It is the mental equivalent of stacking up at a buffet but leaving the plate untouched.

Anyone can collect data.

Not everyone is fit to do so.

And there’s an art form from turning data into insight.

Download a sample photo archive from a foundational research project in Myanmar. The author is the founder of Studio D Radiodurans, @studio_d_rad.

Like what you read? Give Jan Chipchase a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.