Interval of Confidence

Measuring success at our first industry show

Rocket Code Team
10 min readJul 18, 2016
\

By Andrew Watkins and Ray Sylvester

If you’ve been following things recently on Thinkship, you’ll know that we took 15 of our team members to the Internet Retailer Conference & Exhibition (IRCE) in Chicago this past June. Besides being a big adventure, IRCE was also a big experiment. Before the show, we asked ourselves what a conference that claimed to be the hub of the ecommerce industry could do for us that simply working and networking online could not — and we wanted to answer that question by measuring our experience in some way.

We also knew that since most companies bring a small cadre of sales-focused employees to shows like this, it would be important to prove (or disprove) the value of taking a big team with us. Our goal was to show that taking 15 humans to IRCE would be better than leaving them at the office to work all week.

How exactly did we do that? Put on your nerd glasses and your lab coats, and we’ll show you!

Materials & methods

We thought long and hard about how to measure our success (or lack thereof) from sending most of the team to IRCE.

Initially, we considered measuring it in terms of externally focused networking: looking at the leads we got at the show that could turn into new clients or partners. But it’s simply still too early to say if any of those leads will turn into fruitful relationships.

So instead, we pivoted our focus internally to employee confidence. Would the conference experience lead to more self-assurance among Rocket Code staff? We even gave our experiment a veneer of scientific seriousness! Today, we’re going to delve into how we structured, executed, and measured this research question.

As mentioned above, rather than sending a “SWAT” team of our most sales-inclined, we hypothesized that it would be a stronger play in terms of building employee confidence and leadership qualities to “clear the bench” and go as a team.

Although we took 15 people to the show, we only surveyed the 13 who wouldn’t be involved in creating or analyzing the experiment. That sub-team consisted of two interns, three people from the operations team, two from the business team, five engineers, and a UI/UX designer.

To measure confidence, we surveyed all 13 participants both before and after the conference. We asked six questions scored on a five-point Likert scale, from 1 (“not confident at all”) to 5 (“very confident”). The questions were slightly altered for each survey, but only enough to control for natural bias that might come with answering the same exact question twice in different contexts.

The first six questions in each survey were purely quantitative and designed to produce a confidence score that could then be averaged for the group. The scores from the two surveys were to be compared to determine if there was an observed increase in the mean of these scores.

We also included a question that asked participants how they would describe our company to a friend, colleague or family member who was not familiar with the ecommerce industry. Instead of asking people to score their responses on a 1–5 Likert scale, this was an open-ended question.

As it turns out, the qualitative data we garnered from the seventh question provided a rich and welcome contrast to the basic — and ultimately inconclusive — data from the first six quantitative questions.

We’ll refer to the pre-conference version of the survey as the precon, and the post-conference version as the postcon.

Let’s jump right into the results.

Results

How did our team score themselves on the first six questions from survey to survey?

We saw an increase in mean confidence score from 3.56 to 3.85, a 2.85% increase. <NERD GLASSES> A paired samples t-test was used to analyze the data and detect whether that increase in mean confidence score within the group, before and after the survey, was statistically significant: t(77)=1.375, p=0.17. The first number, the t-statistic, doesn’t tell us much on its own, besides the fact that there was a positive effect on mean reported confidence after attending IRCE. The p-value is the crux here: 0.17 tells us that we can say with 83% confidence that going to IRCE increased the team’s confidence. In order for us to call the effect statistically significant, however, we’d need that percentage to be greater than 95. </NERD GLASSES>

Thus, our analysis determined that this result was not statistically significant. So, although we might assume a causal relationship, statistically we can’t infer this increase in mean confidence score wasn’t due to random chance. Boo.

But, fortunately for us, we also collected a rich sample of qualitative data to view our quandary from another angle. The final question we asked in each survey was the following:

“How would you describe Rocket Code to a friend who is unfamiliar with our company?”

Here’s what we found after analyzing people’s responses.

  1. Answers in both surveys described Rocket Code’s work at a high level — but postcon answers were more focused.

The answers to question 7 (the “final question” shared above) in the precon described our work at a high level — what you’d expect in an elevator pitch — but some of those answers were ambiguous in their descriptions of what we achieve for our clients:

  • “We also build custom software solutions that … allow them to do more with their ecommerce presence than was previously possible.”
  • We “help create a seamless and interesting digital framework for our clients.”
  • “…we help brands sell more by creating a better digital experience.”

Although the answers in the postcon also tended to explain our work in a general sense, they were more dialed in. Some examples:

  • Rocket Code “specializes in building solutions that enable loyal customer relationships.”
  • We help businesses “interested in developing a better shopping experience for their customers.”
  • “…we have our hands in the entire lifecyclelife cycle of an ecommerce experience”
  • “This allows our clients the freedom to make better, data informed decisions, knowing that features we deliver add real value.”

2. Answers got shorter in the postcon.

Check out the word counts:

First off, we ran a quick analysis and found these changes were not statistically significant, so we can’t say for sure that the drop in word count was meaningful and not just due to chance.

That caveat aside: People definitely seemed to waffle a bit in the precon, and less so in the postcon. This could have been due to survey fatigue — a reluctance to provide a lengthy answer, especially if they had already given one the first time around. Two people also prefaced their answers in the precon. (More on that later.) But the drop in word count could have also been due to improved confidence in people’s ability to describe Rocket Code’s work in fewer words.

3. There was a shift from an internal to an external focus across the two surveys.

We noticed a personal, inward focus in the precon answers. Four of the answers talked about the experience of working at Rocket Code — how we feel about ourselves as a company, about our work, and about our clients — as opposed to a more external, objective focus on what we do:

  • “We love what we do, who we do it with, and whom we do it for.”
  • “We love our clients’ customers…”
  • “A small, innovative company…”
  • “A hard-working, tight-knit group…”

Two respondents also prefaced their answers in the precon; one started off stating “I certainly believe context would be key in a situation like this, but if I was forced to provide an elevator pitch…” and the other said, “My answer is also for someone who is unfamiliar with e-commerce.”

In the postcon, no one prefaced their answers. Again, this could have been because they’d already done so in the first survey, and/or because they simply felt more confident that their description could stand on its own. It’s tough to say exactly which reason holds more sway.

The postcon answers, however, also showed less inward focus. For better or worse, there was no mention of things or people “we love,” or of any of the endearing characteristics of our company or its people.

This is a great time to bring up our table of word-use trends across the two surveys. Staying on the idea of internal focus, we saw an interesting shift in the use of the word “we” between the two surveys; the same number of people used the term in each survey, but there were 16 overall mentions in the first survey versus just 10 in the second survey.

So, less “we” and less “love.” Did the hippie love fest of precon fall away to reveal a boardroom of Gordon Gekko acolytes? Not quite, but the overall mood in postcon was definitely a bit more grounded and business-minded.

4. There were some other interesting linguistic observations.

The appearance of most of the key terms trended downward, in line with the overall shift toward succinctness we talked about in point #2. A couple terms, though, bumped upward. Along with “digital,” “agency” became more popular. At the same time, there was a drop in the use of “service(s),” a term that often goes together with “agency” in descriptions of our work; i.e., we’re a “services agency.”

But perhaps the most interesting shift was the drop in the use of the term “software” and the growth in popularity of “digital.” We don’t believe this was a coincidence. We think team members started to grok how the “digital experiences” we create — which, yes, are facilitated largely through “software” — matter more than the software itself.

Discussion

If we were to do this again, we would certainly look to develop our quantitative reporting by collecting more personal or demographic data — things like age, employment level, personality type. This would give us an opportunity to understand the relationships and correlations between different variables, and to better learn what makes the team tick.

For a while at least, we’ll be working with a small pool of participants, which in statistics is traditionally an Achilles heel for valid, statistically significant results. That said, there is a hidden opportunity in working with a small group or participants in these kinds of analyses. By developing the qualitative data reporting side, we have potentially more to gain than by simply collecting a set of averages to parse.

As a small business, we can argue that learning what causes and drives our team to grow into its own identity is more important than arbitrary group averages. Why? As fashionable as statistical significance, A/B testing, and large data sets may be, the truth is it can be difficult to spot new trends when they often get rolled up into an average value.

People are starting to think about statistics in a different light, and this framework of thinking says we should not put so much trust into averages derived from big data sets. Rather, it’s the personal, “small” data that’s truly meaningful. Where big data looks to correlations within the averages, the small, qualitative, data not only allows us to hear the opinions, thoughts, and observations of our team, but also how they change over time.

Conclusion

We couldn’t possibly hope to capture or describe the full range of professional and personal development that took place during our four days in Chicago.

We didn’t hit statistical significance in any of the things we did measure. So we can’t say with scientific assuredness that our team’s confidence grew as a result of attending the conference. But we were able to draw some qualitative conclusions that suggest that it may have.

What we’ll be wearing to IRCE next year.

Was attending the conference, in the end, worth the cost? Absolutely. It was a great time. Will we make the decision to go to IRCE again on the data we collected? Probably not.

We will go because we want to take advantage of another opportunity to grow our network, our brand, our influence, and our confidence.

By the time the next industry show rolls around, our company will have evolved, and we will probably have some new goals. But we will still approach the conference and others like it with the same inquisitive attitude, with new hypotheses to prove or disprove, because we know there is huge value in such organizational introspection.

--

--