Chatting Through Data

Published in

FIU Lee Caplin School of Journalism & Media’s Interactive Visualization Course

7 min readSep 26, 2023

I interviewed Paulo Ventura who started having exposure to data analytics in his undergrad which he studied economics. He was always a math guy, and when he got older he moved to Australia to do a masters in data science and that is when he found his passion to what he wanted to do for the rest of his life. He then got a job of a data analyst at Sedwick and is working there present day.

In this ten minute interview, we touched bases on four different graphs that he worked on and exampled how to read them, what they meant, and he then would explain to me why he put what he put on the graph. We finish our conversation by me asking him, “How do you balance the aesthetic of the graph and its facts?”. Paulo then responded in the best way possible! Check out our interview! Enjoy.

data chat transcript

[00:00:00] So this is a very interesting kind of, um, way of visualizing it. Um, the way it works is that, so my hypothesis is that depending on how we communicate with the customer, it might impact the likelihood of that customer to put a complaint forward or not. So I look into two different ways of communicating with the customer.

The first one here is the number of contexts initiated by the customer, as opposed to. Us calling them with information. They are calling us requesting information. So my hypothesis is that if they feel the need to do that, that means that they’re pretty frustrated and they might put a complaint forward.

So how this works is that the y axis is just the number of contacts initiated by them. And the x axis is the claim life, or how long the claim has been open for. The reason I do this is because it needs to be a fair comparison, right? If the claim is [00:01:00] open for longer, the customer is going to, you know, there’s going to be more communications on that claim.

So that number is supposed to be high anyway. Um, so these curves here, let’s start with the blue one. This is what we call a density plot. So, uh, this contours here kind of indicates where the majority of the claims are. So in this group, claims with complaints, we can see that the smaller, uh, region here, which is the gray area is where majority of the claims are.

Around 200 days and less than, you know, up to five contacts. And then once it starts, you kind of, you see that the color starts to kind of become a little bit more clear. Yeah, so this is like a region that there is not a lot of customers in it. Um, and same thing with the other one. So my point using these charts here is [00:02:00] that I’m comparing the distribution of the, uh, number of calls initiated by the customer from claims with complaints.

Versus claims without complaints and we can see that claims without complaints. The customer is supposedly happy For the same claim life. There is a lot less Context initiated by them. So you can see that this whole curve is a lot more squished to the bottom So let’s see for claims, uh, 200 days, it barely reaches like three context that you should have with a customer, right?

Whereas 200 days in this one, uh, it goes up to like five or seven, you know? So it, it kind of supports my hypothesis very clearly visually. So this makes me explain the correlation between [00:03:00] the two variables. A lot easier for the, uh, users to understand, right? Mm hmm. Um, and then same thing with percentage of, uh, context that are from telephone as opposed to emails.

And you can see that in here, it’s not as clear as this one, but it still shows some kind of correlations and parts of the, uh, of the curve, which is kind of a different insight, but just as interesting in my opinion. opinion. I, so these ones are actually kind of a bit trickier to explain to people because I think 95 percent of people have never seen something like this.

Whenever I show this, I have to really take my time and explain it in a way that is going to make sense for them. Um, so it’s something that I only use very, um, scarcely as well. Usually I tend to keep like the charts very, you know, clean and easy to [00:04:00] understand with just not a lot of, uh, information on them that makes the audience kind of, it makes the information more digestible, you know, like there is a concept when you’re talking about storytelling and data visualization that, um, You need to kind of really micromanage the user’s attention, right?

you need to guide them to the information that you want them to see and That is part of your narrative. So if I just form it a bunch of information into them, they don’t know where to look Right, I don’t know if they look into this really huge bar April or if they should look into the increasing trend on the black line You know, so it’s hard for them to kind of get the point of the slide.

I kind of have to, it needs to go with some kind of explanation so I can guide them verbally throughout this chart. So the first one is just a number of [00:05:00] complaints split by um, Complaints type, so if the issue is with the supplier, which is just the builder that performs the repairs, or if it’s a service issue, people kind of calling the customer from Orange or, you know, the amount of the settlement and whatnot.

So this is comparing number of complaints from one month to the other. So I’m color encoding the months here and just breaking it down by, um, types. This one is month on month change. Because what I want to see is that is some type of complaint being more predominant in a month. So, you see how this, um, service issue 3, um, green bar is higher than the, um, Red bar.

Mm-hmm. , that means that the, uh, data for [00:06:00] August was higher than the data for July. Mm-hmm. . And by how, so in this chart, it’s hard to see by how much it increased. Mm. So the one on the draw is just the, uh, portion of the increase, right? Like, so this bar is 50%. So this. You know, the number of complaints for August for this type of complaint increased by 50%.

So this is just compliments this one in terms of which complaints have increased from one month to the other and which types of complaints have decreased from one month to the other. I’ll show you another one that I’ve done. But I think you can probably like, all right, so I really love doing maps and not only maps, but interactive maps, interactive maps.

So what this does is that for this avenge, this map just shows us all of the open claims that we still [00:07:00] have to work on. And people can zoom in and out to see where they’re located because the location is very important because they need your kind of. Think about the logistics of getting people to these claims, right?

So like one person might go to this region here and the other person might go to the region on the bottom Yeah, and also they can click the icons to see informations of the claim Right. Mm hmm. So I’m packing a lot of information here, but it’s very easily digested, right? The other thing that I’m coding here is color.

So The, uh, stakeholder gave me kind of some rules to prioritize the claims. So the red claims are kind of critical cases that we need to look into asap. Mm-hmm. either the customer is a vulnerable customer, they have some kind of disability, or, uh, they’re experiencing some difficulties and we need to prioritize them over other customers, or the [00:08:00] claim has been open for like too long and we need to get them rolling.

Mm-hmm. . Um, and then the green ones are the ones that we should look into, but not as quickly. And then the gray ones are the ones that can wait. So this really helps them kind of strategize and make decisions in terms of how to distribute the resources. Yeah. So, yeah, this is a very easy thing to do, but has a huge impact on the business.

For the first version of this that I sent. The stakeholder didn’t really like the, uh, rules for color encoding, so she asked me, okay, can you adjust this to, you know, show red for these claims and not those claims, because it was showing way too many reds, and she just wanted to cut about more. distributed, um, rule.

It’s always a, you know, uh, um, iterative process, right? Like we send them something, they, you know, review it, they [00:09:00] ask for changes and we do the changes and this kind of keeps going until they get what they need to make their decisions. Well, the facts is the most important thing, right? Like if the facts are wrong, it doesn’t matter if your chart is, uh, beautiful or not because you’re just going to be misleading the audience with, uh, wrong data.

So this is step number one, getting your data. Getting your message right. I think for me what I think about and other people might have different priorities. First, think about the purpose of the visualization. What is the message that you’re trying to send? Because a lot of people just do visualizations for the fact of illustrating something that they’re not really sure what they want to, you know, communicate to the user.

You see how in my charts I kind of highlight things with colors or arrows. Yeah. And I have a clear, each one has a clear objective. Like get their attention to these facts. Um, and then second, pie chart. Is it a bar chart? I personally hate pie charts. It’s really hard to kind of make sense of sizes like it’s not [00:10:00] really comparable.

So choosing the right visualization. And then the last step is just aesthetics for me. Have a nice day. All right. Have a great night. Thank you. Bye.

Chatting Through Data

Written by Luriaacostaa