# Part II: What Would You Say… Ya Do Here?

In the last post, we started exploring the ping data I have been collecting using the TagTime application on my Android phone.

We looked at frequency distributions to get a sense of how many different activities were indicated more than once, and how little I was actually responding to pings.

Now let’s actually lay out some statistical theory that will help us determine if these ping data are actually useful.

### Definitions

#### Population

In statistics, a population is a complete set of items that share at least one property in common that is the subject of a statistical analysis. https://en.wikipedia.org/wiki/Statistical_population

As I mentioned in the first article in this series, I am looking to establish a baseline of what tasks I am doing throughout the day, week, month, etc., as a basis for improvement. So, mapping to the definition above, moments in time are the items, and the property they share is that an activity is taking place.

#### Sample

…[A] data sample is a set of data collected and/or selected from a statistical population by a defined procedure. https://en.wikipedia.org/wiki/Sample_(statistics)

The TagTime pings are the sample in this case. The selection procedure for the “ping time” is discussed by the authors of the TagTime introductory article, but suffice it to say that this is a random sample, and therefore unbiased. Except for…

#### Non-response Bias

Non-response bias occurs in statistical surveys if the answers of respondents differ from the potential answers of those who did not answer. https://en.wikipedia.org/wiki/Non-response_bias

As we saw in the last post, I have quite a few “non-responses.” Actually, 70% of my pings have not been responded to (that proportion is actually greater since taking this last snapshot of my TagTime logs, which might be the topic for yet another post):

`> length(pings[pings\$Activity == "nonresponse",]\$t)[1] 990> length(pings\$t)[1] 1411> length(pings[pings\$Activity == "nonresponse",]\$t) / length(pings\$t)[1] 0.70163`

After talking with a former colleague and statistical Tyrannosaurus, I confirmed that even a much less significant non-response would be grounds for dismissal of the entire data set.

Bummer.

But, it is useful to know when data cannot be used to draw conclusions; too often what amounts to anecdotal evidence is touted as a valid sample (e.g., basically any online opinion poll).

## Next Steps

I have some ideas on how to increase my response rate, that I will attempt to add to the TagTime application(s) in coming weeks.