A Stab in the Dark

What the heck is a “stab”?

Background

It all started with an innocent twitter conversation about the “most stabbed person in history”, which, it turns out, is not a fun topic to research because most stabbings are actually quite sad. But! In the course of the conversation we got side-tracked because what actually counts as a stabbing in this context? Is someone who is shot full of arrows in the running for “world’s best stabbing victim”? Or do they compete in their own category? If I trip and fall into the seconds bin at a knife factory, do I get a prize, or just fatal lacerations? It’s difficult to find the line between a genuine, accept-no-substitute stabbing, and a cheap jab or cut-price puncture.

No one seemed to agree on their definitions, so I set out to take a more descriptive approach to the problem. In this I’m drawing from the field of linguistics (which I know almost nothing about), and in particular on the practice in that field of describing language, as opposed to prescribing it. In other words, I’m not looking to find the “true” meaning of the word “stab”, and I’m not looking to score points by proving that my definition is superior. What I want to do is understand the “shape” of the definition of the word, as it is used. I want to work out where the edges are — which uses are unambiguously correct, and which are generally considered errors. Where do people get uncomfortable about using the word “stab” to describe something? What are the boundaries of that?

So I made a survey! I devised 12 sentences that I hoped would be a mixture of examples of both uncontroversial and very unusual uses of the word “stab”. I asked respondents to answer, for each sentence, whether the sentence was “Totally fine”, “A bit weird, but ok”, or “Incorrect usage”. Here are the 12 questions:

  1. “He picked up the knife and stabbed me.”
  2. “I tripped and fell, and the knife on the floor stabbed me.”
  3. “There was an arrow wedged in the floor, and I got stabbed when I fell into it.”
  4. “The knife fell off the counter, and stabbed my foot.”
  5. “I got stabbed by a knife that was lying on the floor.”
  6. “I was stabbed by a falling arrow.”
  7. “He threw the knife, which spun across the room and stabbed me.”
  8. “He picked up an arrow from the ground, and stabbed me with it.”
  9. “The archers in the treetops stabbed me with arrows from their bows.”
  10. “He stabbed me by throwing the knife at me.”
  11. “I was inadvertently stabbed as we tussled.”
  12. “The soldiers hurled their spears at me, stabbing me several times.”

There were a few things I learned just from writing the sentences. I was trying to write sentences that included examples of stabbings “at range”, but to do so, I found I had to make it very explicit what was going on. “The archers stabbed me with their arrows” to me implies the archers are standing right in front of me, holding their arrows in their hands. To make it clear this was a long-distance stab, I had to add the clarifications “in the treetops” and “with their bows”. I found it hard to write sentences that were in my eyes clear examples of incorrect usage, while remaining clear and grammatical.

I tried also to split the sentences over several axes that I hoped would provide insight into the limits of the definition of the word. These were:

  1. Distance: Is the stabber holding their stabber, or has it left their hand?
  2. Implement: Is this a classic stabbing weapon, like a knife, or a more unusual choice, like an arrow?
  3. Volition: Does the stabbing have an active agent, or is it perpetrated by cruel fate alone?

Ideally I suppose I would have had maybe twice as many sentences in order to provide exhaustive combinations of these elements and to also include other kinds of stabbing implement. I opted to keep the survey short, guessing that others’ patience for digging into minutiae of English usage might not match my own.

Results

There were 53 respondents to the survey, of whom all but one answered every question. The charts below show the overall results for each of the sentences:

Fig. 1: Overall results

There are some results I expected: Near unanimous support for “He picked up the knife and stabbed me” (bar one, I presume perpetually confused, holdout), and some results which were more surprising: Widespread opprobrium for stabs from falling arrows, while the same behaviour from a knife is largely acceptable.

To understand these surprising results, I looked at how the responses split across the axes I had built into the sentences: Distance, implement, and volition.

Regression

I wanted to isolate each of the three factors I’d built into the sentences, to understand how each of these contributed to the overall acceptability of a sentence.

I made a data set which was the overall “acceptability score” of each sentence. This was calculated from the average rate of “totally oks”, minus the average number of “incorrect usages” for each sentence. That means that each sentence got a score between 1 (universal acclaim), and -1 (absolute hatred).

That misses a lot of nuance, since an “A bit weird, but ok” response is not quite the same as “no opinion”, but it allows us to simplify our analysis considerably. For each sentence I also added a flag for each sentence, indicating whether it:

  1. Had an active “stabber” who was perpetrating the stab
  2. Described a long-range stabbing
  3. Described a stab with a knife (the classic stabbing weapon)

This data set enabled me to use a technique called Linear Regression, which calculated a “coefficient” for each input feature (the three flags), indicating how much this factor contributes to the overall score of each sentence.

The coefficients generated were as follows:

  1. Active stabber: 0.42 (An active stabber in the sentence increases the score by 0.42)
  2. Long-range stabbing: -0.76 (reduces score by 0.76)
  3. A knife: 0.10 (increases score by 0.10)

So what this tells us is that, of the three factors analysed, “range” seems to be by far the most important, with a large negative effect. Having an active participant in the stabbing seemed to help the score, while the presence or absence of a knife appeared to have little effect on its own.

I wanted to check how well these coefficients explained the responses I saw in the data. The coefficients can be used to create a predicted score for a sentence (even previously unseen sentences), and we can look at the difference between these predictions and the actual scores to check how well our model is working.

Fig. 2: Regression analysis: Goodness of fit

The chart demonstrates how well our linear regression model fits our data. On the X axis is the actual score we saw in the data, and on the Y axis is the score predicted by our model. The “r2” figure is a measure of how well the predictions fits the observed data. In this case, it’s telling us that our model only explains about 60% of the observed variance in scores.

Respondents

One of the things I wanted to understand was whether there were differences in the population of respondents in their overall propensity to rule usages in or out. Are there stab hardliners and filthy stab liberals? Or do we all have similar levels of overall comfort but differ in where we draw the lines?

I created a histogram that scores uses by the net sum of their responses: Each “totally fine” adds one to their score, and each “Incorrect usage” subtracts one.

Fig. 3: Respondents by Net of Responses

There’s quite a spread! We see a normal distribution around a mean of just over 2, with a small amount of skew downwards. We have a small number of stab fundamentalists, while the population as a whole contains a pretty wide distribution — from one generous respondent who answered “totally ok” for nine of the twelve questions, to the stabber-hater who felt that eight of the twelve were “incorrect usage”.

I wanted to go deeper. So I tried a clustering technique to group respondents according to how they answered different questions. The algorithm — ”k-means clustering” — takes a given number of clusters, and splits a dataset out into that many clusters, trying to find the split that most efficiently differentiates the groups.

I let the algorithm split respondents into 4 groups, and then looked at the characteristics of each of the groups.

Group 1: Hold me closer (22 respondents — 41%)

This group was markedly more comfortable than the majority with situations in which the stabbing had no active agent — “There was an arrow wedged in the floor, and I got stabbed when I fell into it.” , “I got stabbed by a knife that was lying on the floor.”, “I was stabbed by a falling arrow.”, and markedly less comfortable with stabs being committed by active agents, when that agent was at distance: “The soldiers hurled their spears at me, stabbing me several times.”, “He stabbed me by throwing the knife at me.”

Group 2: Any way you want it (14 respondents — 26%)

This group were unique in being strongly supportive of “He stabbed me by throwing the knife at me.” and “The soldiers hurled their spears at me, stabbing me several times.”, but they were more supportive than the average across almost every question, suggesting a very liberal approach to stabbing.

Group 3: Do it to me (10 respondents — 19%)

In may ways diametrically opposed to Group 1, this group had extremely strong support for “The archers in the treetops stabbed me with arrows from their bows.” and “The soldiers hurled their spears at me, stabbing me several times.”, while being more opposed to cases where there was no active agent “There was an arrow wedged in the floor, and I got stabbed when I fell into it.”.

Group 4: Playa Hater (7 respondents — 13%)

This group were the opposite of Group 2: They were more opposed than the average on almost every question, most especially “I got stabbed by a knife that was lying on the floor.”, “The soldiers hurled their spears at me, stabbing me several times.” and “He stabbed me by throwing the knife at me.”

The clustering analysis gives some support to the idea that both range, and activity of the agent are important factors in how we define a stab. But there were inconsistencies as well. Group 3 liked stabbing at range, but they didn’t like every sentence with that formulation. Group 1 were mostly against long-distance stabbings, but they were relatively ok with some of them. It seemed like there was more going on than I was seeing.

Clustering Sentences

I wanted to get a deeper understanding of how respondents’ opinions correlated with each other — are there questions that respondents typically answered the same way? I hoped that this would help me better understand some of the confusing things I’d seen in the data.

To do this, I used a similar approach to the clustering I did for the respondent groups, but on a slightly different view of the data. I looked at the correlation coefficient of every sentence to every other sentence — in other words, for every sentence in the survey, how much did the responses to that sentence correlate with the responses to each other sentence. This produced a 12x12 matrix of correlations, which would form the basis of my clustering — sentences where respondents typically answered the same way would be grouped together.

To visualise this, I used a technique called “Principal Component Analysis” which reduces the 12x12 matrix to a 12x2 plot — for each sentence, instead of twelve different data points, one for each other sentence, there are just two data points, which we can plot in two dimensional space.

Fig. 4: Clustering of sentences by response correlation

Bear in mind that by reducing to two dimensions, the scatter plot has lost some of the information that was available to the clustering algorithm. Still, we can see that we’ve achieved some reasonably sensible results: Group 1, in green, has mostly the ambiguous actor sentences. Group 2, over on the far right, is the two “Gimme” sentences, which almost everyone answered the same way. Group 3, in yellow and on the top left, are most of the “range” sentences, and Group 4 is a bit on an “all the rest” group.

The fascinating thing for me from this is the treatment of “He threw the knife, which spun across the room and stabbed me.” It’s sitting almost halfway between Group 1 and Group 3, which on reflection, makes sense.

At first I was surprised to see it there, hanging out with the Group 1 crowd when you might expect it to be hanging out with its “stab at range” buddies in Group 3. But when you look, there’s something else going on with the sentence: Like all the other sentences in Group 1, this sentence moves the responsibility for the stabbing from an active stabber, to the implement of the stabbing. This is markedly different from “He stabbed me by throwing the knife at me”, and when we look at the overall scores, respondents were much more likely to approve of this construction (35% “totally ok” vs. 21%). Even the cluster of respondents who most disliked long-range stabbings were relatively accepting of this sentence.

This introduces a factor I hadn’t considered previously, and I think is a crucial insight of this analysis: One of the key factors in determining how respondents felt about each sentence was their willingness to accept an inanimate object as the author of the stabbing. People who accepted a knife as the architect of a stabbing, rather than the wielder, were much more happy with that knife doing a stab even when thrown, even when falling. It seems that what matters here is sentence structure, more than fictional circumstances — even when an implied actor is present, what people responded to was the way the sentence assigned responsibility.

Conclusions

If I’m honest I’m a little more confused than I was before.

The regression analysis shows that range is a clearly negative factor for most respondents, but there was a group for whom this was less of a problem. Similarly, having an active stabber was a positive factor, but some respondents didn’t seem to care about this either. It seemed like the actual implement doing the stabbing had only a weak effect on responses.

What we learned from the correlations between the responses is that there are more complex relationships than the three axes I originally envisioned. Sentence construction matters, and fine parsing of intent and action can have big impacts on the responses of some groups.

I’m sure there’s more in there to find, and it’s a super-rich dataset. You can find all my code and the data in my github, and I welcome correspondence!