The four quadrants from hell

Dalcash Dvinsky
The Bunny Years
Published in
10 min readJul 3, 2022

One of the scientific foundations of dog training is so-called operant conditioning. Going back to Skinner, it’s an extension of Pavlov’s classical conditioning. Operant conditioning means that a dog changes his or her behaviour through reward and punishment. Both can be either positive or negative — the four quadrants of operant conditioning.

Two quadrants are for reinforcing behaviour that we like, either by adding something (positive) or by taking something away (negative). Positive reinforcement just means that good things happen when the dog is doing what we want him or her to do. Say sit, dog sits, gets treat. But the reward can be something more intangible, like, walking or sniffing or opening a door. Negative reinforcement means when the dog does something we like, we take something away that has been bothering him or her. To do that, we need to bother the dog in some way, for example, with a leash.

The other two quadrants are for punishing behaviour that we don’t like. It’s useful to abstract from the word punishment. In this context, it does not necessarily mean anything bad is going to happen. But it can. Punishment can again be achieved by adding something (positive) or by taking something away (negative). Positive punishment is to inflict pain or discomfort, when the undesired behaviour occurs. This could be anything from an electric shock to a raised voice. Negative punishment could simply be standing still when the dog starts pulling: You take away the forward movement that the dog craves. Or closing the door when the dog tries to walk away without cue: You take away the access to the world.

The concept of operant conditioning is visualised as a circle, divided in four equally sized parts, the quadrants. When training the dog, all interactions apparently fall in one of these four quadrants. Or even when not training the dog — the dog learns anyway, it doesn’t matter if you call it a training session or not. This is how dogs learn, that’s the idea. Or all mammals, for that matter.

Everybody uses at least two of these quadrants when working with their dog. Those two quadrants are positive reinforcement and negative punishment. Those are the quadrants where we do not inflict pain or inconvenience upon our dogs, we just either give them something good, or we withhold something good. Maybe a lot of owners think they only use two. It’s a nice thing to think. But I presume it must be a very lucky dog owner who really can say that they only use those two. I would strongly suspect that most dogs experience all four, but hopefully not at equal weight.

Dog trainers often sort their methods, and the methods of other dog trainers, into quadrants of operant conditioning. They call themselves ‘balanced’ trainers because they use all four quadrants. That still includes lots of positive reinforcement, but the other quadrants are not off the table. They still don’t torture dogs, by and large. The balanced trainers sneer at the so-called ‘positive only’ trainers who, so they say, only use positive reinforcement. These ‘positive only’ trainers, by and large, do use doors and leashes for negative punishment. But they exclude the other two quadrants, on principle. They, in turn, sneer at the ‘balanced’ trainers and say they are not balanced at all. Both sides claim that the other side is cruel to dogs. It’s a debate that is played out all over the internet, either in the quadrants of operant conditioning, or along the frontline between ‘aversive’ and ‘force-free’ methods. But that’s a different topic. Be that as it may.

Bunny has his own thoughts on this matter. He likes to transcend the merry-go-round of operant conditioning. He likes to break the quadrants. He overwrites the rules that Skinner made for us. He makes it borderline impossible to stick with one quadrant, or with one method, or with one type of training. Consistency, this golden rule of dog training, be damned. He easily finds ways from one quadrant to the next, and to the next, and all the way around. Everything is free-flowing communication. And at some point it is not clear anymore who is the trainer and who is the trainee. Am I training him, or is he training me? Am I reacting to rewards that he is offering me for behaviours he likes? Spend a few days with Bunny and try to seriously to work with him, and you will question your place in the Universe.

Leash walking is a good example. I have trained ‘walk with me’ entirely with positive reinforcement. At home, on a secure field, just with treats, no leash, nothing else. That’s how we train at home anyway, ‘purely positive’. Sometimes I stand in his way to throw in some negative punishment. Walking is easy: Stay with me, somewhere in my immediate vicinity, no matter what I’m doing, and you get treats. Walk away, and, well, nothing. He can walk left or right, even a metre ahead, that’s fine. Really, I want him to be ‘with me’, to pay attention what I’m doing. ‘Heel’ is then just a strict version of ‘with me’. He is supposed to do all that, until I release him with ‘okay’. At home, this is completely easy and straightforward, our relationship on a secure foundation.

But outside, with the leash on, with all the distractions, he is not that interested in food anymore. He wants to get somewhere. He wants to sniff somewhere. There is so much to explore. So, he walks faster and faster. He drifts away from me, left and right, but primarily ahead. He doesn’t pay attention. He is not with me. He starts pulling on the leash. My counter is to stop, a well-trodden path of eliminating pulling. Stop whenever the dog is not with you anymore. The leash halts his progress. This might be negative punishment (I take his movement away) or positive punishment (I jerk on the leash by stopping). Or both. Now he has a problem. He is bothered by the lack of movement. He is frustrated. But he knows how to solve the problem. First he tries looking at me. Then he will move a tiny bit to make the leash go slack. That usually gets me going again — negative reinforcement in action. All four quadrants within a few seconds.

So far, so confusing. But he knows of course that a tight leash makes me stop. Hence, if he doesn’t forget, he will try to walk just so that the leash is very close to being tight, but not quite. It’s a fine line. While we are in motion, fast, dynamic, it’s not so easy to figure out the exact moment when the leash is not loose anymore. Timing is important, and when I stop in the wrong moment, I send the wrong signals. He starts nudging me forward, just a tiny bit. It’s annoying, but the leash is not really tight. So I don’t stop. A few more nudges, and I’m walking faster, without noticing, and the nudging eventually stops. Now he is training me, with negative reinforcement. Eventually I realise what’s happening and stop anyway. Same game all over again. “Are you walking him or is he walking you”, is a question I hear a lot. I used to get upset about it, but not anymore. We are walking each other. We are learning from one another. We are a team, desperately trying to figure out how to deal with the confusing environment.

When nothing else is happening around us, he often comes to me, walks next to me, and looks at me. This is exactly the behaviour that I want, although I haven’t even said anything. But I reinforce it anyway. He gets the treat. He keeps walking next to me, looking at me like a very good dog. It’s like he is giving me a cue and I show a desired behaviour, I hand out treats. That’s what he wants from me, and he rewards it by walking with me. This is what I want from him, and I reward it by giving him food. The interaction is completely symmetric. I try to make him walk ten steps before I hand out the reward, my reward. Then twenty steps. And at some point he loses interest and runs away. Then I have to stop again, and the game starts from scratch.

This all goes out of the window, when something comes our way that he is very interested in. It could be a squirrel, a dog he gets upset about, or a rabbit. Or a horse. Or just the smell thereof. It could be a glimpse of something that was here a few minutes ago. He pulls and leaps and yelps. All the routines and habits which we have, supposedly, ‘conditioned’, are gone, what’s left are much stronger routines, the reptile brain of the dog, formed through thousands of years of selective pressure. We call them primitive urges, hunting, protecting, procreating, but they are more important to the dog than the weird habit of sitting down when a human says so. To my dog, at least. Sometimes he is not yet jumping, instead, he stands in the way, like a statue, alert, majestic, beautiful, prepared for everything. Completely tuning in to the environment. Completely tuning out my cues and signals.

When he is in that mode, the only way to control him is to hold him back, harshly, if needed, and to pull him into a different direction. Basically, we have to get out of the situation, as quickly as possible. Or at least, get something between me and the distraction. At the minimum, hold onto him, for dear life. Some trainers like to call this ‘management’, and that’s somehow not training anymore. Some trainers like to say this is not supposed to happen. Some trainers say the dog is doing the pulling, not me. But given that I’m inflicting discomfort and pain as a result of an undesired behaviour, I might as well go ahead and call it positive punishment, the forbidden quadrant of the circle, the quadrant where prong and choke collars live. It’s painful for me as well, to hold him and displace him, maybe more than for the dog, who is in the zone anyway. In a way we are both punishing each other for undesired behaviour. We are both getting upset, angry, sweaty. It’s the violent part of operant conditioning.

It’s a fun exercise to constantly think about all your interactions with the dog in terms of operant conditioning. Which quadrant am I in? And now? And now? And which quadrant is he in? What does he perceive as reward, what as punishment? And is that the only thing he perceives? Who else is involved? What are the squirrels trying to teach us? When we are on a walk, the transitions happen so fast, and so often, that it becomes very difficult to react precisely, and consistently, as I should. I can’t help but lose track of the wheel of operant conditioning. It runs faster and faster until it spins out of control and karooms through the countryside. We start skidding. The entire circle of learning turns into a slide into mayhem.

Quadrants are a concept from the lab where you can control the parameters and conditions of the experiment. They are an abstract layer on top of the real world, held in place by assumptions and presuppositions. They are not a concept for the real life. Not for dogs, not for anyone else. They are also a trap.

Human learning can be examined through the lens of operant conditioning as well. But weirdly, when we teach humans, we rarely talk about reinforcement and punishment. Instead, we talk about engagement, initiative, creativity, problem solving. There is no room for all that in the circle of operant conditioning, no room for independent thinking. Dogs are not supposed to do these things anyway. They are supposed to exhibit specific behaviours and to shun those behaviours we don’t like. The relationship is supposed to be asymmetric, we give the cues, the dog reacts. We like our dogs obedient, which is why Malamutes always end up near the bottom whenever someone ranks dogs according to their intelligence.

Malamutes are independent thinkers. They have a lot of opinions. We call them ‘stubborn’ and ‘obstinate’, but that’s just because we don’t know any better. There is no place for opinions in operant conditioning. There is only place for habitual reactions, automatisms, rehearsed over and over again. The four quadrants are like cages, they keep dogs locked in a world where dogs are furry robots, without their own ideas and thoughts, without agency. Also, without real two-way interaction with the humans. You can’t have a two-way interaction with someone who has just a bunch of habits formed by me. And if not by me, then by evolution.

Sometimes I stop and think. What if they are right? What if it is really just that, a bunch of instincts and reactions and habits, and nothing else? But then again, if it’s just that, there is really no point in having a dog. There is no way to really buy into that. And so, we leave operant conditioning behind.

--

--