REINFORCEMENT SCHEDULES
By Dr. Ian Dunbar – Courtesy www.dogstardaily.com
© Ian Dunbar
Laboratory study has revealed a variety of reinforcement schedules. Puppy training has revealed that most of these are notorious ineffective, or impossible to administer in practice, with the notable exceptions of variable ratio and especially, differential reinforcement. Yet educators and trainers persist in using these relatively ineffective schedules of reinforcement when trying to teach children and employees and when attempting to train husbands and dogs. Wake up! Puppy training has taught us that most of this stuff doesn’t work too well.
Continuous Reinforcement (CR) — the dog is rewarded after every correct response, for example, the dog is rewarded after every sit
Ironically, continuous reinforcement is the biggest problem in reward-based education and training today. The dog receives far too many rewards, usually food. Certainly CR temporarily increases the frequency of behavior but CR is hopeless for maintaining the frequency and quality of behavior and CR is absolutely no good for improving the quality of behavior. If you reward a dog for every correct response, approximately 50% of the time you will reward the dog for above–average responses and 50% of the time you will reward a dog for below average responses. Consequently, the quality of the behavior will not improve. It is simply too silly to reward a dog for below-average responses.
To make matters worse, consistent reinforcement often causes behavior to decrease in frequency and quality. Since the dog knows that he will always be rewarded … eventually, there’s no need to hurry and so, the dog eventually does it in his own way, in his own good time and his slow sloppy behavior gets rewarded. Spoiled dog syndrome.
And it gets even worse. Rewarding the dog for every correct response makes it very difficult to phase out food rewards in training and usually, response-reliability becomes dependent on the owner having food in their hand or on their person. The dog may deign to work for you if he feels like it and if you have food, but the first time you don’t come up with a food reward, he’ll go on strike.
Basically, think food vending machine. You only use it when you want to (when you’re hungry) and if it fails to deliver food on a just a single occasion, you get mad at the machine and then never use it again.
NEVER use a continuous reinforcement schedule.
However, please do not confuse continuous reinforcement with the classical relationship between a secondary and primary reinforcer, e.g., between a click and a treat. If you click, you must always treat, however, you should progressively refine your criteria so that you click no more than 50% of previously correct responses. Also, you can never give a dog too much praise, too many hugs, or too many pieces of food when classically conditioning the dog to like people, especially children men and strangers and other dogs. But you must kick this food habit if you would like to train dogs to respond reliably to verbal cues.
Fixed Duration Reinforcement (FD) — the dog is rewarded after a specific time, for example, after every five seconds of sit-stay (FD5)
FD is no good for improving the quality of performance. In fact, FD produces marked inconsistencies in the quality of behavior — performance-quality tends drops off immediately after each reward. Performance-quality progressively improves as each expected reward-time comes closer but immediately after the dog is rewarded, attention and quality of behavior decrease (because the dog knows that the next reward is sometime in the future).
Fixed Ratio Reinforcement (FR) — the dog is rewarded after a specific number of responses, for example after every five sits (FR5)
Initially FR is very good at increasing the frequency of behavior. However, performance-quality often takes a nose-dive and the dog rushes through repetitions to get another reward. Also, if the ratio is stretched too much and too many responses are required for a single reward, the dog may slow down after being rewarded. If the ratio is stretched even more, the dog may give up altogether.
Fixed schedules are pretty hopeless in dog training. Fixed schedules are pretty useless for reliably increasing the frequency or duration of behavior and they do nothing to increase quality. Fixed schedules do nothing to specifically instruct the dog how to do better and they do not reinforce the dog for improving the quality of behavior. Performance-quality become inconsistent and usually decreases over time.
I would never use any fixed schedule of reinforcement to train a puppy. However, amazingly, the entire world’s work force is maintained on fixed schedules — FD Pay Day and FR Piece Rate. You simply cannot motivate people on an FD schedule. The reward (pay day or year-end bonus) is now expected. Quality of performance may improve as pay day or the year-end bonus approaches but you’ll still get Monday-morning mourning and January-blahs. Similarly, FR Piece Rate may increase speed of production but usually quality control takes a beating as workers rush to meet their quota. And of cause, the workers will strike if ever the quota is too much to ask for limited pay. Fixed schedules are no way to motivate and reinforce puppies, or the world’s work force.
Variable Duration Reinforcement (VD) — the dog is rewarded after unpredictable length durations, for example, for a VD5, the dog is rewarded after varying durations of sit-stay that average out to be five seconds.
Variable duration reinforcement is really good at getting dogs to perform for increasing lengths of time and preparing them to work without the prospect of reinforcement. VR makes it much easier to phase out training rewards. Since the reward-time is unpredictable, the dog’s behavior does not drop off immediately after each reward because the next one could be just one second later.
However, few people could calculate the ratio and train the dog at the same time. For example, to reinforce a dog’s sit-stay using a VD5, we would have to reward the dog after 5, 1, 7, 2, 6, 5, 9, 3, 4, and 8 seconds, for example. I can do this in my head because I am good at mental arithmetic. But what’s the point? Dog training shouldn’t comprise a mathematics test. Dog training should be relaxing and enjoyable. Much easier would be a Random Duration Reinforcement schedule. Just reward your dog after random lengths of sit-stay and progressively increase the average duration over time. Ahh! Now we’re getting there. We are going to rapidly increase the duration of stays and gradually phase out food rewards at the same time. Also, the dog will give you more attention. But … variable duration reinforcement does not the quality of performance.
Variable Ratio Reinforcement (VR) — the dog is rewarded after an unpredictable number of responses, for example, for a VR10, the dog is rewarded after varying numbers of sits that average out to be ten sits per reward.
VR reinforcement is wonderful for maintaining high frequencies of behavior for longer and longer durations and for fewer and fewer rewards. VR makes it much easier to phase out food rewards because the dog gets used to working for an increasing number of repetitions without reward. Think slot machine. What do you do when it hasn’t paid out on your last seven dollars? You take your eighth dollar and rub it and kiss it because you’re absolutely certain that this is the one. And then after three more dollars without a payout, you get five dollars back and the machine has you hooked.
Of course, we have the same problem as with all variable schedules that few human brains could calculate the schedule and train the dog ant the same time. But you know what? A Random Ratio schedule is just as good. Just reward recalls and sits at random and your dog is going to keep coming and sitting forever.
I just love the concept of random reinforcement — the notion that we can be utterly random, consistently inconsistent, a total ditz even, yet still maintain motivated levels of high frequency responding in our dogs. Love it. However, VR does not improve the quality of performance because you are still reinforcing as many below average responses as above average responses.
Differential Reinforcement (DR) — the dog is given different valued rewards that reflect the quality of the performance, for example, only reward the dog for above-average responses, give better rewards for better responses and give the best rewards for the best responses.
Years ago, I picked up my son from Montessori school and he showed me his previous night’s homework with glee — a gold star. I was furious. I explained to the teacher that the homework was rubbish and that it didn’t deserve a gold star, or a silver star, or a bronze star, or an oblong, or a triangle, or any geometric shape of any color. The homework deserved a massive red “F”. I wanted the grade to reflect the quality of the work. I wanted Jamie to realize that stellar homework was worth a gold star, but rubbish homework was barely worth the ink in the ”F”.
Right from the outset — the puppy’s very first lesson — differential reinforcement is the only way to go to continually and progressively increase the reliability, frequency, panache and pizzazz of performance. Basically, the value of the reward varies according to the quantitative and qualitative aspects of performance. As a guideline, never reward a dog for more than 50% of correct responses. Approximately 50% of responses will be below average and there is absolutely no pint in rewarding the dog for those and less you want his behavior to worsen.
For example, time a dog doing ten recalls and then work out his average recall time. Then only reward your dog for faster-than-average recalls. Recalculate his average after every ten recalls and you will find it is steadily improving as training proceeds. For every ten recalls, you will find than five or six are faster than average. (Because of the long tail of misbehavior — a single lengthy recall considerably biases the average.)
Continuous Reinforcement (CR) — the dog is rewarded after every correct response, for example, the dog is rewarded after every sit
Ironically, continuous reinforcement is the biggest problem in reward-based education and training today. The dog receives far too many rewards, usually food. Certainly CR temporarily increases the frequency of behavior but CR is hopeless for maintaining the frequency and quality of behavior and CR is absolutely no good for improving the quality of behavior. If you reward a dog for every correct response, approximately 50% of the time you will reward the dog for above–average responses and 50% of the time you will reward a dog for below average responses. Consequently, the quality of the behavior will not improve. It is simply too silly to reward a dog for below-average responses.
To make matters worse, consistent reinforcement often causes behavior to decrease in frequency and quality. Since the dog knows that he will always be rewarded … eventually, there’s no need to hurry and so, the dog eventually does it in his own way, in his own good time and his slow sloppy behavior gets rewarded. Spoiled dog syndrome.
And it gets even worse. Rewarding the dog for every correct response makes it very difficult to phase out food rewards in training and usually, response-reliability becomes dependent on the owner having food in their hand or on their person. The dog may deign to work for you if he feels like it and if you have food, but the first time you don’t come up with a food reward, he’ll go on strike.
Basically, think food vending machine. You only use it when you want to (when you’re hungry) and if it fails to deliver food on a just a single occasion, you get mad at the machine and then never use it again.
NEVER use a continuous reinforcement schedule.
However, please do not confuse continuous reinforcement with the classical relationship between a secondary and primary reinforcer, e.g., between a click and a treat. If you click, you must always treat, however, you should progressively refine your criteria so that you click no more than 50% of previously correct responses. Also, you can never give a dog too much praise, too many hugs, or too many pieces of food when classically conditioning the dog to like people, especially children men and strangers and other dogs. But you must kick this food habit if you would like to train dogs to respond reliably to verbal cues.
Fixed Duration Reinforcement (FD) — the dog is rewarded after a specific time, for example, after every five seconds of sit-stay (FD5)
FD is no good for improving the quality of performance. In fact, FD produces marked inconsistencies in the quality of behavior — performance-quality tends drops off immediately after each reward. Performance-quality progressively improves as each expected reward-time comes closer but immediately after the dog is rewarded, attention and quality of behavior decrease (because the dog knows that the next reward is sometime in the future).
Fixed Ratio Reinforcement (FR) — the dog is rewarded after a specific number of responses, for example after every five sits (FR5)
Initially FR is very good at increasing the frequency of behavior. However, performance-quality often takes a nose-dive and the dog rushes through repetitions to get another reward. Also, if the ratio is stretched too much and too many responses are required for a single reward, the dog may slow down after being rewarded. If the ratio is stretched even more, the dog may give up altogether.
Fixed schedules are pretty hopeless in dog training. Fixed schedules are pretty useless for reliably increasing the frequency or duration of behavior and they do nothing to increase quality. Fixed schedules do nothing to specifically instruct the dog how to do better and they do not reinforce the dog for improving the quality of behavior. Performance-quality become inconsistent and usually decreases over time.
I would never use any fixed schedule of reinforcement to train a puppy. However, amazingly, the entire world’s work force is maintained on fixed schedules — FD Pay Day and FR Piece Rate. You simply cannot motivate people on an FD schedule. The reward (pay day or year-end bonus) is now expected. Quality of performance may improve as pay day or the year-end bonus approaches but you’ll still get Monday-morning mourning and January-blahs. Similarly, FR Piece Rate may increase speed of production but usually quality control takes a beating as workers rush to meet their quota. And of cause, the workers will strike if ever the quota is too much to ask for limited pay. Fixed schedules are no way to motivate and reinforce puppies, or the world’s work force.
Variable Duration Reinforcement (VD) — the dog is rewarded after unpredictable length durations, for example, for a VD5, the dog is rewarded after varying durations of sit-stay that average out to be five seconds.
Variable duration reinforcement is really good at getting dogs to perform for increasing lengths of time and preparing them to work without the prospect of reinforcement. VR makes it much easier to phase out training rewards. Since the reward-time is unpredictable, the dog’s behavior does not drop off immediately after each reward because the next one could be just one second later.
However, few people could calculate the ratio and train the dog at the same time. For example, to reinforce a dog’s sit-stay using a VD5, we would have to reward the dog after 5, 1, 7, 2, 6, 5, 9, 3, 4, and 8 seconds, for example. I can do this in my head because I am good at mental arithmetic. But what’s the point? Dog training shouldn’t comprise a mathematics test. Dog training should be relaxing and enjoyable. Much easier would be a Random Duration Reinforcement schedule. Just reward your dog after random lengths of sit-stay and progressively increase the average duration over time. Ahh! Now we’re getting there. We are going to rapidly increase the duration of stays and gradually phase out food rewards at the same time. Also, the dog will give you more attention. But … variable duration reinforcement does not the quality of performance.
Variable Ratio Reinforcement (VR) — the dog is rewarded after an unpredictable number of responses, for example, for a VR10, the dog is rewarded after varying numbers of sits that average out to be ten sits per reward.
VR reinforcement is wonderful for maintaining high frequencies of behavior for longer and longer durations and for fewer and fewer rewards. VR makes it much easier to phase out food rewards because the dog gets used to working for an increasing number of repetitions without reward. Think slot machine. What do you do when it hasn’t paid out on your last seven dollars? You take your eighth dollar and rub it and kiss it because you’re absolutely certain that this is the one. And then after three more dollars without a payout, you get five dollars back and the machine has you hooked.
Of course, we have the same problem as with all variable schedules that few human brains could calculate the schedule and train the dog ant the same time. But you know what? A Random Ratio schedule is just as good. Just reward recalls and sits at random and your dog is going to keep coming and sitting forever.
I just love the concept of random reinforcement — the notion that we can be utterly random, consistently inconsistent, a total ditz even, yet still maintain motivated levels of high frequency responding in our dogs. Love it. However, VR does not improve the quality of performance because you are still reinforcing as many below average responses as above average responses.
Differential Reinforcement (DR) — the dog is given different valued rewards that reflect the quality of the performance, for example, only reward the dog for above-average responses, give better rewards for better responses and give the best rewards for the best responses.
Years ago, I picked up my son from Montessori school and he showed me his previous night’s homework with glee — a gold star. I was furious. I explained to the teacher that the homework was rubbish and that it didn’t deserve a gold star, or a silver star, or a bronze star, or an oblong, or a triangle, or any geometric shape of any color. The homework deserved a massive red “F”. I wanted the grade to reflect the quality of the work. I wanted Jamie to realize that stellar homework was worth a gold star, but rubbish homework was barely worth the ink in the ”F”.
Right from the outset — the puppy’s very first lesson — differential reinforcement is the only way to go to continually and progressively increase the reliability, frequency, panache and pizzazz of performance. Basically, the value of the reward varies according to the quantitative and qualitative aspects of performance. As a guideline, never reward a dog for more than 50% of correct responses. Approximately 50% of responses will be below average and there is absolutely no pint in rewarding the dog for those and less you want his behavior to worsen.
For example, time a dog doing ten recalls and then work out his average recall time. Then only reward your dog for faster-than-average recalls. Recalculate his average after every ten recalls and you will find it is steadily improving as training proceeds. For every ten recalls, you will find than five or six are faster than average. (Because of the long tail of misbehavior — a single lengthy recall considerably biases the average.)