top of page
Writer's picturebettoreddoc

Bayes, Base Rates, and Bad Beats: Avoiding a Common Pitfall in Probability

The Base Rate Fallacy is a psychological trap that leads us to ignore general information (the base rate) in favor of specific details. Let’s look at an example to clarify what I mean by that: Let’s say you meet a shy, thin, detail-oriented man who wears glasses and he asks you to guess his occupation. Is it more likely that he is a construction worker, or a marine biologist? Most people would say it's more probable that he's a marine biologist, but that judgment ignores the base rate. There are about 100 times more construction workers than there are marine biologists. A person chosen at random is more likely to be a construction worker, even if specific details about them seem to match the characteristics of a typical marine biologist.


While the base rate fallacy is about ignoring base rates, you might also choose the wrong base rate to reference. Imagine a rookie baseball player with a high batting average after a few games. It might be tempting to look at this performance and assume that’s his baseline, ignoring the fact that most rookies’ stats tend to level off closer to league averages over time. It might be better to look at rookies in general, or even better, rookies that had very similar characteristics to this player or were graded similarly. Understanding base rates helps you look beyond the hype and make decisions based on statistically sound information.


Acknowledging base rates is one thing, but counteracting the base rate fallacy just doesn’t come naturally. One effective method to avoid these errors is to utilize Bayesian reasoning, which explicitly accounts for base rates. Bayesian reasoning, or Bayesian statistics, is an approach that takes what we already know (called “priors”, which are often taken from the base rates) and combines it with new evidence to come to a more accurate, revised understanding. This process is grounded in Bayes’ theorem, a mathematical formula developed by Thomas Bayes in the 1700s, which essentially gives us a recipe for weighing new information in light of our priors in order to give us the new, adjusted probability (called the “posterior” probability). Bear with me on the math for a minute- Bayes’ Theorem actually turns out to be fairly easy, just plug and play. Here’s the equation:


P(A|B) = [P(A) * P(B|A)] / P(B)

P(A|B) = Probability of A conditional on B = Posterior Probability.

P(A) = Probability of A = Prior probability.

P(B|A) = Probability of B conditional on A.

P(B) = Probability of B.


In plain English, our adjusted probability based on the new information is calculated from the probability of our outcome times the probability of the new information if the outcome is true, all divided by the probability of the new information in general.


So, to use Bayes’ Theorem to find the posterior probability, you need three other probability estimates. Here’s an example relevant to sports betting:


Let’s say a basketball player, who typically plays every game, has suffered a minor injury two days before game day. Based on the type of injury, there’s initially a 20% chance that he might still play. However, we get new information from the team's medical staff the next day, who say the player participated in a full practice session. We’re interested in the new probability that he plays, given that he practiced- that’s P(A|B).


Let’s figure out those A’s and B’s to make this a little easier to think through. The A is our prior, which is the initial probability (20%) he plays in the game. We’ll write that as P(Play). B is the new information, which is the player participating in a full practice. We’ll write that as P(Practice). Now we can assign probabilities.


  1. P(Play): Your initial belief was that the player had a 20% probability of playing based on his injury. This is the number we need to adjust using the new information.

  2. P(Practice | Play): The probability that the player participated in a full practice the day before the game, assuming they play. Let's assume this probability is high, say 0.90. Most players with injuries who play in the game will have had a test run with a full practice the day before.

  3. P(Practice): The probability that this player was going to complete a full practice regardless of whether they will play in the next game. Let’s assume this is 0.25. His chances of fully practicing were a little better than playing in the game.


Now we have everything we need to plug and play, so let’s apply Bayes’ Theorem.


P(Play | Practice) = [P(Play)*P(Practice | Play)] / P(Practice)

P(Play | Practice)= (0.20.9) / 0.25 = 72%


So, based on the initial 20% estimate that he would play, and accounting for the fact that he practiced, we now estimate he has a 72% chance of playing in the game. Notice how this prevents us from rushing to the conclusion that he’s definitely going to play.

 

Also notice how changing values would change our answer. If the injury was even more minor and there was a 50% initial probability of playing, we’d also have to adjust P(Practice) to maybe 55%, but then we get a final result of 81.8%. If we make P(Practice) 99%, implying players go through a full practice even if they’re injured, then it doesn’t add any information and we get an answer of ~18%. Also realize it doesn’t make sense, at least to me, to make P(Practice) lower than P(Play) here, but it could if the practice was several days prior to the game and he would have more time to recover before game time.


Tying this back to our first example when we were guessing the man’s occupation, our prior was the relative probability of a person being a construction worker vs a marine biologist. There was initially a very low probability he was a marine biologist, but the new information about his characteristics pushed us in that direction. Let’s use Bayes’ Theorem again.


P(Marine Biologist): There are ~1.3 million male construction workers + ~13,000 male marine biologists in the U.S., so we’re looking at a sample of about 1.313 million people. The probability of one of those people being a marine biologist is 13,000/1.313m = 1% 


P(Characteristics | Marine Biologist) = Let’s say most male marine biologists have the characteristics of being thin, shy, etc. The probability that he will have those characteristics based on the fact that he’s a marine biologist might be 75%.


P(Characteristics) = We’ll say 10% of all people in the U.S. are thin and shy.


P(Marine Biologist | Characteristics) = (0.001 * .75) / 0.1 = 7.4% 


With our estimated probabilities, there is still only a 7.4% chance he is a marine biologist. Even if 99% of marine biologists have those characteristics, we still only get up to 9.8%.


By combining the prior belief with the new evidence, you get a revised, posterior belief that’s more balanced and informed. You might not stick with your prior, but you also don't take the new information at face value. Your updated belief isn’t just based on your initial assumption or the latest data in isolation but is a synthesis of both. Ultimately, the base rate fallacy can be seen as our tendency to disregard prior probabilities, leading to poor judgments. Bayesian reasoning provides a framework to continuously refine our beliefs based on the quality and quantity of information we receive, making it particularly valuable in areas like sports betting, where new data (e.g., player stats, team news, market odds) constantly emerges.



For more on Bayesian thinking and other intersections of psychology and sports betting, go to this free psychology course.


5 views0 comments

Comentários


bottom of page