patternModerate

What is meant by the term "prior" in machine learning

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

machine-learning terminology cs stackoverflow pattern-recognition bayesian-statistics

meantthewhattermlearningpriormachine

Problem

I am new to machine learning. I have read several papers where they have employed deep learning for various applications and have used the term "prior" in most of the model design cases, say prior in human body pose estimation. Can someone explain what does it actually means. I could only find the mathematical formulation of prior and posterior in the tutorials.

Solution

Put simply, and without any mathematical symbols, prior means initial beliefs about an event in terms of probability distribution. You then set up an experiment and get some data, and then "update" your belief (and hence the probability distribution) according to the outcome of the experiment, (the posteriori probability distribution).

Example:
Assume we are given two coins. But we don't know which coin is fake. Coin 1 is unbiased (HEADS and TAILS have 50% probability), and Coin 2 is biased, say, we know it gives HEADS with probability 60%. Mathematically:

Given we have HEADS, the probability that it is Coin 1 is 0.4 $$p(H | Coin_1) = 0.4$$ and probability it is Coin 2 is 0.6 $$p(H| Coin_2) = 0.6$$

So, that is all what we know before we set up an experiment.

Now we are going to pick a coin toss it, and based on the information what we have (H or T) we are going to guess what coin we have chosen (Coin 1 or Coin 2).

Initially we assume $p(Coin_1) = p(Coin_2) = 0.5$ both coins have equal chances, because we have no information yet. This is our prior. It is a uniform distribution.

Now we take randomly one coin, toss it, and have a HEADS. At this moment everything happens. We compute posterior probability/distribution using Bayesian formula:
$$p(Coin_1 | H) = \frac{p(H | Coin_1)p(Coin_1)}{p(H | Coin_1)p(Coin_1) + p(H | Coin_2)p(Coin_2)} = \frac{0.4\times0.5}{0.4\times0.5 + 0.6\times0.5} = 0.4$$

$$p(Coin_2 | H) = \frac{p(H | Coin_2)p(Coin_2)}{p(H | Coin_1)p(Coin_1) + p(H | Coin_2)p(Coin_2)} = \frac{0.6\times0.5}{0.4\times0.5 + 0.6\times0.5} = 0.6$$

So, initially we had $0.5$ probability for each coin, but now after the experiment our beliefs has changed, now we believe that the coin is Coin 1 with probability 0.4 and it is Coin 2 with the probability 0.6. This is our posterior distribution, Bernoulli distribution.

This is the basic principle of Bayesian inference and statistics used in Machine learning.

Context

StackExchange Computer Science Q#76647, answer score: 17

Revisions (0)

No revisions yet.