How would you explain the concept of p-value to a layman ?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
It will take less than 1 minute to register for lifetime. Bonus Tip - We don't send OTP to your email id Make Sure to use your own email id for free books and giveaways
Answers ( 24 )
If you are into Data Science, then you must have heard about p-value.
I could have started it with a very superficial definition strolling around probability and significance and null hypothesis, etc. But that’s already there on multiple blogs.
We want to simplify this term in order to make you “understand” rather than remember things.
We will start with null hypothesis. What is null hypothesis?
So, Nitin was the monitor of Class VIII B, He has one job i.e. to write the name of those classmates who used to make noise in the absence of the teacher.
One day he wrote the name of Tahseen on the black board. Teacher asked Tahseen whether he was making any noise?
As usual Tahseen denied. Now, the teacher had to either believe the monitor or Tahseen.
He assumed that Tahseen did not make the noise, why? because it’s easier to disprove this.
See, it’s always to disprove something with an example than to prove something. Example, If the teacher catches Tahseen making noise then the Null Hypothesis i.e. Tahseen did not make the noise will be dispropved.
But if we take the null hypothesis as “Tahseen made noise” and you did not catch him making noise on an instance then that does not mean that the null hypothesis is proved.
Coming back to the question
Teacher had this null hypothesis – Tahseen did not make the noise
Alternate hypothesis – Tahseen made noise
Now again the next day Nitin complained that Tahseen was making noise which was again denied by Tahseen.
On the next three days also his name was written on the black-board. Now the teacher has reached a threshold where he can say with confidence that “Dude, you were making noise because you have reached a benchmark of complains and it is statistically significant to prove that my null hypothesis was wrong. Thank you Nitin :)”
This statistical significance is p-value which is nothing but a benchmark set before starting the experiment.
In general a p-value <0.05 is treated as statistically significant which means that there is 95% confidence of rejecting the null hypothesis.
To understand what p-value is, you first need to understand what a null hypothesis is. When running a hypothesis test/experiment, the null hypothesis says that there is no difference or no change between the two tests. The alternate hypothesis is the opposite of the null hypothesis and states that there is a difference between the two tests. The goal of the experiment is usually to disprove the null hypothesis, and to prove/test the alternate hypothesis. Let me illustrate this with some examples.
If you are trying to test whether a new marketing campaign generates more revenue, the null hypothesis is that there is no change in the revenue as a result of the new marketing campaign. And the alternate hypothesis is that the new marketing campaign performs better (or worse) than the previous campaign. If you are trying to prove that a new drug lowers cholesterol, the null hypothesis states that there is no difference in cholesterol between the group with the drug and the group without, while the alternate hypothesis states that the new drug does have an effect on cholesterol levels. If you are trying to test whether a new server version has better or worse performance than the previous version, the null hypothesis is that both server versions have equal performance. And the alternate hypothesis is that there is a meaningful difference in the performance of the old and new server.
So what is the simple layman’s definition of p-value? The p-value is the probability that the null hypothesis is true. That’s it.
In the example where we are trying to test whether a new marketing campaign generates more revenue, the p-value is the probability that the null hypothesis, which states that there is no change in the revenue as a result of the new marketing campaign, is true. If the value of the p-value is 0.25, then there is a 25% probability that there is no real increase or decrease in revenue as a result of the new marketing campaign. If the value of the p-value is 0.04 then there is a 4% probability that there is no real increase or decrease in revenue as a result of the new marketing campaign. As you can surmise, the lower the p-value, the more confident we are that the alternate hypothesis is true, which, in this case, means that the new marketing campaign causes an increase or decrease in revenue.
So what do p-values really tell us? p-values tell us whether an observation is as a result of a change that was made or is a result of random occurrences. In order to accept a test result we want the p-value to be low. How low you ask? Well, that depends on what standard you want to set/follow. In most fields, acceptable p-values should be under 0.05 while in other fields a p-value of under 0.01 is required. This cut-off number is known in statistics as the alpha, and results from experiments with p-values below this threshold are considered to be statistically significant. So when a result has a p-value of 0.05 or lower we can say that we are 95% confident that there is an actual difference between the two observations as opposed to just differences due to random variations. And as a result, we have reasonable grounds to support the alternate hypothesis and reject the null hypothesis.
Suppose a child in the family goes to the school daily and one day, his teacher writes to his mother in the school diary that your son is very naughty and he was found fighting with another kid. This situation is quite common in schools and like any mother, this child’s mother says no my son is not naughty, it must have been the other kid who provoked him. Simple…agreed?
Now, if you wish to do a Hypothesis test whether this child is really naughty or not, you would presume that he is actually naughty (that would be your hypothesis of interest) but you would try to formulate another hypothesis opposite to this (i.e. your Null Hypothesis) that says that the child is not naughty.
So your two hypotheses would be:
Null: The child is not naughty
Alternative: The child is naughty
Then you would try to find evidence by collecting data and putting it to test.
Coming back to the story from the child’s school…
After some days, there is another complaint from the school of this child fighting with another kid and, again the mother of the kid in question does not accept the fault of her child.
The situation repeats itself again with this kid the third time with a third kid and now the mother becomes suspicious as to probably her child is really naughty.
Interestingly, this third instance in our example actually becomes what is Statistically called the Threshold of Significance (or the level of significance).
When the same child again is reported to fight with another kid, the mother has no other option but to accept that her child is really naughty and he finds ways to fight with other kids. This is actually when it is said that the evidence is “Statistically Significant”. This is accepted to be significant since it has occurred beyond the level of significance.
The above situation is purely hypothetical but could be a subject of a Statistical analysis somewhere.
But how P value is answered here?
The probability of null hypothesis.
Basically p- value is a probability value which gives you evidence against a certain assumption.
So the lower the p- value, the more the evidence that the assumption is wrong.
As it is a probability value it ranges from 0 to 1
basically p value is the probability of observing the statistic as extreme as the observed one given in the space where the NULL Hypothesis is true
Assuming the layman knows concept of probability
Let’s say we are trying to test whether a new marketing campaign generates more revenue, the p-value is the probability that the null hypothesis, which states that there is no change in the revenue as a result of the new marketing campaign, is true. If the value of the p-value is 0.25, then there is a 25% probability that there is no real increase or decrease in revenue as a result of the new marketing campaign. If the value of the p-value is 0.04 then there is a 4% probability that there is no real increase or decrease in revenue as a result of the new marketing campaign. As you can surmise, the lower the p-value, the more confident we are that the alternate hypothesis is true, which, in this case, means that the new marketing campaign causes an increase or decrease in revenue.
The next question is how low should be the p-value?
Well, that depends on what standard you want to set/follow. In most fields, acceptable p-values should be under 0.05 while in other fields a p-value of under 0.01 is required. This cut-off number is known in statistics as the alpha, and results from experiments with p-values below this threshold are considered to be statistically significant.
Sorry it a private answer.
P value is the lowest significance level at which null hypothesis can be rejected. It is also known as exact Probability of commuting a Type I error. It is always better to give up fixing alpha at some level and simply choose the p value of the test statistic.
Definition :
We define p-value as the probability of obtaining a sample like ours or more extreme than that of ours given that the null hypothesis is true.
Approach:
We assume the null hypothesis to be true and then determine how different our sample really is. If it’s not that different, then we reject the alternate hypothesis (the hypothesis we’re trying to prove).
A low p-value indicates that by sheer luck alone, it’d be highly unlikely to procure a sample like that of ours if the null hypothesis were to be true. So we reject the null hypothesis and work towards the alternate hypothesis.
A high p-value indicates that it’s quite normal to procure a sample like that of ours given that the null hypothesis is true. Hence, the null hypothesis holds.
Let’s take an example of Horlicks. You want to try giving the container a new look. You are trying to test whether a new look is attractive enough to generate more revenue, then we have the null hypothesis that there is no change in the revenue as a result of the new look. And the alternate hypothesis is that the new look performs better (or worse) than the previous look.
So in simple terms, the p-value is the probability that the null hypothesis is true.
In the example we are trying to test whether a new look generates more revenue, the p-value is the probability that the null hypothesis, which states that there is no change in the revenue as a result of the new look, is true. If the value of the p-value is 0.25, then there is a 25% probability that there is no real increase or decrease in revenue as a result of the new look. If the value of the p-value is 0.04 then there is a 4% probability that there is no real increase or decrease in revenue as a result of the new look. Lastly, the lower the p-value, the more confident we are that the alternate hypothesis is true, which, in this case, means that the new look causes an increase or decrease in revenue.
Sorry it a private answer.
Sorry it a private answer.
Assume an investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index. To determine this, the investor conducts a two-tailed test. The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent. (If the investor conducted a one-tailed test, the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.)
One commonly used significance level is 0.05. If the investor finds that the p-value is less than 0.05, then there is evidence against the null hypothesis. As a result, the investor would reject the null hypothesis and accept the alternative hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis. Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude the portfolio’s returns and the S&P 500’s returns are not be equivalent.
Conversely, a p-value that is greater than 0.05 indicates that there is (at best) weak evidence against the conjecture, so the investor would fail to reject the null hypothesis. In this case, the differences observed between the investment portfolio data and the S&P 500 data are explainable by chance alone
To understand the concept of p value, we first need to understand what is hypothesis testing. Hypothesis testing is basically evaluating 2 mutually exclusive statements on population using sample data. There are steps required to perform it.
1. Make an initial assumption which is H0 (Null Hypothesis) – A person is innocent.
2. Collect data – Collecting data is nothing but evidences.
3. Gather evidences to reject or not reject null hypothesis.
H1 (Alternate Hypothesis) – A person is not innocent.
A p value is nothing but a significance value. It is in between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random). Therefore, we reject the null hypothesis, and accept the alternative hypothesis.
A p-value higher than 0.05 (> 0.05) is not statistically significant and indicates strong evidence for the null hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis.
P Value tells us how likely it is to get a result like this if null hypothesis is true.
Example:
Ramesh is a dried fruit sweets manufacturer and he specified that his sweet contains 30 gram cashews, but customers are complaining that they are getting cashew less than 30 grams in sweets.
Now Ramesh cannot open every sweet packet that he manufactured and check for complaints, so he took a sample from the manufactured sweets.
Lets assume he had manufactured 100 sweets out of which he took 30 samples for testing.
So here are two hypothesis conditions:
• Null Hypothesis (H0): Sweet contains 30 gram cashews.
• Alternate Hypothesis (H1): Sweet does not contains 30 grams cashews.
So ,for 30 samples Ramesh will calculate mean of amount of cashews present in sweets.
And, he decides significance value of 0.05.
Now on the basis of mean value he will find P Value.
If P value greater than 0.05 then our Null Hypothesis (H0) will be true.
If P value less than or equal 0.05 then our Alternate Hypothesis (H1) will be true.
P-value : the area outside the confidence interval is called p-value. where confidence interval is an interval where there is maximum occurrence of data points.
P-value tells us if the Null Hypothesis is accepted or not accepted.
for example Ho = person having cancer (Null hypothesis)
Ha = person not having cancer
and we are taking confidence interval as 95%
now here p-value is 0.05 i.e. 0.05 now if,
p-value is less than 0.05 than Ho is not accepted (p-value low Null will go)
p- value is more than 0.05 than Ho is accepted (p-value high Null will fly)
All the above examples more or less accurately explain about the p-value in layman terms.
Here is my learning from them –
Let’s say we are conducting a lab experiment to gauge the effect of a certain medicine on rats.
p-value determines the statistical significance of the validity of these results.
To confirm that, we generally form a Null Hypothesis which states that the experiment is not having any effect on rats.
And an Alternate Hypothesis stating that there is some effect on rats.
A lower p-value (mostly < 0.05) indicates that there is less than 5% chance of the observed event occurring merely through randomness. That means our Alternate Hypothesis holds some significance and we can easily reject our Null Hypothesis and vice-versa.
P- value is the probability value of a null hypothesis to be true.
Now, what is null hypothesis?
When conducting a hypothesis test or experiment, a null hypothesis says there is no difference or change between two tests.
Alternate hypothesis, as the name suggests, says there is a difference between the two tests.
For example, if we order from dominos it claims that the pizza will be delivered within 30 minutes.
Now the null hypothesis is the ideal scenario that is the pizza will be delivered within 30 minutes.
But this is not always the case due to traffic jams, accidents, etc.
So the alternate hypothesis will be that the pizza will not be delivered within 30 minutes.
Now we run our experiment on a certain sample of people say 100.
And we take a 5% level of significance, that is 0.05 is our cutoff to accept our reject our claim.
Now if p value is less than 0.05 then we reject our null hypothesis. Therefore we can reject the claim and out result is statiscally significant.
If p value is greater than 0.05, then we fail to reject our null hypothesis.
P- value is the probability value of a null hypothesis to be true.
Now, what is null hypothesis?
When conducting a hypothesis test or experiment, a null hypothesis says there is no difference or change between two tests.
Alternate hypothesis, as the name suggests, says there is a difference between the two tests.
For example, if we order from dominos it claims that the pizza will be delivered within 30 minutes.
Now the null hypothesis is the ideal scenario that is the pizza will be delivered within 30 minutes.
But this is not always the case due to traffic jams, accidents, etc.
So the alternate hypothesis will be that the pizza will not be delivered within 30 minutes.
Now we run our experiment on a certain sample of people say 100.
And we take a 5% level of significance, that is 0.05 is our cutoff to accept our reject our claim.
Now if p value is less than 0.05 then we reject our null hypothesis. Therefore we can reject the claim and out result is statiscally significant.
If p value is greater than 0.05, then we fail to reject our null hypothesis.
P-value : the area outside the confidence interval is called p-value. where confidence interval is an interval where there is maximum occurrence of data points.
P-value tells us if the Null Hypothesis is accepted or not accepted.
for example Ho = person having cancer (Null hypothesis)
Ha = person not having cancer
and we are taking confidence interval as 95%
now here p-value is 0.05 i.e. 0.05 now if,
p-value is less than 0.05 than Ho is not accepted (p-value low Null will go)
p- value is more than 0.05 than Ho is accepted (p-value high Null will fly)
P-Value is the smallest level of significance at which we can still reject the null hypothesis, given the observed sample statistics.
Let’s say we have a dataset wherein we rejected the null hypothesis at significance levels of 0.05 and 0.01. The p-value comes in picture when we wanted to know how much lower can we go after which we can’t reject the null hypothesis.
P-value is a universal concept and it works with almost all distributions.
The null hypothesis is rejected if the p-value is lower than the level of significance.
p-value is the probability for null hypothesis to be true. it is always ranges between 0 to 1.
if higher the p value their more chance of null hypothesis getting selected.