Hypothesis testing is explained here in simple steps and with very easy to understand examples. Hypothesis testing is the fundamental and the most important concept of statistics used in Six Sigma and data analysis. Also explained is the p-Value and how to interpret it. We have tried here to avoid manual calculation and make sure that you understand the concept.
A statistical hypothesis test is a method of making statistical decisions using data. It is sometimes called confirmatory data analysis.
Why do Hypothesis Testing
- To improve processes, there is a need to identify Xs which impact the mean or standard deviation.
- Once these Xs are identified and adjustments are made for improvement, actual improvement needs to be validated
- Sometimes it cannot be decided graphically or by using calculated statistics (sample mean and standard deviation) if there is a statistically significant difference between processes (Pre & Post)
- In such cases, the decision will be subjective.
- Hence the need to perform a formal statistical hypothesis test to decide objectively if there is a difference.
Hypothesis Testing – definition
- A set of statistical tools that quantifies your confidence about the ‘real’ difference based on the measurements.
- It is a method of making a statistical decision using experimental data.
- This is also called as Statistical Significance testing.
Hypothesis Testing – key concepts
Hypothesis Testing is done to help determine if the variation between or among groups of data is due to true variation or if it is the result of sample variation. With the help of sample data we form assumptions about the population, then we have test our assumptions statistically. This is called Hypothesis testing.
- Ho = Null Hypothesis
Statement of ‘no effect’ or ‘no difference’ or the “status quo”.
- Ha = Alternative Hypothesis
Statement/claim assumed to be true and we are trying to prove it to be true.
The burden of proof rests with Ha.
Testing a hypothesis is similar to a court trial. The hypothesis is that the defendant is presumed not guilty until proven guilty. A null hypothesis can only be rejected or fail to be rejected, it cannot be accepted because of lack of evidence to reject it.
If the means of two populations are different, the null hypothesis of equality can be rejected if enough data is collected. When rejecting the null hypothesis, the alternate hypothesis must be accepted.
Real and Measured difference
- Real Difference – ‘Real’ difference is the difference that will be there if you measure everything, also in the future. This is called ‘population’.
- Measured Difference – ‘Measured’ difference is the difference that you calculate based on the results of your test (or sometimes historical data), called ‘samples’.
Difference between Null and Alternate hypothesis:
It is summarized here because it is very important to understand the difference
Steps in Hypothesis Testing
As step 1, let us take an example and learn how to form the null and alternate hypothesis statements.
- The histograms below show the weight of people of countries A and B.
- Both samples are of size 250, the scale is the same, and the unit of measurement is Kilograms.
Question : Is the people of country B, heavier than that of country A?
In the previous section, we have read that Null hypothesis is about the status quo or no difference. So here also the Null hypothesis will be µA = µB (mean of country A=mean of country B), this means in simple words that there is no significant difference between the average weight of country A and B.
The hypotheses are always statements about the population parameters
- Formulate Null Hypothesis (Ho)
Ho:The Weight of citizens in country A is equal to the weight of citizens in country B (µA = µB)
- Formulate an Alternative Hypothesis (Ha)
Ha:The weight of citizens in country A is not equal to the weight of citizens in country B (µA = µB)
- Test Alternative Hypothesis with Statistical Test (2 sample T test can be used here)
- Based on the test result, reject or accept the null hypothesis Ho
Now that we have formed the hypothesis, we have to decide the statistical test which we need to perform to test the hypothesis. We are not covering some of the related concepts like one-tail and two-tail test, alpha and beta risks, here to keep the topic simple ensuring that the reader understands the basic concepts and able to perform the tests.
- Before deciding the test type, we need to identify the data types of Y and X’s
- Then we need to see whether we have to check mean, variance, proportion etc.
- One simple example is shared in figure below, so if Y is discrete and X is also discrete then we can use Chi-square
- After identifying the test type, we can use different software packages to run the test. You can use Minitab, SPSS or R.
Minitab is a popular application used for applied statistics and when we perform any statistical test, we get P-value as one of the output. Key concepts which will help you to interpret the test output:
- Minitab will calculate P-value for the Ho hypothesis
- P-value is a measure of how much evidence we have against the null hypotheses. The smaller the P-value, the more evidence we have against Ho.
- The P-value can be between 0 and 1
I hope you have understood the above concept and if you want to learn more such tools then go for a Six Sigma course from Simplilearn. The course is aligned to IASSC and ASQ exam, integrates lean and DMAIC methodologies using case studies and real-life examples.
There is another good online Six Sigma Green Belt course from Coursera. This course is from University System of Georgia and is well recognized.
If you want to learn new age data science techniques, then one good starting point is Data Science course from Simplilearn. Data Science is emerging very fast and early movers will always have advantage.
SUBMIT YOUR QUERY PLEASE CLICK HERE