3 Types of ANOVA analysis

What is Anova and when to use

Concept of Anova and different types of Anova explained in a very simple way with examples, also you will learn how to use Minitab for Anova and infer output. Anova is a very important and versatile analysis used in data analysis and analyzing relationships. Anova is used when X is categorical and Y is continuous data type.

Definition : ANOVA is an analysis of the variation present in an experiment. It is used for examining the differences in the mean values of the dependent variable associated with the effect of independent variables. Essentially , ANOVA is used as a test of means for two or more populations.

The tests in an ANOVA are based on the F-ratio: the variation due to an experimental treatment or effect divided by the variation due to experimental error.

Before we move ahead, we need to understand following four terms very clearly:

  • Dependent Variable – Analysis of variance must have a dependent variable that is continuous. This is our “Y-Total sales”, its value will depend on different levels of “X” or “Xs” in our experiment or analysis.
  • Independent Variable – ANOVA must have one or more categorical independent variable like Sales promotion. These variables are also called Factors.
  • Null hypothesis – All means are equal.
  • Factor level – Each Factor can have multiple levels like Heavy, Medium and Low are three levels of Sales promotion.

Different forms of ANOVAThere are three types of Anova analysis which we can use based on number of independent variables(Xs) and type of independent variables. But your dependent variable(Y) will remain continuous always.

Fig 1 explains the types of Anova with an example. In this example “Y” is total sales of a general store in $ which is a continuous variable and it is common for the three examples.

Eta square : The strength of the effects of X on Y is measured by Eta square. The value of Eta square varies between 0 and 1.

F Statistic : The null hypothesis that the category means are equal in the population is tested by an F statistic based on the ratio of mean square related to X and mean square related to error.

Mean square : The mean square is the sum of squares divided by the appropriate degrees of freedom

SS(between) : This is the variation in Y related to the variation in the means of the categories of X. this represents variation between the categories of X or the portion of the sum of squares in Y related to X.

SS(within) : Also reffered to as SS(error), this is the variation in Y due to the variation within each of the categories of X. This variation is not accounted for by X.

SS(y) : The total variation in Y.

Objective: To test the effect of cause X on the CTQ Y

Usage: When cause X is Categorical (grouped) & CTQ Y is Continuous Data

  • A project was taken to Reduce the Processing Time.
  • One of the causes suspected was lack of experience.

The following data on processing Time was collected with 3 levels of Experience. Analyze the data and verify whether lack of experience is a cause of high Processing Time