Two-Way Analysis of Variance (ANOVA)
An understanding of the one-way ANOVA is crucial to understanding the two-way ANOVA, so be sure that the concepts involved in the one-way ANOVA are clear. Important background information and review of concepts in ANOVA can be found in Ray Ch. 9, so be sure to read that chapter carefully.
The sort of experiment that produces data for analysis by a two-factor ANOVA is one in which there are two factors (independent variables). In Ray’s example (p. 182 ff.), an
experimenter is interested in assessing the impact of housing (the first factor) and feeding schedule (the second factor) on errors made in running a maze (the dependent variable). In this experiment, the housing factor can take on two levels (enriched or standard) and the feeding schedule can take on two levels (ad lib or once a day). Thus, this experiment is a 2x2 independent groups design, which means that there are 4 unique conditions to the experiment. Of the 40 mice in the experiment, 20 are randomly assigned to the enriched housing and 20 are assigned to the standard housing. Of the 20 mice assigned to the enriched housing, 10 are fed ad lib and 10 are fed once a day. Likewise, of the 20 mice in the standard housing, 10 are fed ad lib and 10 are fed once a day. Schematically, the design would look like the table below:
Ad Lib Feeding n = 10 n = 10
Once a Day Feeding n = 10 n = 10
Of course, we could conduct two separate experiments with our 40 mice (or think of this experiment as two separate one-way independent groups analyses). For instance, we could put 40 mice into a single factor experiment, with 20 exposed to enriched housing and 20 exposed to standard housing. We would be testing the simple H: ; = ;. Were we 0EnrichedStandard
to do so, our source table would look like this:
Source SS df MS F
Housing 640 1 640 33.13
Error 734 38 19.3
Total 1374 39
These results would lead us to reject H and conclude that there was a significant effect of 0
housing, F(1,38) = 33.13, MSE = 19.3, p < .05. The mice in the enriched environment make
significantly fewer errors (M = 10) than those in the standard environment (M = 18).
Alternatively, we could imagine the 40 mice in a different second single factor experiment, with 20 exposed to ad lib feeding and 20 exposed to once a day feeding. In this case, we would be testing the simple H: ; = ;. Were we to do so, our source table 0AdHocOnce
would look like this:
Two-Way ANOVA - 1
Source SS df MS F
Feeding 0 1 0 0
Error 1374 38 36.2
Total 1374 39
These results would lead us to retain H and conclude that there was no significant effect of 0
feeding, F(1,38) = 0, MSE = 36.2, p > .05. The mice fed once a day did not differ in number
of errors (M = 14) compared to those fed on an ad hoc basis (M = 14).
The advantage of a two-factor design is that not only can we assess the independent impact of our two factors (as in the two separate single-factor designs), but also we can assess the interaction of the two factors in their effect on the DV. Thus, with the same data we would be able to test three different null hypotheses:
Null Hypothesis Alternative Hypothesis
: Not H HH: ; = ; 100enriched housingstandard housing
: Not H HH: ; = ; 100ad lib feedingonce-a-day feeding
H: no interaction between the two factors H: Not H 010
The concept of interaction is a difficult one, but it is essential that you come to grasp the concept. Here’s one definition, of an interaction: An interaction occurs when the effect of
one of the factors is not the same across all levels of the other factor. (Does that make sense
to you, even after re-reading it several times?) Read the portion of Ray’s chapter on
interaction (p. 182 - 195), and we’ll return to a discussion of interaction effects shortly. Before doing so, however, let’s complete the analysis of the data provided by Ray.
First of all, let’s look at the summary table of the mean number of errors made by each group of mice.
Enriched Housing Standard Housing Marginal Means
Ad Lib Feeding 6 22 14
Once a Day Feeding 14 14 14
Marginal Means 10 18
Which two means do we compare to test the null hypothesis about the Housing factor? You should see that we would compare the marginal means for housing (10 vs. 18), because 10 represents the mean error score for all of the mice who were raised in the enriched housing and 18 represents the mean error score for all of the mice who were raised in standard housing. Likewise, to test the null hypothesis regarding Feeding, we would compare the marginal means for feeding (14 vs. 14). On the face of it, it would certainly appear that there is no
Two-Way ANOVA - 2
difference in number of errors whether we chose to feed the mice in an ad lib fashion or once a day. Well, let’s actually compute the ANOVA using SPSS and then see how we would
interpret the results.
You would need 3 columns to enter your data as seen below on the left (but only a portion of the data is shown). You would need a column for each factor (just as you would for a one-way ANOVA) and then use unique names or numbers to define the levels of the factor. In this case, I’ve chosen to use names as labels for the levels of the two IVs, so the first two columns are string variables. (This approach wouldn’t work with One-Way ANOVA
in SPSS, but is just fine for the procedure you’ll use for a two-way ANOVA.) The final
column holds the Error scores for each mouse. So the first mouse is raised in Enriched Housing and received Ad Lib Feeding, which leads that mouse to make 6 errors on the maze.
Once your data are entered, choose General Linear Model and then Univariate…
from the Analyze menu. Doing so will prompt the window seen above on the right. I’ve dragged Errors into the Dependent variable slot and both Feeding and Housing into the
Fixed Factor(s) slot. Before proceeding, however, you need to make some additional choices. On the right of the window, you’ll notice buttons for Options… and Plots… Clicking on
those buttons will reveal the windows seen below.
Note in the Options window on the left that I’ve chosen a number of options by
clicking on appropriate boxes. Moreover, so that I’ll be able to see means for the interaction
Two-Way ANOVA - 3
as well as the main effects, at the top of the window I’ve moved variables from the left to the right. As you’ll see later, choosing Descriptive statistics will give me the information for each of the four conditions (the interaction means), but otherwise I’d need to compute the marginal means. Choosing to display all the means may make my life easier later on.
Note in the Plot window on the right that I’ve placed the Feeding variable on the X axis and the Housing variable as separate lines in the graph. The cursor arrow is poised above the Add button and you need to actually click on that button to produce the graph. Then, click on the Continue button.
Below you’ll find the Source Table and the Descriptive statistics.
First of all, you should note that the source table is fairly complex. That’s because it includes a lot of information that you can readily ignore. First of all, you need look only at the rows for the two main effects, the interaction, the error term, and the total. Thus, you can ignore the rows labeled Corrected Model, Intercept, and Total. To the right of the source table, Partial Eta Squared is an effect size measure (closer to 1.0 is better) and an estimate of Power (closer to 1.0 is better). (You can ignore the Noncentrality Parameter). Looking at the Significance levels tells us that we can reject two of our null hypotheses, while retaining the remaining null hypothesis. Because the p-value for Housing is less than .05, we would reject
H for Housing. Because the p-value for Feeding is greater than .05, we would retain H for 00
Feeding. Finally, because the p-value for the AB (Housing x Feeding) interaction is less
than .05, we would reject that H. 0
Examining the output shows the Levene test, which indicates that there is little reason to be concerned that the data violate the homogeneity of variance assumption (p > .05). As a result, it seems reasonable to use the standard alpha-level of .05.
Two-Way ANOVA - 4
In a multi-factor ANOVA where interactions are present, we need to concentrate first on explaining the interaction, so let’s do so here. What has produced the significant interaction between Housing and Feeding? To understand the source of the interaction, we need to look at the means that are unique to the 4 conditions (that is, those in the interior of our means table). For instance, what happens to the error scores for mice in the Enriched Housing when we compare those getting Ad Lib Feeding with those getting Once-a-Day Feeding? As you should be able to see, the error scores are greater (M = 14) when the mice are fed once a day
compared to error scores for mice fed on an ad lib basis (M = 6). So when raised in an
enriched environment, mice do better when fed on an ad lib basis. Is that also true of mice raised in a standard environment? No, it is not. As you can see, for mice raised in the standard housing, errors were higher when they were fed on an ad lib basis (M = 22) compared to the
once-a-day feeding (M = 14). This is what we mean by the effects of one factor not being the same at all levels of the other factor. In this case, error scores are higher for ad lib feeding
compared to once-a-day feeding for mice in an standard environment but error scores are
lower for ad lib feeding compared to once-a-day feeding for mice in the enriched environment. (Note that in describing an interaction, you’ll typically say ―but,‖ ―however,‖ ―on the other hand,‖ etc.)
So far, all that we’ve done is to look at the means to determine the pattern of the interaction. To determine the statistical significance of these differences, we need to compute Tukey’s HSD:
Before we use Tukey’s HSD, however, let’s look at the interaction graphically. When an interaction is present, the lines used to connect the conditions in a graph will not be parallel, as seen in the graph below left. Unfortunately, SPSS uses color lines, which don’t come out well on a black-and-white copy. Thus, I’ve used another piece of software (Kaleidagraph) to generate a graph that comes out equally well in color or black-and-white.
Can you see how to translate the table of means into the graphs seen above? One factor (Feeding) is shown on the x-axis, while the other factor (Housing) is shown within the body of the graph by using different symbols. The y-axis is used to show scores on the DV (Errors).
Two-Way ANOVA - 5
Lines are used to connect the means for the two levels of the Housing factor. That is, one line connects the two means for Standard Housing (22 and 14) and another line connects the two means for Enriched Housing (6 and 14). The fact that those two lines are not parallel is an indication that there is an interaction between the two factors.
If we were to use Tukey’s HSD to analyze the means portrayed in the graph (or in the table), we would arrive at the same conclusion we’d arrived at simply by eye-balling the data.
(That won’t always be the case.) That is, we could look at the effect of feeding under
Enriched Housing (called a simple effect), where we would learn that the error scores are greater (M = 14) when the mice are fed once a day compared to error scores for mice fed on an ad lib basis (M = 6), because the difference (14 – 6 = 8) is greater than the critical mean
difference of 1.94. Examining the other simple effect (effect of feeding at Standard Housing) we find that when mice are raised in Standard Housing, the errors were higher when they
were fed on an ad lib basis (M = 22) compared to the once-a-day feeding (M = 14), because
the difference is greater than the HSD critical mean difference of 1.94.
It would also be possible to consider the interaction from a different perspective. That is, you could look at the simple effect of housing at Ad Lib feeding (finding that mice make significantly more errors under Standard Housing compared to Enriched Housing) and the simple effect of housing at Once-a-Day Feeding (finding that mice make an equal number of errors regardless of type of housing). Note that you would be telling a different ―story,‖ but it would still be an ―interaction‖ story. That is, you could say that when mice are fed on an ad lib basis they make significantly more errors when they live in standard housing compared to enriched housing. However, when they are fed on a once-a-day basis, the mice make
Now that we’ve gotten a good understanding of the source of the interaction, we can look at the two main effects (independent effects of each factor). First of all, would you be willing to conclude that there is no effect of the Feeding factor simply because there is no main effect for that factor? You should see that although there is no difference at all between the means for the two levels of that factor (both are 14) that is not an indication that there is
no effect of Feeding. In fact, Feeding is a very influential factor, but it works very differently in each of the two environments examined in this experiment, and the effects cancel one another out. So although there is no main effect for Feeding (because the P-value for the main effect is > .05), the presence of the significant interaction tells us that Feeding is, in fact, an influential factor.
So let’s look at the other main effect (Housing). On the basis of the significant main
effect, you might be tempted to conclude that Standard Housing (M = 18) leads to
significantly more errors that Enriched Housing (M = 10). Would you be justified in that
global conclusion? What about mice who are fed once a day? Once again, you should see that the presence of the interaction would lead us to qualify our interpretation of a main effect. Mice who are fed on an ad lib basis do make far fewer errors when raised in Enriched Housing
compared to those raised in Standard Housing, but that difference disappears when mice are fed once a day.
Two-Way ANOVA - 6
A complete interpretation of the outcome of this experiment, then, would hinge on the interpretation of the interaction. When a significant interaction is present, if you explain the interaction well, you will have made great strides toward interpreting the outcome of the experiment. In this case, in your Results section, you might say something like:
There was a significant main effect of housing, F(1,36) = 245.106, MSE = 2.611, p
< .001. There was also a significant interaction between housing and feeding, F(1,36) =
245.106, p < .001. Post hoc analyses using Tukey’s HSD indicated that mice raised in an enriched environment make fewer errors on a maze when fed on an ad lib basis (M = 6)
compared to mice fed once a day (M = 14). However, mice raised in a standard environment
make fewer errors on a maze when fed once a day (M = 14) compared to mice fed on an ad lib
basis (M = 22). So, to enhance learning, your feeding should depend on the mouse’s housing.
In your Discussion section, you’d want to talk about why you think those results
emerged. That is, you might hypothesize that when the mice are fed on an ad lib basis, they don’t worry about where their next meal is coming from, so they have more time to explore
and benefit from the enriched environment. Because the standard environment is so boring, those in that condition simply sit around waiting for their next meal. On the other hand, mice fed once a day may spend a portion of the day hungry and focused more on trying to find food. As a result, they don’t take advantage of the enriched environment, but they move around both environments looking for food.
Another (more complex) Example
The experiment described above is the simplest possible two-way design (2x2). You should also be able to analyze and interpret experiments in which the two factors might each contain more than two levels (e.g., 2x3, 2x4, 3x3, 3x4). In the above example, you may be able to completely interpret the outcome of the experiment without recourse to post hoc tests (if the interaction is not significant). However, with increasingly complex designs (more than two levels of either or both factors) you would have to use Tukey’s test to completely analyze
For example, Schachter, Christenfeld, Ravina and Bilous (1991) studied the presence of speech fillers (um, ah,...) in faculty from different disciplines (Natural Science, Social Science, and Humanities). [You should note that this factor is a non-manipulated participant characteristic.] Schacter et al. thought that when lecturing in a classroom setting, faculty in more ―precise‖ disciplines would be inclined to use fewer fillers. However, when being interviewed (e.g., about their ongoing work with graduate students), Schachter et al. thought that there would be little difference among the faculty. I’m going to make up a little data set (below left) that would be consistent with the results they obtained and then analyze it using SPSS and then interpret the results. Just so you see how the data would be input in SPSS, below right is a portion of the data file. For the Situation under which the data are collected, I’ve used the labels Lecture and Interview. For the Discipline, I’ve used the abbreviated labels
Nat Sci, Soc Sci, and Human.
Two-Way ANOVA - 7
Nat Sci Soc Sci Hum Nat Sci Soc Sci Hum
1 4 4 5 4 5
2 4 5 5 5 5
1 4 4 5 5 6
1 4 5 6 5 5
2 4 5 6 5 6
Below you can also see the source table and the means table for the ANOVA. (I’ve also
included a graph.)
Note that because there are only two levels of the Situation factor, you would be able
to reject the H and conclude that there were significantly more fillers in the Interview than in 0
the Lecture. But because there is a significant interaction, we would probably want to hold off
on that interpretation until we’d explained the interaction, so let’s try to do so now.
To see which of the 6 particular means differed, we would have to compute Tukey’s
HSD. We know what MS is (.233) and we know what n is (5), so we only have to look up Error
q to compute HSD. In this case, because there are 6 unique conditions, we would look up q
with 6 treatments and 24 df (so q = 4.37). Error
Two-Way ANOVA - 8
Armed with that value, we can now interpret the interaction. First, looking at the Interview Situation, none of the means for the three Disciplines (5.4, 5.4, and 4.8) are significantly different, because their differences are all less than .94. However, looking at the Lecture Situation, both the Social Science (M = 4.0) and Humanities (M = 4.6) faculty used
significantly more fillers than the Natural Science faculty (M = 1.4), although they don’t differ
from one another.
Another way to look at the source of the interaction is to compare each discipline at the two different situations. That is, we would be examining the simple effects of Situation at each level of Discipline. Using this approach, we can see that faculty in the Natural Sciences use significantly more fillers when in the Interview Situation (M = 5.4) than in the Lecture
Situation, (M = 1.4) but faculty in the Social Sciences or Humanities don’t differ in the number of fillers present in the Interview Situation or in the Lecture Situation (although these differences appear to be almost significant, and would become significant with more power). Notice that whichever approach we take, we get a sense of the source of the interaction. That is, the effects of Situation are not the same at all levels of Discipline (or the effects of Discipline are not the same at all levels of Situation).
Once we’ve interpreted the interaction, you should see that the main effect for Discipline is of little interest. Although we could use a Tukey’s HSD to see which Disciplines differ in the number of fillers present, the analysis of the interaction showed us that the Disciplines differed only in the Lecture Situation and not in the Interview Situation, so we should be reluctant to make any overall statement about differences among the Disciplines. Note that the same logic might lead us to be uncomfortable stating that there were significantly more fillers in the Interview Situation compared to the Lecture Situation (but the fact that the differences were almost significant might lead us to conclude that with a larger sample size we might well find that faculty use significantly more fillers in an Interview Situation compared to a Lecture Situation.
Thus, for this experiment, it appears that the results are entirely consistent with the predictions of Schacter et al., which essentially predicted an interaction. One might report the results as:
There was a significant main effect of discipline, F(2,24) = 28.0, MSE = .233, p < .001.
There was also a significant main effect of situation, F(1,24) = 112.0, p < .001. There was
also a significant interaction between discipline and situation, F(2,24) = 36.571, p < .001. Post
hoc analyses using Tukey’s HSD indicated that in the Interview Situation, the mean number of fillers for the three Disciplines do not differ (M = 5.4, 5.4, and 4.8 for Humanities, Social
Sciences and Natural Sciences respectively). However, in the Lecture Situation, both the Social Sciences (M = 4.0) and Humanities (M = 4.6) faculty used significantly more fillers than
the Natural Science faculty (M = 1.4), although they do not differ from one another.
How would you talk about these results in a Discussion section?
Two-Way ANOVA - 9
Making Sure That You Understand df
Suppose that there were more than five participants per condition, how would the df change? Answer
the questions below to ensure that you understand the computation of df for a two-factor ANOVA.
rdConditions stay the same, but n = 10 Introduce a 3 Situation (e.g., party), with n = 15
Source df Source df
Situation x Discipline Situation x
ththrdIntroduce a 4 Discipline, with n = 10 Introduce a 4 Disc & 3 Situation, with n = 20
Source df Source df
Situation x Situation x Discipline
Two-Factor Independent Groups Designs:
A researcher was interested in the impact of a particular drug (Smart-O) on rats’ performance in a maze. She
decided to run an independent groups design, comparing Smart-O with a placebo. She also thought that the type of maze (simple vs. complex) might have an impact, so she introduced this second factor into the design — producing a 2x2 independent groups design. Her budget was pretty flush, so she decided to run 25 rats in each condition. She chose to use the number of errors the rats made (going down blind alleys) as her dependent variable. On completion of the study, she ran an analysis of the data, but absent-mindedly left her output where the rats could get to it and they nibbled away parts of the source table. Generate the missing parts of the table.
SS df MS F Source
Drug (Drug vs. Placebo) 10
Maze (Simple vs. Complex) 20
Drug x Maze
Error (Within) 192
Two-Way ANOVA - 10