How to Design and Report Likert Scale
First How to Design one:
(See the many different types of Likert Scale)
; http://www.socialresearchmethods.net/kb/scallik.htm (a mini job-aid with good
Likert scale: A Likert scale (pronounced 'lick-ert') is a type of psychometric response
scale often used in questionnaires, and is the most widely used scale in survey research. When responding to a Likert questionnaire item, respondents specify their level of agreement to a statement. The scale is named after Rensis Likert, who published a report
describing its use (Likert, 1932).
Sample Question presented using a five-point Likert Scale
A typical test item in a Likert scale is a statement, the respondent is asked to indicate their degree of agreement with the statement. Traditionally a five-point scale is used, however many psychometricians advocate using a seven or nine point scale. Ice cream is good for breakfast
1. Strongly disagree
3. Neither agree nor disagree
5. Strongly agree
Likert scaling is a bipolar scaling method, measuring either positive and negative
response to a statement. Sometimes Likert scales are used in a forced choice method
where the middle option of "Neither agree nor disagree" is not available. Likert scales
may be subject to distortion from several causes. Respondents may avoid using
extreme response categories (central tendency bias); agree with statements as
presented (acquiescence response bias); or try to portray themselves or their group in
a more favorable light (social desirability bias).
Scoring and analysis: http://www.answers.com/topic/likert-scale After the questionnaire is completed, each item may be analyzed separately or item responses may be summed to create a score for a group of items. Hence, Likert scales are often called summative scales.
Responses to a single Likert item are normally treated as ordinal data, because, especially
when using only five levels, one cannot assume that respondents perceive the difference between adjacent levels as equidistant. When treated as ordinal data, Likert
responses can be analyzed using non-parametric tests, such as the Mann-Whitney test, the Wilcoxon signed-rank test, and the Kruskal-Wallis test.
When responses to several Likert items are summed, they may be treated as interval data measuring a latent
variable. If the summed responses are normally distributed, parametric statistical tests such as the analysis
of variance can be applied.
; Attitudes toward Computer (20 questions, but Each participant gets one score,
; Some of the personality test (same way)
Data from Likert scales are sometimes reduced to the nominal level by combining all agree and disagree responses into two categories of "accept" and "reject". The Cochran Q, or McNemar-Test are common statistical procedures used after this transformation.
Example of a Likert Scale (ordinal) Survey and Data Analysis
Data set: the one posted on Course Website: Cultural differences in online Learning
See the Survey at: http://surveymonkey.com/s.asp?u=61201883523 (Question 4—
Students’ Perceptions on Teachers and Teaching in General)
Mini-Data Analysis for two Class Periods
; Set the data correctly (Data, Variable)
; Analyze Data_Practice_continuous (Descriptive Analysis, first
time, leave “factor” unchecked; second time, check it, compare the
; Look at the results and see what conclusions can you draw?
; Analyze Data_Practice_Ordinal (Inferential statistics)
Summary of Data: (Descriptive analysis)—generated by SurveyMonkey
. Following are several questions about your perceptions on or expectations about teachers and teaching in general; please click on the button to indicate your choice:
StronglResponsStronglAgreUndecideDisagrey e y agree e d e disagreAverage e
I typically consider my 34% 58% 7% (9) 1% (2) 0% (0) 1.76 teachers to have wisdom. (45) (78)
I usually have a great deal of 29% 58% 9% (12) 4% (5) 0% (0) 1.87 (39) respect for my teachers. (78)
I feel me and my teachers 23% 40% 9% (12) 26% (35) 2% (3) 2.69 are essentially equals. (30) (53)
I think there should be
express rules of conduct in 27% 50% 17% (23) 6% (8) 0% (0) 2.02 every class which all (36) (66)
students should follow.
I expect my teachers to be 43% 45% recognized experts in the 8% (11) 3% (4) 1% (1) 1.71 (57) (60) field which they teach.
I am more comfortable when
my teacher conducts class in 30% 35% 7% (9) 23% (31) 5% (7) 3.02 a formal manner rather than (40) (47)
Total Respondents 134
(filtered out) 3
(skipped this question) 1
Note: however, this summary does not give us Mean and SD, still I needed to
analyze the raw data.
Data: (Part of the Raw Data-SurveyMonkey will give you a numerical version)
Open-Ended Q1WisdQ2RespQ3EquQ4rulescondQ5expeQ6formalmanResponse om ect al uct rts ner Chinese 2 1 2 2 1 5 Chinese 2 2 2 1 1 4 Chinese 2 2 2 2 2 2 Chinese 1 1 2 1 1 2 Chinese 1 1 2 2 1 2 Chinese 3 1 2 2 4 4 Chinese 1 1 1 1 2 4
American 2 2 4 2 1 3 American 1 1 3 1 2 2 American 2 4 4 2 3 4 American 1 1 4 1 1 1 american 2 2 3 1 1 3 American 1 1 2 2 1 4 American 2 2 3 2 2 2 american 2 2 2 2 3 5 American 2 2 4 1 2 1 American 2 2 1 4 1 4
Descriptive Analysis: (from 1 strongly agree to 5 strongly disagree)
n Mean SD
Q1Wisdom 128 1.750 0.6148
Q2Respect 128 1.875 0.6987
Q3Equal 127 2.669 0.9682
Q4rulesconduct 127 2.031 0.8351
Q5experts 127 1.701 0.7592
Q6formalmanner 128 3.047 1.0338
Note: Likert Scale has to be set as Continuous Data in order for Analyse-it to run descriptive Statistics; not
accurate but Acceptable.
The rigorous analysis is to get a Weighted mean, which Analyse-it does not do.
Often times, researchers go right into Inferential Statistics and Skip the Descriptive Statistics, since it is less informative.
[End of Descriptive Analysis]
Inferential Analysis on the differences among the three groups (American, Chinese, Korean) For later discussion
1) Participants’ perceptions on teacher and teaching in general (pre-survey): Item 4 on the pre-
survey assessed participants’ perceptions and expectations on teacher and teaching in general. The three questions that are closely related to sense of Power Distance were analyzed inferentially with the Kruskal-Wallis Analysis of Variance test, with cultural identity being the independent variable. The results indicate that:
a) there were significant differences in participants’ perceptions about being equal with their instructor. The Korean group had the highest mean rank (45.53) on a scale of 1 (strongly agree) to 5 (strongly disagree). By contrast, the Anglo-American group had the lowest mean rank (29.77) and therefore perceived their instructors more as equals.
b) There was no significant difference in participants’ perceptions about rules of conduct in
online classes. The Chinese group had the lowest mean rank (29.36), an indication of a stronger agreement about implementing specific rules of conduct. This result aligned with some of their narrative comments about “feeling lost” and hoping for more guidance.
And c) There was highly significant differences in their perceptions on course conduct. Again the Chinese had the lowest mean rank, an indication of a stronger agreement about conducting courses in a formal manner.
2) Post-survey: approaching superior and peer when completing individual assignments and team work: Other Responses to the post survey that reflect the impact of Power Distance include: a) Learners’ comfort level in approaching the instructor/facilitator/TA for help with individual
assignments and/or teamwork; and b) Their comfort level in approaching the peers for help with individual assignments and/or teamwork. Participants rated their comfort level from very comfortable (1), to somewhat comfortable (2), uncomfortable (3), and to very uncomfortable (4). The lower their mean rating, the higher their comfort level. Kruskal-Wallis Analysis of Variance was used again to compare the mean differences in participants’ ranking of comfort level in approaching “superior” or their peers, when completing individual assignments and team work if applicable.
[Note: Because the regular Mean of Likert Scale does not make much sense, I skipped the descriptive Analysis and Went right into Inferential Analysis—Analysis of Variance using
the Non-Parametric Kruskal-Wallis statistic]
Item 1. Individual Assignment: Approaching Superior for Help (two-tailed test)
O. IA: Approach "Superior" n sum rank
American 31 950.0 30.65
Chinese 15 682.5 45.50
Korean 29 1217.5 41.98
Kruskal-Wallis statistic 7.15
p 0.0280 corrected for ties)
When the level of significance is set at 0.05 (a), the small p value (0.02) indicates significant difference in participants’ rating for approaching “superior” in individual assignment. The American group, not surprisingly, had the lowest mean rank (30.65), an indication of greater comfort level in approaching the instructors for help; and the Chinese group had the highest mean rank (45.50) and thus lower comfort level in approaching their instructors.
Item 2. Individual Assignment: Approaching Peer for Help (two-tailed test)
cases excluded: 2 due to
n 73 missing values)
P. IA: Approach Peer by Group n Rank sum Mean rank
American 31 911.5 29.40
Chinese 15 372.5 24.83
Korean 27 1417.0 52.48
Kruskal-Wallis statistic 26.46
p <0.0001 corrected for ties)
When a=0.05, the small p value (<0.0001) indicates highly significant differences in participants’ comfort level in approaching peers for help with individual assignments. The Chinese group had the lowest mean rank (24.83--higher comfort level), while the Korean group had the lowest mean rank (52.48--lower comfort level).
Item 3. Teamwork: Approaching Superior for Help
cases excluded: 17 due to missing
n 58 values)
U. Team: Approach "Superior" by
Group n Rank sum Mean rank
American 31 814.5 26.27
Chinese 15 509.0 33.93
Korean 12 387.5 32.29
Kruskal-Wallis statistic 2.88
chisqr approximation, corrected for
p 0.2364 ties)
P=0.236 (>a=0.05) indicates no significant difference in participants’ comfortableness in approaching superiors for help when completing teamwork.
Item 4. Teamwork: Approaching Peer for Help (two-tailed test)
cases excluded: 17 due to missing
n 58 values)
V. Team: Approach Peer by
Group n Rank sum Mean rank
American 31 806.5 26.02
Chinese 15 319.5 21.30
Korean 12 585.0 48.75
Kruskal-Wallis statistic 24.80
chisqr approximation, corrected for
p <0.0001 ties)
The high Kruskal-Wallis statistic (24.8) and the small p value (<0.0001) again indicates highly significant difference in participants’ comfort level in approaching peers for help with team work. The Korean group (mean rank = 48.75) contributed greatly to this difference. However, the statistical power might have been reduced in this test because of the 17 missing rating values from the Korean group. As mentioned in the curriculum analysis, many of the Korean courses did not involve team work and many chose “non applicable” for this survey question.
Summary: Influence of power distance evidenced by the four tests: Conforming to the existing
findings about Power Distance, the American group (mainly Anglo-American) had the lowest PDI score, while the Chinese group had the highest PDI score. Possibly because of their sense of PDI, the American group felt the most comfortable in approaching their instructors for help, while the Korean group felt most uncomfortable in doing so. Chinese students, because of their large class size, did not have much opportunity to interact with the instructors. Still, their reported comfort level in approaching the instructors was low. As to approaching their peers for help, the Chinese group felt the most comfortable in completing both individual assignments and team work, the American group felt comfortable, while the Korean group felt the least comfortable in completing both individual assignments and teamwork. Again, the Koreans’ cultural perceptions
on CMC might have influenced their ratings here. As some of the Korean participants commented, peers or classmates online can be “strangers.” As to the high comfort level of the Chinese, it is worth noting that most of these Chinese students worked in self-formed teams and they therefore were comfortable about approaching their peers for help.
The four Kruskal-Wallis analyses on the post-survey items had revealing results. Although there was no significant difference in the three groups’ comfort level in approaching superiors for help
with team work, there were significant differences in their rating for approaching superiors in individual assignments, and there were highly significant differences in their levels of comfort in approaching peers for help with individual assignments and with team work. Power Distance indeed affected students’ ways in approaching instructors and their peers. By contrast, individuals were able to overcome their sense of Power Distance when working as a group. In other words, individuals became “braver” when working as a team to approach their instructors for help.
(From Wang’s Cultural Studies of Online Learning, British Journal of Educational
More Likert Scale Examples:
Defining the Focus. As in all scaling methods, the first step is to define what it is you are trying to measure. Because this is a unidimensional scaling method, it is assumed that the concept you want to measure is one-dimensional in nature. You might operationalize the definition as an instruction to the people who are going to create or generate the initial set of candidate items for your scale. Generating the Items. next, you have to create the set of potential scale items.
These should be items that can be rated on a 1-to-5 or 1-to-7 Disagree-Agree response scale. Sometimes you can create the items by yourself based on your intimate understanding of the subject matter. But, more often than not, it's helpful to engage a number of people in the item creation step. For instance, you might use some form of brainstorming to create the items. It's desirable to have as large a set of potential items as possible at this stage, about 80-100 would be best.
Rating the Items. The next step is to have a group of judges rate the items.
Usually you would use a 1-to-5 rating scale where:
1. = strongly unfavorable to the concept
2. = somewhat unfavorable to the concept
3. = undecided
4. = somewhat favorable to the concept
5. = strongly favorable to the concept
Administering the Scale. You're now ready to use your Likert scale. Each
respondent is asked to rate each item on some response scale. For instance, they could rate each item on a 1-to-5 response scale where:
1. = strongly disagree
2. = disagree
3. = undecided
4. = agree
5. = strongly agree
There are a variety possible response scales (1-to-7, 1-to-9, 0-to-4). All of these odd-numbered scales have a middle value is often labeled Neutral or Undecided. It is also possible to use a forced-choice response scale with an even number of responses and no middle neutral or undecided choice. In this situation, the respondent is forced to decide whether they lean more towards the agree or disagree end of the scale for each item.
The final score for the respondent on the scale is the sum of their ratings for all of the items (this is why this is sometimes called a "summated" scale). On some scales, you will have items that are reversed in meaning from the overall direction of the scale. These are called reversal items. You will need to reverse the response value for each of these items before summing for the total. That is, if the respondent gave a 1, you make it a 5; if they gave a 2 you make it a 4; 3 = 3; 4 = 2; and, 5 = 1.
Example: The Employment Self Esteem Scale
Here's an example of a ten-item Likert Scale that attempts to estimate the level of self esteem a person has on the job. Notice that this instrument has no center or neutral point -- the respondent has to declare whether he/she is in agreement or disagreement with the item.
INSTRUCTIONS: Please rate how strongly you agree or disagree with each of the following statements by placing a check mark in the appropriate box.
1. I feel good about my work on the Strongly Somewhat Somewhat Strongly Agree job. Disagree Disagree Agree
2. On the whole, I get along well with Strongly Somewhat Somewhat Strongly Agree others at work. Disagree Disagree Agree
3. I am proud of my ability to cope
Strongly Somewhat Somewhat Strongly Agree with difficulties at work. Disagree Disagree Agree
4. When I feel uncomfortable at work, Strongly Somewhat Somewhat Strongly Agree I know how to handle it. Disagree Disagree Agree
5. I can tell that other people at work Strongly Somewhat Somewhat Strongly Agree are glad to have me there. Disagree Disagree Agree
6. I know I'll be able to cope with work Strongly Somewhat Somewhat Strongly Agree for as long as I want. Disagree Disagree Agree
7. I am proud of my relationship with Strongly Somewhat Somewhat Strongly Agree my supervisor at work. Disagree Disagree Agree
8. I am confident that I can handle my Strongly Somewhat Somewhat Strongly Agree job without constant assistance. Disagree Disagree Agree
9. I feel like I make a useful Strongly Somewhat Somewhat Strongly Agree contribution at work. Disagree Disagree Agree
10. I can tell that my coworkers Strongly Somewhat Somewhat Strongly Agree respect me. Disagree Disagree Agree
Usability Glossary: Likert scale
a type of survey question where respondents are asked to rate the level at which they agree or disagree with a given statement. For example:
I find this software easy to use.
strongly disagree 1 2 3 4 5 6 7 strongly agree
A Likert scale is used to measure attitudes, preferences, and subjective reactions. In
software evaluation, we can often objectively measure efficiency and effectiveness with performance metrics such as time taken or errors made. Likert scales and other attitudinal scales help get at the emotional and preferential responses people have to the
design. Is it attractive, fun, professional, easy?
Producing Means and Standard Deviations:
The DESCRIPTIVES procedure in SPSS produces means and standard deviations for variables. It also prints the minimum and maximum value. Likert scale questions are
appropriate to print means for since the number that is coded can give us a feel for which direction the average answer is. The standard deviation is also important as it
give us an indication of the average distance from the mean. A low standard deviation
would mean that most observations cluster around the mean. A high standard deviation would mean that there was a lot of variation in the answers. A standard deviation of 0 is
obtained when all responses to a question are the same. The following code produces descriptive statistics of columns 1 to 20. The minimum and maximum value tell us the range of answers given by our survey population.
variables = q1 to q20
Variable Mean Std Dev Minimum Maximum N Label
Q1 4.65 .66 2 5 80 question 1 Q2 4.59 .66 2 5 85 question 2 Q3 4.36 .75 2 5 90 question 3 Q4 4.72 .51 3 5 74 question 4 Q5 3.89 1.11 1 5 92 question 5 Q6 3.26 1.45 1 5 101 question 6 Q7 3.92 1.14 1 5 88 question 7 Q8 4.26 .90 1 5 94 question 8 Q9 4.32 .88 2 5 90 question 9 Q10 4.45 .86 2 5 75 question 10 Q11 3.86 1.45 1 5 95 question 11 Q12 3.71 1.26 1 5 110 question 12 Q13 4.62 .71 2 5 90 question 13 Q14 4.37 .85 2 5 97 question 14 Q15 3.08 1.39 1 5 109 question 15 Q16 4.45 .89 1 5 91 question 16 Q17 4.56 .81 1 5 79 question 17 Q18 2.68 1.34 1 5 116 question 18 Q19 4.54 .74 2 5 90 question 19 Q20 4.39 .76 2 5 96 question 20