Appropriate Analysis and Presentation of Ordered Categorical Data
Physicians often use ordered categorical (ordinal)
scales to approximate an objective evaluation of outcome
variables for which precise measurements on continuous
scales are not available. One advantage of assigning
numerical values is that the severity of conditions can
be ranked. However, these numerical values cannot be
analysed as continuous data, because the values assigned
in ordered categorical scales describe a ranking, not a
measurement.
Clinical outcomes measured with ordinal
scales are often presented and analysed with inappropriate
statistical methods in the medical literature. In the fields of
anaesthesia, rheumatology, and nursing, several studies
indicated that ordinal data were presented appropriately
in only 39–49% and analysed appropriately in 57–63% of
journal articles [1~3]. The most common error was the
presentation of a mean value for ordinal data. The use of
ANOVA to analyse ordinal data was the most common error
for the analysis. Other problems included graphs of ordinal data in
which data points were connected by lines, and failure to report
the raw data required to reanalyse the data appropriately.
Ordinal data may be graded with scales incorporating
groups such as 'extremely satisfied', 'satisfied',
'neutral', 'unsatisfied', and 'extremely unsatisfied'.
The main limitation of these data is that the interval or
distance between the groups is unknown.
It is inappropriate
to calculate the 'mean' satisfaction in such a group.
Although the grading scores are mutually exclusive and
encompass all possible outcomes, they do not represent
equal spacing between adjacent ranks, as occurs in interval
or ratio data. Accordingly,
calculating the sum, product,
mean, or standard deviation of ordinal data is not appropriate
because these functions assume that there is equal spacing
between adjacent values
[6].
Ranking ordinal data into alphabetical categories (e.g. A
through E) makes these limitations more intuitive than
ranking the same data numerically (e.g. 1 though 5); adding
or multiplying letters together does not make sense.
Descriptive Statistics
Both a measure of "central tendency" and one of variation need to be given.
When data are ordinal and skewed, medians and interquartile ranges are
appropriate. The median represents the middle value of an ordered data set
(i.e. half of the values will be lower and half of the values
will be higher than the median) [7].
For example, assume a
satisfaction score with five possible values: 1–5, signifying
satisfied through dissatisfied. If 100 patients were surveyed
during a quality improvement project and
the median satisfaction score was before the project was 4 and
after the project was 2, it would mean that more than half of
the questionnaires would be scored
as a 1 or a 2, while the remainder would include scores of
2, 3, 4 and/or 5. It would not, however, be justified to state
that a score of 2 after the project represents a 50% reduction
in satisfaction score of 4 before the project, as this
would require multiplication or division of ordinal data.
Parametric statistical methods (e.g.
t-tests and
ANOVA) that are used to analyse interval or ratio data assume
normality of the data [6]. However, ordinal data do not follow
a normal (Gaussian) distribution and cannot be analysed
with these methods. Presentation and analysis of ordered
categorical data with methods that are inconsistent with the
structure of the data may lead to unjustified implications
and conclusions. The inappropriate use of the t-test
on simulated data sets led to a type I error rate
(false positive) confidence interval, indicating that the
t-test rejected the null hypothesis more often than it should
[7].
An increased type I error rate is cause for concern
because incorrect conclusions about treatments are made.
Researchers would too often conclude that two treatment
groups were significantly different when in fact there was
no difference.
| Category | Method |
| Analysis | Wilcoxon signed rank |
| Wilcoxon rank sum |
| Mann–Whitney U |
| Kruskal–Wallis |
| Spearman rank correlation |
| Kendall's rank correlation |
| Logistic regression |
| Cohen's kappa |
| Presentation | Median |
| Range or interquartile range |
| Percentage within each rank of a numerical rating scale |
| Two-group comparisons
[8]
|
(paired design)Wilcoxon signed rank test |
| (unpaired design) Wilcoxon rank sum test |
| (unpaired design) Mann-Whitney U-test [less usual] |
Single Attribute Correlation for Nonparametric Distributions
Linear regression assumes Normality and constant standard deviation of the
outcome variable for given values of the explanatory
variable. The Pearson correlation coefficient is based
on a Normal distribution of both variables and is
heavily influenced by outliers. Data should always be plotted first,
as only if the relation is at least
approximately linear is it sensible to use either linear
regression or Pearson's correlation. Nonparametric correlation coefficients,
Spearman's or Kendall's, should be
used when the assumptions are violated.
Importance
Satisfaction surveys contain a series of "attributes", which
are rating scales of a series of specific statements or question (e.g. courtesy, accuracy, timeliness).
Whether some attributes are more important than others can be assessed in
two ways:
- "stated importance": by asking customers how important an item is
- "derived importance": by calculating the relationship between attributes and overall satisfaction
Stated importance: Asking about importance adds unnecessary questionnaire length, because
concerning each attribute, not only you would need to ask "were you satisfied (on a scale of 1 to 5)?"
but also "how important is this attribute (on a scale of 1 to 5)?" doubling the THIS questionnaire from
approximately 50 attribute questions to more than 100. This leads to making the person filling in the
questionnaire irritable, or unwilling to complete the form. The results may lead to erroneous improvement
strategies because what customers say is important may not one of the drivers of whether they will be
satisfied.
Derived importance: uncovers items which are most important to the satisfaction of customers.
These attributes will not always be the same attributes that a customer would identify as being most important,
but they would be the ones which, if improved upon, will result in higher levels of
satisfaction. It is not difficult to calculate the relative importance of a series of attributes, provided that the
questionnaire also includes a satisfaction measure of some sort. The basic process is to conduct a
correlation analysis, eliminate attributes which have high correlation coefficients with each other
and are saying "much the same thing", and then to run a nonparametric regression analysis.
Quadrant Analysis: Importance vs Perception
Linear regression assumes Normality and constant standard deviation of the
outcome variable for given values of the explanatory
variable. The Pearson correlation coefficient is based
on a Normal distribution of both variables and is
heavily influenced by outliers. Data should always be plotted first,
as only if the relation is at least
approximately linear is it sensible to use either linear
regression or Pearson's correlation. Nonparametric correlation coefficients,
Spearman's or Kendall's, should be
used when the assumptions are violated.
Prioritization
The Priority Index is an ordered list of survey items that shows the areas needing the most improvement.
Survey items are arranged from the "first item to work on" to the "last item to work on". The index
reflects service issues that the hospital is performing relatively poorly on but that are important to the
patients. Survey items that have low average scores and high correlation scores will have high priority
index scores.
It is calculated as follows: questions are rank ordered according to their top-box score. Questions
with the lowest score are given the highest point value. The questions are then ranked again by their rank correlation
(Kendall's τ) to "overall satisfaction". Summing together the two ranks provides an overall position in the
priority index. Items with the highest totals point to the specific questions where there is the most room for
improvement and the most likely to have the greatest impact on overall satisfaction.
Multiple Attributes & Ordinal Logistic Regression
Linear regression assumes Normality and constant standard deviation of the
outcome variable for given values of the explanatory
variable. The Pearson correlation coefficient is based
on a Normal distribution of both variables and is
heavily influenced by outliers. Data should always be plotted first,
as only if the relation is at least
approximately linear is it sensible to use either linear
regression or Pearson's correlation. Nonparametric correlation coefficients,
Spearman's or Kendall's, should be
used when the assumptions are violated.