library(tidyverse)
= read_csv('https://tinyurl.com/R-practice-data') d
Scale construction
How to create and validate scales
What is a scale?
A scale is a composite measure that combines multiple items into a single score. For example, in order to measure the complex construct “happiness”, you could ask people multiple questions related to happiness, such as “How often do you feel happy?”, “How satisfied are you with your life?”, and “How often do you feel joy?”.
The idea is that by combining multiple items into a single score, you can get a more reliable and valid measure of the underlying construct. If we would just ask people the single question “How happy are you?”, we might not get a very accurate measure of their happiness, because happiness is a complex and multi-faceted construct. By breaking it down into a multi-item scale, we can get a more nuanced and accurate measure.
If you’ve ever taken a survey, you’ve probably encountered scales before. For example, you might have been asked to rate your agreement with a series of statements on a scale from 1 to 5, where 1 means “strongly disagree” and 5 means “strongly agree”. This is also called a Likert scale, and it’s a common way to gather data on multiple items, with the goal of combining them into a single score for a complex construct.
You will also hear scales referred to as latent variables. The word latent means hidden or concealed, and it refers here to the fact that the construct we are trying to measure is not directly observable. We can only measure it properly by looking at multiple observable indicators (items) that are related to the construct.
How to create a scale
To create a scale, we combine multiple columns in your data (i.e. the variables for the scale items) into a single score. For instance, by taking the average of the values in these columns. However, before we can do that, we need to make sure that the scale is reliable and valid. This requires a few steps:
1. Choose the items based on theory
First, you need to think carefully about which items to include in your scale, and this needs to be grounded in theory. There might also already be existing scales that you can use.
For example, in our practice data we have a construct called “trust in journalism”, which we measure with five items, based on the items proposed by Strömbäck et al. (2020). Participants were asked to rate their agreement with the following items on a scale from 1 to 10:
- Journalists are fair when covering the news
- Journalists are unbiased when covering the news
- Journalists do not tell the whole story when covering the news
- Journalists are accurate when covering the news
- Journalists separate facts from opinion when covering the news
Note that item 3 is inversed, so that higher values indicate lower trust. Keep this in mind, because to create the scale we
2. Inspect your data
Once you have collected your data, always check whether everything is in order. In the Inspect and clean chapter we looked at how to do this. Here we just use the dfSummary
function from the summarytools
package to get a quick overview of our data.
First though, let’s load our data and select the columns that we’re interested in.
In the practice data we have two scales: trust_t1
and trust_t2
, with each having five items (trust_t1_item1
to trust_t1_item5
and trust_t2_item1
to trust_t2_item5
). For this tutorial we’ll just focus on trust_t1
. The following code selects the five items for trust_t1
, and then uses the dfSummary
function to get a summary of these columns.
library(summarytools)
|>
d select(trust_t1_item1:trust_t1_item5) |>
dfSummary() |>
view()
This looks good. There are no missing values, all values are within the expected range (1-10), and the distributions look reasonable.
3. Look at the correlations
The idea behind a scale is that the items are related to each other, because they all measure the same underlying construct. A good way to check this is by looking at the correlations between the items.
For this we’ll be using the sjPlot
package, which has a function tab_corr
(tabulate correlations) that creates a nice table with the correlations between all columns in a data frame. We again use this on the five items for trust_t1
. In tab_corr
we also set p.numeric=TRUE
to show the p-values for the correlations as numbers (instead of stars).
library(sjPlot)
|>
d select(trust_t1_item1:trust_t1_item5) |>
tab_corr(p.numeric=TRUE)
trust_t1_item1 | trust_t1_item2 | trust_t1_item3 | trust_t1_item4 | trust_t1_item5 | |
trust_t1_item1 | 0.335 (<.001) |
-0.707 (<.001) |
0.715 (<.001) |
0.801 (<.001) |
|
trust_t1_item2 | 0.335 (<.001) |
-0.301 (<.001) |
0.288 (<.001) |
0.321 (<.001) |
|
trust_t1_item3 | -0.707 (<.001) |
-0.301 (<.001) |
-0.633 (<.001) |
-0.756 (<.001) |
|
trust_t1_item4 | 0.715 (<.001) |
0.288 (<.001) |
-0.633 (<.001) |
0.744 (<.001) |
|
trust_t1_item5 | 0.801 (<.001) |
0.321 (<.001) |
-0.756 (<.001) |
0.744 (<.001) |
|
Computed correlation used pearson-method with listwise-deletion. |
Here we see that the correlations between the items is mostly quite strong, and all significant at the 0.001 level. The only notable exception in terms of strength is that the correlations of item2
to the other items is much lower. This suggests that our items indeed measure are common underlying construct, but item2
(about bias) might measure a somewhat different aspect of trust in journalism.
One very important thing to notice is that the correlation of trust_t1_item3
with the other items is negative! So when the score on trust_t1_item3
goes up, the scores on the other items tend to go down. This makes complete sense if we remember that trust_t1_item3
is inversed, so that higher values indicate lower trust.
Another common way to check whether the items in your scale are related is by using factor analysis. This is a statistical technique that can help you identify the underlying factors that explain the correlations between the items. We’ll cover factor analysis in a later tutorial.
4. Invert items if necessary
In the correlation analysis we saw that the third item (trust_t1_item3
) is negatively correlated with the other items. This is all according to plan, since we inversed the scale for this item. But to create a single construct, we need to make sure that all items have the same directionality. So we need to invert the values for trust_t1_item3
.
Since we have a scale from 1 to 10, we can inverse the value by subtracking it from 11
(11 - 1 = 10, 11 - 2 = 9, …, 11 - 10 = 1).
<- d |>
d mutate(trust_t1_item3_inv = 11 - trust_t1_item3)
Notice that we do not overwrite the original column, but create a new column trust_t1_item3_inv
(inversed). Overwriting the original column is possible, but dangerous and not transparent. Creating a new column prevents you from accidentally running the inversion multiple times, and messing up your analysis.
Since we had a scale from 1 to 10, we could invert the values by subtracting from 11. Similarly, if you have a scale from 1 to 7, you could invert the values by subtracting from 8. So for any scale starting from 1, the formula you can use is:
\[ \text{new value} = \text{max value} + 1 - \text{old value} \]
However, if your scale does not start from 1, this doesn’t work! (try it out for a scale from 0 to 10, and you’ll see why). The more general formula therefore is:
\[ \text{new value} = \text{max value} + \text{min value} - \text{old value} \]
Note that in this case you need to use the minimum and maximum possible values of your scale, and NOT the actual minimum and maximum values in your data! So if your scale goes from 1 to 7, you would use 1 and 7 in the formula, even if the minimum and maximum values in your data are 1.5 and 6.5.
5. Calculate the reliability
The reliability of a scale is a measure of how consistent the items in the scale are. There are different ways to calculate reliability, but one of the most common is Cronbach’s alpha.
Cronbach’s alpha ranges from 0 to 1, where higher values indicate higher reliability. A common rule of thumb is that a value of 0.7 or higher is acceptable, but this can vary depending on the context. As with any rule of thumb, don’t blindly follow it, but think about what makes sense in your specific situation.
To calculate Cronbach’s alpha, we can use the psych
package, which has a function alpha
that calculates the alpha for the columns on an input data frame. So we’ll do the same thing as above, but note that this time our select statement looks different, because we need to include the inversed item.
library(psych)
|>
d select(trust_t1_item1,
trust_t1_item2,
trust_t1_item3_inv,
trust_t1_item4, |>
trust_t1_item5) alpha()
Reliability analysis
Call: alpha(x = select(d, trust_t1_item1, trust_t1_item2, trust_t1_item3_inv,
trust_t1_item4, trust_t1_item5))
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.85 0.86 0.86 0.56 6.4 0.01 3.9 0.97 0.67
95% confidence boundaries
lower alpha upper
Feldt 0.83 0.85 0.87
Duhachek 0.83 0.85 0.87
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
trust_t1_item1 0.79 0.80 0.80 0.51 4.1 0.0151 0.0518
trust_t1_item2 0.91 0.91 0.89 0.73 10.6 0.0059 0.0032
trust_t1_item3_inv 0.80 0.82 0.82 0.53 4.6 0.0146 0.0587
trust_t1_item4 0.80 0.82 0.82 0.54 4.6 0.0140 0.0579
trust_t1_item5 0.78 0.80 0.78 0.50 3.9 0.0157 0.0437
med.r
trust_t1_item1 0.48
trust_t1_item2 0.73
trust_t1_item3_inv 0.52
trust_t1_item4 0.52
trust_t1_item5 0.48
Item statistics
n raw.r std.r r.cor r.drop mean sd
trust_t1_item1 600 0.87 0.88 0.87 0.79 4.2 1.1
trust_t1_item2 600 0.60 0.56 0.37 0.35 2.7 1.5
trust_t1_item3_inv 600 0.84 0.84 0.80 0.73 4.4 1.3
trust_t1_item4 600 0.82 0.84 0.80 0.72 3.7 1.1
trust_t1_item5 600 0.89 0.90 0.90 0.81 4.6 1.2
Non missing response frequency for each item
1 2 3 4 5 6 7 8 miss
trust_t1_item1 0.00 0.05 0.22 0.33 0.30 0.07 0.02 0.00 0
trust_t1_item2 0.28 0.23 0.21 0.15 0.09 0.04 0.00 0.00 0
trust_t1_item3_inv 0.01 0.04 0.18 0.29 0.28 0.13 0.06 0.01 0
trust_t1_item4 0.01 0.11 0.31 0.34 0.19 0.04 0.01 0.00 0
trust_t1_item5 0.00 0.03 0.14 0.29 0.31 0.18 0.04 0.01 0
By default, alpha
only shows two digits for Cronbach’s alpha. If you want to see more digits, you can use the print
function with the digits
argument.
|>
d select(trust_t1_item1, trust_t1_item2, trust_t1_item3_inv, trust_t1_item4, trust_t1_item5) |>
alpha() |>
print(digits=3)
Reliability analysis
Call: alpha(x = select(d, trust_t1_item1, trust_t1_item2, trust_t1_item3_inv,
trust_t1_item4, trust_t1_item5))
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
0.848 0.864 0.862 0.56 6.37 0.0104 3.93 0.972 0.67
95% confidence boundaries
lower alpha upper
Feldt 0.828 0.848 0.867
Duhachek 0.828 0.848 0.869
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r
trust_t1_item1 0.786 0.805 0.798 0.507 4.12 0.01509 0.0518
trust_t1_item2 0.912 0.914 0.893 0.726 10.60 0.00586 0.0032
trust_t1_item3_inv 0.797 0.821 0.817 0.534 4.59 0.01461 0.0587
trust_t1_item4 0.804 0.823 0.819 0.537 4.64 0.01400 0.0579
trust_t1_item5 0.776 0.798 0.782 0.497 3.95 0.01572 0.0437
med.r
trust_t1_item1 0.477
trust_t1_item2 0.729
trust_t1_item3_inv 0.525
trust_t1_item4 0.521
trust_t1_item5 0.484
Item statistics
n raw.r std.r r.cor r.drop mean sd
trust_t1_item1 600 0.870 0.884 0.870 0.793 4.17 1.10
trust_t1_item2 600 0.599 0.558 0.367 0.349 2.66 1.46
trust_t1_item3_inv 600 0.842 0.844 0.803 0.729 4.45 1.29
trust_t1_item4 600 0.820 0.840 0.797 0.723 3.73 1.08
trust_t1_item5 600 0.888 0.900 0.902 0.814 4.65 1.19
Non missing response frequency for each item
1 2 3 4 5 6 7 8 miss
trust_t1_item1 0.003 0.053 0.220 0.333 0.300 0.070 0.018 0.002 0
trust_t1_item2 0.282 0.232 0.207 0.153 0.087 0.035 0.005 0.000 0
trust_t1_item3_inv 0.010 0.042 0.180 0.293 0.283 0.127 0.057 0.008 0
trust_t1_item4 0.010 0.107 0.310 0.338 0.190 0.038 0.007 0.000 0
trust_t1_item5 0.000 0.025 0.142 0.292 0.310 0.178 0.042 0.012 0
Note that this way of setting the nr of digits is specific to the psych
package.
This gives quite a lot of output. These are the most important parts to consider:
- Cronbach’s alpha: At the top we have a row that says
raw_alpha
,std.alpha
, etc. Here we are just interested in theraw_alpha
, which is the value of Cronbach’s alpha. In this case it’s 0.85, which is already very good. - Reliability if an item is dropped: This part shows you what would happen to the
raw_alpha
(and the other reliability measures) if you would drop one of the items from the scale. In our data we see that ifitem2
would be dropped, theraw_alpha
would go up to 0.91 (from 0.85). In other words, if we would use a 4-item scale withitem2
dropped, the scale would be more reliable.
- Item statistics: This part shows you some usefull statistics about the items, like the mean and standard deviation (sd). More importantly, it also shows several scores for
item-total correlation
(ther
inraw.r
,std.r
,r.cor
andr.drop
stands for correlation). This indicate how strongly the item is correlated to the total scale (i.e. the combination of the other items). The recommended correlation measure to look at is ther.cor
(correlation corrected for item overlap). In our data we see thatitem5
has the strongest correlation with the total scale, whereasitem2
has the weakets. Notice how this is in line with theReliability if an item is dropped
part: if we would dropitem2
, the scale would become more reliable. (Think about why that makes sense!)
6. Remove items if necessary
If the reliability of your scale is too low, you might want to consider removing some items (if you have enough items to spare). Above we saw that if we would drop item2
, the reliability of the scale would go up to 0.91. You can verify that this is indeed what happens:
|>
d select(trust_t1_item1,
trust_t1_item3_inv,
trust_t1_item4, |>
trust_t1_item5) alpha()
You can verify that the raw_alpha
is now 0.91. So now we can choose between either using the 4-item scale with an alpha of 0.91, or keeping the 5-item scale with an alpha of 0.85. This is a judgement call, and depends on the context of your research. Generally speaking, if you have sufficient items left and the improvement in the alpha is not very small, it’s a good idea to remove items to increase the reliability.
If you decide to remove an item, you should test the scale again, because it is possible that you could further improve the reliability by removing another item. In the output of alpha()
for our four item scale we see that this is not the case.
7. Calculate the scale score
Finally, once you have a reliable scale, you can calculate the scale score. This is usually done by taking the average of the items in the scale. The simplest way to do so is to just add up the items and divide by the number of items. Let’s do this for the 4-item scale (\(\alpha = 0.91\)) that we tested above (mind the parentheses!):
<- d |>
d mutate(trust_t1_scale = (trust_t1_item1 + trust_t1_item3_inv + trust_t1_item4 + trust_t1_item5) / 4)
So now we have a new column trust_t1_scale
that contains the scale score.
Remember that in the practice data we already had a column trust_t1
for the scale. This is how that column was created. You can verify this by correlating the new scale with the old one. If you did everything correctly, the correlation should be 1.
cor.test(d$trust_t1, d$trust_t1_scale)