Scale construction

How to create and validate scales

What is a scale?

A scale is a composite measure that combines multiple items into a single score. For example, in order to measure the complex construct “happiness”, you could ask people multiple questions related to happiness, such as “How often do you feel happy?”, “How satisfied are you with your life?”, and “How often do you feel joy?”.

The idea is that by combining multiple items into a single score, you can get a more reliable and valid measure of the underlying construct. If we would just ask people the single question “How happy are you?”, we might not get a very accurate measure of their happiness, because happiness is a complex and multi-faceted construct. By breaking it down into a multi-item scale, we can get a more nuanced and accurate measure.

If you’ve ever taken a survey, you’ve probably encountered scales before. For example, you might have been asked to rate your agreement with a series of statements on a scale from 1 to 5, where 1 means “strongly disagree” and 5 means “strongly agree”. This is also called a Likert scale, and it’s a common way to gather data on multiple items, with the goal of combining them into a single score for a complex construct.

You will also hear scales referred to as latent variables. The word latent means hidden or concealed, and it refers here to the fact that the construct we are trying to measure is not directly observable. We can only measure it properly by looking at multiple observable indicators (items) that are related to the construct.

How to create a scale

To create a scale, we combine multiple columns in your data (i.e. the variables for the scale items) into a single score. For instance, by taking the average of the values in these columns. However, before we can do that, we need to make sure that the scale is reliable and valid. This requires a few steps:

1. Choose the items based on theory

First, you need to think carefully about which items to include in your scale, and this needs to be grounded in theory. There might also already be existing scales that you can use.

For example, in our practice data we have a construct called “trust in journalism”, which we measure with five items, based on the items proposed by Strömbäck et al. (2020). Participants were asked to rate their agreement with the following items on a scale from 1 to 10:

Journalists are fair when covering the news
Journalists are unbiased when covering the news
Journalists do not tell the whole story when covering the news
Journalists are accurate when covering the news
Journalists separate facts from opinion when covering the news

Note that item 3 is inversed, so that higher values indicate lower trust. Keep this in mind, because to create the scale we

2. Inspect your data

Once you have collected your data, always check whether everything is in order. In the Inspect and clean chapter we looked at how to do this. Here we just use the dfSummary function from the summarytools package to get a quick overview of our data.

First though, let’s load our data and select the columns that we’re interested in.

library(tidyverse)
d = read_csv('https://tinyurl.com/R-practice-data')

In the practice data we have two scales: trust_t1 and trust_t2, with each having five items (trust_t1_item1 to trust_t1_item5 and trust_t2_item1 to trust_t2_item5). For this tutorial we’ll just focus on trust_t1. The following code selects the five items for trust_t1, and then uses the dfSummary function to get a summary of these columns.

library(summarytools)

d |> 
    select(trust_t1_item1:trust_t1_item5) |> 
    dfSummary() |>
    view()

This looks good. There are no missing values, all values are within the expected range (1-10), and the distributions look reasonable.

3. Look at the correlations

The idea behind a scale is that the items are related to each other, because they all measure the same underlying construct. A good way to check this is by looking at the correlations between the items.

For this we’ll be using the sjPlot package, which has a function tab_corr (tabulate correlations) that creates a nice table with the correlations between all columns in a data frame. We again use this on the five items for trust_t1. In tab_corr we also set p.numeric=TRUE to show the p-values for the correlations as numbers (instead of stars).

library(sjPlot)
d |> 
    select(trust_t1_item1:trust_t1_item5) |> 
    tab_corr(p.numeric=TRUE)

	trust_t1_item1	trust_t1_item2	trust_t1_item3	trust_t1_item4	trust_t1_item5
trust_t1_item1		0.335 (<.001)	-0.707 (<.001)	0.715 (<.001)	0.801 (<.001)
trust_t1_item2	0.335 (<.001)		-0.301 (<.001)	0.288 (<.001)	0.321 (<.001)
trust_t1_item3	-0.707 (<.001)	-0.301 (<.001)		-0.633 (<.001)	-0.756 (<.001)
trust_t1_item4	0.715 (<.001)	0.288 (<.001)	-0.633 (<.001)		0.744 (<.001)
trust_t1_item5	0.801 (<.001)	0.321 (<.001)	-0.756 (<.001)	0.744 (<.001)
Computed correlation used pearson-method with listwise-deletion.

Here we see that the correlations between the items is mostly quite strong, and all significant at the 0.001 level. The only notable exception in terms of strength is that the correlations of item2 to the other items is much lower. This suggests that our items indeed measure are common underlying construct, but item2 (about bias) might measure a somewhat different aspect of trust in journalism.

One very important thing to notice is that the correlation of trust_t1_item3 with the other items is negative! So when the score on trust_t1_item3 goes up, the scores on the other items tend to go down. This makes complete sense if we remember that trust_t1_item3 is inversed, so that higher values indicate lower trust.

Factor analysis

Another common way to check whether the items in your scale are related is by using factor analysis. This is a statistical technique that can help you identify the underlying factors that explain the correlations between the items. We’ll cover factor analysis in a later tutorial.

4. Invert items if necessary

In the correlation analysis we saw that the third item (trust_t1_item3) is negatively correlated with the other items. This is all according to plan, since we inversed the scale for this item. But to create a single construct, we need to make sure that all items have the same directionality. So we need to invert the values for trust_t1_item3.

Since we have a scale from 1 to 10, we can inverse the value by subtracking it from 11 (11 - 1 = 10, 11 - 2 = 9, …, 11 - 10 = 1).

d <- d |> 
    mutate(trust_t1_item3_inv = 11 - trust_t1_item3)

Notice that we do not overwrite the original column, but create a new column trust_t1_item3_inv (inversed). Overwriting the original column is possible, but dangerous and not transparent. Creating a new column prevents you from accidentally running the inversion multiple times, and messing up your analysis.

General formula for inversing

Since we had a scale from 1 to 10, we could invert the values by subtracting from 11. Similarly, if you have a scale from 1 to 7, you could invert the values by subtracting from 8. So for any scale starting from 1, the formula you can use is:

\[ \text{new value} = \text{max value} + 1 - \text{old value} \]

However, if your scale does not start from 1, this doesn’t work! (try it out for a scale from 0 to 10, and you’ll see why). The more general formula therefore is:

\[ \text{new value} = \text{max value} + \text{min value} - \text{old value} \]

Note that in this case you need to use the minimum and maximum possible values of your scale, and NOT the actual minimum and maximum values in your data! So if your scale goes from 1 to 7, you would use 1 and 7 in the formula, even if the minimum and maximum values in your data are 1.5 and 6.5.

5. Calculate the reliability

The reliability of a scale is a measure of how consistent the items in the scale are. There are different ways to calculate reliability, but one of the most common is Cronbach’s alpha.

Cronbach’s alpha ranges from 0 to 1, where higher values indicate higher reliability. A common rule of thumb is that a value of 0.7 or higher is acceptable, but this can vary depending on the context. As with any rule of thumb, don’t blindly follow it, but think about what makes sense in your specific situation.

To calculate Cronbach’s alpha, we can use the psych package, which has a function alpha that calculates the alpha for the columns on an input data frame. So we’ll do the same thing as above, but note that this time our select statement looks different, because we need to include the inversed item.

library(psych)

d |> 
    select(trust_t1_item1,  
           trust_t1_item2, 
           trust_t1_item3_inv, 
           trust_t1_item4, 
           trust_t1_item5) |> 
    alpha()


Reliability analysis   
Call: alpha(x = select(d, trust_t1_item1, trust_t1_item2, trust_t1_item3_inv, 
    trust_t1_item4, trust_t1_item5))

  raw_alpha std.alpha G6(smc) average_r S/N  ase mean   sd median_r
      0.85      0.86    0.86      0.56 6.4 0.01  3.9 0.97     0.67

    95% confidence boundaries 
         lower alpha upper
Feldt     0.83  0.85  0.87
Duhachek  0.83  0.85  0.87

 Reliability if an item is dropped:
                   raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r
trust_t1_item1          0.79      0.80    0.80      0.51  4.1   0.0151 0.0518
trust_t1_item2          0.91      0.91    0.89      0.73 10.6   0.0059 0.0032
trust_t1_item3_inv      0.80      0.82    0.82      0.53  4.6   0.0146 0.0587
trust_t1_item4          0.80      0.82    0.82      0.54  4.6   0.0140 0.0579
trust_t1_item5          0.78      0.80    0.78      0.50  3.9   0.0157 0.0437
                   med.r
trust_t1_item1      0.48
trust_t1_item2      0.73
trust_t1_item3_inv  0.52
trust_t1_item4      0.52
trust_t1_item5      0.48

 Item statistics 
                     n raw.r std.r r.cor r.drop mean  sd
trust_t1_item1     600  0.87  0.88  0.87   0.79  4.2 1.1
trust_t1_item2     600  0.60  0.56  0.37   0.35  2.7 1.5
trust_t1_item3_inv 600  0.84  0.84  0.80   0.73  4.4 1.3
trust_t1_item4     600  0.82  0.84  0.80   0.72  3.7 1.1
trust_t1_item5     600  0.89  0.90  0.90   0.81  4.6 1.2

Non missing response frequency for each item
                      1    2    3    4    5    6    7    8 miss
trust_t1_item1     0.00 0.05 0.22 0.33 0.30 0.07 0.02 0.00    0
trust_t1_item2     0.28 0.23 0.21 0.15 0.09 0.04 0.00 0.00    0
trust_t1_item3_inv 0.01 0.04 0.18 0.29 0.28 0.13 0.06 0.01    0
trust_t1_item4     0.01 0.11 0.31 0.34 0.19 0.04 0.01 0.00    0
trust_t1_item5     0.00 0.03 0.14 0.29 0.31 0.18 0.04 0.01    0

Cronbach’s alpha with 3 digits

By default, alpha only shows two digits for Cronbach’s alpha. If you want to see more digits, you can use the print function with the digits argument.

d |> 
    select(trust_t1_item1, trust_t1_item2, trust_t1_item3_inv, trust_t1_item4, trust_t1_item5) |> 
    alpha() |>
    print(digits=3)


Reliability analysis   
Call: alpha(x = select(d, trust_t1_item1, trust_t1_item2, trust_t1_item3_inv, 
    trust_t1_item4, trust_t1_item5))

  raw_alpha std.alpha G6(smc) average_r  S/N    ase mean    sd median_r
     0.848     0.864   0.862      0.56 6.37 0.0104 3.93 0.972     0.67

    95% confidence boundaries 
         lower alpha upper
Feldt    0.828 0.848 0.867
Duhachek 0.828 0.848 0.869

 Reliability if an item is dropped:
                   raw_alpha std.alpha G6(smc) average_r   S/N alpha se  var.r
trust_t1_item1         0.786     0.805   0.798     0.507  4.12  0.01509 0.0518
trust_t1_item2         0.912     0.914   0.893     0.726 10.60  0.00586 0.0032
trust_t1_item3_inv     0.797     0.821   0.817     0.534  4.59  0.01461 0.0587
trust_t1_item4         0.804     0.823   0.819     0.537  4.64  0.01400 0.0579
trust_t1_item5         0.776     0.798   0.782     0.497  3.95  0.01572 0.0437
                   med.r
trust_t1_item1     0.477
trust_t1_item2     0.729
trust_t1_item3_inv 0.525
trust_t1_item4     0.521
trust_t1_item5     0.484

 Item statistics 
                     n raw.r std.r r.cor r.drop mean   sd
trust_t1_item1     600 0.870 0.884 0.870  0.793 4.17 1.10
trust_t1_item2     600 0.599 0.558 0.367  0.349 2.66 1.46
trust_t1_item3_inv 600 0.842 0.844 0.803  0.729 4.45 1.29
trust_t1_item4     600 0.820 0.840 0.797  0.723 3.73 1.08
trust_t1_item5     600 0.888 0.900 0.902  0.814 4.65 1.19

Non missing response frequency for each item
                       1     2     3     4     5     6     7     8 miss
trust_t1_item1     0.003 0.053 0.220 0.333 0.300 0.070 0.018 0.002    0
trust_t1_item2     0.282 0.232 0.207 0.153 0.087 0.035 0.005 0.000    0
trust_t1_item3_inv 0.010 0.042 0.180 0.293 0.283 0.127 0.057 0.008    0
trust_t1_item4     0.010 0.107 0.310 0.338 0.190 0.038 0.007 0.000    0
trust_t1_item5     0.000 0.025 0.142 0.292 0.310 0.178 0.042 0.012    0

Note that this way of setting the nr of digits is specific to the psych package.

This gives quite a lot of output. These are the most important parts to consider:

Cronbach’s alpha: At the top we have a row that says raw_alpha, std.alpha, etc. Here we are just interested in the raw_alpha, which is the value of Cronbach’s alpha. In this case it’s 0.85, which is already very good.
Reliability if an item is dropped: This part shows you what would happen to the raw_alpha (and the other reliability measures) if you would drop one of the items from the scale. In our data we see that if item2 would be dropped, the raw_alpha would go up to 0.91 (from 0.85). In other words, if we would use a 4-item scale with item2 dropped, the scale would be more reliable.
Item statistics: This part shows you some usefull statistics about the items, like the mean and standard deviation (sd). More importantly, it also shows several scores for item-total correlation (the r in raw.r, std.r, r.cor and r.drop stands for correlation). This indicate how strongly the item is correlated to the total scale (i.e. the combination of the other items). The recommended correlation measure to look at is the r.cor (correlation corrected for item overlap). In our data we see that item5 has the strongest correlation with the total scale, whereas item2 has the weakets. Notice how this is in line with the Reliability if an item is dropped part: if we would drop item2, the scale would become more reliable. (Think about why that makes sense!)

6. Remove items if necessary

If the reliability of your scale is too low, you might want to consider removing some items (if you have enough items to spare). Above we saw that if we would drop item2, the reliability of the scale would go up to 0.91. You can verify that this is indeed what happens:

d |> 
    select(trust_t1_item1, 
           trust_t1_item3_inv, 
           trust_t1_item4, 
           trust_t1_item5) |> 
    alpha()

You can verify that the raw_alpha is now 0.91. So now we can choose between either using the 4-item scale with an alpha of 0.91, or keeping the 5-item scale with an alpha of 0.85. This is a judgement call, and depends on the context of your research. Generally speaking, if you have sufficient items left and the improvement in the alpha is not very small, it’s a good idea to remove items to increase the reliability.

If you decide to remove an item, you should test the scale again, because it is possible that you could further improve the reliability by removing another item. In the output of alpha() for our four item scale we see that this is not the case.

7. Calculate the scale score

Finally, once you have a reliable scale, you can calculate the scale score. This is usually done by taking the average of the items in the scale. The simplest way to do so is to just add up the items and divide by the number of items. Let’s do this for the 4-item scale (\(\alpha = 0.91\)) that we tested above (mind the parentheses!):

d <- d |> 
    mutate(trust_t1_scale = (trust_t1_item1 + trust_t1_item3_inv + trust_t1_item4 + trust_t1_item5) / 4)

So now we have a new column trust_t1_scale that contains the scale score.

Remember that in the practice data we already had a column trust_t1 for the scale. This is how that column was created. You can verify this by correlating the new scale with the old one. If you did everything correctly, the correlation should be 1.

cor.test(d$trust_t1, d$trust_t1_scale)

References

Strömbäck, Jesper, Yariv Tsfati, Hajo Boomgaarden, Alyt Damstra, Elina Lindgren, Rens Vliegenthart, and Torun Lindholm. 2020. “News Media Trust and Its Impact on Media Use: Toward a Framework for Future Research.” Annals of the International Communication Association 44 (2): 139–56.