Blog: DIY haircuts and dodgy psychometrics
Like many, I tried to give myself a DIY haircut when the COVID-19 restrictions were announced. My hair was a bit longer than I liked. I could rub a load of hair wax into it to make it look presentable, but it seemed a lot of effort, especially when I wasn’t bothering to change into “outdoor” clothes most days.
It didn’t go well.
I soon realised that my old €10 beard trimmer wasn’t up to the task. I continued hacking away until it was an even, fluffy one-all-over. This was tidier at least, but I couldn’t do anything with it. Ah well.
A couple of days later, a paper I wrote with colleagues some time ago was published;
Perry, J. L., Temple, E. C., Worrell, F. C., Zivkovic, U., Mello, Z. R., Musil, B., … McKay, M. T. (2020). Different Version, Similar Result? A Critical Analysis of the Multiplicity of Shortened Versions of the Zimbardo Time Perspective Inventory. SAGE Open. https://doi.org/10.1177/2158244020923351
It got me thinking, there is a similarity between cutting your own hair and creating shortened versions of psychometric scales – you can tidy it up, but it’s rarely a satisfactory job and it becomes difficult to do much with it.
The Zimbardo Time Perspective Inventory (ZTPI; Zimbardo & Boyd, 1999) certainly has psychometric problems. That's ok, it’s an interesting concept and it may well be possible to keep developing the idea and measurement to achieve something really useful. Many researchers have adopted the approach of simply removing items in order to achieve a satisfactory model fit. That is, the observed data “fits” onto the hypothesized questionnaire structure well. This has led to a host of shortened versions. These typically produce a better model fit but at what cost?
Firstly, the reliability of shorter scales is inevitably reduced. This is because the same level of uncertainty/ error has a greater impact. If I missed a bit when cutting my own hair, the bit I missed would stand out more the shorter I cut the rest of it.
Secondly, the distribution is flattened. See the attached figure from the supplementary material in our paper as an illustration (bottom of the page). Each line here presents a different version of the ZTPI, with the numbers in the key representing the number of items in each version. The original ZTPI is the 56-item, which has the highest peak. As the scales get shorter, the mean values (x axis) remain pretty much the same, but the curves become flatter.
Let’s remember the point of the questionnaire existing in the first place – to measure an otherwise unobservable construct. That measurement then helps us to determine relationships or differences, whether between groups or over time. The flatter the distribution, the more overlap we will likely see when comparing mean scores and therefore, by shortening the questionnaire to achieve model fit, we have created (a) a less reliable measure, and (b) a less sensitive one.
Much like my DIY haircut, it looks neater from a distance, but on closer inspection the errors really stand out and I can’t really do anything with it. It’s much better to do a proper job on it. For questionnaires, this means re-writing items, not simply continuing to cut. For hair - leave it to the professionals.
John Perry, Head of Department of Psychology