New research led by a University of Illinois Urbana-Champaign expert who studies personnel psychology shows a better way to assess noncognitive abilities such as a job candidate’s personality and vocational interests using the “graded forced-choice format.”
The scientific study of a person’s soft skills heavily relies on self-reported measures – for example, respondents are often presented with a series of statements describing typical behaviors, feeling or thoughts such as “I am the life of the party” and are asked to indicate their degree of agreement with each statement on a graded scale from 1-5. The format, in which 1 equates to “strongly disagree” and 5 to “strongly agree,” is known as the Likert rating scale.
But despite the ease of development and scoring, Likert rating scales have been known to be prone to “faking” and various rating biases that may render the validity of scores derived from them questionable, said Bo Zhang, a professor of labor and employment relations and of psychology at Illinois and the lead author of the paper.
“In job application contexts, most people will go out of their way to present the best version of themselves to maximize their chance of getting a job – hence why it’s called ‘faking,’” Zhang said. “Almost everyone has the motivation to fake, and almost all the applicants know how to exaggerate their fit for a job, for example. If you’re a recruiter, that’s a serious problem.”
Even in low-stakes contexts such as participating in research, some respondents have the general tendency to choose the extreme option – that is, strongly agree or strongly disagree – and some people are more likely to be more modest and use the non-extreme response options, regardless of statement content, Zhang said.
“It could mean that two wholly different people may have essentially the same scores just because they have very idiosyncratic ways of using the response options,” he said. “Either way, you’re not getting an accurate measurement.”
To overcome those deficiencies, the researchers looked to the forced-choice format as a remedy.
“Instead of asking respondents to rate how much they agree with one statement, we gave them two statements simultaneously and then asked them to choose which one is more like them,” Zhang said. “With the forced-choice format, there’s no way for respondents to display what we call ‘response biases.’ If two statements are further matched on social desirability, it also becomes much harder to fake. People have to respond according to their true selves, because they’re forced to choose from two equally desirable statements.”
But the traditional forced-choice format isn't perfect, Zhang said.
“Compared to Likert rating scales, it often produces less reliable scores and people often find it harder to respond to,” he said.
Using data from two samples of more than 4,000 respondents, the researchers found a promising alternative – the graded forced-choice format, which “preserved the advantages of traditional forced-choice measures and improved reliability and people’s feelings,” Zhang said.
The graded forced-choice allows respondents to express finer differentiations about their preference for each statement on a graded scale, such as “1 = A is much more like me,” “2 = A is slightly more like me,” “3 = A and B are equally like me,” “4 = B is slightly more like me,” and “5 = B is much more like me,” according to the paper.
The findings show that the graded forced-choice format produces more reliable scores and respondents perceive it to be less difficult than the dichotomous forced-choice format. It’s also less susceptible to response biases and harder to fake than the Likert rating scales, Zhang said.
“We also recommend the inclusion of a middle response option such as ‘A and B are equally like me’ when using the graded forced-choice format,” he said.
The research has implications for cross-cultural comparisons.
“In large-scale international projects, for example, respondents in Asian cultures are more likely to use the non-extreme response options compared to their Western counterparts,” Zhang said. “So if you’re comparing data cross-culturally, the comparison might be invalid because of the existence of response bias in Likert rating scales. The forced-choice format, especially the graded one, can effectively reduce response biases and thus lead to more reliable and accurate cross-cultural comparisons.”
Zhang’s co-authors are Jing Luo of the Feinberg School of Medicine at Northwestern University and Jian Li of Faculty of Psychology at Beijing Normal University.
The paper will be published by the journal Multivariate Behavioral Research.