Experimental congruence of interval scale production from paired comparisons and ranking for image evaluation

John C. Handley; Jason S. Babcock; Jeff B. Pelz

doi:10.1117/12.526623

18 December 2003 Experimental congruence of interval scale production from paired comparisons and ranking for image evaluation

John C. Handley, Jason S. Babcock, Jeff B. Pelz

Author Affiliations +

Proceedings Volume 5294, Image Quality and System Performance; (2003) https://doi.org/10.1117/12.526623
Event: Electronic Imaging 2004, 2004, San Jose, California, United States

Abstract

Image evaluation tasks are often conducted using paired comparisons or ranking. To elicit interval scales, both methods rely on Thurstone's Law of Comparative Judgment in which objects closer in psychological space are more often confused in preference comparisons by a putative discriminal random process. It is often debated whether paired comparisons and ranking yield the same interval scales. An experiment was conducted to assess scale production using paired comparisons and ranking. For this experiment a Pioneer Plasma Display and Apple Cinema Display were used for stimulus presentation. Observers performed rank order and paired comparisons tasks on both displays. For each of five scenes, six images were created by manipulating attributes such as lightness, chroma, and hue using six different settings. The intention was to simulate the variability from a set of digital cameras or scanners. Nineteen subjects, (5 females, 14 males) ranging from 19-51 years of age participated in this experiment. Using a paired comparison model and a ranking model, scales were estimated for each display and image combination yielding ten scale pairs, ostensibly measuring the same psychological scale. The Bradley-Terry model was used for the paired comparisons data and the Bradley-Terry-Mallows model was used for the ranking data. Each model was fit using maximum likelihood estimation and assessed using likelihood ratio tests. Approximate 95% confidence intervals were also constructed using likelihood ratios. Model fits for paired comparisons were satisfactory for all scales except those from two image/display pairs; the ranking model fit uniformly well on all data sets. Arguing from overlapping confidence intervals, we conclude that paired comparisons and ranking produce no conflicting decisions regarding ultimate ordering of treatment preferences, but paired comparisons yield greater precision at the expense of lack-of-fit.

Citation Download Citation

John C. Handley, Jason S. Babcock, and Jeff B. Pelz "Experimental congruence of interval scale production from paired comparisons and ranking for image evaluation", Proc. SPIE 5294, Image Quality and System Performance, (18 December 2003); https://doi.org/10.1117/12.526623

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available