Comparisons Are Odious

By in
Comparisons Are Odious

In our daily “Data” section we typically analyze the numbers that create and support market narratives. The usual themes of our work: there’s no such thing as a perfect dataset and labels usually mislead. Every commonly used measure of market/economic activity is like an onion. Lots of layers, and there’s usually a few tears along the way.

But in an ever more data-centric world, attention is now shifting to HOW companies, researchers and academics gather data. Technology businesses are one epicenter of that debate, of course. “Free” services like Google, Facebook, and Twitter carry no cost because the data users hand over is valuable to advertisers. That works well – too well, some would say – but the questions of privacy and appropriateness are now front-and-center issues.

Scrolling through the most-read articles of the Proceedings of the National Academy of Sciences last night, we came across this in the #3 spot: “Objecting to experiments that compare 2 unobjectionable policies or treatments”. Here’s how this study relates to the collection of data:

  • Researchers across a wide array of disciplines, including health care, finance, medicine and education, use randomized experiments to test what works and what doesn’t.
  • The most common structure for these experiments is an “A/B test”, where similar populations are exposed to 2 different stimuli. Researchers then measure the relative effect between each subgroup to determine the more effective option.

So far, so good, or so it seems. A/B testing is an efficient and statistically rigorous way to unearth new solutions with hard-nosed data.

Except it’s actually not so simple, because what the paper describes is that people really don’t like A/B testing even when they have no problem with “A” or “B”. A few details from the study:

  • The paper’s authors surveyed 5,873 subjects with a wide range of individual educational attainment about possible studies related to everything from autonomous vehicle design to poverty reduction and health care practices.
  • They found “that people frequently rate A/B tests designed to establish the comparative effectiveness of two policies or treatments as inappropriate even when universally implementing either A or B, untested, is seen as appropriate.”
  • This “A/B effect” (a strong dislike for the direct, randomized comparison of 2 options) is “as strong among those with higher educational attainment and science literacy and among relevant industry professionals”.

So why don’t people – even trained scientists – like a study format that is definitively superior to just picking one option or the other? The paper posits a few possibilities:

  • An “aversion to randomization”. It is unfair/unethical to randomly give half a subject population something that may be inferior to the other option.
  • “Implied absence of informed consent”. The subjects in an experiment don’t actually know they are being exposed to something that may or may not work relative to another option.
  • The “mad scientist”. Individuals who have their own agendas might use A/B testing to less-than ethical ends.
  • “Experts should already know”. Rather than do an A/B test, experts in the field should just implement “A” or “B” based on their best judgment.

Now, all this may seem purely theoretical, but the fact that this article is so widely read on the PNAS website just now shows it is relevant to many areas and holds some common lessons:

  • People, as a rule, do not like to feel that someone else is experimenting on them without their prior explicit consent.
  • “Fairness” is a still-underappreciated factor, even in the sciences. Randomly assigning people to “A” or “B” classes can easily come across as arbitrary. We often write about the Ultimatum Game, another example of how humans put fairness above simple constructs like marginal utility.
  • The importance of trust and perceived expertise. The study’s findings were robust across all educational levels, showing that people broadly prefer human judgment to randomized experimentation. That’s not to say they trust it implicitly; it’s just better in their eyes than perceived unfairness.

The upshot to all this: the process of data collection, especially in an increasingly technological world, is a minefield. How humans judge the process is actually more important than developing the “best” approach. A/B testing may be effective, for example, but it is an uncomfortable fit with what we value. No wonder regulating Tech companies is a powerful political message just now. And it is clearly an evergreen topic.

PNAS Paper: