A/B controlled trial design in the big data era


Picture source: http://www.thebluediamondgallery.com/

Controlled trials are used ubiquitously to investigate the effect of an A/B treatment factor on a response variable of interest. The example is a clinical trial that selects a set of subjects for the study. Half are administered some treatment (A), while the other half are in standard care (B). Then the subjects’ responses to the treatments are observed and compared. Such a study is designed to examine the effects of the promotion on purchasing behavior.

In the era of big data, it is now common to have databases that contain rich covariate information for a large population of potential subjects, and the effect of the treatment often depends on the covariates. Researchers approached this problem by asking the fundamental question: “How can big databases of subject covariate information be leveraged to design more efficient and powerful controlled trials?” for develop a framework for selecting a small but informative subset of subjects from the database. The subjects are selected and allocated to either the A or B group with the objective of maximizing the expected information content in the data that will be collected.