Viewing a single comment thread. View all comments

chaosmosis t1_j0icgvf wrote

As an example, imagine that Bob and Susan are estimating the height of a dinosaur and Bob makes errors that are exaggerated versions of Susan's, so if Susan underestimates its height by ten feet then Bob underestimates it by twenty, or if Susan overestimates its height by thirty feet then Bob overestimates it by forty. You can "artificially construct" a new prediction to average with Susan's predictions by taking the difference between her prediction and Bob's, flipping its sign, and adding it to her prediction. Then you conduct traditional linear averaging on the constructed prediction with Susan's prediction.

Visually, you can think about it as if normal averaging draws a straight line between two different models' individual outputs in R^n , then chooses some point between them, while control variates extend that line further in both directions and allow you to choose a point that's more extreme.

It's a little more complicated with more predictors and when issuing predictions in higher dimensions than in one dimension, but not by much. Intuitively, you have to avoid "overcounting" certain relationships when you're trying to build a flipped predictor. This is why the financial portfolio framework is helpful; they're already used to thinking about correlations between lots of different investments.

The tl;dr version is, you want models with errors that balance each other out.