probability-theory

Definition

Covariance

The covariance between two random variables is a measure of alignment and joint variability, and is defined as:

Properties

Expanded Form

Symmetry

The covariance is symmetric for any real-valued random variables :

Affine Linearity

The covariance is affine linear for any real-valued random variables and real-valued scalar :

Bilinear Form

The covariance is a bilinear form in for any real-valued random variables :

(i.e. additivity in the first argument)

Independence

If two random variables are independent, i.e. , then:

The reverse, , is generally wrong.

Example

For example, consider and . Since is a (non-trivial) function of , the two random variables are not independent, but:

Therefore, .

Linear Dependence

The product term tells us whether and are on the same side of their respective means.

We can analyse five possible scenarios for any given data point :

The covariance is the expectation of this product over all possible outcomes.

  1. Case: strong positive linear dependence:
    • If and are positively related, they tend to move together. This means scenarios 1 and 2 (above, where the product is positive) will happen much more often than scenarios 3 and 4. When taking the expectation, the many positive terms will overpower the few negative terms.
    • Result: will be a large positive number.
  2. Case: strong negative linear dependence:
    • If and are negatively related, they tend to move in opposite directions. This means scenarios 4 and 4 (above, where the product is negative) will happen much more often then scenarios 1 and 2. When taking the expectation, the many negative terms will overpower the few positive terms.
    • Result: will be a large negative number.
  3. Case 3: no linear dependence:
    • If and have no linear relationship, all scenarios 1-4 (above) are roughly balanced. The positive products from scenarios 1 and 2 will be cancelled out, on average, by the negative products from scenarios 3 and 4 (and vice versa).
    • Result: will tend to zero.

The covariance measures the linear dependence between and .