## Drawing Confidence Ellipses and Ellipsoids

I’ve seen some really bad methods for drawing confidence ellipsoids recently, they all seem to make it really complicated and confusing (and specific). So I thought I would show how to calculate points on an ellipse corresponding to a covariance matrix – this method works for any number of dimensions without any need to change it.

For all those that don’t care why, the method to generate the points of an ellipsoid is as follows:

1) make a unit n-sphere (which for 2D is a circle with radius 1), call these points X:

If it is an elipse you want, make a matrix with columns of $\sin(\theta)$ and $\cos(\theta)$ for some incrementing $\theta$ values between $0$ and $2\pi$(=)

2) apply the following linear transformation to get the points of your ellipsoid (Y):

$Y = M + kC(\Sigma)X$

where M is the vector of the means (center of the ellipsoid) and $\Sigma$ is the covariance matrix. C represents the Cholesky Decomposition, sort of a matrix square root. k is the number of standard deviations at which one wishes to draw the ellipse

The Cholesky decomposition can be accessed as, “numpy.linalg.cholesky” in Python, “Cholesky” in R (matrix package), “chol” in MATLAB and “spotrf” (amongst others, I think) in LAPACK

For those who care, here is why this works…

The general formula for an ellipsoid (in Y) is (as seen in the exponent of the multivariate normal distribution):
$(Y - M)^T \Sigma^{-1} (Y-M) = 1$
One can consider points on the ellipsoid as a linear transformation from points on a unit sphere/circle (X).
$Y = A + BX$
It’s clear that the center of the ellipse should be the mean, and that B cannot contribute to a translation of the origin, so A = M. Plug this into the equation above, and that into the one above it:
$(BX)^T \Sigma^{-1} (BX) = 1$
expanding the BX brackets gives
$X^TB^T \Sigma^{-1} (BX) = 1$
and premultiplying by $X$ and postmultiplying by $X^T$
$XX^TB^T \Sigma^{-1} BXX^T = XX^T$
X, being a point on the unit sphere satisfies $XX^T = 1$ for each point. So
$B^T \Sigma^{-1} B = 1$
premultiplying by $(B^T)^{-1}$ and postmultiplying by $B^{-1}$ then gives
$\Sigma^{-1} = (B^T)^{-1}B^{-1}$
$= (B B^T)^{-1}$
$\Sigma = BB^T$
The Cholesky decomposition is just the name of a process which finds a value of B which satisfies this equation. There is no reason why X cannot be a list of points. Scale B by whatever factor you deem appropriate for your confidence interval – without scaling it is ellipse with standard deviation of one.