Covariance matrix estimation

Meva provides two methods to estimate the covariance matrix of log asset returns: one, cov_pca based on pricipal component analysis (PCA) and one, cov_fa based on factor analysis. Both functions return B and d such that the covariance matrix V is given by

V = \mbox{cov}(y) = BB' + \mbox{diag}(d),

where B is a n\times k matrix mapping k factors to n assets, d is…

cov_pca (PCA-based)

The cov_pca function uses principal component analysis to fit a factor-model decomposition of market variability. This model decomposes the market return y into two parts:

y = Bx + w.

Here, B is a n\times k matrix mapping k factors to n assets, x is a draw from a k dimensional iid standard normal distribution, and w is a draw from an independent normal distribution in which the variance of the j-th component is given by d_j.

The cov_pca function handles missing data, coded as nan.

cov_fa (factor-analysis-based)

The cov_fa function fits the same factor model to the market returns. But instead of using principal component analyisis to decompose the variance, this factor analysis function uses the EM (expectation-maximization) algorithm to iteratively find a maximum-likelihood fit.

For background on the factor analysis model and its EM solution see Andrew Ng’s freely available machine-leaning notes.

The only difference between our algorithm and his is that we use the matrix identities

\big(E - FH^{-1}G\big)^{-1} FH^{-1} =
E^{-1}F \big(H - GE^{-1}F\big)^{-1},

and

\big(E - FH^{-1}G\big)^{-1} =
E^{-1} + E^{-1}F\big(H - GE^{-1}F\big)^{-1}GE^{-1}

to reduce n\times n matrix inversions to k \times k matrix inversions. These identities result form standard blockwise inversion teckniques, for instance on wikipedia here.

We apply them with E=I_{n\times n}, F=B', G=-B, and H=\mbox{diag}(d) during the E-step in order to avoid n\times n inversions.

The cov_fa function cannot handle missing data.

Examples