Even if I can't directly answer the above questions, I decided they should be easily quantifiable and with a few hours of work I should be able to produce a cool visualization that somehow illuminates the inner workings of the market.
Since I haven't worked with financial data recently, my first task was to somehow acquire a reasonable dataset to work with. Yahoo! Finance was perfect for the task due to the easily automatable downloading of data using its "Save as Spreadsheet" link. A couple of python scripts later, I had an automated pipeline of scripts that would allow me to download and merge (there are often missing days for some of the timeseries) data for arbitrary tickers. As a proxy for various market segments, I searched for ETFs which have been in existence since at least 1998, and settled on the following ones: EWJ, MDY, SPY, XLE, XLF, XLP, XLU, XLV, along with the transportation index ^DJT and ^IRX and ^TNX for the 90-day bill and 10-year bond. My dataset seem reasonably representative of the complexities of the market spanning industries, countries, and company sizes.
As customary, I converted my dataset of absolute prices to N-day relative log returns (I tried N of 1 ,2, and 5 and used N=2 in the graphs below.) My visualization scheme idea was to split up the time series into non-overlapping periods of K days, and produce a graph showing the similarity between pairs of periods. So, for example I should be able to see that the market right now is really different from the market in 2001.
The measure of similarity I decided to use, was the KL-divergence (relative entropy) or relative entropy between the joint multivariate distribution of returns in the two distinct periods. I modeled the distribution of returns in each period as a multivariate Gaussian (one dimension for each security in my dataset.) Since I specifically wanted to investigate similarity computed based on second-order characteristics, I scaled the data for each period of K days to have 0 mean and unit(i.e. 1) variance. Had I not scaled the data, much of my similarity metric would have depended on things such as the direction of the market or its volatility. Since such effects are usually rather obvious, removing them by centering the data, allowed me to focus almost entirely on the inter-security structure of market returns. KL-divergence is not a proper distance metric since it is not symmetric (i.e. the KL-divergence between the distributions P and Q is different from the one between Q and P), so my distance measure became the average of the KL-divergence computed both ways.
Implementing the above was a breeze thanks to MATLAB. Below, I have included the pairwise distance for non-overlapping 60-day periods, whose date is indicated on the axes. I only display the upper triangle of the distance matrix since it is symmetric.

Below, I also include a similar plot, only over 30 day periods. We can see that the fall of the market in February 2007 was really unlike anything seen in the past several years, in terms of the market covariance structure. With the shorter periods, periods are overall more similar which might be partly due to the fact that it is difficult to reliably estimate the covariance matrix for such a small periods. This can be addressed by projecting to a lower dimensional space, as in PCA and computing distances on the projected distributions.

What do you think about this visualization style? How would you use the underlying ideas of analyzing second-order structure to trade better? I would like to run this on a different basket of assets/industries that is more representative of what is truly important. Any ideas what to include?
In order to make sense of the information, here is a plot of the SPY ETF (proxy for the S&P 500):

P.S. This visualization was actually inspired by tools used to display the correlation structure of markers in the human genome.
No comments:
Post a Comment