DataWolf: 2007

As I was driving in my car, I was pondering the regime-switching nature of the world financial markets. I wondered about how the relationships between the performance of various asset classes or industry groups evolve over time. For example, we all know that at certain times small caps outperform large-caps. More interestingly, there are times when the movement in short term interest rates may be positively or negatively correlated with the general market, depending on the underlying fundamental worries in the period. Does the market experience regime switches, which completely break the existing correlations between its various segments?

Even if I can't directly answer the above questions, I decided they should be easily quantifiable and with a few hours of work I should be able to produce a cool visualization that somehow illuminates the inner workings of the market.

Since I haven't worked with financial data recently, my first task was to somehow acquire a reasonable dataset to work with. Yahoo! Finance was perfect for the task due to the easily automatable downloading of data using its "Save as Spreadsheet" link. A couple of python scripts later, I had an automated pipeline of scripts that would allow me to download and merge (there are often missing days for some of the timeseries) data for arbitrary tickers. As a proxy for various market segments, I searched for ETFs which have been in existence since at least 1998, and settled on the following ones: EWJ, MDY, SPY, XLE, XLF, XLP, XLU, XLV, along with the transportation index ^DJT and ^IRX and ^TNX for the 90-day bill and 10-year bond. My dataset seem reasonably representative of the complexities of the market spanning industries, countries, and company sizes.

As customary, I converted my dataset of absolute prices to N-day relative log returns (I tried N of 1 ,2, and 5 and used N=2 in the graphs below.) My visualization scheme idea was to split up the time series into non-overlapping periods of K days, and produce a graph showing the similarity between pairs of periods. So, for example I should be able to see that the market right now is really different from the market in 2001.

The measure of similarity I decided to use, was the KL-divergence (relative entropy) or relative entropy between the joint multivariate distribution of returns in the two distinct periods. I modeled the distribution of returns in each period as a multivariate Gaussian (one dimension for each security in my dataset.) Since I specifically wanted to investigate similarity computed based on second-order characteristics, I scaled the data for each period of K days to have 0 mean and unit(i.e. 1) variance. Had I not scaled the data, much of my similarity metric would have depended on things such as the direction of the market or its volatility. Since such effects are usually rather obvious, removing them by centering the data, allowed me to focus almost entirely on the inter-security structure of market returns. KL-divergence is not a proper distance metric since it is not symmetric (i.e. the KL-divergence between the distributions P and Q is different from the one between Q and P), so my distance measure became the average of the KL-divergence computed both ways.

Implementing the above was a breeze thanks to MATLAB. Below, I have included the pairwise distance for non-overlapping 60-day periods, whose date is indicated on the axes. I only display the upper triangle of the distance matrix since it is symmetric.

The graph above uses 60 day periods, and the distributions compared are 2-day log returns, as described previously. The redder the color, the more similar two periods are, and the bluer it is, the more dissimilar. The height of the bear market in 2001 is most dissimilar to any period. Similarly, the bull market of 2005 is characterized by a red blob near the diagonal, which means that the covariance structure of the market remained fairly constant. We can see how the current period we are in compares against the past by examining the last column of the matrix: it is not particularly similar to any other period (i.e. absence of very red colors) and is in fact as dissimilar to anything else in recent history as any other period in the dataset. In general, by examining the lower right corner of the graph we can see that the most recent period of 60 days has the sharpest difference from the immediately preceding periods of any time period in the recent past. However, perhaps the difference, while noticeable, might be less striking than one would expect given the turmoil and the extreme volatility recently. While past correlations were broken as is evident by the prominence of the rightmost column of the matrix, there was also a large increase in the sheer magnitude of the moves, which we specifically excluded from this analysis.

Below, I also include a similar plot, only over 30 day periods. We can see that the fall of the market in February 2007 was really unlike anything seen in the past several years, in terms of the market covariance structure. With the shorter periods, periods are overall more similar which might be partly due to the fact that it is difficult to reliably estimate the covariance matrix for such a small periods. This can be addressed by projecting to a lower dimensional space, as in PCA and computing distances on the projected distributions.

What do you think about this visualization style? How would you use the underlying ideas of analyzing second-order structure to trade better? I would like to run this on a different basket of assets/industries that is more representative of what is truly important. Any ideas what to include?

In order to make sense of the information, here is a plot of the SPY ETF (proxy for the S&P 500):

P.S. This visualization was actually inspired by tools used to display the correlation structure of markers in the human genome.

DataWolf

Tuesday, September 18, 2007

Financial Markets Visualization

Blog Archive

About Me