This page last changed on Jan 03, 2009 by iank@bearcave.com.
 The Material Below is Incorrect
A lab notebook should record the steps taken to arrive at a result. Unfortunately, some of these steps will go in the wrong direction. This page is just such a case.
Return is usually calculated using the formula return = price_{t}  price_{t1}. There are variations on this, like log return. The signal below, signal = short_{200}  long_{1200}, is a proxy for return. The short_{200} average will be close to the actual time series and the long_{1200} will lag the time series. The difference is a sign reversed return, since a later average is being substracted from a more recent average.
As the graphs below show, this pseudo return falls into a curve, at least when looked at over multiple days. This curve is not the same as the distribution of the minimum and maximum of the signal. The text below states this, but this is not correct. What is needed to reproduce the visual algorithm are these minimum and maximum values. Then a line is plotted through the minimum values. The question is, now to plot this line.
The problem is, the minimum values may be slightly off of each other. What we want is something like the minimum line that intersects one to two values a day (on average), where the line has a slope of zero (e.g., its a flat line) and minimizes the distance to the minimums. 
The Simple Moving Average model creates a signal from the difference of a short window (of 200 trade ticks) and a long window (of 1200 trade ticks). That is When the signal reaches a minimum value, a block of stock is bought. When Ed developed the Excel models he found the buy signal buy drawing a line through a set of minima for several days of signal. In theory this manually finds points on the far end of the signal distribution. If a buy is made at these points, there will be reversion to the mean and the model will make money, on average.
Finding the signal minima manually is not practical for more than a few stocks, so a method is needed to find the minima via a software (Java) algorithm. One way this can be done is to calculate a histogram for the signal values. The elements of the histogram are created from the signal values (rounded to three decimal places) and the count of the number of times the signal had that value. The algorithm starts at the left end (negative end) of the histogram and moves to the right until it finds a histogram "bucket" that has at least one value for each day of data that went into the signal data. So if there are five days of signal data, the algorithm will try to find a histogram bucket with five values.
A histogram for the stock GS (GoldmanSachs), from four days of market data (June 30, July 1, July 2 and July 3, 2008) is shown below. The yaxis is the frequency (e.g., number of values per bucket).
The GS histogram is relatively smooth. However, the histogram for CME is not as evenly distributed.
The algorithm ended up picking up the bucket near 11, which may represent an outlier. The problem with the CME distribution is that it has lots of spikes. If these were smoothed out, a value nearer the center of the distribution would be chosen and it's not clear if this is correct.
The parameters that were calculated form the histogram distributions (using June 30, July 1, July 2 and July 3, 2008) are:
stock 
parameter 
GOOG 
5.101 
GS 
1.465 
CME 
11.704 
FCX 
1.511 
OIH 
3.384 
RIG 
1.689 
AMGN 
0.26 
BIIB 
0.636 
CMI 
2.299 
ERTS 
0.729 
ICE 
3.345 
CAT 
1.227 
BBBY 
0.419 
BRCM 
0.88 
GENZ 
0.665 
AMZN 
1.787 
Unfortunately, these parameters did not result in profitable (paper) trading. Trading on July 7, 2008 with 1000 share orders yielded the results shown below:
The debug trade from the Trade Engine is included below:
Parameters for July 8 trading using tick data from June 30, July 1, July 2 , July 3, and July 7, 2008:
stock 
parameter 
GOOG 
5.088 
GS 
1.48 
CME 
9.729 
FCX 
1.514 
OIH 
3.384 
RIG 
1.689 
AMGN 
0.467 
BIIB 
0.636 
CMI 
2.299 
ERTS 
0.709 
ICE 
3.345 
CAT 
1.227 
BBBY 
0.419 
BRCM 
0.522 
GENZ 
0.785 
AMZN 
1.787 
Unfortunately, these (slightly different) parameters did no better trading on July 8 than the parameters for July 7 did. The trading trace is shown below. Note that only GS and FCX ever even hit the signal.
The Algorithm Doesn't do what the Graphical Method Does
The algorithm calculates the signal values for each day separately. It then puts all of these signal values into a single distribution (that consists of the signal values for all of the days). Starting at the far left (the negative side) the algorithm finds the histogram bucket that has a count of at least N values (where N is the number of days in the distribution). This is the value used for the trigger value.
If, for example, one day was unusually volatile, there could be N instances (where N is the number of days in the distribution) in that day that are at the low end of the total distribution. As the daily time window moves forward, this bucket may continue to be picked up. These values were all from one day, so they don't represent the other days in the distribution.
The graphical technique that has been used to hand parameterize the models will tend to pick a trigger value that is distributed throughout the days in the distribution since a line is picked that is a minima through all of the days. This is not the same as the algorithm described above.
The distributions for four days, June 30, July 1, July 3 and July 7 are shown below. If these histograms are combined (as happens in the algorithm used to calculate parameters above) and a histogram bucket with four (or five) elements is chosen then the parameter will be around 1.5.
If a histogram value is chosen that exists in all of the distributions, the value will be around 1.2, since July 1 does not have as large a negative range as the other days. This algorithm is closer to the "by hand" graphical algorithm. The problem with visual "by hand" algorithms is that they may be adjusted in ways that follow visual rules, not algorithmic rules, which makes the algorithm impossible to reproduce.
Moving average (Short_{200}  Long_{1200}) of FCX, June 30, 2008
Moving average (Short_{200}  Long_{1200}) of FCX, July 1, 2008
Moving average (Short_{200}  Long_{1200}) of FCX, July 2, 2008
Moving average (Short_{200}  Long_{1200}) of FCX, July 7, 2008
My Problems got Problems
Reproducing the visual algorithm is more difficult than is implied above. This means that even when the ranges are aligned, the algorithm cannot immediately find a matching value. Nor is it clear without back testing that this will even be an effective technique. While I hate to give up on something, it's not clear how to write an algorithm to effectively parameterize this model. So I am going to concentrate on the EMA models and hope for something better.
