Intraday Trading : Automatically Estimating Parameters for the Simple Moving Average Model
This page last changed on Jan 03, 2009 by firstname.lastname@example.org.
The Simple Moving Average model creates a signal from the difference of a short window (of 200 trade ticks) and a long window (of 1200 trade ticks). That is When the signal reaches a minimum value, a block of stock is bought. When Ed developed the Excel models he found the buy signal buy drawing a line through a set of minima for several days of signal. In theory this manually finds points on the far end of the signal distribution. If a buy is made at these points, there will be reversion to the mean and the model will make money, on average.
Finding the signal minima manually is not practical for more than a few stocks, so a method is needed to find the minima via a software (Java) algorithm. One way this can be done is to calculate a histogram for the signal values. The elements of the histogram are created from the signal values (rounded to three decimal places) and the count of the number of times the signal had that value. The algorithm starts at the left end (negative end) of the histogram and moves to the right until it finds a histogram "bucket" that has at least one value for each day of data that went into the signal data. So if there are five days of signal data, the algorithm will try to find a histogram bucket with five values.
A histogram for the stock GS (Goldman-Sachs), from four days of market data (June 30, July 1, July 2 and July 3, 2008) is shown below. The y-axis is the frequency (e.g., number of values per bucket).
The GS histogram is relatively smooth. However, the histogram for CME is not as evenly distributed.
The algorithm ended up picking up the bucket near -11, which may represent an outlier. The problem with the CME distribution is that it has lots of spikes. If these were smoothed out, a value nearer the center of the distribution would be chosen and it's not clear if this is correct.
The parameters that were calculated form the histogram distributions (using June 30, July 1, July 2 and July 3, 2008) are:
Unfortunately, these parameters did not result in profitable (paper) trading. Trading on July 7, 2008 with 1000 share orders yielded the results shown below:
The debug trade from the Trade Engine is included below:
Parameters for July 8 trading using tick data from June 30, July 1, July 2 , July 3, and July 7, 2008:
Unfortunately, these (slightly different) parameters did no better trading on July 8 than the parameters for July 7 did. The trading trace is shown below. Note that only GS and FCX ever even hit the signal.
The algorithm calculates the signal values for each day separately. It then puts all of these signal values into a single distribution (that consists of the signal values for all of the days). Starting at the far left (the negative side) the algorithm finds the histogram bucket that has a count of at least N values (where N is the number of days in the distribution). This is the value used for the trigger value.
If, for example, one day was unusually volatile, there could be N instances (where N is the number of days in the distribution) in that day that are at the low end of the total distribution. As the daily time window moves forward, this bucket may continue to be picked up. These values were all from one day, so they don't represent the other days in the distribution.
The graphical technique that has been used to hand parameterize the models will tend to pick a trigger value that is distributed throughout the days in the distribution since a line is picked that is a minima through all of the days. This is not the same as the algorithm described above.
The distributions for four days, June 30, July 1, July 3 and July 7 are shown below. If these histograms are combined (as happens in the algorithm used to calculate parameters above) and a histogram bucket with four (or five) elements is chosen then the parameter will be around -1.5.
If a histogram value is chosen that exists in all of the distributions, the value will be around -1.2, since July 1 does not have as large a negative range as the other days. This algorithm is closer to the "by hand" graphical algorithm. The problem with visual "by hand" algorithms is that they may be adjusted in ways that follow visual rules, not algorithmic rules, which makes the algorithm impossible to reproduce.
Reproducing the visual algorithm is more difficult than is implied above. This means that even when the ranges are aligned, the algorithm cannot immediately find a matching value. Nor is it clear without back testing that this will even be an effective technique. While I hate to give up on something, it's not clear how to write an algorithm to effectively parameterize this model. So I am going to concentrate on the EMA models and hope for something better.