|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--wavelet_util.plot | +--wavelet_util.noise_filter
The objective in filtering is to remove noise while keeping the features that are interesting.
Wavelets allow a time series to be examined at various resolutions. This can be a powerful tool in filtering out noise. This class supports the subtraction of gaussian noise from the time series.
The identification of noise is complex and I have not found any material that I could understand which discussed noise identification in the context of wavelets. I did find some material that has been difficult and frustrating. In particular Image Processing and Data Analysis: the multiscale approach by Starck, Murtagh and Bijaoui.
If the price of a stock follows a random walk, its price will be distributed in a bell (gaussian) curve. This is one way of stating the concept from financial theory that the daily return is normally distributed (here daily return is defined as the difference between yesterdays close price and today's close price). Movement outside the bounds of the curve may represent something other than a random walk and so, in theory, might be interesting.
At least in the case of the single test case used in developing this code (Applied Materials, symbol: AMAT), the coefficient distribution in the highest frequency is almost a perfect normal curve. That is, the mean is close to zero and the standard deviation is close to one. The area under this curve is very close to one. This resolution approximates the daily return. At lower frequencies the mean moves away from zero and the standard deviation increases. This results is a flattened curve, whose area in the coefficient range is increasingly less than one.
The code in this class subtracts the normal curve from the coefficients at each frequency up to some minimum. This leaves only the coefficients above the curve which are used to regenerate the time series (without the noise, in theory). This filter removes 50 to 60 percent of the coefficients.
Its probably worth mentioning that there are other kinds of noise, most notably Poisson noise. In theory daily data tends to show gaussian noise, while intraday data would should Poisson noise. Intraday Poisson noise would result from the random arrival and size of orders.
This function has two public methods:
n filter_time_series, which is passed a file name and a time series
gaussian_filter which is passed a set of Haar coefficient spectrum and an array allocated for the noise values. The noise array will be the same size as the coefficient array.
Inner Class Summary | |
private class |
noise_filter.bell_info
Bell curve info: mean, sigma (the standard deviation) |
private class |
noise_filter.bin
A histogram bin |
private class |
noise_filter.point
The point class represents a coefficient value so that it can be sorted for histogramming and then resorted back into the orignal ordering (e.g., sorted by value and then sorted by index) |
private class |
noise_filter.sort_by_index
Sort an array of point objects by the index field. |
private class |
noise_filter.sort_by_val
Sort an array of point objects by the val filed. |
Constructor Summary | |
noise_filter()
|
Method Summary | |
private noise_filter.bin[] |
alloc_bins(int num_bins,
double low,
double high)
Allocate an array of histogram bins that is num_bins in length. |
private noise_filter.point[] |
alloc_points(double[] coef,
int start,
int end,
noise_filter.bell_info info)
Allocate and initialize an array of point objects. |
private noise_filter.bin[] |
calc_histo(noise_filter.point[] pointz,
int num_bins)
Calculate the histogram of the coefficients using num_bins histogram bins |
(package private) java.lang.String |
class_name()
|
private int |
filter_spectrum(double[] coef,
int start,
int end,
double[] noise)
This function is passed the section of the Haar coefficients that correspond to a single spectrum. |
void |
filter_time_series(java.lang.String file_name,
double[] ts)
Calculate the Haar tranform on the time series (whose length must be a factor of two) and filter it. |
void |
gaussian_filter(double[] coef,
double[] noise)
This function is passed a set of Haar wavelet coefficients that result from the Haar wavelet transform. |
private void |
histogram(noise_filter.bin[] binz,
noise_filter.point[] pointz)
Build a histogram from the sorted data in the pointz array. |
private double |
normal_interval(noise_filter.bell_info info,
double low,
double high,
int num_points)
normal_interval |
private void |
normalize_to_zero(double[] noise)
Normalize the noise array to zero by subtracting the smallest value from all points. |
private int |
subtract_gauss_curve(noise_filter.bin[] binz,
noise_filter.bell_info info,
int total_points,
double[] noise)
Subtract the gaussian (or normal) curve from the histogram of the coefficients. |
private void |
zero_points(noise_filter.bin b,
int num_zero,
double[] noise)
Set num_points values in the histogram bin b to zero. |
Methods inherited from class wavelet_util.plot |
OpenFile |
Methods inherited from class java.lang.Object |
|
Constructor Detail |
public noise_filter()
Method Detail |
java.lang.String class_name()
private void histogram(noise_filter.bin[] binz, noise_filter.point[] pointz)
Build a histogram from the sorted data in the pointz array. The histogram is constructed by appending a point object to the the bin vals Vector if the value of the point is between b[i].start and b[i].start + step.
private noise_filter.bin[] alloc_bins(int num_bins, double low, double high)
private noise_filter.bin[] calc_histo(noise_filter.point[] pointz, int num_bins)
Calculate the histogram of the coefficients using num_bins histogram bins
The Haar coefficients are stored in point objects which consist of the coefficient value and the index in the point array.
To calculate the histogram, the pointz array is sorted by value. After it is histogrammed it is resorted by index to return the original ordering.
private noise_filter.point[] alloc_points(double[] coef, int start, int end, noise_filter.bell_info info)
Allocate and initialize an array of point objects. The size of the array is end - start. Each point object in the array is initialized with its index and a Haar coefficient (from the coef array).
Since the allocation code has to iterate through the coefficient spectrum the mean and standard deviation are also calculated to avoid an extra iteration. These values are returned in the bell_info object.
private double normal_interval(noise_filter.bell_info info, double low, double high, int num_points)
normal_interval
Numerically integreate the normal curve with mean info.mean and standard deviation info.sigma over the range low to high.
There normal curve equation that is integrated is:
f(y) = (1/(s * sqrt(2 * pi)) e-(1/(2 * s2)(y-u)2
Where u is the mean and s is the standard deviation.
The area under the section of this curve from low to high is returned as the function result.
The normal curve equation results in a curve expressed as a probability distribution, where probabilities are expressed as values greater than zero and less than one. The total area under a normal curve with a mean of zero and a standard deviation of one is is one.
The integral is calculated in a dumb fashion (e.g., we're not using anything fancy like simpson's rule). The area in the interval xi to xi+1 is
area = (xi+1 - xi) * g(xi)
where the function g(xi) is the point on the normal curve probability distribution at xi.
info
- This object encapsulates the mean and standard deviationlow
- Start of the integralhigh
- End of the integralnum_points
- Number of points to calculate (should be even)private void zero_points(noise_filter.bin b, int num_zero, double[] noise)
Set num_points values in the histogram bin b to zero. Or, if the number of values is less than num_zero, set all values in the bin to zero.
The num_zero argument is derived from the area under the normal curve in the histogram bin interval. This area is a fraction of the total curve area. When multiplied by the total number of coefficient points we get num_zero.
The noise coefficients are preserved (returned) in the noise array argument.
private int subtract_gauss_curve(noise_filter.bin[] binz, noise_filter.bell_info info, int total_points, double[] noise)
Subtract the gaussian (or normal) curve from the histogram of the coefficients. This is done by integrating the gaussian curve over the range of a bin. If the number of items in the bin is less than or equal to the area under the curve in that interval, all items in the bin are set to zero. If the number of items in the bin is greater than the area under the curve, then a number of bin items equal to the curve area is set to zero.
The area under a normal curve is always less than or equal to one. So the area returned by normal_interval is the fraction of the total area. This is multiplied by the total number of coefficients.
The function returns the number of coefficients that are set to zero (e.g., the number of coefficients that fell within the gaussian curve). These coefficients are the noise coefficients. The noise coefficients are returned in the noise argument.
private int filter_spectrum(double[] coef, int start, int end, double[] noise)
This function is passed the section of the Haar coefficients that correspond to a single spectrum. It compares this spectrum to a gaussian curve and zeros out the coefficients within the gaussian curve.
The function returns the number of points filtered out as the function result. The noise spectrum is also returned in the noise argument.
private void normalize_to_zero(double[] noise)
public void gaussian_filter(double[] coef, double[] noise)
This function is passed a set of Haar wavelet coefficients that result from the Haar wavelet transform. It applies a gaussian noise filter to each frequency spectrum. This filter zeros out coefficients that fall within a gaussian curve. This alters the input data (the coef array).
The coef argument is the input argument and contains the coefficients. The noise argument is an output argument and contains the coefficients that have been filtered out. This allows a noise spectrum to be rebuilt.
public void filter_time_series(java.lang.String file_name, double[] ts)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |