Stats(int _m = STATS_FAST_ADD)
This creates a new stats object of the requested type.
The regression tests for this function are in test_0()
- Parameters:
- _m - This is the stats object mode. It can be one of the
following:
- STATS_SLOW_ADD - This designates that the stats object is
to optimize calculations at the cost of slowing the addition
of new data to the dataset.
- STATS_FAST_ADD - This designates that the stats object is
to optimize adding of new data to the dataset at the cost of
slower calculations.
The object will default to the STATS_FAST_ADD method.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- set the default stats mode.
virtual ~Stats()
This destroys a stats object.
The regression tests for this function are in test_0()
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
int getMode()
This will return the stats object mode.
The regression tests for this function are in test_1()
- Returns:
- This function will return an integer: either STATS_SLOW_ADD or
STATS_FAST_ADD.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
void add(DWORD _n, DWORD _s=0)
This will add a data element to the stats object dataset.
The fast_add(...) function simply adds a new node to the head (m_root)
of the datalist and increments the item count. There is no sorting, nor
is there a running sum, etc... However, the most important aspect of
the fast mode is that the m_tail and m_middle pointers are invalid --
only the m_root pointer is used.
The slow_add(...) function is much more complex:
- If the datalist is empty, then it simply adds the new data to
the object as the head, with the primary data as the running sum,
and the count as 1.
- Otherwise, it adds the primary data to the running sum and adds
1 to the count. It then inserts the data into the appropriate
location in the list, while maintaining the integrity of the
m_root, m_middle, and m_tail pointers. (As a side note, it seems
to me, in retrospect, that a binary tree would be better suited
for this task.) m_middle should always point at the median
element.
The regression tests for this function are in test_2() and test_3().
- Parameters:
- _n - This is the data to add to the object.
_s - This is the (optional) secondary data to add to the object. If
this data is permitted, it is assummed to be 0. When using
paired data, these two data elements cannot be separated, it's
analogous to adding (x,y) to the dataset instead of just x.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Added support for adding paired data.
- Documented the details of fast vs. slow add.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/27/2001 erpeters{at}users.sourceforge.net
- Fixed a bug where, in SLOW_ADD mode, if the first two
elemets added were the same value, segfault would occur.
- 05/07/2001 erpeters{at}users.sourceforge.net
DWORD median()
This function returns the median element of the dataset.
Note: in both cases, the median is the center element if
if
is odd, and it is the center element of elements
if
is
even. (
being the number of elements).
The regression tests for this function are in test_4() and test_5().
To Do:
- Improve documentation
- Give an option on which item to pick if the listsize is even.
- Returns:
- The median element of the dataset.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
DWORD mean()
This function returns the mean (average) element of the dataset. It
returns a rounded integer representation of the result from:

The regression tests for this function are in test_6() and test_7().
- Returns:
- The average element in this dataset.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Fixed rounding. This meant that I removed the int variants
and made this call the Dbl functions instead. I feel this
code should go away and be replaced by the Dbl functions.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/07/2001 erpeters{at}users.sourceforge.net
double meanDbl()
This function returns the mean (average) element of the dataset.
This is exactly the same as mean(), except is casts the operands to the
final divide to double before doing the division.
The regression tests for this function are in test_8() and test_9().
- Returns:
- The average of the elements in this dataset.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/07/2001 erpeters{at}users.sourceforge.net
double variance()
This returns the (population) variance.

The regression tests for this function are in test_21() and test_22().
- Returns:
- This returns the Standard Deviation (sample) of this dataset.
- Version:
- Change Log:
double stdDevDbl()
This returns the sample std deviation.

To Do:
The regression tests for this function are in test_10() and test_11().
- Returns:
- This returns the Standard Deviation (sample) of this dataset.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Based this function on the variance function.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/07/2001 erpeters{at}users.sourceforge.net
int optimalClassCount()
This function will return the "optimal" class count. This is defined by
the smallest
such that
.
This table demonstrates what to expect:
 |  |  |
0 | 0 | 1 |
1 | 1 | 2 |
2 | 1 | 2 |
3 | 2 | 4 |
4 | 2 | 4 |
5 | 3 | 8 |
6 | 3 | 8 |
7 | 3 | 8 |
8 | 3 | 8 |
9 | 4 | 16 |
16 | 4 | 16 |
17 | 5 | 32 |
100 | 7 | 128 |
1000 | 10 | 1024 |
10000 | 14 | 16384 |
|
The regression tests for this function are in test_12().
- Returns:
- The optimal number of classes.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
int getCount()
returns the number of data elements in this object.
The regression tests for this function are in test_13().
- Returns:
- The size of this object, in elements.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
DWORD minimum()
This returns the minimal datum in this object. Currently, if the dataset
is empty, this function returns 0.
The regression tests for this function are in test_14() and test_15().
- Returns:
- The minimal datum in the object.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
DWORD maximum()
This returns the maximal datum in this object. Currently, if the dataset
is empty, this function returns 0.
The regression tests for this function are in test_16() and test_17().
- Returns:
- The maximal datum in the object.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
double midpoint()
This returns the midpoint of the range of this dataset. This is
not the mean or median, it is
.
If there is only 1 element in this dataset, that 1 element is the
midpoint, which is an exception to the above stated formula. If the
dataset is empty, the midpoint claims to be 0. The latter will be
replaced with an exception in the future.
The regression tests for this function are in test_18() and test_19().
- Returns:
- the midpoint of this dataset.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
int intervalSize()
This function will return the size of the interval based on the
optimal number of classes. This function is heavily based on the
optimalClassCount.
With
as the number of elements in this dataset,
is selected by
optimalClassCount() (based on
, see optimalClassCount() for more
details). The interval size is then the following:

The following table is an example of what to expect:
 |  |  |  |
0 | 0 | 1 | 0 |
1 | 1 | 2 |  |
2 | 1 | 2 |  |
3 | 2 | 4 |  |
4 | 2 | 4 |  |
5 | 3 | 8 |  |
6 | 3 | 8 |  |
7 | 3 | 8 |  |
8 | 3 | 8 |  |
9 | 4 | 16 |  |
16 | 4 | 16 |  |
17 | 5 | 32 |  |
100 | 7 | 128 |  |
1000 | 10 | 1024 |  |
10000 | 14 | 16384 |  |
|
The regression tests for this function are in test_20().
- Returns:
- the optimal interval size.
- Version:
- Change Log:
- 11/23/2001 erpeters{at}users.sourceforge.net
void slow_add(DWORD _n, DWORD _s)
This will add a data element to the stats object dataset. This is for
internal use only, please see void add(DWORD _n);
- Parameters:
- _n - This is the data to add to the object.
_s - This is the secondary data to add to the object.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Added paired data support.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/27/2001 erpeters{at}users.sourceforge.net
- Fixed a bug where, in SLOW_ADD mode, if the first two
elemets added were the same value, segfault would occur.
- 05/07/2001 erpeters{at}users.sourceforge.net
void fast_add(DWORD _n, DWORD _s)
This will add a data element to the stats object dataset. This is for
internal use only, please see void add(DWORD _n);
- Parameters:
- _n - This is the data to add to the object.
_n - This is the secondary data to add to the object.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Added paired data support.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/27/2001 erpeters{at}users.sourceforge.net
- Fixed a bug where, in SLOW_ADD mode, if the first two
elemets added were the same value, segfault would occur.
- 05/07/2001 erpeters{at}users.sourceforge.net
DWORD slow_median()
This function returns the median element of the dataset. This is for
internal use only, please see DWORD median();
- Returns:
- The median element of the dataset.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
DWORD fast_median()
This function returns the median element of the dataset. This is for
internal use only, please see DWORD median();
- Returns:
- The median element of the dataset.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
double slow_meanDbl()
This function returns the mean (average) element of the dataset. This is
for internal use only, please see double meanDbl();
- Returns:
- The average of the elements in this dataset.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
double fast_meanDbl()
This function returns the mean (average) element of the dataset. This is
for internal use only, please see double meanDbl();
- Returns:
- The average of the elements in this dataset.
- Version:
- Change Log:
- 05/07/2001 erpeters{at}users.sourceforge.net
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
double slow_variance()
This returns the sample std deviation. This is for internal use only,
please see double variance();
- Returns:
- This returns the variation (sample) of this dataset.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Altered code: was stddev, now deviance.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/07/2001 erpeters{at}users.sourceforge.net
double fast_variance()
This returns the sample variance. This is for internal use only,
please see double variance();
- Returns:
- This returns the variance (sample) of this dataset.
- Version:
- Change Log:
- 11/25/2001 erpeters{at}users.sourceforge.net
- Altered code: was stddev, now variance.
- 11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
- 05/07/2001 erpeters{at}users.sourceforge.net
StatsNode* m_root
The storage of the data in this object is a double linked list, with a
great deal of help by the StatsNode object.
This linked list may or may not be ordered depending on the mode of this
object. (see add for more information on this particular issue)
The list has a m_root pointer to the head of the list, a m_tail pointer to
the backend of the list, and a m_middle pointer to the median for quicker
(in reality, probably not too much quicker) adds and/or lookups.
I'm strongly considering alternate data structures for this object.
StatsNode* m_middle
- This is the middle node of the list.
StatsNode* m_tail
- This is the last node of the list.
unsigned long m_sum
- This is the running sum of the nodes.
int m_mode
- This is this objects runtime mode.
int m_count
- This is the count of the nodes. this datum is maintained by both
fast_add and slow_add.