SimpleStats v1.5

This will return the stats object mode.

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

int getMode()

The regression tests for this function are in test_1()

This function will return an integer: either STATS_SLOW_ADD or STATS_FAST_ADD.

This function returns the mean (average) element of the dataset.

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

void add(DWORD _n, DWORD _s=0)

This will add a data element to the stats object dataset.

The fast_add(...) function simply adds a new node to the head (m_root) of the datalist and increments the item count. There is no sorting, nor is there a running sum, etc... However, the most important aspect of the fast mode is that the m_tail and m_middle pointers are invalid -- only the m_root pointer is used.

The slow_add(...) function is much more complex:

If the datalist is empty, then it simply adds the new data to the object as the head, with the primary data as the running sum, and the count as 1.
Otherwise, it adds the primary data to the running sum and adds 1 to the count. It then inserts the data into the appropriate location in the list, while maintaining the integrity of the m_root, m_middle, and m_tail pointers. (As a side note, it seems to me, in retrospect, that a binary tree would be better suited for this task.) m_middle should always point at the median element.

The regression tests for this function are in test_2() and test_3().

Parameters:

_n - This is the data to add to the object.
_s - This is the (optional) secondary data to add to the object. If this data is permitted, it is assummed to be 0. When using paired data, these two data elements cannot be separated, it's analogous to adding (x,y) to the dataset instead of just x.

Version:

Change Log:

11/25/2001 erpeters{at}users.sourceforge.net
- Added support for adding paired data.
- Documented the details of fast vs. slow add.
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/27/2001 erpeters{at}users.sourceforge.net
- Fixed a bug where, in SLOW_ADD mode, if the first two elemets added were the same value, segfault would occur.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

DWORD median()

This function returns the median element of the dataset.

Note: in both cases, the median is the center element if if is odd, and it is the center element of elements if is even. ( being the number of elements).

The regression tests for this function are in test_4() and test_5().

To Do:

Improve documentation
Give an option on which item to pick if the listsize is even.

Returns:

The median element of the dataset.

Version:

Change Log:

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

DWORD mean()

This function returns the mean (average) element of the dataset. It returns a rounded integer representation of the result from:

The regression tests for this function are in test_6() and test_7().

Returns:

The average element in this dataset.

Version:

Change Log:

11/25/2001 erpeters{at}users.sourceforge.net
- Fixed rounding. This meant that I removed the int variants and made this call the Dbl functions instead. I feel this code should go away and be replaced by the Dbl functions.
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

double meanDbl()

This is exactly the same as mean(), except is casts the operands to the final divide to double before doing the division.

The regression tests for this function are in test_8() and test_9().

The average of the elements in this dataset.

This returns the (population) variance.

11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

double variance()

The regression tests for this function are in test_21() and test_22().

This returns the Standard Deviation (sample) of this dataset.

This returns the sample std deviation.

initial implementation

double stdDevDbl()

To Do:

Support population std deviation as a option.

The regression tests for this function are in test_10() and test_11().

This returns the Standard Deviation (sample) of this dataset.

int optimalClassCount()

11/25/2001 erpeters{at}users.sourceforge.net
- Based this function on the variance function.
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

This function will return the "optimal" class count. This is defined by the smallest such that .

This table demonstrates what to expect:

0 0 1
1 1 2
2 1 2
3 2 4
4 2 4
5 3 8
6 3 8
7 3 8
8 3 8
9 4 16
16 4 16
17 5 32
100 7 128
1000 10 1024
10000 14 16384

The regression tests for this function are in test_12().

The optimal number of classes.

returns the number of data elements in this object.

11/23/2001 erpeters{at}users.sourceforge.net
- initial implementation

int getCount()

The regression tests for this function are in test_13().

The size of this object, in elements.

This returns the midpoint of the range of this dataset. This is not the mean or median, it is .

11/23/2001 erpeters{at}users.sourceforge.net
- initial implementation

DWORD minimum()

This returns the minimal datum in this object. Currently, if the dataset is empty, this function returns 0.

The regression tests for this function are in test_14() and test_15().

Returns:

The minimal datum in the object.

Version:

Change Log:

11/23/2001 erpeters{at}users.sourceforge.net
- initial implementation

DWORD maximum()

This returns the maximal datum in this object. Currently, if the dataset is empty, this function returns 0.

The regression tests for this function are in test_16() and test_17().

Returns:

The maximal datum in the object.

Version:

Change Log:

11/23/2001 erpeters{at}users.sourceforge.net
- initial implementation

double midpoint()

If there is only 1 element in this dataset, that 1 element is the midpoint, which is an exception to the above stated formula. If the dataset is empty, the midpoint claims to be 0. The latter will be replaced with an exception in the future.

The regression tests for this function are in test_18() and test_19().

the midpoint of this dataset.

This function will return the size of the interval based on the optimal number of classes. This function is heavily based on the optimalClassCount.

11/23/2001 erpeters{at}users.sourceforge.net
- initial implementation

int intervalSize()

With as the number of elements in this dataset, is selected by optimalClassCount() (based on , see optimalClassCount() for more details). The interval size is then the following:

The following table is an example of what to expect:

0 0 1 0
1 1 2
2 1 2
3 2 4
4 2 4
5 3 8
6 3 8
7 3 8
8 3 8
9 4 16
16 4 16
17 5 32
100 7 128
1000 10 1024
10000 14 16384

The regression tests for this function are in test_20().

the optimal interval size.

double slow_meanDbl()

11/23/2001 erpeters{at}users.sourceforge.net
- initial implementation

void slow_add(DWORD _n, DWORD _s)

This will add a data element to the stats object dataset. This is for internal use only, please see void add(DWORD _n);

Parameters:

_n - This is the data to add to the object.
_s - This is the secondary data to add to the object.

Version:

Change Log:

11/25/2001 erpeters{at}users.sourceforge.net
- Added paired data support.
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/27/2001 erpeters{at}users.sourceforge.net
- Fixed a bug where, in SLOW_ADD mode, if the first two elemets added were the same value, segfault would occur.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

void fast_add(DWORD _n, DWORD _s)

This will add a data element to the stats object dataset. This is for internal use only, please see void add(DWORD _n);

Parameters:

_n - This is the data to add to the object.
_n - This is the secondary data to add to the object.

Version:

Change Log:

11/25/2001 erpeters{at}users.sourceforge.net
- Added paired data support.
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/27/2001 erpeters{at}users.sourceforge.net
- Fixed a bug where, in SLOW_ADD mode, if the first two elemets added were the same value, segfault would occur.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

DWORD slow_median()

This function returns the median element of the dataset. This is for internal use only, please see DWORD median();

Returns:

The median element of the dataset.

Version:

Change Log:

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

DWORD fast_median()

This function returns the median element of the dataset. This is for internal use only, please see DWORD median();

Returns:

The median element of the dataset.

Version:

Change Log:

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

This function returns the mean (average) element of the dataset. This is for internal use only, please see double meanDbl();

The average of the elements in this dataset.

double fast_meanDbl()

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

This function returns the mean (average) element of the dataset. This is for internal use only, please see double meanDbl();

The average of the elements in this dataset.

double slow_variance()

05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.

This returns the sample std deviation. This is for internal use only, please see double variance();

This returns the variation (sample) of this dataset.

double fast_variance()

11/25/2001 erpeters{at}users.sourceforge.net
- Altered code: was stddev, now deviance.
11/23/2001 erpeters{at}users.sourceforge.net
- documentation.
- moved function into header.
05/07/2001 erpeters{at}users.sourceforge.net
- initial implementation

This returns the sample variance. This is for internal use only, please see double variance();

This returns the variance (sample) of this dataset.