SimpleStats v1.5

http://simplestats.sourceforge.net

Powered by: SourceForge Logo

Project Admin: Eric J. Peters


class Stats

This is a simple stats object


Public Methods

Stats(int _m = STATS_FAST_ADD)
This creates a new stats object of the requested type.
virtual ~Stats()
This destroys a stats object.
int getMode()
This will return the stats object mode.
void add(DWORD _n, DWORD _s=0)
This will add a data element to the stats object dataset.
DWORD median()
This function returns the median element of the dataset.
DWORD mean()
This function returns the mean (average) element of the dataset.
double meanDbl()
This function returns the mean (average) element of the dataset.
double variance()
This returns the sample std deviation.
double stdDevDbl()
This returns the sample std deviation.
int optimalClassCount()
This function will return the "optimal" class count.
int getCount()
returns the number of data elements in this object.
DWORD minimum()
This returns the minimal datum in this object.
DWORD maximum()
This returns the maximal datum in this object.
double midpoint()
This returns the midpoint of the range of this dataset.
int intervalSize()
This function returns the optimal interval size.

Private Fields

StatsNode* m_root
This is the first node of the list.
StatsNode* m_middle
This is the middle node of the list.
StatsNode* m_tail
This is the last node of the list.
unsigned long m_sum
This is the running sum of the nodes.
int m_mode
This is this objects runtime mode.
int m_count
This is the count of the nodes. this datum is maintained by both fast_add and slow_add.

Private Methods

void slow_add(DWORD _n, DWORD _s)
This will add a data element to the stats object dataset.
void fast_add(DWORD _n, DWORD _s)
This will add a data element to the stats object dataset.
DWORD slow_median()
This function returns the median element of the dataset.
DWORD fast_median()
This function returns the median element of the dataset.
double slow_meanDbl()
This function returns the mean (average) element of the dataset.
double fast_meanDbl()
This function returns the mean (average) element of the dataset.
double slow_variance()
This returns the sample variance.
double fast_variance()
This returns the sample variance.

Documentation

This is a simple stats object. This package is in the initial open stages. It is designed to do simple statistical caluclations very fast, or to add data to a dataset very fast.

To Do:

Stats(int _m = STATS_FAST_ADD)

This creates a new stats object of the requested type.

The regression tests for this function are in test_0()

Parameters:
_m - This is the stats object mode. It can be one of the following:
  • STATS_SLOW_ADD - This designates that the stats object is to optimize calculations at the cost of slowing the addition of new data to the dataset.
  • STATS_FAST_ADD - This designates that the stats object is to optimize adding of new data to the dataset at the cost of slower calculations.
The object will default to the STATS_FAST_ADD method.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • Initial implementation.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
    • set the default stats mode.

virtual ~Stats()

This destroys a stats object.

The regression tests for this function are in test_0()

Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

int getMode()

This will return the stats object mode.

The regression tests for this function are in test_1()

Returns:
This function will return an integer: either STATS_SLOW_ADD or STATS_FAST_ADD.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

void add(DWORD _n, DWORD _s=0)

This will add a data element to the stats object dataset.

The fast_add(...) function simply adds a new node to the head (m_root) of the datalist and increments the item count. There is no sorting, nor is there a running sum, etc... However, the most important aspect of the fast mode is that the m_tail and m_middle pointers are invalid -- only the m_root pointer is used.

The slow_add(...) function is much more complex:

The regression tests for this function are in test_2() and test_3().

Parameters:
_n - This is the data to add to the object.
_s - This is the (optional) secondary data to add to the object. If this data is permitted, it is assummed to be 0. When using paired data, these two data elements cannot be separated, it's analogous to adding (x,y) to the dataset instead of just x.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Added support for adding paired data.
    • Documented the details of fast vs. slow add.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/27/2001 erpeters{at}users.sourceforge.net
    • Fixed a bug where, in SLOW_ADD mode, if the first two elemets added were the same value, segfault would occur.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

DWORD median()

This function returns the median element of the dataset.

Note: in both cases, the median is the center element if if is odd, and it is the center element of elements if is even. ( being the number of elements).

The regression tests for this function are in test_4() and test_5().

To Do:

Returns:
The median element of the dataset.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

DWORD mean()

This function returns the mean (average) element of the dataset. It returns a rounded integer representation of the result from:



The regression tests for this function are in test_6() and test_7().

Returns:
The average element in this dataset.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Fixed rounding. This meant that I removed the int variants and made this call the Dbl functions instead. I feel this code should go away and be replaced by the Dbl functions.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

double meanDbl()

This function returns the mean (average) element of the dataset.

This is exactly the same as mean(), except is casts the operands to the final divide to double before doing the division.

The regression tests for this function are in test_8() and test_9().

Returns:
The average of the elements in this dataset.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

double variance()

This returns the (population) variance.



The regression tests for this function are in test_21() and test_22().

Returns:
This returns the Standard Deviation (sample) of this dataset.
Version:
Change Log:
    • initial implementation

double stdDevDbl()

This returns the sample std deviation.



To Do:

The regression tests for this function are in test_10() and test_11().

Returns:
This returns the Standard Deviation (sample) of this dataset.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Based this function on the variance function.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

int optimalClassCount()

This function will return the "optimal" class count. This is defined by the smallest such that .

This table demonstrates what to expect:

001
112
212
324
424
538
638
738
838
9416
16416
17532
1007128
1000101024
100001416384

The regression tests for this function are in test_12().

Returns:
The optimal number of classes.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • initial implementation

int getCount()

returns the number of data elements in this object.

The regression tests for this function are in test_13().

Returns:
The size of this object, in elements.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • initial implementation

DWORD minimum()

This returns the minimal datum in this object. Currently, if the dataset is empty, this function returns 0.

The regression tests for this function are in test_14() and test_15().

Returns:
The minimal datum in the object.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • initial implementation

DWORD maximum()

This returns the maximal datum in this object. Currently, if the dataset is empty, this function returns 0.

The regression tests for this function are in test_16() and test_17().

Returns:
The maximal datum in the object.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • initial implementation

double midpoint()

This returns the midpoint of the range of this dataset. This is not the mean or median, it is .

If there is only 1 element in this dataset, that 1 element is the midpoint, which is an exception to the above stated formula. If the dataset is empty, the midpoint claims to be 0. The latter will be replaced with an exception in the future.

The regression tests for this function are in test_18() and test_19().

Returns:
the midpoint of this dataset.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • initial implementation

int intervalSize()

This function will return the size of the interval based on the optimal number of classes. This function is heavily based on the optimalClassCount.

With as the number of elements in this dataset, is selected by optimalClassCount() (based on , see optimalClassCount() for more details). The interval size is then the following:

The following table is an example of what to expect:

0010
112
212
324
424
538
638
738
838
9416
16416
17532
1007128
1000101024
100001416384

The regression tests for this function are in test_20().

Returns:
the optimal interval size.
Version:
Change Log:
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • initial implementation

void slow_add(DWORD _n, DWORD _s)

This will add a data element to the stats object dataset. This is for internal use only, please see void add(DWORD _n);

Parameters:
_n - This is the data to add to the object.
_s - This is the secondary data to add to the object.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Added paired data support.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/27/2001 erpeters{at}users.sourceforge.net
    • Fixed a bug where, in SLOW_ADD mode, if the first two elemets added were the same value, segfault would occur.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

void fast_add(DWORD _n, DWORD _s)

This will add a data element to the stats object dataset. This is for internal use only, please see void add(DWORD _n);

Parameters:
_n - This is the data to add to the object.
_n - This is the secondary data to add to the object.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Added paired data support.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/27/2001 erpeters{at}users.sourceforge.net
    • Fixed a bug where, in SLOW_ADD mode, if the first two elemets added were the same value, segfault would occur.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

DWORD slow_median()

This function returns the median element of the dataset. This is for internal use only, please see DWORD median();

Returns:
The median element of the dataset.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

DWORD fast_median()

This function returns the median element of the dataset. This is for internal use only, please see DWORD median();

Returns:
The median element of the dataset.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

double slow_meanDbl()

This function returns the mean (average) element of the dataset. This is for internal use only, please see double meanDbl();

Returns:
The average of the elements in this dataset.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

double fast_meanDbl()

This function returns the mean (average) element of the dataset. This is for internal use only, please see double meanDbl();

Returns:
The average of the elements in this dataset.
Version:
Change Log:
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.

double slow_variance()

This returns the sample std deviation. This is for internal use only, please see double variance();

Returns:
This returns the variation (sample) of this dataset.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Altered code: was stddev, now deviance.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

double fast_variance()

This returns the sample variance. This is for internal use only, please see double variance();

Returns:
This returns the variance (sample) of this dataset.
Version:
Change Log:
  • 11/25/2001 erpeters{at}users.sourceforge.net
    • Altered code: was stddev, now variance.
  • 11/23/2001 erpeters{at}users.sourceforge.net
    • documentation.
    • moved function into header.
  • 05/07/2001 erpeters{at}users.sourceforge.net
    • initial implementation

StatsNode* m_root

The storage of the data in this object is a double linked list, with a great deal of help by the StatsNode object.

This linked list may or may not be ordered depending on the mode of this object. (see add for more information on this particular issue)

The list has a m_root pointer to the head of the list, a m_tail pointer to the backend of the list, and a m_middle pointer to the median for quicker (in reality, probably not too much quicker) adds and/or lookups.

I'm strongly considering alternate data structures for this object.

StatsNode* m_middle
This is the middle node of the list.

StatsNode* m_tail
This is the last node of the list.

unsigned long m_sum
This is the running sum of the nodes.

int m_mode
This is this objects runtime mode.

int m_count
This is the count of the nodes. this datum is maintained by both fast_add and slow_add.


This class has no child classes.
Author:
Eric J. Peters <erpeters{at}users.sourceforge.net>
Version:
1.4.0 Change Log:
  • Version 1.4.0:
    • 12/01/2001: erpeters{at}users.sourceforge.net
      • Removed Windows support.
      • "fixed" filenames --> all lowercase
      • removed extranious Windows files.
      • added sourceforge doc header.
      • tagged first CVS version (1.4.0) -- new releases will have a more guided, well-specified plan.
  • Version 1.3:
    • 11/25/2001: erpeters{at}users.sourceforge.net
      • Added support for adding paired data.
      • Added variance.
      • Altered StdDev to use Variance.
      • Fixed mean() to return rounded results.
  • Version 1.2:
    • 11/24/2001: erpeters{at}users.sourceforge.net
      • Began regression test suite.
    • 11/23/2001: erpeters{at}users.sourceforge.net
      • Lot more documentation work.
      • Moved most of Readme.txt into this comment block.
    • 11/21/2001: erpeters{at}users.sourceforge.net
      • Linux port complete (RedHat 6.2 & 7.1 tested)
      • Documentation improved. (massively)
      • Code unified into .h file.
      • Documentation support via DOC++ in Makefile (http:
      • Segregated classes into own .h/.cpp files.
      • Made Linux the default #define.
    • 11/21/2001: erpeters{at}users.sourceforge.net
      • Altered the files to use spaces instead of tabs.
      • Altered the files to use Un*x CR/LF instead of DOS.
      • Wrapped the Windows specific stuff (DLL's) in LINUX tests.
      • Added the DWORD (etc) definitions to a linux block in stats.h
      • Created a Makefile.
  • Version 1.1:
    • 05/27/2001 erpeters{at}users.sourceforge.net
      • I quickly added 2 functions: stdDevDbl and meanDbl.
      • I fixed a bug in slow add where if the same value was added as the first 2 values of a data set, segfault happened.
      • I added more commentation and began CVS tagging.
      • I added a version resource to the project.
  • Version 1.0:
    • 05/07/2001 erpeters{at}users.sourceforge.net
      • I dusted this package off and gpl'd it.

alphabetic index hierarchy of classes


Powered by: SourceForge Logo
Project Admin: Eric J. Peters

generated by doc++