Mean

`results.mean.MeanResults(mean, count)` `dataclass` ¶

`count` `instance-attribute` ¶

`mean` `instance-attribute` ¶

`results.mean.OnlineMeanResults()` ¶

Bases: ResultsHandler[MeanResults]

OnlineMeanResults provides a numerically stable, online (incremental) algorithm to compute the arithmetic mean of large, streaming, or distributed datasets represented as NumPy arrays. In many real-world applications—such a distributed computing or real-time data processing—data is processed in batches, with each batch yielding a partial mean and its corresponding count. This class merges these partial results into a global mean without needing to store all the raw data, thus avoiding issues such as numerical overflow and precision loss.

Suppose the overall dataset is divided into k batches. For each batch \(i\) (where \(1 \leq i \leq k\)), let:

\(m_i\) be the partial mean computed over \(n_i\) data points.
\(M_i\) be the global mean computed after processing \(i\) batches.
\(N_i = n_1 + n_2 + ... + n_i\) be the cumulative count after \(i\) batches.

The arithmetic mean of all data points is defined as:

\[ M_\text{total} = \frac{n_1 m_1 + n_2 m_2 + \ldots + n_k m_k}{n_1 + n_2 + \ldots + n_k} \]

Rather than computing \(M_{total}\) from scratch after processing all data, the class uses an iterative update rule. When merging a new partial result (m_partial, n_partial) with the current global mean M_old (with count n_old), the updated mean is given by:

\[ M_\text{new} = M_\text{old} + \left( m_\text{partial} - M_\text{old} \right) \cdot \frac{n_\text{partial}}{n_\text{old} + n_\text{partial}} \]

This update is mathematically equivalent to the weighted average:

\[ M_\text{new} = \frac{n_\text{old} M_\text{old} + n_\text{partial} m_\text{partial}}{n_\text{old} + n_\text{partial}} \]

but is rearranged to enhance numerical stability. By focusing on the difference (m_partial - M_old) and scaling it by the relative weight n_partial / (n_old + n_partial), the algorithm minimizes the round-off errors that can occur when summing large numbers or when processing many batches sequentially.

The handler starts with no accumulated data. The global mean (global_mean) is initially set to None, and it will be defined by the first partial result received. The total number of observations (total_count) is initialized to zero.

`global_mean = None` `instance-attribute` ¶

The current global mean of all processed observations.

`total_count = 0` `instance-attribute` ¶

The total number of observations processed.

`add_result(result, *args, **kwargs)` ¶

Processes one or more batches of partial results to update the global mean.

PARAMETER	DESCRIPTION
`batch_results`	Results after running Task.
`batch_index`	An optional index identifier for the batch (for interface consistency, not used in calculations).

`get()` ¶

Retrieves the final computed global mean along with the total number of observations.

RETURNS	DESCRIPTION
`MeanResults`	A dictionary with the following keys: `"mean"`: A NumPy array representing the computed global mean. `"n"`: An integer representing the total number of observations processed.

RAISES	DESCRIPTION
`ValueError`	If no data has been processed (i.e., `global_mean` is None or `total_count` is zero).

Mean

results.mean.MeanResults(mean, count) dataclass ¶

count instance-attribute ¶

mean instance-attribute ¶

results.mean.OnlineMeanResults() ¶

global_mean = None instance-attribute ¶

total_count = 0 instance-attribute ¶

add_result(result, *args, **kwargs) ¶

get() ¶