Mean
results.mean.MeanResults(mean, count)
dataclass
¶
results.mean.OnlineMeanResults()
¶
Bases: ResultsHandler[MeanResults]
OnlineMeanResults
provides a numerically stable, online (incremental)
algorithm to compute the arithmetic mean of large, streaming, or distributed
datasets represented as NumPy arrays. In many real-world applications—such a
distributed computing or real-time data processing—data is processed
in batches, with each batch yielding a partial mean and its corresponding count.
This class merges these partial results into a global mean without needing to
store all the raw data, thus avoiding issues such as numerical overflow and
precision loss.
Suppose the overall dataset is divided into k batches. For each batch \(i\) (where \(1 \leq i \leq k\)), let:
- \(m_i\) be the partial mean computed over \(n_i\) data points.
- \(M_i\) be the global mean computed after processing \(i\) batches.
- \(N_i = n_1 + n_2 + ... + n_i\) be the cumulative count after \(i\) batches.
The arithmetic mean of all data points is defined as:
Rather than computing \(M_{total}\) from scratch after processing all data,
the class uses an iterative update rule. When merging a new partial result
(m_partial, n_partial)
with the current global mean M_old
(with count
n_old
), the updated mean is given by:
This update is mathematically equivalent to the weighted average:
but is rearranged to enhance numerical stability. By focusing on the
difference (m_partial - M_old)
and scaling it by the relative weight
n_partial / (n_old + n_partial)
, the algorithm minimizes the round-off
errors that can occur when summing large numbers or when processing many
batches sequentially.
The handler starts with no accumulated data. The global mean (global_mean
) is initially
set to None, and it will be defined by the first partial result received. The total number
of observations (total_count
) is initialized to zero.
global_mean = None
instance-attribute
¶
The current global mean of all processed observations.
total_count = 0
instance-attribute
¶
The total number of observations processed.
add_result(result, *args, **kwargs)
¶
Processes one or more batches of partial results to update the global mean.
PARAMETER | DESCRIPTION |
---|---|
batch_results
|
Results after running Task.
|
batch_index
|
An optional index identifier for the batch (for interface consistency, not used in calculations).
|
get()
¶
Retrieves the final computed global mean along with the total number of observations.
RETURNS | DESCRIPTION |
---|---|
MeanResults
|
A dictionary with the following keys:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If no data has been processed (i.e., |