Nicodème Westphalen
Research2024

Streaming Data Analysis

Exploration of real-time data processing techniques and analysis pipelines for streaming datasets.

Overview

This research paper studies how key statistical quantities can be updated recursively when data arrive sequentially. Instead of recomputing statistics from scratch each time a new observation appears, recursive formulas allow the mean, variance, and regression estimators to be updated using only previously stored values and the newest datapoint.

The work derives numerically stable update formulas for the mean, variance, and the centered cross‑deviation terms that underlie simple linear regression. By expressing ordinary least squares entirely in terms of these sufficient statistics, regression coefficients can be updated in constant time and with constant memory, making the approach suitable for streaming data environments.

Methodology

  • Derivation of recursive formulas for the mean and variance in streaming datasets
  • Development of numerically stable variance updates using Welford’s identity
  • Extension of recursive statistics to simple linear regression through centered sums (Sxx and Sxy)
  • Algorithmic representation and pseudocode for real‑time implementation
  • Manual simulations comparing batch and recursive computation

Key Insights

  • Recursive statistics allow means and variances to be updated without storing the full dataset
  • Welford’s algorithm provides a numerically stable one‑pass variance computation
  • Linear regression can be expressed through maintained statistics (Sxx and Sxy) instead of full recomputation
  • The recursive formulation preserves the exact results of batch ordinary least squares
  • The approach enables constant‑time updates suitable for streaming and real‑time analytics

Tools & Technologies

  • Mathematical derivation of recursive statistical identities
  • Algorithm design for one‑pass streaming computation
  • Pseudocode implementation of Welford’s algorithm and recursive regression updates
  • Analytical comparison between batch and recursive computation methods

Final Paper

The full research report is available below.