Bayesian drift detection for feature pipelines
A compact note on using posterior predictive checks to catch quiet distribution shifts before model quality moves.
Feature drift is rarely a single dramatic event. In production it usually arrives as a small shift in event mix, a change in freshness, or a schema field that is technically valid and semantically wrong.
For a binary feature, a beta-binomial model gives a lightweight first pass:
Alert when the posterior predictive probability of the latest window falls below a threshold and the data freshness service agrees the window is complete.
The useful piece is not the equation itself. It is the contract around it: define the reference population, freeze the time window, and make the model explain which feature moved.
from scipy.stats import betabinom
def posterior_tail_probability(successes, trials, alpha=2, beta=8):
observed_rate = successes / trials
expected = betabinom.cdf(successes, trials, alpha, beta)
return min(expected, 1 - expected), observed_rateThe final dashboard should show the posterior, the raw feature count, and the lineage of the upstream job. Statistical confidence without data lineage tends to create very polished confusion.