In experiments, metrics like purchases or clicks are often summed post-exposure. However, if these metrics are not time-limited, the variance increases over time, leading to unreliable results. For instance, if we measure total minutes watched over an unlimited period, the variance grows, making it hard to detect significant effects. A t-test for such metrics might never reach full power, even with an infinite sample size. This issue arises because the mean of the t-statistic does not diverge to infinity unless the metric’s mean does, which is often not the case.
To address this, experiments should cap the time post-exposure for metric computation. For example, instead of measuring total minutes watched indefinitely, limit it to two days post-exposure. This approach ensures the variance does not increase with time, allowing for more accurate and powerful statistical tests. In simulations, time-limited metrics showed a power increase to nearly 100% as sample size grew, while unlimited metrics did not. Time-limiting metrics not only stabilizes variance but also simplifies comparisons across experiments with different durations.
Source: towardsdatascience.com
