RDumb: A simple approach that questions our progress in continual test-time adaptation
tl;dr: Stable test-time adaptation over long time scales and continual distribution shift.
News
September '23 | Our paper was accepted at NeurIPS 2023. See you in New Orleans! |
July '22 | Our dataset for Continuosly Changing Corruptions (CCC) was accepted into the Shift Happens workshop at ICML'22. We are currently working on the integration, and you can find the current code base in the Shift Happens repository. |
June '22 | A preliminary version of our paper was accepted at the Principles of Distribution Shift (PODS) workshop at ICML 2022! |
Abstract
Test-Time Adaptation (TTA) allows to update pretrained models to changing data distributions at deployment time. While early work tested these algorithms for individual fixed distribution shifts, recent work proposed and applied methods for continual adaptation over long timescales. To examine the reported progress in the field, we propose the Continuously Changing Corruptions (CCC) benchmark to measure asymptotic performance of TTA techniques. We find that eventually all but one state-of-the-art methods collapse and perform worse than a non-adapting model, including models specifically proposed to be robust to performance collapse. In addition, we introduce a simple baseline, "RDumb", that periodically resets the model to its pretrained state. RDumb performs better or on par with the previously proposed state-of-the-art in all considered benchmarks. Our results show that previous TTA approaches are neither effective at regularizing adaptation to avoid collapse nor able to outperform a simplistic resetting strategy.