True Impact of Cascade Length in Contextual Cascading Bandits

Abstract

We revisit the contextual cascading bandit, where a learning agent recommends an ordered list (\textit{cascade}) of items and a user scans the list sequentially, stopping at the first attractive item. Although cascading bandits underpin various search and recommendation engines, the role of the cascade length K in shaping regret has remained ambiguous. Contrary to prior results that regret grows with K, we prove that in the contextual setting it actually \textit{decreases} once K is large enough. Leveraging this insight, we design a new upper-confidence-bound algorithm built on online mirror descent that attains the sharpest known regret upper bound, O~(Kp¯K1dT) for contextual cascading bandits. To complement this new regret upper bound, we match it with a lower bound of Ω(KpK1dT), where 0pp¯<1. Together, these results fully characterize how regret truly scales with K and close the theoretical gap for contextual cascading bandits. Finally, extensive experiments validate our theoretical results and show the effectiveness of our proposed method.

Publication
Neural Information Processing Systems (NeurIPS), 2025
Joongkyu Lee
Joongkyu Lee
Ph.D. candidate in Data Science