1

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

We study online preference-based reinforcement learning (PbRL) with the goal of improving sample efficiency. While a growing body of …

Joongkyu Lee, Seouh-won Yi, Min-hwan Oh

True Impact of Cascade Length in Contextual Cascading Bandits

We revisit the contextual cascading bandit, where a learning agent recommends an ordered list ($\text{\textit{cascade}}$) of items and …

Hyunjun Choi, Joongkyu Lee, Min-hwan Oh

Combinatorial Reinforcement Learning with Preference Feedback

In this paper, we consider combinatorial reinforcement learning with preference feedback,where a learning agent sequentially offers an …

Joongkyu Lee, Min-hwan Oh

Improved Online Confidence Bounds for Multinomial Logistic Bandits

In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL …

Joongkyu Lee, Min-hwan Oh

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit (Top 0.2%, 32/15671)

In this paper, we study the contextual multinomial logit (MNL) bandit problem in which a learning agent sequentially selects an …

Joongkyu Lee, Min-hwan Oh

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

We study reinforcement learning with multinomial logistic (MNL) function approximation where the underlying transition probability …

Wooseong Cho, Taehyun Hwang, Joongkyu Lee, Min-hwan Oh

Demystifying Linear MDPs and Novel Dynamics Aggregation Framework

In this paper, we first challenge the common premise that linear MDPs always induce performance guarantees independent of the state …

Joongkyu Lee, Min-hwan Oh

Learning Uncertainty-Aware Temporally-Extended Actions

In reinforcement learning, temporal abstraction in the action space, exemplified by action repetition, is a technique to facilitate …

Joongkyu Lee, Seung Joon Park, Yunhao Tang, Min-hwan Oh