Sun, Rui; Wang, Xinshang; Zhou, Zijie - 2021
to implement and do not require solving any DLPs. Our algorithm achieves a regret bound of $O(\log k)$, where $k$ is the … system size. To the best of our knowledge, this is the first NRM algorithm that (i) has an $o(\sqrt{k})$ asymptotic regret …