Curiosity-Driven Exploration via Temporal Contrastive Learning

1Mila - Quebec AI Institute 2Université de Montréal 3 Princeton University

Abstract

Exploration remains a key challenge in reinforcement learning (RL), especially in long-horizon tasks and environments with high-dimensional observations. A common strategy for effective exploration is to promote state coverage or novelty, w hich often involves estimating the agent's state visitation distribution. In this paper, we propose Curiosity-Driven Exploration via Temporal Contrastive Learning Learning (C-TeC), an exploration method based on temporal contrastive learning that rewards agents for reaching states with unexpected futures. This incentivizes uncovering meaningful less-visited states. \methodName is simple and does not require explicit density or uncertainty estimation, while learning representations aligned with the RL objective. It consistently outperforms standard baselines in complex mazes using different embodiments (Ant and Humanoid) and robotic manipulation tasks, while also yielding more diverse behaviors in Craftax without requiring task-specific information.

Overview

ctec_reward_visual

A state-action pair $(s_t, a_t)$ is passed through the encoder $\phi$, and the corresponding future state is passed through the encoder $\psi$. The outputs of the encoders are used to compute the similarity score and the intrinsic reward.

Reward Visualization

ctec_reward_visual

Evolution of the C-TeC reward during training. This figure shows how the intrinsic reward changes over the course of training based on future state visitation. The black circle in the lower-left corner represents the starting state. Early in training (3M steps), higher rewards are assigned to nearby states. As training progresses, the agent explores farther, and the reward increases for more distant regions. All reward values are normalized for visualization.

Agent Videos

Humanoid-u-maze

C-TeC (Ours)

APT

RND

Ant-largest-maze

C-TeC (Ours)

APT

RND

Arm-binpick-hard

C-TeC (Ours)

APT

RND

Craftax-Classic

C-TeC

E3B

RND

State Coverage

ctec_reward_visual

Coverage of ant-largest-maze over training time

ctec_reward_visual

Craftax Achievements

ctec_reward_visual ctec_reward_visual

BibTeX