Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-exploration Problems

Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-exploration Problems

  • November 27, 2018
Table of Contents

Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-exploration Problems

In deep reinforcement learning (RL), solving the Atari games Montezuma’s Revenge and Pitfall has been a grand challenge. These games represent a broad class of challenging, real-world problems called “hard-exploration problems,” where an agent has to learn complex tasks with very infrequent or deceptive feedback. The state-of-the-art algorithm on Montezuma’s Revenge gets an average score of 11,347, a max score of 17,500, and solved the first level at one point in one of ten tries.

Surprisingly, despite considerable research effort, so far no algorithm has obtained a score greater than 0 on Pitfall. Today we introduce Go-Explore, a new family of algorithms capable of achieving scores over 2,000,000 on Montezuma’s Revenge and scoring over 400,000 on average! Go-Explore reliably solves the entire game, meaning all three unique levels, and then generalizes to the nearly-identical subsequent levels (which only differ in the timing of events and the score on the screen).

We have even seen it reach level 159!

Source: uber.com

Tags :
Share :
comments powered by Disqus

Related Posts

Real Time Facial Expression Recognition

Real Time Facial Expression Recognition

Computer animated agents and robots bring new dimension in human computer interaction which makes it vital as how computers can affect our social life in day-to-day activities. Face to face communication is a real-time process operating at a time scale in the order of milliseconds. The level of uncertainty at this time scale is considerable, making it necessary for humans and machines to rely on sensory rich perceptual primitives rather than slow symbolic inference processes.

Read More
Five Lessons From the First Three Years of Michelangelo

Five Lessons From the First Three Years of Michelangelo

Uber has been one of the most active contributors to open source machine learning technologies in the last few years. While companies like Google or Facebook have focused their contributions in new deep learning stacks like TensorFlow, Caffe2 or PyTorch, the Uber engineering team has really focused on tools and best practices for building machine learning at scale in the real world. Technologies such as Michelangelo, Horovod, PyML, Pyro are some of examples of Uber’s contributions to the machine learning ecosystem.

Read More
Accurate Online Speaker Diarization with Supervised Learning

Accurate Online Speaker Diarization with Supervised Learning

Speaker diarization, the process of partitioning an audio stream with multiple people into homogeneous segments associated with each individual, is an important part of speech recognition systems. By solving the problem of “who spoke when”, speaker diarization has applications in many important scenarios, such as understanding medical conversations, video captioning and more. However, training these systems with supervised learning methods is challenging — unlike standard supervised classification tasks, a robust diarization model requires the ability to associate new individuals with distinct speech segments that weren’t involved in training.

Read More