Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-exploration Problems

November 27, 2018

Table of Contents

In deep reinforcement learning (RL), solving the Atari games Montezuma’s Revenge and Pitfall has been a grand challenge. These games represent a broad class of challenging, real-world problems called “hard-exploration problems,” where an agent has to learn complex tasks with very infrequent or deceptive feedback. The state-of-the-art algorithm on Montezuma’s Revenge gets an average score of 11,347, a max score of 17,500, and solved the first level at one point in one of ten tries.

Surprisingly, despite considerable research effort, so far no algorithm has obtained a score greater than 0 on Pitfall. Today we introduce Go-Explore, a new family of algorithms capable of achieving scores over 2,000,000 on Montezuma’s Revenge and scoring over 400,000 on average! Go-Explore reliably solves the entire game, meaning all three unique levels, and then generalizes to the nearly-identical subsequent levels (which only differ in the timing of events and the score on the screen).

We have even seen it reach level 159!

Source: uber.com

Tags :

comments powered by Disqus

The dark side of YouTube

The YouTube algorithm that I helped build in 2011 still recommends the flat earth theory by the hundreds of millions. This investigation by @RawStory shows some of the real-life consequences of this badly designed AI.

Radiology and Deep Learning

Radiology and DeepLearningDetecting pneumonia opacities from chest X-Ray images using deep learning. One day back in August, I was catching up with my best friend from high school who is now a radiology resident. One thing led to another, and we started talking about our interests in artificial intelligence and machine learning and its possible applications in radiology.

Accurate Online Speaker Diarization with Supervised Learning

Speaker diarization, the process of partitioning an audio stream with multiple people into homogeneous segments associated with each individual, is an important part of speech recognition systems. By solving the problem of “who spoke when”, speaker diarization has applications in many important scenarios, such as understanding medical conversations, video captioning and more. However, training these systems with supervised learning methods is challenging — unlike standard supervised classification tasks, a robust diarization model requires the ability to associate new individuals with distinct speech segments that weren’t involved in training.

Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-exploration Problems

Tags :

Share :

Related Posts

The dark side of YouTube

Radiology and Deep Learning

Accurate Online Speaker Diarization with Supervised Learning