Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

  • September 25, 2018
Table of Contents

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

In recent years, deep learning has taken a central role in solving a wide range of problems in pattern recognition. At Uber Advanced Technologies Group (ATG), we use deep learning to solve various problems in the autonomous driving space, since many of these are pattern recognition problems. Many of our models require tens of terabytes of training data acquired from numerous sensors, including cameras, lidars, and radars.

Researchers and engineers at Uber ATG are actively pushing the state of the art in autonomous driving across multiple problem domains, such as perception, prediction and planning. To support these efforts, our team is working on developing dataset storage solutions that will make data more easily available to researchers, allowing them to focus on model experimentation. In this article, we describe Petastorm, an open source data access library developed at Uber ATG.

This library enables single machine or distributed training and evaluation of deep learning models directly from multi-terabyte datasets in Apache Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, Pytorch, and PySpark. It can also be used from pure Python code.

Training state-of-the art models takes time even on modern hardware, and in many cases, distributing the training load on multiple machines is essential. A typical deep learning cluster performs the following steps:

Source: uber.com

Share :
comments powered by Disqus

Related Posts

Kaggle Tensorflow Speech Recognition Challenge

Kaggle Tensorflow Speech Recognition Challenge

From November 2017 to January 2018 the Google Brain team hosted a speech recognition challenge on Kaggle. The goal of this challenge was to write a program that can correctly identify one of 10 words being spoken in a one-second long audio file. Having just made up my mind to start seriously studying data science with the goal of turning a new corner in my career, I decided to tackle this as my first serious kaggle challenge.

Read More
Google’s TensorFlow AI framework adds Swift and JavaScript support

Google’s TensorFlow AI framework adds Swift and JavaScript support

Google todayunveiled a slew of updates to its popular TensorFlow machine learning framework aimed at making it useful for a wider variety of developers and providing data scientists with new ways to get started building AI models.

Read More
Carnegie Mellon Researchers Develop New Deepfake Method

Carnegie Mellon Researchers Develop New Deepfake Method

Deepfakes, ultrarealistic fake videos manipulated using machine learning, are getting pretty convincing. And researchers continue to develop new methods to create these types of videos, for better or, more likely, for worse. The most recent method comes from researchers at Carnegie Mellon University, who have figured out a way to automatically transfer the “style” of one person to another.

Read More