Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

  • September 25, 2018
Table of Contents

Introducing Petastorm: Uber ATG’s Data Access Library for Deep Learning

In recent years, deep learning has taken a central role in solving a wide range of problems in pattern recognition. At Uber Advanced Technologies Group (ATG), we use deep learning to solve various problems in the autonomous driving space, since many of these are pattern recognition problems. Many of our models require tens of terabytes of training data acquired from numerous sensors, including cameras, lidars, and radars.

Researchers and engineers at Uber ATG are actively pushing the state of the art in autonomous driving across multiple problem domains, such as perception, prediction and planning. To support these efforts, our team is working on developing dataset storage solutions that will make data more easily available to researchers, allowing them to focus on model experimentation. In this article, we describe Petastorm, an open source data access library developed at Uber ATG.

This library enables single machine or distributed training and evaluation of deep learning models directly from multi-terabyte datasets in Apache Parquet format. Petastorm supports popular Python-based machine learning (ML) frameworks such as Tensorflow, Pytorch, and PySpark. It can also be used from pure Python code.

Training state-of-the art models takes time even on modern hardware, and in many cases, distributing the training load on multiple machines is essential. A typical deep learning cluster performs the following steps:

Source: uber.com

Share :
comments powered by Disqus

Related Posts

Announcing PyTorch 1.0 for both research and production

Announcing PyTorch 1.0 for both research and production

PyTorch 1.0 takes the modular, production-oriented capabilities from Caffe2 and ONNX and combines them with PyTorch’s existing flexible, research-focused design to provide a fast, seamless path from research prototyping to production deployment for a broad range of AI projects. With PyTorch 1.0, AI developers can both experiment rapidly and optimize performance through a hybrid front end that seamlessly transitions between imperative and declarative execution modes. The technology in PyTorch 1.0 has already powered many Facebook products and services at scale, including performing 6 billion text translations per day.

Read More
Swift for TensorFlow

Swift for TensorFlow

Swift for TensorFlow is a result of first-principles thinking applied to machine learning frameworks, and works quite differently than existing TensorFlow language bindings. Whereas prior solutions are designed within the constraints of what can be achieved by a (typically Python or Lua) library, Swift for TensorFlow is based on the belief that machine learning is important enough to deserve first-class language and compiler support.

Read More