Releasing Pythia for vision and language multimodal AI models

Releasing Pythia for vision and language multimodal AI models

  • May 25, 2019
Table of Contents

Releasing Pythia for vision and language multimodal AI models

Pythia is a deep learning framework that supports multitasking in the vision and language domain. Built on our open-source PyTorch framework, the modular, plug-and-play design enables researchers to quickly build, reproduce, and benchmark AI models. Pythia is designed for vision and language tasks, such as answering questions related to visual data and automatically generating image captions.

Pythia incorporates elements of our winning entries in recent AI competitions (the VQA Challenge 2018 and Vizwiz Challenge 2018). Features include reference implementations to show how previous state-of-the-art models achieved related benchmark results and to quickly gauge the performance of new models. In addition to multitasking, Pythia also supports distributed training and a variety of datasets, as well as custom losses, metrics, scheduling, and optimizers.

Pythia smooths the process of entering the growing subfield of vision and language and frees researchers to focus on faster prototyping and experimentation. Our goal is to accelerate progress by increasing the reproducibility of these models and results. This will make it easier for the community to build on, and benchmark against, successful systems.

We hope that removing some of the obstacles will allow researchers to more quickly develop new ways for people and intelligent machines to communicate. This work should also help researchers develop adaptive AI that synthesizes multiple kinds of understanding into a more context-based, multimodal understanding. In addition to this open source release, we plan to continue adding tools, tasks, data sets, and reference models.

Source: fb.com

Tags :
Share :
comments powered by Disqus

Related Posts

Introducing Ludwig, a Code-Free Deep Learning Toolbox

Introducing Ludwig, a Code-Free Deep Learning Toolbox

Over the last decade, deep learning models have proven highly effective at performing a wide variety of machine learning tasks in vision, speech, and language. At Uber we are using these models for a variety of tasks, including customer support, object detection, improving maps, streamlining chat communications, forecasting, and preventing fraud. Many open source libraries, including TensorFlow, PyTorch, CNTK, MXNET, and Chainer, among others, have implemented the building blocks needed to build such models, allowing for faster and less error-prone development.

Read More
Hash Your Way To a Better Neural Network

Hash Your Way To a Better Neural Network

The computer industry has been busy in recent years trying to figure out how to speed up the calculations needed for artificial neural networks—either for their training or for what’s known as inference, when the network is performing its function. In particular, much effort has gone into designing special-purpose hardware to run such computations. Google, for example, developed its Tensor Processing Unit, or TPU, first described publicly in 2016.

Read More
Creating Bitcoin trading bots that don’t lose money

Creating Bitcoin trading bots that don’t lose money

In this article we are going to create deep reinforcement learning agents that learn to make money trading Bitcoin. In this tutorial we will be using OpenAI’s gym and the PPO agent from the stable-baselines library, a fork of OpenAI’s baselines library. If you are not already familiar with how to create a gym environment from scratch, or how to render simple visualizations of those environments, I have just written articles on both of those topics.

Read More