Releasing Pythia for vision and language multimodal AI models

Releasing Pythia for vision and language multimodal AI models

  • May 25, 2019
Table of Contents

Releasing Pythia for vision and language multimodal AI models

Pythia is a deep learning framework that supports multitasking in the vision and language domain. Built on our open-source PyTorch framework, the modular, plug-and-play design enables researchers to quickly build, reproduce, and benchmark AI models. Pythia is designed for vision and language tasks, such as answering questions related to visual data and automatically generating image captions.

Pythia incorporates elements of our winning entries in recent AI competitions (the VQA Challenge 2018 and Vizwiz Challenge 2018). Features include reference implementations to show how previous state-of-the-art models achieved related benchmark results and to quickly gauge the performance of new models. In addition to multitasking, Pythia also supports distributed training and a variety of datasets, as well as custom losses, metrics, scheduling, and optimizers.

Pythia smooths the process of entering the growing subfield of vision and language and frees researchers to focus on faster prototyping and experimentation. Our goal is to accelerate progress by increasing the reproducibility of these models and results. This will make it easier for the community to build on, and benchmark against, successful systems.

We hope that removing some of the obstacles will allow researchers to more quickly develop new ways for people and intelligent machines to communicate. This work should also help researchers develop adaptive AI that synthesizes multiple kinds of understanding into a more context-based, multimodal understanding. In addition to this open source release, we plan to continue adding tools, tasks, data sets, and reference models.

Source: fb.com

Tags :
Share :
comments powered by Disqus

Related Posts

Hash Your Way To a Better Neural Network

Hash Your Way To a Better Neural Network

The computer industry has been busy in recent years trying to figure out how to speed up the calculations needed for artificial neural networks—either for their training or for what’s known as inference, when the network is performing its function. In particular, much effort has gone into designing special-purpose hardware to run such computations. Google, for example, developed its Tensor Processing Unit, or TPU, first described publicly in 2016.

Read More
Detecting malaria with deep learning

Detecting malaria with deep learning

Artificial intelligence (AI) and open source tools, technologies, and frameworks are a powerful combination for improving society. ‘Health is wealth’ is perhaps a cliche, yet it’s very accurate! In this article, we will examine how AI can be leveraged for detecting the deadly disease malaria with a low-cost, effective, and accurate open source deep learning solution.

Read More
DeepMind and Google: the battle to control artificial intelligence

DeepMind and Google: the battle to control artificial intelligence

One afternoon in August 2010, in a conference hall perched on the edge of San Francisco Bay, a 34-year-old Londoner called Demis Hassabis took to the stage. Walking to the podium with the deliberate gait of a man trying to control his nerves, he pursed his lips into a brief smile and began to speak: “So today I’m going to be talking about different approaches to building…” He stalled, as though just realising that he was stating his momentous ambition out loud.

Read More