Releasing Pythia for vision and language multimodal AI models

Releasing Pythia for vision and language multimodal AI models

  • May 25, 2019
Table of Contents

Releasing Pythia for vision and language multimodal AI models

Pythia is a deep learning framework that supports multitasking in the vision and language domain. Built on our open-source PyTorch framework, the modular, plug-and-play design enables researchers to quickly build, reproduce, and benchmark AI models. Pythia is designed for vision and language tasks, such as answering questions related to visual data and automatically generating image captions.

Pythia incorporates elements of our winning entries in recent AI competitions (the VQA Challenge 2018 and Vizwiz Challenge 2018). Features include reference implementations to show how previous state-of-the-art models achieved related benchmark results and to quickly gauge the performance of new models. In addition to multitasking, Pythia also supports distributed training and a variety of datasets, as well as custom losses, metrics, scheduling, and optimizers.

Pythia smooths the process of entering the growing subfield of vision and language and frees researchers to focus on faster prototyping and experimentation. Our goal is to accelerate progress by increasing the reproducibility of these models and results. This will make it easier for the community to build on, and benchmark against, successful systems.

We hope that removing some of the obstacles will allow researchers to more quickly develop new ways for people and intelligent machines to communicate. This work should also help researchers develop adaptive AI that synthesizes multiple kinds of understanding into a more context-based, multimodal understanding. In addition to this open source release, we plan to continue adding tools, tasks, data sets, and reference models.

Source: fb.com

Tags :
Share :
comments powered by Disqus

Related Posts

An ML showdown in search of the best tool

An ML showdown in search of the best tool

Ever burgeoning digital data combined with impressive research has lead to a rising interest in Machine Learning or ML, which has further powered a vibrant ecosystem of technologies, frameworks, and libraries in the space. Scikit-learn sees high adoption from the tech community. The most probable reason is a powerful Python interface that allows tweaking of models across multiple parameters.

Read More
12 open source tools for natural language processing

12 open source tools for natural language processing

It would be easy to argue that Natural Language Toolkit (NLTK) is the most full-featured tool of the ones I surveyed. It implements pretty much any component of NLP you would need, like classification, tokenization, stemming, tagging, parsing, and semantic reasoning. And there’s often more than one implementation for each, so you can choose theexact algorithm or methodology you’d like to use.

Read More
Introducing Ludwig, a Code-Free Deep Learning Toolbox

Introducing Ludwig, a Code-Free Deep Learning Toolbox

Over the last decade, deep learning models have proven highly effective at performing a wide variety of machine learning tasks in vision, speech, and language. At Uber we are using these models for a variety of tasks, including customer support, object detection, improving maps, streamlining chat communications, forecasting, and preventing fraud. Many open source libraries, including TensorFlow, PyTorch, CNTK, MXNET, and Chainer, among others, have implemented the building blocks needed to build such models, allowing for faster and less error-prone development.

Read More