The 50 Best Free Datasets for Machine Learning

June 15, 2018

Table of Contents

What are some open datasets for machine learning? We at Gengo decided to create the ultimate cheat sheet for high quality datasets. These range from the vast (looking at you, Kaggle) or the highly specific (data for self-driving cars).

First, a couple of pointers to keep in mind when searching for datasets. According to Dataquest: A dataset shouldn’t be messy, because you don’t want to spend a lot of time cleaning data. A dataset shouldn’t have too many rows or columns, so it’s easy to work with.

The cleaner the data, the better — cleaning a large data set can be very time consuming. There should be an interesting question that can be answered with the data.

Source: gengo.ai

Improving Language Understanding with Unsupervised Learning

We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.

Training a neural network in phase-change memory beats GPUs

Compared to a typical CPU, a brain is remarkably energy-efficient, in part because it combines memory, communications, and processing in a single execution unit, the neuron. A brain also has lots of them, which lets it handle lots of tasks in parallel. Attempts to run neural networks on traditional CPUs run up against these fundamental mismatches.

Horovod: Distributed Training Framework for TensorFlow, Keras, and PyTorch

Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

The 50 Best Free Datasets for Machine Learning

Tags :

Share :

Related Posts

Improving Language Understanding with Unsupervised Learning

Training a neural network in phase-change memory beats GPUs

Horovod: Distributed Training Framework for TensorFlow, Keras, and PyTorch