The 50 Best Free Datasets for Machine Learning

The 50 Best Free Datasets for Machine Learning

  • June 15, 2018
Table of Contents

The 50 Best Free Datasets for Machine Learning

What are some open datasets for machine learning? We at Gengo decided to create the ultimate cheat sheet for high quality datasets. These range from the vast (looking at you, Kaggle) or the highly specific (data for self-driving cars).

First, a couple of pointers to keep in mind when searching for datasets. According to Dataquest: A dataset shouldn’t be messy, because you don’t want to spend a lot of time cleaning data. A dataset shouldn’t have too many rows or columns, so it’s easy to work with.

The cleaner the data, the better — cleaning a large data set can be very time consuming. There should be an interesting question that can be answered with the data.

Source: gengo.ai

Share :
comments powered by Disqus

Related Posts

Horovod: Distributed Training Framework for TensorFlow, Keras, and PyTorch

Horovod: Distributed Training Framework for TensorFlow, Keras, and PyTorch

Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

Read More
Americans Less Trusting of Self-Driving Safety Following High-Profile Accidents

Americans Less Trusting of Self-Driving Safety Following High-Profile Accidents

Americans are less trusting of self-driving cars following two deadly accidents involving autonomous or semi-autonomous vehicles, with half of U.S. adults considering those automobiles less safe than human drivers, according to a new poll. A Morning Consult survey conducted March 29-April 1 among a national sample of 2,202 adults found that 27 percent of respondents said self-driving cars are safer than human drivers, while 50 percent said autonomous vehicles are less safe. Eight percent said the automobiles are on par with human drivers when it comes to safety.

Read More
Improving Language Understanding with Unsupervised Learning

Improving Language Understanding with Unsupervised Learning

We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored in the past, and we hope our result motivates further research into applying this idea on larger and more diverse datasets.

Read More