The 50 Best Free Datasets for Machine Learning

The 50 Best Free Datasets for Machine Learning

  • June 15, 2018
Table of Contents

The 50 Best Free Datasets for Machine Learning

What are some open datasets for machine learning? We at Gengo decided to create the ultimate cheat sheet for high quality datasets. These range from the vast (looking at you, Kaggle) or the highly specific (data for self-driving cars).

First, a couple of pointers to keep in mind when searching for datasets. According to Dataquest: A dataset shouldn’t be messy, because you don’t want to spend a lot of time cleaning data. A dataset shouldn’t have too many rows or columns, so it’s easy to work with.

The cleaner the data, the better — cleaning a large data set can be very time consuming. There should be an interesting question that can be answered with the data.

Source: gengo.ai

Share :
comments powered by Disqus

Related Posts

AI winter is well on its way

AI winter is well on its way

Deep learning has been at the forefront of the so called AI revolution for quite a few years now, and many people had believed that it is the silver bullet that will take us to the world of wonders of technological singularity (general AI). Many bets were made in 2014, 2015 and 2016 when still new boundaries were pushed, such as the Alpha Go etc. Companies such as Tesla were announcing through the mouths of their CEO’s that fully self driving car was very close, to the point that Tesla even started selling that option to customers [to be enabled by future software update].

Read More
Why do neural networks generalize so poorly?

Why do neural networks generalize so poorly?

Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, the deeper the network the more we see these failures to generalize.

Read More
Attacks against machine learning – an overview

Attacks against machine learning – an overview

At a high level, attacks against classifiers can be broken down into three types: Adversarial inputs, which are specially crafted inputs that have been developed with the aim of being reliably misclassified in order to evade detection. Adversarial inputs include malicious documents designed to evade antivirus, and emails attempting to evade spam filters. Data poisoning attacks, which involve feeding training adversarial data to the classifier.

Read More