Very Large datasets GB to TB – Where to get them for free?

Every once in a while I need a very large dataset to work with, most often this required creating dummy data using one of my favorite data generators called Mockaroo. But sometimes a few GBs is not enough, at these times I turn to a little know feature of AWS (Amazon Web Services). AWS has provided a repository of public datasets that run in the 100s of GB ranging on everything from Climate data , NASA Satellite data , Census data , Economic Indicators, Transportation data etc.

You can find these datasets at the below link

http://aws.amazon.com/datasets/

Keep in mind the data is so huge the only way it can be provided is as an AWS Snapshot from which you need to create a volume. The size of the dataset range from 15 GB to 541 TB so there is something for everybody. Some of my favorites include Wikipedia Stats, Transportation data and Daily Global weather data.

Watch out for the processing costs associated with each instance by the way. This would be a great starting place for those looking to test Polybase with SQL 2016 as well. I have included the links on how to Launch an AWS instance and attach a snapshot volume to the instance below (for those who need an intro to AWS)

Attaching a Volume