A Deep Dive into Iconic Datasets for Deep Learning

Siddhartha Purwar
3 min readAug 6, 2023

--

Exploring MNIST, CIFAR-10, CIFAR-100, ImageNet and Fashion MNIST

Fashion MNIST

Table of Contents:

  1. MNIST
  2. CIFAR-10
  3. CIFAR-100
  4. ImageNet
  5. Fashion MNIST

MNIST

  • Full Form: Modified National Institute of Standards and Technology
  • Year of Creation: 1998
  • Number of Images: 70,000 (60,000 for training and 10,000 for testing)
  • Size of Image: 28x28 pixels
  • Color: Grayscale
  • Type of Data: Handwritten digits (0–9)
  • Size of Dataset: Approximately 11 MB

Loading MNIST in PyTorch

import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) # Normalize images
])

# Download and load the MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

# Create data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

CIFAR-10

  • Full Form: Canadian Institute for Advanced Research — 10 classes
  • Year of Creation: 2009
  • Number of Images: 60,000 (50,000 for training and 10,000 for testing)
  • Size of Image: 32x32 pixels
  • Color: Colorful (RGB)
  • Type of Data: Various objects in 10 classes (e.g., airplane, car, bird, etc.)
  • Size of Dataset: Approximately 170 MB

Loading CIFAR-10 in PyTorch

import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) # Normalize images
])

# Download and load the CIFAR-100 dataset
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

CIFAR-100

  • Full Form: Canadian Institute for Advanced Research — 100 classes
  • Year of Creation: 2009
  • Number of Images: 60,000 (50,000 for training and 10,000 for testing)
  • Size of Image: 32x32 pixels
  • Color: Colorful (RGB)
  • Type of Data: Various objects in 100 fine-grained classes
  • Size of Dataset: Approximately 170 MB

Loading CIFAR-100 in PyTorch

import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) # Normalize images
])

# Download and load the CIFAR-100 dataset
train_dataset = torchvision.datasets.CIFAR100(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.CIFAR100(root='./data', train=False, transform=transform, download=True)

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

ImageNet

  • Year of Creation: 2009 (original dataset), subsequent versions and challenges later
  • Number of Images: Over 1 million (varies with different versions)
  • Size of Image: Varies, typically larger images
  • Color: Colorful (RGB)
  • Type of Data: Wide variety of objects in thousands of classes
  • Size of Dataset: Several GBs (can be over 150 GB for full dataset)

Loading ImageNet in PyTorch

import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) # Normalize images
])

# Download and load the ImageNet dataset
train_dataset = torchvision.datasets.ImageNet(root='./data', split='train', transform=transform, download=True)
test_dataset = torchvision.datasets.ImageNet(root='./data', split='val', transform=transform, download=True)

# Create data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)

You will face this error while dowloading it

RuntimeError: The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in ./data.

To resolve this issue, you’ll need to download the development kit archive manually and place it in the designated directory (./data) before running your code.

Fashion MNIST

  • Full Form: Fashion Modified National Institute of Standards and Technology
  • Year of Creation: 2017
  • Number of Images: 70,000 (60,000 for training and 10,000 for testing)
  • Size of Image: 28x28 pixels
  • Color: Grayscale
  • Type of Data: Fashion items (e.g., shirts, trousers, sneakers, etc.)
  • Size of Dataset: Approximately 29 MB

Loading ImageNet in PyTorch

import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) # Normalize images
])

# Download and load the CIFAR-100 dataset
train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)

# Create data loaders
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

Thank You

Thank you for taking the time to read this article.

I value your feedback! If you have any comments or questions, please feel free to share them with me on comment or email me directly.

--

--

Siddhartha Purwar
Siddhartha Purwar

Written by Siddhartha Purwar

"Data data data, I can't make bricks without clay"

No responses yet