MNIST

This repo contains a mirror of the files from http:https://yann.lecun.com/exdb/mnist/ . See that website for more details of MNIST.

The reason for this mirror is, that sometimes the download of the files get stuck for several minues. But this only happens with wget and only on some computers. With curl it was always possible to download the files.

To download the files inside of a script, you can use the following links:

https://raw.githubusercontent.com/fgnt/mnist/master/train-images-idx3-ubyte.gz
https://raw.githubusercontent.com/fgnt/mnist/master/train-labels-idx1-ubyte.gz
https://raw.githubusercontent.com/fgnt/mnist/master/t10k-images-idx3-ubyte.gz
https://raw.githubusercontent.com/fgnt/mnist/master/t10k-labels-idx1-ubyte.gz

and in the following you can find download code for python (original came from https://cntk.ai/pythondocs/CNTK_103A_MNIST_DataLoader.html):

def get_mnist():
    # The code to download the mnist data original came from
    # https://cntk.ai/pythondocs/CNTK_103A_MNIST_DataLoader.html
    
    import gzip
    import numpy as np
    import os
    import struct

    from urllib.request import urlretrieve 

    def load_data(src, num_samples):
        print("Downloading " + src)
        gzfname, h = urlretrieve(src, "./delete.me")
        print("Done.")
        try:
            with gzip.open(gzfname) as gz:
                n = struct.unpack("I", gz.read(4))
                # Read magic number.
                if n[0] != 0x3080000:
                    raise Exception("Invalid file: unexpected magic number.")
                # Read number of entries.
                n = struct.unpack(">I", gz.read(4))[0]
                if n != num_samples:
                    raise Exception(
                        "Invalid file: expected {0} entries.".format(num_samples)
                    )
                crow = struct.unpack(">I", gz.read(4))[0]
                ccol = struct.unpack(">I", gz.read(4))[0]
                if crow != 28 or ccol != 28:
                    raise Exception(
                        "Invalid file: expected 28 rows/cols per image."
                    )
                # Read data.
                res = np.frombuffer(
                    gz.read(num_samples * crow * ccol), dtype=np.uint8
                )
        finally:
            os.remove(gzfname)
        return res.reshape((num_samples, crow, ccol)) / 256


    def load_labels(src, num_samples):
        print("Downloading " + src)
        gzfname, h = urlretrieve(src, "./delete.me")
        print("Done.")
        try:
            with gzip.open(gzfname) as gz:
                n = struct.unpack("I", gz.read(4))
                # Read magic number.
                if n[0] != 0x1080000:
                    raise Exception("Invalid file: unexpected magic number.")
                # Read number of entries.
                n = struct.unpack(">I", gz.read(4))
                if n[0] != num_samples:
                    raise Exception(
                        "Invalid file: expected {0} rows.".format(num_samples)
                    )
                # Read labels.
                res = np.frombuffer(gz.read(num_samples), dtype=np.uint8)
        finally:
            os.remove(gzfname)
        return res.reshape((num_samples))


    def try_download(data_source, label_source, num_samples):
        data = load_data(data_source, num_samples)
        labels = load_labels(label_source, num_samples)
        return data, labels
    
    # Not sure why, but yann lecun's website does no longer support 
    # simple downloader. (e.g. urlretrieve and wget fail, while curl work)
    # Since not everyone has linux, use a mirror from uni server.
    #     server = 'http:https://yann.lecun.com/exdb/mnist'
    server = 'https://raw.githubusercontent.com/fgnt/mnist/master'
    
    # URLs for the train image and label data
    url_train_image = f'{server}/train-images-idx3-ubyte.gz'
    url_train_labels = f'{server}/train-labels-idx1-ubyte.gz'
    num_train_samples = 60000

    print("Downloading train data")
    train_features, train_labels = try_download(url_train_image, url_train_labels, num_train_samples)

    # URLs for the test image and label data
    url_test_image = f'{server}/t10k-images-idx3-ubyte.gz'
    url_test_labels = f'{server}/t10k-labels-idx1-ubyte.gz'
    num_test_samples = 10000

    print("Downloading test data")
    test_features, test_labels = try_download(url_test_image, url_test_labels, num_test_samples)
    
    return train_features, train_labels, test_features, test_labels

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Makefile		Makefile
README.md		README.md
t10k-images-idx3-ubyte.gz		t10k-images-idx3-ubyte.gz
t10k-labels-idx1-ubyte.gz		t10k-labels-idx1-ubyte.gz
train-images-idx3-ubyte.gz		train-images-idx3-ubyte.gz
train-labels-idx1-ubyte.gz		train-labels-idx1-ubyte.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MNIST

About

Releases

Packages

Languages

fgnt/mnist

Folders and files

Latest commit

History

Repository files navigation

MNIST

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages