How to effortlessly explore your IDX dataset

Explore the original MNIST files in Python using the idx2numpy library

Chayma Zatout
3 min readJan 2, 2023
Photo by Bruce Tang on Unsplash

Hello everyone! In this tutorial, we will see how to explore IDX files to extract the dataset. We will be using the MNIST dataset as the case of study. To make this process simpler, we will be using the idx2numpy library.

Table of contents

· 1. IDX files format
· 2. MNIST dataset
· 3. idx2numpy
· 4.Explore MNIST dataset using idx2numpy
· 5. Conclusion

1. IDX files format

The IDX format is a binary file format that is used to store vectors and multidimensional matrices of various numerical types. It is developed by Yann LeCun to store the MNIST dataset. As it’s a binary file, it ensures a compact representation of data, a small memory storage requirement and compatibility with various tools.

According to the author, a basic idx file has the following structure [1]:

magic number
size in dimension 0
size in dimension 1
size in dimension 2
…..
size in dimension N
data

Where the magic number is a 4 bytes integer and has the following representation:

--

--

Chayma Zatout

Passionate about writing tutorials in a simple and organized way. I write about computer vision and machine learning.