How to effortlessly explore your IDX dataset
Explore the original MNIST files in Python using the idx2numpy library
Hello everyone! In this tutorial, we will see how to explore IDX files to extract the dataset. We will be using the MNIST dataset as the case of study. To make this process simpler, we will be using the idx2numpy library.
Table of contents
· 1. IDX files format
· 2. MNIST dataset
· 3. idx2numpy
· 4.Explore MNIST dataset using idx2numpy
· 5. Conclusion
1. IDX files format
The IDX format is a binary file format that is used to store vectors and multidimensional matrices of various numerical types. It is developed by Yann LeCun to store the MNIST dataset. As it’s a binary file, it ensures a compact representation of data, a small memory storage requirement and compatibility with various tools.
According to the author, a basic idx file has the following structure [1]:
magic number
size in dimension 0
size in dimension 1
size in dimension 2
…..
size in dimension N
data
Where the magic number is a 4 bytes integer and has the following representation: