Analyticbox

Database management, data engineering, data science and machine learning articles from Analyticbox’ members and contributors

Using NPY in NumPy instead of CSV

Anindio Daneswara
Analyticbox
Published in
2 min readFeb 19, 2021

--

Photo by Boitumelo Phetla on Unsplash

We will be discussing advantages using *.npy file instead of *.csv. While maybe most of us have been very comfortable working with CSV dataset, however, NPY has its own advantages, though not as famous as CSV.

One thing to consider though, when we brought in a dataset into python, we must note that;

load a dataset” is not equal to “import a dataset”

While importing a dataset does not track of the datatype of the original array, we might have to specifically define the datatype of the values after we import the data into python. However, when loading into python, there is no need to specify or change our data.

This is the main advantage of working with NPY file since

  • the dataset will retain its format
  • we don’t have to specify and reorganizing the values from the external text file

NPY files are generated from np.save() in NumPy package. np.save() will create a file with .npy extension in the same directory as our Jupyter notebook *.ipytnb files.

Other advantages which are also equally important;

  • NPY files is much faster to work with
  • NPY takes up way less memory space than corresponding CSV
  • NPY is a special type of text file native to NumPy.

I.e. : if we have a dataset containing numbers, although, technically NPY is a text file, when we load the dataset back into python, the values of the dataset will still be recognized as numbers automatically.

Let’s take a look how we write the code for np.save(). We will use an example file called original_file.csv :

# Import NumPyimport numpy as np# Load the original CSV fileload_file = np.genfromtxt("original_file.csv", delimiter = ',', dtype = np.str)print(load_file)

Here we are importing the file using np.genfromtxt() function, use comma as delimiter (obviously, since it is a CSV), and treat the datatype as string using np.str parameter.

# Save the CVS file to NPY filenp.save("converted_file", load_file)convert_file = np.load(converted_file.npy)print(convert_file)

The loaded file which is saved in load_file variable will be saved as converted_file.npy using np.save() and put into another variable convert_file using np.load().

So how do we ensure that both files converted_file.npy and original_file.csv are the same and unaltered after we load the data?

np_array_equal(convert_file, load_file)## output is TRUE if both files are the same datatype

Now we know how to leverage NPY files instead just using CSV files when loading a dataset. However, CSV is very famous and most data analyst and data scientist are familiar with this filetype. This post shows how NPY has advantages when working in a python and NumPy environment especially when we want to retain the value format of our dataset.

--

--

Analyticbox
Analyticbox

Published in Analyticbox

Database management, data engineering, data science and machine learning articles from Analyticbox’ members and contributors

Anindio Daneswara
Anindio Daneswara

Written by Anindio Daneswara

Data Engineering, Software Architect, Blockchain Developer. Python/Java