[Day-4]NumPy Array-Functions(2)

Advait Joshi
4 min readApr 14, 2024

--

Hello everyone! Today I delved into the concepts of shapes and sizes of an array and reshaping an array. Apart from this, I made a small project using NumPy. I was a bit bored of reading today so I went on YouTube and found a small tutorial and understood how to deal with data using NumPy. I posted it on GitHub.

GitHub Repo: https://github.com/TheUsefulNerd/NYC-Taxi-DataAnalysis-NumPy.git

While writing the code I came across a CSV file and I do not know what that is. So, I googled it and understood what a CSV file is and I also read and understood other file types used in codes and daily life.

The functions that we discuss today are:

  1. ndarray.ndim()
  2. ndarray.size()
  3. ndarray.shape()
  4. arr.reshape()

Let us Begin…

ndarray.ndim() :

ndarray.ndim will tell you the number of axes, or dimensions, of the array.

For Example, you create this array:

array_example = np.array([[[0, 1, 2, 3],
[4, 5, 6, 7]],

[[0, 1, 2, 3],
[4, 5, 6, 7]],

[[0 ,1 ,2, 3],
[4, 5, 6, 7]]])

To find the number of dimensions of the array, run:

array_example.ndim
#Output :
3

ndarray.size() :

ndarray.size will tell you the total number of elements of the array. This is the product of the elements of the array’s shape.

Considering the same 3-dimensional array:

To find the total number of elements in the array, run:

array_example.size
#Output :
24

ndarray.shape() :

ndarray.shape will display a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 3 columns, the shape of your array is (2, 3).

Considering the same 3-dimensional array:

To find the shape of your array, run:

array_example.shape
#Output :
(3, 2, 4)

arr.reshape() :

Using arr.reshape() will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array. If you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements.

If you start with this array:

a = np.arange(6)
print(a)
#Output :
[0 1 2 3 4 5]

You can use reshape() to reshape your array. For example, you can reshape this array to an array with three rows and two columns:

b = a.reshape(3, 2)
print(b)
#Output :
[[0 1]
[2 3]
[4 5]]

With np.reshape, you can specify a few optional parameters:

np.reshape(a, newshape=(1, 6), order='C')
array([[0, 1, 2, 3, 4, 5]])

a is the array to be reshaped.

newshape is the new shape you want. You can specify an integer or a tuple of integers. If you specify an integer, the result will be an array of that length. The shape should be compatible with the original shape.

order: C means to read/write the elements using C-like index order, F means to read/write the elements using Fortran-like index order, A means to read/write the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise. (This is an optional parameter and doesn’t need to be specified.)

Now let’s discuss about a few different file types:

CSV (Comma-Separated Values):

  • CSV files store tabular data in plain text format, with each line representing a row of data and commas separating the values within each row. It’s widely used because it’s simple and can be easily imported into most data analysis tools.

JSON (JavaScript Object Notation):

  • JSON files store data in a hierarchical format using key-value pairs. It’s commonly used for transmitting data between a server and a web application, but it’s also popular in data science for its flexibility and ease of use with various programming languages.

Excel (XLSX, XLS):

  • Excel files are spreadsheets created using Microsoft Excel. They can store tabular data, charts, and formulas. While not as lightweight or portable as CSV files, they’re widely used in business settings and are compatible with many data analysis tools.

Parquet:

  • Parquet is a columnar storage file format that is optimized for use with big data processing frameworks like Apache Spark and Apache Hadoop. It stores data in a highly compressed and efficient binary format, making it ideal for large-scale data analytics.

HDF5 (Hierarchical Data Format version 5):

  • HDF5 is a file format designed to store and organize large amounts of scientific data. It supports complex data structures, metadata, and compression, making it suitable for a wide range of scientific applications, including image processing, climate modeling, and bioinformatics.

Avro:

  • Avro is a data serialization system that provides rich data structures and a compact binary format. It’s often used in conjunction with Apache Kafka for data streaming applications and is known for its schema evolution capabilities, allowing for seamless updates to data schemas over time.

Feather:

  • Feather is a lightweight binary columnar data format that is designed to be fast to read and write. It’s particularly popular in the Python ecosystem and is supported by libraries like Pandas and Apache Arrow. It’s useful for efficiently exchanging data between different data analysis tools.

This is all that I learned about the different file types and functions of numpy arrays today. My learnings will be documented here on Medium for better understanding. Stay tuned for the next blog.

All The Best People☺️.

Follow me on Linkedin:

www.linkedin.com/in/advaitszone

To read my previous blog “[Day-3]NumPy Array-Essential Functions(1)”:

https://medium.com/@advaitszone/day-3-numpy-array-essential-functions-1-47ce7b9a0d13

--

--