Python-Reading text, large csv and image files .

Reading csv, text and image files

Photo by Ohmky on Unsplash

There are many different types of files that we can process. Some of these file types are listed here:

  • CSV files (generally have the .csv extension) hold tabular data that resembles spreadsheet data.
  • Image files (generally with the .png or .jpg extension) hold images for computer vision.
  • Text files (often have the .txt extension) hold unstructured text and are essential for natural language processing.
  • JSON (often have the .json extension) contain semi-structured textual data in a human-readable text-based format.
  • H5 (can have a wide array of extensions) contain semi-structured textual data in a human-readable text-based format. Keras and TensorFlow store neural networks as H5 files.

Read a CSV File

Python programs can read CSV files with Pandas. We will see more about Pandas in the next section, but for now, its general format is:

Read (stream) a Large CSV File

Pandas will read the entire CSV file into memory. Usually, this is fine. However, at times you may wish to “stream” a huge file. Streaming allows you to process this file one record at a time. Because the program does not load all of the data into memory, you can handle huge files. The following code loads the Iris dataset and calculates averages, one row at a time. This technique would work for large files.

Read a Text File

The following code reads the USA Declaration of Independence as a text file. This code streams the document and reads it line-by-line. This code could handle a huge file.

Read an Image

Computer vision is one of the areas that neural networks outshine other models. To support computer vision, the Python programmer needs to understand how to process images. For this course, we will use the Python PIL package for image processing. The following code demonstrates how to load an image from a URL and display it.



