Read and Write Text Files Using Python Open

Ravish Kumar
EnjoyAlgorithms
Published in
8 min readFeb 14, 2024

Data files in ML and Data Science can be present as documents because maintaining a document is easy. Especially in Natural Language Processing (NLP), most textual datasets are present in files ending with a ".txt" file extension. In this article, we will learn methods to read the content of these document files, modify them, and save them again by updating the same document or creating a new document.

Sample Document To Test

We can create a dummy text file or download any authentic sample file to do some hands-on. For example, the image below shows a dataset containing emails.

How to open any Document in Python?

We need to read their content while building ML models or analyzing these datasets. For that, in most cases, we use the "open" function natively present in Python. This function expects two mandatory input arguments:

  • file: It includes the filename of that document to be opened along with its location directory. For example, if a folder is present in the Downloads folder, we need to give a complete directory of the file as a string like this: '~/Downloads/sample_file1.txt'.
  • Mode: It is an optional argument representing the mode in which the file needs to be opened. The default value is "r" representing "read" mode. In simple terms, if we use open(file) directly, it will open the file in read-only mode. A list with all the modes is present in the table below.
Character                                       Meaning
------------------------------------------------------------------
'r', 'r+', or 'r+b' open the file in read-only mode (default) and do not truncate

'w' 'w+' or 'w+b' open the file for writing, truncating the file first

'x' open for exclusive creation, failing if the file already exists

'a' or '+' open for writing, appending to the end of file if it exists

'b' binary mode

't' text mode (default)

'+' open for updating (reading and writing)

These modes represent a lot about the state of the file being opened by the open function. If we open the file with 'w' mode, it will open the file with write mode, and if that file already exists, it will first delete the content present in that file and then write our instructed information in that file.

The line below shows a way to open one of the files present in the sample folder in the read-only mode.


sample_file = open('0001.1999-12-10.farmer.ham.txt', "r")

| ------------------------------ |
| | |
\_/ \_/ \_/

Open function path of file to open Mode

As we know, in Python, everything is an object. So, the line above will create a file object with the name sample_file. This object will also contain some other information about the file being opened. For example, we can access the file's name opened using 'sample_file.name' and the mode with which the file opened using 'sample_file.mode'.

print(sample_file.name, sample_file.mode)

## 0001.1999-12-10.farmer.ham.txt r

Closing the Opened File

When we open the file using Python's "open" function, it must be closed after performing the desired operations. For example, if we open the above file to read its content, then it should be closed after reading like this:

sample_file = open('0001.1999-12-10.farmer.ham.txt', "r")

sample_file.close() ## Closing

But why do we need to close the opened file?

Let's list down the reasons for closing the opened file:

  • The opened file needs to be closed as other programs running on the same system may start interfering with the content being dumped into that opened file. It will make the content corrupt or a garbage collector.
  • The opened file will use space from the RAM, slowing down our program.
  • In most cases, the changes in the opened files get reflected when we close the file. So, if we do not close the file, our changes will not reflect in that.
  • We will hit the limit of the number of files that can be opened inside a program.

Shouldn't it be automatically closed?

  • No, it should not be closed automatically, as we sometimes might need to dump information into the same file while doing the parallel execution performed on the same system.
  • Also, we can not automatically close opened files as the number of operations happening in that opened file can vary. If we automatically close the file, it may cause data loss issues.

How to check if the file is already closed?

To check whether the file opened is already closed or not, we can use the ".closed" operation like this:

# Case 1

sample_file = open('0001.1999-12-10.farmer.ham.txt', "r")

print("Case 1 : File is closed : ",sample_file.closed)

sample_file.close()

print("Case 2 : File is closed : ",sample_file.closed)


## Case 1 : File is closed : False
## Case 2 : File is closed : True

How do you automatically close the file after performing some operations?

To avoid interference from other sources and get a sense of when the file will be closed and open, we open the file from the "with" statement like this:

with open('0001.1999-12-10.farmer.ham.txt', "r") as file1:

print("Case 1 : File is closed : ", file1.closed)

print("Case 2 : File is closed : ", file1.closed)


'''
Case 1 : File is closed : False
Case 2 : File is closed : True
'''

In Python, indentation matters a lot, and by using that, our program senses which operations need to be performed by keeping the file open, and as soon as we come out of the "with" statement, the file will get automatically closed. In most cases, we open files using the 'with' statement.

Now, as the file is open, we will learn about performing actions on these opened text files. Some popular actions are Reading, Writing, and Appending.

Reading content from the text file

  • Reading the entire content at once:
with open('0001.1999-12-10.farmer.ham.txt', "r") as file1:

file_content = file1.read()

print(file_content)

'''
This is line 1 in sample text file.
This is line 2 in sample text file.
This is line 3 in sample text file.
This is line 4 in sample text file.
This is line 5 in sample text file.
'''
  • Reading the content as a list of lines in the text file using "readlines":

Here, every line in the content is treated as a list element and forms a complete list on that basis. Please note the spelling "readlines," which is plural, and that's why all the lines are present in the content.

We can also mention the number of characters we want to read

with open('0001.1999-12-10.farmer.ham.txt', "r") as file1:

file_content = file1.readlines()


print(file_content)


'''
['This is line 1 in sample text file.\n',
'This is line 2 in sample text file.\n',
'This is line 3 in sample text file.\n',
'This is line 4 in sample text file.\n',
'This is line 5 in sample text file.']
'''
  • Reading the content as a list of lines in the text file using "readline":

Here, the "readline" attribute of the opened file object is used to read a single line of the content. If we call the "readline" attribute "n" times, it will read the "n-th" line from the text file. Please note the new line in the output printed.

with open('0001.1999-12-10.farmer.ham.txt', "r") as file1:

file_content1 = file1.readline()
file_content2 = file1.readline()
file_content3 = file1.readline()
file_content4 = file1.readline()
file_content5 = file1.readline()
file_content6 = file1.readline()

print(file_content1)
print(file_content2)
print(file_content3)
print(file_content4)
print(file_content5)
print(file_content6)

'''
This is line 1 in sample text file.

This is line 2 in sample text file.

This is line 3 in sample text file.

This is line 4 in sample text file.

This is line 5 in sample text file.

'''

Special case: If we call the "readline" attribute six times for the content having only five lines, it will not print anything.

Writing a text file in Python by opening it in write mode

Write is another important operation we perform with the text files while creating data or logging information by running various programs. We use the "open" function with the "w (write)" mode to open the text file in the write mode like this:

open("sample_write.txt", "w")

-------\/--------

filename + file directory

There can be two scenarios: 1. Filename already exists in the mentioned file directory, and 2. The filename does not exist in the mentioned directory. Let's understand these scenarios by performing these things.

  • File exists

We will first open the file in read mode to read the original content, then open it in the write mode to write something in it, and then again open it in the read mode to see the new content after writing in it. Here is the sample code:

### Open the file to read the original content in it

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "r") as file1:

file_content1 = file1.read()
print("** Original Content ** ",file_content1)

#### file closed


### Open the file to write something in it

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "w") as file1:

file1.write("This is new line written!!")

## file closed

### Open the file to read the new content in it

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "r") as file1:

file_content1 = file1.read()
print("** New Content ** ",file_content1)

'''
** Original Content ** This is line 1 in sample text file.
This is line 2 in sample text file.
This is line 3 in sample text file.
This is line 4 in sample text file.
This is line 5 in sample text file.
** New Content ** This is new line written!!
'''

Please note that the content is overwritten. The original content is lost in this case, and the new content we have written is the only content present in the same file.

  • The file does not exist.
with open('sample_write.txt', "w") as file1:

file1.write("This is new line written!!")

This will create a new file with the name of the filename mentioned in the open function at the given file directory location.

Reading from one file and writing it in another

We can store the readings as list entities using the readlines attribute and then use the Python loops to write those entities into another file.

with open('0001.1999-12-10.farmer.ham.txt', "r") as file1:

all_lines = file1.readlines()

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "w") as file2:

for line in all_lines:

file2.write(line)

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "r") as file1:

file_content1 = file1.read()

print("** New Content ** ",file_content1)

'''
** New Content ** This is line 1 in sample text file.
This is line 2 in sample text file.
This is line 3 in sample text file.
This is line 4 in sample text file.
This is line 5 in sample text file.
'''

One good question that should cross our mind is, what if we do not want to lose the data or content present in the file, just put extra things into that same file? This won't be possible in the write mode as it will truncate the previous content. We can store the content of that file somewhere and then re-write it with the updated content. This can be tedious and requires more runtime memory. For this, we have an append mode.

Appending the file: Adding more content to the existing file

Here is the code to perform the appending operation to any existing file. We first need to have the new content we want to append and then use Python loops to append that.

new_lines = ['This is line 6 in sample text file.\n', 'This is line 7 in sample text file.\n']

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "a") as file1:

for line in new_lines:
file1.write(line)

with open('0001-Copy1.1999-12-10.farmer.ham.txt', "r") as file1:

file_content1 = file1.read()

print("** New Content ** ", file_content1)

Conclusion

In Machine Learning, we often need to read text files by opening them, writing or modifying the existing content, and then restoring the updated versions. This content mentions how to open a text file using Python's in-built Open function with and without 'with' statements. We hope you enjoyed the article and learned something new.

Enjoy Learning!

16 Week Live Project-Based ML Course: Admissions Open

--

--

Ravish Kumar
EnjoyAlgorithms

Deep Learning Engineer@Deeplite || Curriculum Leader@ enjoyalgorithms.com || IIT Kanpur || Entrepreneur || Super 30