How to follow a file in Python (tail -f in Python)
In this blog post, we see how we can create a simple version of tail -f file
in Python.
What are we doing?
We want to read a file using Python and keep reading the file, infinitely. We want to ‘follow’ a file. Essentially, we want to emulate the UNIX commandtail -f file
does:
We are reading an “infinite stream” of data. Here are a few things to keep in mind:
- we want to constantly watch the file and yield lines as soon as new lines are written to the file
- But we don’t really know how much data will actually be written
- log files can often be enormous (so no question of reading an entire file every time and looking for updates)
How are we doing this?
We will be writing a simple Python script and use Pythonic concepts such as generators
.
Disclaimer: in a real-world production scenario, it’s probably a bad (not scalable) idea using Python to create something like this, we’re doing this just for fun.
Let’s see some code
import time
import osdef follow(thefile):
'''generator function that yields new lines in a file
'''
# seek the end of the file
thefile.seek(0, os.SEEK_END)
# start infinite loop
while True:
# read last line of file
line = thefile.readline() # sleep if file hasn't been updated
if not line:
time.sleep(0.1)
continue
yield line
if __name__ == '__main__':
logfile = open("run/foo/access-log","r")
loglines = follow(logfile) # iterate over the generator
for line in loglines:
print(line)
What’s happening here:
- we create a
follow
function that accepts a file andyeilds
(and not returns) a sequence of lines - we iterate over the generator and keep printing new lines written into the file
- an infinite loop is spawned within the
follow
generator function which makes
A bit on generators
Here, follow
is a special type of function called a generator. What happens under the hood:
- when a generator is encountered `loglines = follow(logfile)`: function execution is paused and a generator object is returned (this contains state variables related to the function)
- the actual function runs when we iterate over the generator object previously returned
- so when we iterate a
__next__()
method is executed, which is when the generator function executes and a value isyielded
To sum up: generators are functions that return objects, which can be iterated over, typically consumed in loops.
Conclusion
Infinite streams are tricky, generators are fun and Python is handy!
Here’s are some great resources on generators: