Extracting Information with Python

A Beginner’s Guide to Successfully Creating a Python Script to Extract Information From a Custom Path or Current Working Directory.

Published in

Women in Technology

9 min readMay 17, 2023

Intimidated by snakes, I know the feeling. I know some can find Python scripting a little scary, but hopefully this will not be. This article will be about growing our knowledge and skills with Python. I will be sharing a tutorial on how-to build a script that extracts information, such as the name and size of files, from either a custom path or the current working directory, and then stores it in a list of dictionaries.

A little background //

Python: is one of the world’s most popular general-purpose programming language. It’s fan base is growing thanks in large part that Python’s design philosophy, which emphasizes code readability utilizing a simple syntax that allows programs to be written with fewer lines versus other programming languages. Python’s language is very similar to the English language, which makes it perfect for beginners.

Python is open-source making it free to use and distribute, even for commercial use. Uses range from analyzing data, creating program software, or even creating websites. As the industry grows so will Python uses.

Foundational Objective //

Create a script to that generates a list of dictionaries about files in the working directory. Then print the list.

To follow along with this project you will need //

An IDE (Integrated Development Environment)
I will be utilizing AWS Cloud 9
An Optional GitHub Account
Attention to Details

Log into your IDE of your choice. I am using AWS I will provide you a few gems; I always start by creating a new branch. At the bottom of the screen click on the word main, and then select the option of create new branch. I like to name mine with the date of the day. Once we are all successfully finished we will commit and push to our GitHub Repo to share our script with the world. (Remember sharing is caring in the world of DevOps.)

Let’s get started with creating a new Python file, from the File Drop Down Menu, then name the file by saving it to one of your working directories.

I prefer to start a new python script using the New From Template Option

With our blank template we can complete a few steps to create a script that will extract information from our current working directory, (CWD).

Delete lines 1–3
Line one is where we will enter:

#!/usr/bin/env python3.7
(This specifies the version and path of the Python interpreter to use)

In the terminal, mark the script executable by entering the command with the name of your python file:

chmod u+x <filename.py>

Note: you should be in the directory that you saved your script to, otherwise you will receive an error when running the chmod command.

Back in our script, we will Import the OS by entering the command. Which will return a string that represents the current working directory.

# Import the OS Module. 
import os

Our next step is to create an empty list named “cwd_files”. This is what we will add the dictionaries of the files to.

cwd_files = []

To define the working directory by using the following function and set it to define a variable named “my_cwd”.

my_cwd = os.getcwd()

Creating a for loop, which will run through all files in the CWD.

for file in os.listdir():

Join the file name with the current working directory path by running the command:

file_path = os.path.join(file)

Our Objective said to collect the file stats (path & size of files)we can achieve that by running the command

file_stats = os.stat(file_path)

We will need to use the append method to add each dictionary created to the empty list that was created earlier.

cwd_files.append({'my_cwd':my_cwd+'/' +file, 'size': file_stats.st_size})

Our script should come together to look like this:

#!/usr/bin/env python3.7

# Import the OS Module
import os

# Create empty list name "cwd_files"
cwd_files = []

# Define the working directory as "cwd" variable
my_cwd = os.getcwd()

# Loop through all files in the current working directory
for file in os.listdir():

# Join the file name with the current working directory path
    file_path = os.path.join(file)
    
# Get file stats (path and size of file)
    file_stats = os.stat(file_path)

# Append a new dictionary to 'cwd_files' with path and size
    cwd_files.append({'my_cwd':my_cwd+'/' +file, 'size': file_stats.st_size})

Awesome Job! Let’s wrap up, by ending our script with the print command. This will then result in us being able to see everything we put together.

print(cwd_files)

In Terminal, execute the script.

Hmm, I am not loving the legibility factor with our results. The great thing is with Python, is we can have our script run one more command to make things a little bit clearer and easier to read. Delete the print command and add:

for i in range(len(cwd_files)):
    print(cwd_files[i])

Execute the script again.

As you can see our script is successful & much more legible.

Oooh! I love it!! Let’s push our new script to our GitHub Repo, merge, and then follow back up with deleting the branch we created today!

Need a refresher on Git commands check out a previous article:
https://medium.com/@mel.foster/git-github-a-collaboration-84a7415de3ba

Here is the final script we created today:

#!/usr/bin/env python3.7

# Import the OS Module
import os

# Create empty list name "cwd_files"
cwd_files = []

# Define the working directory as "cwd" variable
my_cwd = os.getcwd()

# Loop through all files in the current working directory
for file in os.listdir():

# Join the file name with the current working directory path
    file_path = os.path.join(file)
    
# Get file stats (path and size of file)
    file_stats = os.stat(file_path)

# Append a new dictionary to 'cwd_files' with path and size
    cwd_files.append({'my_cwd':my_cwd+'/' +file, 'size': file_stats.st_size})
    
# Using a "for" loop with range() function to print the files legible manner 
for i in range(len(cwd_files)):
    print(cwd_files[i])

Not ready to leave just yet and want to push yourself? Try working through an advanced portion.

Our Advanced Objectives //

Modify our script into a function that any path can be passed as a parameter.
Parameter should be optional and should default to the current working directory when the variable is not passed
The function should create the list of dictionaries about files from that path
The function should also return information on files nested in folders (recursive)

Building our Advanced Script //

Starting our script in the same manner:

#!/usr/bin/env python3.7

# Import the OS Module
import os

# Create empty list name "cwd_list"
cwd_files = []

Now, here is where we break away from foundation and start getting into a bit more of the advanced objectives.

Define the function using the extract_info and adding a parameter of ‘path’ which will default to CWD.

# Define function to extract file info default cwd
def extract_info(path = '.'):
    cwd_files = []

Now, we are going to create a for loop. A for loop is used for iterating over a sequence and os.walk will pull the information requested from the path provided. As well as we are going to create file_dict for both path & size in a different way then the script above. I wanted to challenge us to see if there was a different way to get the same info. Once compiled we will append into cwd_files. Set the exit = False

# Recursively iterate over all files/directiories cwd
    for root, dirs, files in os.walk(path): 
        for file_name in files:
            file_dict = {} 
            
            # Get file path and size
            file_dict['path'] = os.path.join(root, file_name) 
            file_dict['size'] = os.path.getsize(os.path.join(root, file_name)) 
            
            # Appends file info dictionary to the list
            cwd_files.append(file_dict) 
            
        # Return List
        return cwd_files
        
exit = False

Moving to finishing out our script, we will create a while loop. A while loop can execute a set of statements as long as a condition is true. We are going to run ours until the User decides to exit. This is where we will also give the User an option of choosing a path of a different directory or running for CWD by using if & elif variables.

# Loop until User Chooses to Exit
while exit == False: 
    try:
        
        #User has option to input path or to exit the program
        path = input("Type in a path or press 'Enter' for CWD. (Type 'Exit' to exit the program): ") 
        
        if path.lower() == "exit": 
            exit =  True

        # Default to CWD if no path entered
        elif path == "": 
            info = extract_info()
            print(*info, sep="\n")

        # Pass custom path to function
        else:
            info = extract_info(path) 
            print(*info, sep="\n")

Lastly, we want to generate an Error Response, so the User will be notified that they entered an incorrect path.

# Unexpected Error Message         
    except: 
        print(" Error: Issue Processing Path. Please try again with a correct path.")

The completed script should look something like this. Remember you can customize your Error Message.

#!/usr/bin/env python3.7

# Import the OS Module
import os

# Create empty list name "cwd_list"
cwd_files = []

# Define function to extract file info default cwd
def extract_info(path = '.'):
    cwd_files = []

# Recursively iterate over all files/directiories cwd
    for root, dirs, files in os.walk(path): 
        for file_name in files:
            file_dict = {} 
            
            # Get file path and size
            file_dict['path'] = os.path.join(root, file_name) 
            file_dict['size'] = os.path.getsize(os.path.join(root, file_name)) 
            
            # Appends file info dictionary to the list
            cwd_files.append(file_dict) 
            
        # Return List
        return cwd_files
        
exit = False

# Loop until User Chooses to Exit
while exit == False: 
    try:
        
        #User has option to input path or to exit the program
        path = input("Type in a path or press 'Enter' for CWD. (Type 'Exit' to exit the program): ") 
        
        if path.lower() == "exit": 
            exit =  True

        # Default to CWD if no path entered
        elif path == "": 
            info = extract_info()
            print(*info, sep="\n")

        # Pass custom path to function
        else:
            info = extract_info(path) 
            print(*info, sep="\n")
            
            
    # Unexpected Error Message         
    except: 
        print(" Error: Issue Processing Path. Please try again with a correct path.")

Let’s see if we were successful by executing the script in Terminal.

./WK13Python_Dict_Advanced.py

Yes!! If you look above in the image, you can see that I executed the script. First I entered in a path, then after that success, I was given another try due to “this will loop until we choose to exit.” This time, I just hit enter, to test if we would get the CWD, which we did! As you can see my current working directory, has way more files. Finally, to test to see if we could get our Error message to show up, I typed an incorrect path.

It feels good to push yourself! I did a lot of reading a research to discover the coding I entered above. There’s multiple ways to get the same or very similar results. Research and networking will get you really far! I find the Cloud DevOp Community to be very giving and welcoming to those who have a desire to learn and work towards growing their knowledge.

Finally, I will once again, commit and push the advanced script we created to my GitHub, and then finally delete my branch after merging. Thank you so much for following along. I hope you learned a little, and ready to follow me down the next rabbit hole.

Special Notes //

Instead of running the script through Terminal, you can always test/process your script at any point in time with using the Run function in AWS, after you save your file. This can come in handy to pull information that you wont need to execute through the command line. Cloud9 gives the User control to decide.

Here is an example of running vs executing from Terminal command line.

Select Run, and it will open a tab next to your Terminal Tab, it will process the script and you can enter the a custom path, or hit ‘enter’ for CWD.

Look for the Green Circle with the Play Icon in it.

For more information check out:

What is AWS Cloud9?

AWS Cloud9 is an integrated development environment, or IDE. The AWS Cloud9 IDE offers a rich code-editing experience…

docs.aws.amazon.com

https://docs.python.org/3/library/os.html for all Python commands surrounding OS.

If you are like me, and like to go down a rabbit hole or two, stackoverflow.com can be a great resource for learning and understanding the process and working to find the answer.

Join me on https://www.linkedin.com/in/melissafoster08/ or follow me at https://github.com/mel-foster