Identifying Repetitive Tasks — My Guide

7 min readJan 29, 2024

Software development often involves repetitive tasks that can be optimized or automated to improve efficiency and productivity. This post will delve deeper into the concept of DRY (Don’t Repeat Yourself), task complexity and frequency, data analysis tasks, and testing.

Embracing the DRY Principle

The DRY (Don’t Repeat Yourself) principle is a fundamental concept in software development. It emphasizes that every piece of knowledge must have a single, unambiguous representation within a system.

Implementation of the DRY concept in programming encourages the encapsulation of repetitive commands into a separate function, regardless of their length — many code lines, a few code lines, or even one code line. Whenever the command is required, the program refers to this specific function. Consequently, if modifications are necessary or a bug is detected, there’s just a single location to implement the changes.

For example: If you’re writing a program that calculates the area of different shapes, instead of writing the formula for each shape every time, you can define functions like calculate_circle_area(radius), calculate_rectangle_area(length, width), and so on.

import math

def calculate_circle_area(radius):
    return math.pi * radius ** 2

def calculate_rectangle_area(length, width):
    return length * width

# Use the functions
circle_area = calculate_circle_area(5)
circle_area = calculate_circle_area(15)
circle_area = calculate_circle_area(7)
rectangle_area = calculate_rectangle_area(4, 7)
rectangle_area = calculate_rectangle_area(5.5, 8)
rectangle_area = calculate_rectangle_area(2.4, 17)

Data Analysis

In the context of data analysis, repetitiveness could refer to several actions: the regular opening and reading of data tables, writing to these tables, or performing recurrent manipulations and analyses on the table data.

Opening and reading tables

When working with databases, it’s common to repeatedly open and read tables. In such scenarios, a developer might opt to create a universal function to handle all open-and-read operations.

import pandas as pd

def read_table(file_name):
  # Load the data
  data = pd.read_csv(file_name)
  
  # Clean the data
  data = data.dropna()

  return data

Please note: In certain programming languages, there are two separate commands, open and read, while others employ a single read command that effectively combines both functions behind the scenes.

Computations on Tables

Occasionally, we may need to perform basic computations on similar tables immediately after opening them, ensuring the data is primed for use. This capability can be integrated into the existing open-and-read function.

import pandas as pd

def read_table(file_name):
  # Load the data
  data = pd.read_csv(file_name)
  
  # Clean the data
  data = data.dropna()

  # Calculate total sales
  total_sales = data['SaleAmount'].sum()

  return data

Performing Recurrent Manipulations and Analyses

Data analysis tasks often involve repetitive processes such as data cleaning, transformation, visualization, and modeling. These tasks can be automated using various programming languages and tools.

To succeed with the analysis, we should choose the programming language that is most suitable for our task. I often use Python, and Python libraries like pandas and seaborn can be used for data manipulation and visualization, respectively.

For example: Let’s assume you have data from a selling company, and all the information about the sales from all employees is in a table. You might want to filter the sales data to include only those rows where the sales amount is greater than a certain value, and then calculate the mean of the filtered sales amounts. You can go over the table, extract the information and calculate, or simply use pandas in Python:

def calc {sales_threshold}:
    # Filter the data
    filtered_data = data[data['Sales'] > sales_threshold]

    # Calculate the mean of the sales amounts in the filtered data
    mean_sales = filtered_data['Sales'].mean()

    return mean_sales

threshold1 = 1000
mean_sales1 = calc(threshold1)
threshold2 = 2000
mean_sales2 = calc(threshold2)

print(f'The mean sales amount for sales over {threshold1} is {mean_sales1}')
print(f'The mean sales amount for sales over {threshold2} is {mean_sales2}')

Cron

The DRY concept is not only applicable to coding, but also to the execution of recurring commands. Take crontab in Unix, for example — it executes commands repeatedly based on your configurations.

Cron, the time-based job scheduler in Unix-like operating systems, is an excellent tool for automating repetitive tasks. Consider a scenario where a script updates a database backup every night.

For full reading about crontab click here.

In general, if you would like to edit your crontab list, enter the following command line in your Unix terminal:

crontab -e

This will open an editor, where you can write your commands (detailed below).

The syntax: minute(s) hour(s) day(s) month(s) weekday(s), respectively.

Legal values:

Minutes: Command will be executed at the specific minute.
Values: 0–59 or * (means all values).
Hours: Command will be executed at the specific hour
Values: 0–23 or * (means all values).
Days: Commands will be executed on these days of the month.
Values: 1–31 or * (means all values).
Months: The month in which tasks need to be executed.
Values: 1–12 or * (means all values).
Weekdays: Days of the week where commands would run. Here, 0 is Sunday.
Values: 0–6 (0 is Sunday) or * (means all values).

Examples:

# Cron job example #1:
0 15 * * 0 echo `date`

# Cron job example #2:
*/5 12 1-10 5 1,3 csh /bin/my_script.csh

In example #1: the current date is displayed on the screen every
Sunday (0 15 * * 0 ) at 15:00 (0 15 * * 0).

In example #2: the script my_script.csh is executed
from May 1st to May 10th (*/5 12 1–10 5 1,3),
only on Mondays and Wednesdays (*/5 12 1–10 5 1,3),
every 5 minutes (*/5 12 1–10 5 1,3)
from 12:00 to 12:59 (*/5 12 1–10 5 1,3).

Task Complexity and Frequency

Task complexity refers to the number of resources, such as time, cognitive effort, and skills, required to complete a task. Task frequency refers to how often a task needs to be performed.

High-frequency tasks are prime candidates for automation, as automating these tasks can save significant time and effort in the long run. The Cron example above is a simple example for taking into consideration the frequency of the task.

High complexity tasks, might also be good candidates for automation, even if they are not executed frequently. What happens when the task is very complex, but occurs once in a few months or maybe once a year? Should we choose the automation solution?

For example, imagine a situation where you need to construct a simulation environment every few months. This environment includes a multitude of interlinked files, which are provided as inputs to the project. In such a case, it could be beneficial to dedicate a few weeks to developing the environment. By doing so, you ensure that whenever a new project arises, the time required to set up the environment is significantly reduced.

Example: Automating the task of taking a list of attributes and creating a set, get and display for each variable:

def generate_class(class_name, variables, filename):
    with open(filename, 'w') as f:
        # Write the class definition
        f.write(f"class {class_name}:\n")
        f.write("    def __init__(self")
        for var in variables:
            f.write(f", {var}=None")
        f.write("):\n")
        for var in variables:
            f.write(f"        self.{var} = {var}\n")
        f.write("\n")

        # Write the set, get, and display methods for each variable
        for var in variables:
            f.write(f"    def set_{var}(self, value):\n")
            f.write(f"        self.{var} = value\n\n")
            f.write(f"    def get_{var}(self):\n")
            f.write(f"        return self.{var}\n\n")
            f.write(f"    def display_{var}(self):\n")
            f.write(f"        print(self.{var})\n\n")

        # Write the display_all method
        f.write("    def display_all(self):\n")
        for var in variables:
            f.write(f"        print(self.{var}, end=' ')\n")
        f.write("        print()\n")

Example usage:

# Example usage:
generate_class("MyClass", ["var1", "var2", "var3"], "MyClass.py")

The output of this code:

class MyClass:
    def __init__(self, var1=None, var2=None, var3=None):
        self.var1 = var1
        self.var2 = var2
        self.var3 = var3
    
    def set_var1(self, value):
        self.var1 = value
    
    def get_var1(self):
        return self.var1
    
    def display_var1(self):
        print(self.var1)
    
    def set_var2(self, value):
        self.var2 = value
    
    def get_var2(self):
        return self.var2
    
    def display_var2(self):
        print(self.var2)
    
    def set_var3(self, value):
        self.var3 = value
    
    def get_var3(self):
        return self.var3
    
    def display_var3(self):
        print(self.var3)
    
    def display_all(self):
        print(self.var1, end=' ')
        print(self.var2, end=' ')
        print(self.var3, end=' ')
        print()

Testing

Testing is another area where repetitive tasks are common. Automated testing tools can execute test cases, compare expected and actual results, and generate test reports, reducing the need for manual intervention.

Consider a simple unit test for a function add(a, b) that adds two numbers in Python using the unittest module:

Example:

import unittest

def add(a, b):
    return a + b

class TestAddFunction(unittest.TestCase):
    def test_add(self):
        self.assertEqual(add(5, 7), 12)

if __name__ == '__main__':
    unittest.main()

The function assertEqual automatically executes the given function (assertEqual(add(5, 7), 12)), compares its returned value to the other given parameter (assertEqual(add(5, 7), 12)), and notifies if they are not equal, which means an error occurred.

Now you can have even more complex functions, other than add, and you can use assertEqual to test it.

Additional Important Points

Prioritizing

Not all tasks are created equal. Certain tasks may require a significant effort, but yield a minimal impact, while others may necessitate a minimal effort, but demonstrate a high impact. Prioritizing tasks based on their effort-to-impact ratio can help you focus on tasks that provide the most value.

Documentation

Documentation plays a crucial role in software development. It serves as a guide for users and developers, facilitating understanding and usage of the software. Good documentation should be clear, concise, and up-to-date. Tools like Javadoc for Java and Sphinx for Python can help in generating and maintaining documentation.

For instance, in Python, you can use docstrings to document your code:

def add(a, b):
    """
    This function adds two numbers and returns the result.

    Parameters:
    a (int): The first number
    b (int): The second number

    Returns:
    int: The sum of a and b
    """
    return a + b

Code Review

Code review is a process where someone other than the author(s) of a piece of code examines that code. Code review helps to maintain a high standard of code quality and to share knowledge among team members. Tools like GitHub and Bitbucket provide platforms for collaborative code review.

In Conclusion

In conclusion, identifying and managing repetitive tasks is key to improving efficiency and productivity in software development. By adhering to the DRY principle, analyzing task complexity and frequency, automating data analysis and testing tasks, prioritizing tasks, maintaining good documentation, and conducting code reviews, we can significantly enhance our workflow.

Remember, the goal is not just to work hard, but to work smart! Let me know if you need more information on any other topics!