Analytics Vidhya
Published in

Analytics Vidhya

Config Files and Logging

Most of the Data Scientist coding in Python are comfortable using Jupyter Notebook. One of the most basic skills that most of the Data Scientist lack is writing a production ready code.

As Jupyter Notebook gives a seamless IDE for examining the variables and displaying the print statements right below the code, the Data Scientist find it difficult to write a production script. A production ready script should be generic and write the debug or error statements in a file.

What is a generic code? A code should have minimal hard-coding. The program should not be changed if there is any change in environment variable like the input file folder has moved, the DB schema (in case of multi-tenant configuration) has changed etc. Such environment variable should be read from a configuration file (or config file, for short) which can be changed without the need of making code changes and deploying the code again.

Why writing the messages in a log file important? Any message outputted in console using print() statements are not available for debugging after the program/script has ended. So the developers prefer to log messages in the a file so that it can be referred to in case of abnormal termination of program.

Config Files — Making code generic

One of the ways to ensure that there is minimal hard-coding is to use config files. Config files are used to save the configuration values related to, say, database connection, location of log files and other environment variables. configparser is used to read the config file. A config file has sections for configuration values. An example of a config file is below. The file is saved as config.ini in the server.

Config File

[Logger] is the section name. This section has 3 configurations — folder, file and level. We will discuss about these configurations in the next section where we will discuss logging.

First step we have to do for reading config file is to import configparser package.

Importing ConfigParser

Next call the function for reading configurations

Before reading the config file

The minimal hard-coding that needs to be done is the folder where config file is present, name of the config file and section of the config file where all the necessary configurations are present. MASTER_FOLDER is the folder where config file is stored. MASTER_FILE has the value config.ini which is the name of config file. LOGGER_SECTION has the value Logger which will be used to find the configurations. Logger is referred to [Logger] in the config.ini file. read_config function reads the file and configurations.

Function to read config file

In this function a parser object is created which is used to read the the config file. The details of the section are read in section_details dictionary. This dictionary is returned from the function. logger_details in the main stores the details of all the configurations. logger_details will be used to in the next section.

Logging — Must for Debugging

In the previous section we have read the configuration file and stored the configurations in a dictionary. In this section we will use this dictionary to log the messages. Logging messages in a file is very important to view the messages after the program is executed and cleared from memory. There are 5 types of log messages — Debug, Info, Warning, Error and Critical. These levels are in the increasing order of severity.

Debug — Messages related to diagnosing issues in the code. These are usually the print statements that are put in Jupyter Notebook.

Info — These are messages related to the progress of the code.

Warning —These are the warning messages that can have a potential future issue. For example, a warning message can be logged when the RAM reaches a threshold.

Error — These are serious bug in the code. It could be any exception like out of memory error. Usually the program terminates after logging error messages.

Critical — These are critical issue in the code due to which the program might end abruptly or stop functioning.

These values of severity levels will be referred to later in the article.

Python package logging is used to log the messages in a file. The first thing to be done is to import the package.

Importing Logging Package

After importing the package, basic configuration for logging should be set to mention the file where the logs will be created. In case this basic configuration is not set then the logging will be done in console only. We will call a method to set the basic configuration related to logging from main after reading config.ini file. The code of main will be as follows. You would have seen most of the code in previous section. The last line of the code is related to calling the function to set the basic configuration for logging.

Calling function to set logging configurations

set_logging_basics does not return any value and its purpose id to set the basic configuration for logging only. Since this function sets the environment it does not return any value. Previously read logger_details are passed to the function. logger_details is a dictionary with the configuration details that we would discuss further.

Function to set Basic Configuration for Logging

In this function everything we have discussed so far comes together. First we read the dictionary values and store them in different variables. If you refer to the config.ini file mentioned in the top of the article, you would understand that folder = C:\Logs\, file = application.log and level = DEBUG. All these variables are used to set the basic configuration of logging by calling basicConfig method.

In logging.basicConfig the first argument is filename which is the path of file where messages will be written. The second argument filemode specifies in which mode the file should be opened. I have mentioned filemode='w' which means that the file will be opened in write mode and the messages will be overwritten in every run of the program.

format specifies the way the message will be written in the file. First the name of the logger used to log the call will be written. After which the level of message (INFO, DEBUG etc.) will be written followed by the message specified by developer. It will become clear when we log the message shortly.

level argument specifies the lowest level of severity. More about level later.

Once these basic configurations are set the messages will be written in C:\Logs\application.log file. The messages will be logged like below

How to log messages using logging module

The application.log file looks like —

Log File

These logged messages help a developer to debug the code in case of an error and also to track the progress of code.

Note about level in logging.basicConfig

level argument specifies the lowest level of severity. Remember the levels in increasing order of severity are —

DEBUG < INFO < WARNING < ERROR < CRITICAL

If the value in level is INFO then only the messages with severity more than or equal to INFO will be logged. This means that any messages with DEBUG level will not be logged. Another example — if the value in level is ERROR then only ERROR and CRITICAL messages will be logged and all other messages will not be logged. In the config.ini file I have used the level = DEBUG implies that all types of messages will be logged in the log file.

Conclusion

Not only Data Scientist, every developer should know how to write a generic code and debug the code in case of any issue. Using config files and loggers are tools used for reducing hard-coding and logging messages. Every developer must know how to use them.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store