Config Files and Logging
Most of the Data Scientist coding in Python are comfortable using Jupyter Notebook. One of the most basic skills that most of the Data Scientist lack is writing a production ready code.
As Jupyter Notebook gives a seamless IDE for examining the variables and displaying the print statements right below the code, the Data Scientist find it difficult to write a production script. A production ready script should be generic and write the debug or error statements in a file.
What is a generic code? A code should have minimal hard-coding. The program should not be changed if there is any change in environment variable like the input file folder has moved, the DB schema (in case of multi-tenant configuration) has changed etc. Such environment variable should be read from a configuration file (or config file, for short) which can be changed without the need of making code changes and deploying the code again.
Why writing the messages in a log file important? Any message outputted in console using
print() statements are not available for debugging after the program/script has ended. So the developers prefer to log messages in the a file so that it can be referred to in case of abnormal termination of program.
Config Files — Making code generic
One of the ways to ensure that there is minimal hard-coding is to use config files. Config files are used to save the configuration values related to, say, database connection, location of log files and other environment variables.
configparser is used to read the config file. A config file has sections for configuration values. An example of a config file is below. The file is saved as config.ini in the server.
[Logger] is the section name. This section has 3 configurations —
level. We will discuss about these configurations in the next section where we will discuss logging.
First step we have to do for reading config file is to import
Next call the function for reading configurations
The minimal hard-coding that needs to be done is the folder where config file is present, name of the config file and section of the config file where all the necessary configurations are present.
MASTER_FOLDER is the folder where config file is stored.
MASTER_FILE has the value
config.ini which is the name of config file.
LOGGER_SECTION has the value
Logger which will be used to find the configurations.
Logger is referred to
[Logger] in the
read_config function reads the file and configurations.
In this function a parser object is created which is used to read the the config file. The details of the section are read in
section_details dictionary. This dictionary is returned from the function.
logger_details in the
main stores the details of all the configurations.
logger_details will be used to in the next section.
Logging — Must for Debugging
In the previous section we have read the configuration file and stored the configurations in a dictionary. In this section we will use this dictionary to log the messages. Logging messages in a file is very important to view the messages after the program is executed and cleared from memory. There are 5 types of log messages — Debug, Info, Warning, Error and Critical. These levels are in the increasing order of severity.
Debug — Messages related to diagnosing issues in the code. These are usually the
Info — These are messages related to the progress of the code.
Warning —These are the warning messages that can have a potential future issue. For example, a warning message can be logged when the RAM reaches a threshold.
Error — These are serious bug in the code. It could be any exception like out of memory error. Usually the program terminates after logging error messages.
Critical — These are critical issue in the code due to which the program might end abruptly or stop functioning.
These values of severity levels will be referred to later in the article.
logging is used to log the messages in a file. The first thing to be done is to import the package.
After importing the package, basic configuration for logging should be set to mention the file where the logs will be created. In case this basic configuration is not set then the logging will be done in console only. We will call a method to set the basic configuration related to logging from
main after reading
config.ini file. The code of
main will be as follows. You would have seen most of the code in previous section. The last line of the code is related to calling the function to set the basic configuration for logging.
set_logging_basics does not return any value and its purpose id to set the basic configuration for logging only. Since this function sets the environment it does not return any value. Previously read
logger_details are passed to the function.
logger_details is a dictionary with the configuration details that we would discuss further.
In this function everything we have discussed so far comes together. First we read the dictionary values and store them in different variables. If you refer to the
config.ini file mentioned in the top of the article, you would understand that
DEBUG. All these variables are used to set the basic configuration of logging by calling
logging.basicConfig the first argument is
filename which is the path of file where messages will be written. The second argument
filemode specifies in which mode the file should be opened. I have mentioned
filemode='w' which means that the file will be opened in write mode and the messages will be overwritten in every run of the program.
format specifies the way the message will be written in the file. First the name of the logger used to log the call will be written. After which the level of message (
DEBUG etc.) will be written followed by the message specified by developer. It will become clear when we log the message shortly.
level argument specifies the lowest level of severity. More about
Once these basic configurations are set the messages will be written in
C:\Logs\application.log file. The messages will be logged like below
The application.log file looks like —
These logged messages help a developer to debug the code in case of an error and also to track the progress of code.
level argument specifies the lowest level of severity. Remember the levels in increasing order of severity are —
DEBUG < INFO < WARNING < ERROR < CRITICAL
If the value in
INFO then only the messages with severity more than or equal to
INFO will be logged. This means that any messages with DEBUG level will not be logged. Another example — if the value in
level is ERROR then only ERROR and CRITICAL messages will be logged and all other messages will not be logged. In the
config.ini file I have used the
DEBUG implies that all types of messages will be logged in the log file.
Not only Data Scientist, every developer should know how to write a generic code and debug the code in case of any issue. Using config files and loggers are tools used for reducing hard-coding and logging messages. Every developer must know how to use them.