Parse string from logfile (automation)

Published in

Chen-Chih’s Portfolio Page

39 min readNov 19, 2022

Today I’m trying to write some notes or tutorials related to parsing specific strings or characters from a log file. It’s somehow an automation script that is pretty much helpful and more productive. Basely it’s like a log file that has many values including Tput value, MCS. I want to parse all the values out and write them into another text file then export them to excel. In the future, I wish to implement it into a line chart.

Let me talk about what I am going to automatic first, and if you're interested you can continue reading. There’re three different types of scenario cases that I want to show you.

Senarior1: get the timedate and Tput Value

Senarior2: get the timedate and other related value

Single UE :will get Single UE’s Tput(DL or UL), and other specific value.

Multiply UE: will get the average of TPut value, and other specific values. So as you can see I have multiplied UE I will get the averageTput which is ingress and egress traffic, and below will have the average of other related value. Please reefer picture below for more clearly.

Senarior3 :get timedate, ingress and egress value

Objective: What will you learn after reading this post?

In this automation you will get to learn these skills:

Read and Write text file
Regular expressions to parse the text file and write into a new file
Transfer text file into excel. (openexcl, and pandas)
Graph(DONE 202306)

Note: I’m NOT going to explain how to use regular expressions in deep, just the basics. And this is not a tutorial of a regular expression, keep in mind.

Among these skills I believe that you can do it with many projects in the future; this skill is been used often in python.

In this Post I will cover these item:^
Part 1 Introduction of project
Part 2 understanding parsing string and Regular expression
Part 3 Start writting automation

Part 1 Introduction of project
  - Manual Test without automation
  - Description of this project
  - Log File Explain
Part 2 understanding parsing string and Regular expression
  - Senarior1 : get the timedate and Tput Value from logfile
    - Method1: Traditional way without regular expression
    - Method2: using regular expression
      - Example1:re.search() 
      - Example2:re.findall()
  Senarior2 : get the timedate and more value
  Senarior3: get timedate, ingress, and egress value
    - re.search()
    - re.findall()
Part 3 Start writting automation
  - (case1) Automation for scenarior2: get timedate, and other value
  - (Case2) Automation for senarior3: get timedate, and pdcp
  - (case3) Automation for scenarior2: get timedate, and other value with multiply UE
  - (Case4) Merging multiple log file into one file
  - (Case5) Data Visualization draw the Tput into Line graph

Part 1 Introduction

Manual Test without automation

If I was to manually use it what will I have to do first?

First: I will have to get the log file from system
Second:Open with notepad++ and search and filter related keyword such as DL or UL
Third: Copy the result and save it to .csv file
Forth : Open the file and start to split related keyword and delete column
Fifth: Save it to excel and draw a line chart.

That’s the step, it’s pretty short isn’t it, but in the fourth step, you will take a lot of time to finish it, especially when you have multiple log files, and need to split and delete columns in Excel. My original plan was to draw a line chart, but I think I’ll cover it in the future, and I also need to study it.

There’re many different types ways to achieve it, you don’t have to follow mine. When automatic this, I went through many struggles, so I surf on the net and find many resources, I will show you many different types of ways to achieve it. During this process I have learned many things, and would I like to show you, it is pretty cool to learn skills in many various methods and share them with people. I think the skill I cover is pretty useful and hope it will help you.

Description of this project:

Log file record many data download and upload TPut value
Read the file and search related keywords you want to find like UL or DL
Parse the keyword to the specific pattern and remove the unwanted table
Append to list and write to txt or excel
Draw a graph (future not included in this post)

In this project, you will learn to read and write file txt, parse the specific string, write to Excel, and append to Excel.

Log File Explain:

In this project there are 2 types of log, one is Layer2, and another one is PDCP.

Layer2 Log: will have single and multiple results which display differently like below:

single UE: just a single throughput, whereas multiply UE will have an average of tput.

PDCP log will record all traffic

Part 2 Understand parsing string and Regular expression

Before diving into automation code, let me explain parsing the string, and that’s the most important part of this post. After understanding how to parse the related string we want it will be easier to understand how to automatic.

I will show various types of methods but keep in mind this is not a tutorial on regular expressions.

Senarior1 : get the timedate and Tput Value from logfile

In the below picture when I open elog file(layer2 log), you will see I only want the and Tput Value, other items I don’t want.

In this scenario or example, I’m going to show you how to get the datetime and Tput value.

If we have a string like below:

string="[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]"

string="[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]"

we will get like this output result:

Note: don’t care about the datetime and tput value for the picture

Method1: Traditional way without regular expression

This way is pretty easy, just use the split and strip method to parse out the index of the string. There’re two problems with this code, which does not recommend using this method.

Problem 1 is the code is pretty messy or too long, we can also split it individually, which I will show you below.
Problem 2 is we‘re splitting according to the string's index. This falls to another problem, what if one day the string changes its index position, then this code will run into a problem?

Parse Datetime and Tput Value:

Parse date and time

You can use the split method to split the bracket

dateStr = string.split(‘[‘, 1)[1].split(‘]’)[0]
print(dateStr) #20221018.165317.401606

split(): basely to split value, 1 is max occurrence, in other words, split 1 time when the target is found.

After splitting open and closing the bracket you will just get the timedate value: 20221018.165317.401606 .

Let me explain in more detail here. As you can see string.split(‘[‘, 1)) will split the first bracket, and the rest will be index 1, which is what I want to get. I want to split the ending bracket also so I need to add another split:[1].split(‘]’)[0] which will remove all the strings after the bracket; 0 index means not including the closing bracket. So this is how we will get only the timedate string by removing the bracket .

Please refer below picture to be clear

Now let's move to the second part to get just the Tput Value. I will have full code on how to parse both the timedate and Tput value. For right now let's just focus on one item first.

Parse Tput Value ONLY:

let's assign a tput string as below:

string="[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]"
Tput2 = string.split("Tput= ", 1)[1].split('Mbps')[0].strip()

Strip(): After parsing the Tput you will see there’s space between Tput value, so use the strip to remove space.

First, we split the tput value st2.split(“Tput= “, 1)[1]. This code tells us if find 1 target match and split it and get the index1 value. Since we want to split tput=before it is index[0] which is the timedate, but we want the Tput value, so we will use the index[1] value.

Second, split Mbps, and get before the value index[0]. Please refer below picture I have marked two red arrows, which means split start and end. After all, the strings have been split we should left out the value,0.000091 however there’s space before the value, so we can use the strip to remove the space.

If you don’t want your code to be too long like above you can split it individually, but I’m lazy so I like to do it all in one line. Below is a method to split individuals:

subString = string.split("Tput= ", 1)[1]
TPUT=subString.split(',')[0]
TPUT=TPUT.split(' ')[0]
TPUT
#'0.000091'

Parse Datetime, Tput, and other related Value:

In above I show you how to parse only the timedate and Tput Value, in this example, I am showing you how to parse more value in the string.

For other Parameters please reference below full code if you’re interested.

string="[20221018.165317.401606][info]:[DL- UE[ 0]: Tput=    0.000091 Mbps, Mcs=  9.0(Sigma= 0.0), RbNum=   1.4, ReTxRatio=  33.3, Layers= 1.0, PdschBler=   0.0, nonWPdschBler=  33.3]"
dateStr = string.split('[', 1)[1].split(']')[0]
print(f"datetime:{dateStr2}")
Tput = st2.split("Tput= ", 1)[1].split('Mbps')[0].strip()
print(f"Tput:{Tput2}")
MCS=st2.split("Mcs=", 1)[1].split('(')[0].strip()
print(f"MCS: {MCS}")
rb=st2.split("RbNum=", 1)[1].split(',')[0].strip()
print(f"RbNum:{rb}")
ReTx=st2.split("ReTxRatio=", 1)[1].split(',')[0].strip()
print(f"ReTxRatio:{ReTx}")
PdschBler=st2.split("PdschBler=", 1)[1].split(',')[0].strip()
print(f"PdschBler:{PdschBler}")
nonWPdschBler=st2.split("nonWPdschBler=", 1)[1].split(',')[0].strip(']').strip()
print(f"nonWPdschBler:{nonWPdschBler}")

#OUTPUT
#datetime:20221018.165317.401606
#Tput:0.000091
#MCS: 9.0
#RbNum:1.4
#ReTxRatio:33.3
#PdschBler:0.0
#nonWPdschBler:33.3

There’s another way we can add you want to find’s target string using delims method.

string="[20221018.165317.401606][info]:[DL- UE[ 0]: Tput=    0.000091 Mbps, Mcs=  9.0(Sigma= 0.0), RbNum=   1.4, ReTxRatio=  33.3, Layers= 1.0, PdschBler=   0.0, nonWPdschBler=  33.3]"
dateStr = string.split('[', 1)[1].split(']')[0] 
# output: '20221018.165317.401606'

st3 = string.split("][info]:[DL- UE[ 0]: Tput= ", 1)[1].strip()
#'0.000091 Mbps, Mcs=  9.0(Sigma= 0.0), RbNum=   1.4, ReTxRatio=  33.3, Layers= 1.0, PdschBler=   0.0, nonWPdschBler=  33.3]'

delims = ["Tput=","Mbps,", "Mcs=", "(Sigma= 0.0),", "RbNum=", "ReTxRatio= ", "Layers= 1.0, PdschBler=   0.0, nonWPdschBler=  33.3", ",","]"]
for d in delims:    
    st3 = st3.replace(d,'')
listtostr=" ".join(st3.split())

print(f"{dateStr} {listtostr}")
# 20221018.165317.401606 0.000091 9.0 1.4 33.3

So this is an example of how to use split and strip to parse specific strings, but as I mentioned before if the string has changed then this code will be broken and run an error. So this is not the best solution to workaround, so the best way that I recommend is using a regular expression, which is the next method I will teach you.

Method2: using regular expression

A regular expression is been used on many different types of programs, it’s based on the matching pattern of parameters, which can be pretty complex, but it’s quite useful. I’m not an expert on it, so if you’re interested in deep on the regular expression you can surf the net or documentation.

In order to use a regular expression you have to import re .

There are a couple of re methods we can use, but I will only show some of them.

match(): Find the pattern at the beginning of a string
search(): Return the first match of a pattern in a string
findall(): Return all matches of a pattern in a string
finditer(): Return all matches of a pattern as an iterator

Example1:re.search() get datime and Tput value

regular expression pattern:
re.search(r’\[(\d+\.\d+\.\d+)\].*?(Tput=[^]]+)’, string)

So let's look at the below picture, we have a string I want to get the number inside the bracket, so I use the \[ and \] special character open and closing bracket. I also add () to group it, which [] is not in it, because we don’t want the bracket. Then I add \d+ mean at least one or more digits followed by a special character \. period, which matches 0.0.0 the format. The result will be in group1.

The next group2 will have the rest of the string but not include the closing bracket] . We need to add .* , .means any character except line, and *break zero or more times, and continue start with Tput = and not include [^]]close bracket, so it will not get the ] after 33.3.

Right now we need to get the Tput value, so the first thing we will do is do in group2 split('') space and get the related index of what we want.

newstr=m1.group(2).split(‘ ‘) print(m1.group(1),’\nTPUT:’, newstr[1])

Please refer to full code as below:

import re
string = "[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]"
# Method 1: using search method
m1 = re.search(r'\[(\d+\.\d+\.\d+)\].*?(Tput=[^]]+)', string)
#group(1) => 20221018.165317.401606
#group(2) => Value=    0.000091 Mbps, Mcs=  4.0(Sigma= 0.0), Num=   1.4
print(m1.group(2))
newstr=m1.group(2).split(' ')
print(m1.group(1),'\nTPUT:', newstr[1], '\nMcs',newstr[4].split('(')[0],'\nRb',newstr[7].split(',')[0])

##output will be like this
#20221018.165317.401606 
#TPUT: 0.000091 
#Mcs 9.0 
#Rb 1.4

or you can also use this pattern

re.search(r”^\[([\d\.]+).+Tput= ([\d\.]+).+Mcs= ([\d\.]+).+RbNum= ([\d.]+).+ReTxRatio= ([\d\.]+).+.”, string)

m4=re.search(r"^\[([\d\.]+).+Tput= ([\d\.]+).+Mcs= ([\d\.]+).+RbNum= ([\d.]+).+ReTxRatio= ([\d\.]+).+.", string)
print(m4.group())
#m4.group(1) =>20221018.165317.401606
#m4.group(2) => 0.000091 
#m4.group(3) =>9.0
#m4.group(4) =>1.4

However, this is not the best way to get the value by getting its index which is also the same problem mentioned earlier. I will show you a better way to accomplish it. For right now, I would just like to show you different methods first, how to parse related values.

Example2:re.findall() get all value not including datetime

regular expression
re.findall(r”[0–9]*\.[0–9]+”, string)

This method is only able to parse this format number0–9.number0–9 any style related to number between periods. There’s one problem with the datetime, because datetime styling uses three numbers with a period in each number like thisnumber0–9.number0–9.number0–9. , so in this case we aren’t able to parse datetime.

You can still try to parse the datetime, which will come out with results like below, only Date, hours, and minutes will be parse in group1, and the second will split into group2, please refer below.

import re
string = "[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]" 
m2=re.findall(r"[0–9]*\.[0–9]+", string)

[‘20221018.165317’, ‘.401606’]

So in this case I’ll not use parse the datetime, instead I will parse the other value like tput, mcs, and so on. I’ll use start index 2 and end with -1 which is the last list element, like below.

[‘0.000091’, ‘9.0’, ‘0.0’, ‘1.4’, ‘33.3’, ‘1.0’, ‘0.0’]

import re
string = "[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]" 
m2=re.findall(r"[0–9]*\.[0–9]+", string)
#convert m2 lsit to string 
m2=" ".join(m2)
print(m2)
#0.00009 9.0 0.0 .0 0.0

Senarior2: get the timedate and more value

In this senarior2 I would like to show how to grab other related values, such as Tput, MCS, RBNub, etc. I think it’s the best way to achieve grabbing related keywords rather than using an index. This example is similar to the first example2 above, but better way. I will be using a new method call re.sub() to replace the specific characters.

Example 3: using re.sub and re.search get timedate

re.sub is to replace or subsite specific characters, which you should also know. I will show it works, let me show you each command are before show you the code.

remove both () [] : re.sub("[$\[].*?[$\]]", "", string)
remove only (): re.sub("[$].*?[$]", "", string)
remove ,: use replace()
split(): need to convert to list, only split space
re.split(): can split multiple characters. ex: re.split(' : | * ', string)

In this example, I will be using the same as the example1 pattern, but I will show how to remove different values using reg.sub() method.

Step1: declare the string and parse the pattern as below

import re
string = "[20221018.165317.401606][info]:[DL- UE[ 0]: Tput= 0.000091 Mbps, Mcs= 9.0(Sigma= 0.0), RbNum= 1.4, ReTxRatio= 33.3, Layers= 1.0, PdschBler= 0.0, nonWPdschBler= 33.3]"
# need to remove () 
m3 = re.search(r'\[(\d+\.\d+\.\d+)\].*?(Tput=[^]]+)', string)

The grouping result is the same as the one I mention in Example1.

Step2: remove comma, and () using re.sub and replace()method.

Now the time in group1 is correct, so let's move to group2, and cut off related values which we don’t want.

m3New= re.sub(“[$\[].*?[$\]]”, “”,m3.group(3)).replace(‘,’,’’).strip().strip()

from the above picture, if we add a pattern to substitute () and replace ,commas to empty space. We should have a clean string with no specific character

Step 3 split the string to list m3New=m3New.split()

output:
[‘Tput=’, ‘0.000091’, ‘Mbps’, ‘Mcs=’, ‘9.0’, ‘RbNum=’, ‘1.4’,

Step4 declares a function to get the string’s next element which is the value

def getelement(li, element):
    ind = li.index(element)
    return li[ind+1]

Step5 now we can use the function to get the value that we wants

print(getelement(m3New, ‘Tput=’),’ ‘, getelement(m3New, ‘RbNum=’), ‘’, getelement(m3New, ‘Mcs=’))
output:
0.000091 1.4 9.0

So this method is the best way,if I want to get the value for tput, just use the function getelement() it will find the element and add 1 to get the value.

Below is the full code for this example.

Full code:

#remove () [] and use replace to remove , 
m3New= re.sub("[\(\[].*?[\)\]]", "",m3.group(3)).replace(',','').strip().strip()
m3New=m3New.split()
#['Tput=', '0.000091', 'Mbps', 'Mcs=', '9.0', 'RbNum=', '1.4', 'ReTxRatio=', '33.3', 'Layers=', '1.0', 'PdschBler=', '0.0', 'nonWPdschBler=', '33.3']def getelement(li, element):
    ind = li.index(element)
    return li[ind+1]# get next element 
print(getelement(m3New, 'Tput='),' ', getelement(m3New, 'RbNum='), '', getelement(m3New, 'Mcs='))
# 0.000091   1.4  9.0

reference: https://stackoverflow.com/questions/14596884/remove-text-between-and

[Update] 2023-April MUltiply UE parsing (new implement feature)

This is a different parsing value from above with a single UE. As you can see single UE values have only one UE so we can easily parse the ID, but multiply have so many UE ID, so in this case I will parse the average.

read file and check the value:

Get Tput Value

if "m>>> DL - " in line
if "mUL <<<-" in line

Get all other value

#UL
re.search(r'\[(\d+\.\d+\.\d+)\].*?(UL <<<- Mcs=[^]]+)', string)
#DL
re.search(r'\[(\d+\.\d+\.\d+)\].*?(>>> DL- Mcs=[^]]+)', string)

Code example as below:


import re
elogfileName="elog.txt"

#DL
with open(elogfileName, 'r') as filedata:    
    #get tput value relate string
    for line in filedata:                    
        print(line)
        # get other value string
        if "mUL <<<-" in  line:            
            for nextline in filedata:
                print(re.search(r'\[(\d+\.\d+\.\d+)\].*?(UL <<<- Mcs=[^]]+)', nextline))

output:

[20230311.012825.882348][info]:[[40;33mUL <<<- ingress traffic: 77.519501(Mbps) PDU_Count[8000], egress traffic: 74.335068(Mbps) PDU_Count[7647], ReRx: 2.915803(Mbps)[0m]
<re.Match object; span=(0, 225), match=’[20230311.012825.882486][info]:[UL <<<- Mcs= 19.2>

I will explain more clearly in below automation part, for here just need to know the regular expression pattern.

Parsing the value
Tput value: please go to senarior3 or Automation section for more clear example
other Value MCS, RB: it the same as single UE, so I will not mention too much. For the T

Senarior3: get timedate, ingress, and egress value

This is a special case if you have a string like below ingress traffic: 0.000545(Mbps), egress traffic: 0.001979(Mbps) and you want to grab the value.

This is a different way from the above two scenarios, you can still use the index splitting method. There’re many other ways to achieve it, but I used this way.

string2=”[20221110.112127.316856][info]:[PDCP DL- ingress traffic: 0.000000(Mbps), egress traffic: 0.000000(Mbps)]”

re.search() method:

Since I have introduced about getelement() in example4, so I need to split it into lists, so I have to use this way, to achieve the target string(which your find sting), and next to the string is the value.

string="[20221110.112127.316856][info]:[PDCP DL- ingress traffic: 0.000000(Mbps), egress traffic: 0.000000(Mbps)]"
searchtest=re.search(r'(ingress [^(]+).+(egress [^(]+)',string2)
test= searchtest.group(1)+", "+ searchtest.group(2)
#ingress traffic: 0.000000, egress traffic: 0.000000
test1=test.replace(", ", ":").strip().split(':')
['ingress traffic', ' 0.000000', 'egress traffic', ' 0.000000']

we can also use re.split() multiple characters:

string="[20221110.112127.316856][info]:[PDCP DL- ingress traffic: 0.000000(Mbps), egress traffic: 0.000000(Mbps)]"
searchtest=re.search(r'(ingress [^(]+).+(egress [^(]+)',string2)
test= searchtest.group(1)+", "+ searchtest.group(2)
#ingress traffic: 0.000000, egress traffic: 0.000000
newtest= re.split(': | * ', string__)
#['ingress', 'traffic', '', '0.000000,', 'egress', 'traffic', '', '0.000000']

re.findall() method:

When we use findall, it will return a list, but actually, inside it is a tuple, so we need to convert it to a list. You can use itertools module to convert to a list like list(itertools.chain(*findtest))

search2 = re.findall(r'^\[([\d\.]+).+ingress traffic: ([\d\.]+).+egress traffic: ([\d\.]+).+.', string2)
#[('20221110.112127.316856', '0.000000', '0.000000')]
#convert to list 
import itertools
out = list(itertools.chain(*findtest))
print(out)
#['ingress traffic: 0.000000', 'egress traffic: 0.000000']

You can also use the below pattern method, which I mention before, and assign group to a new variable to store ingress traffic and the other store egress traffic.

searchresult=re.search(r'ingress traffic: ([\d\.]+).+ egress traffic: ([\d\.]+)',string2)
ingress= searchresult.group(1) 
egress = searchresult.group(2)

or use the \s flag for space.

ingress =re.search(r'(ingress traffic):\s+(\d+.\d+)',string2)
#ingress traffic: 0.590000   
egress =re.search(r'(egress traffic):\s+(\d+.\d+)',string2)
#egress traffic: 0.000000
print(ingress.group(2), " ", egress.group(2))

Up to here, you should have an understanding of how to use a regular expression to catch the string. There’re many other ways, depending on how you want to accomplish it.

As you can see regular expression is pretty powerful, and indeed it’s flexible to match the related pattern you want. I have shown many different ways to catch the number or string, however, there’re still many other methods you can use; you can look more at regular expression documentation if you are interested.

Part 3 Automation Script

Right now I’m going to dive into the step-by-step procedure on how to read a log file and parse the data. My log file is related to the network log, so if you do not understand the log, it doesn’t matter. The purpose of this is to learn how to read a file, parse the data you want and save text files. After all, are been settled, we can convert the text file to excel, and in the future, you can use it to draw a line chart or other data visualization.

I will cover three senerarior in this autoamtion script using differnt function.

Below is the roadmap or process

prerequisite: please prepare your log file in the directory. I’m not going to check if your file exists or the current path exists or not, please keep this in mind.

[Update]2023 April I have changed many parts to improve my code, so I already update my GitHub page, if interested please refer to it.

(case1) Automation for scenarior2: get timedate, and other value

Step 1: import relate module

When you’re testing the code, you can hot code the elogfileName which store your logfile, and givenstring is the search target string you want to search. Let's look at the below picture, we can search A or B, depending on which value you want. A is scenario1, B is scenario2, the difference is A will only show Tput value, but B has UL or DL.

In this example, I will show you the senarior2 method in part 2 like below.

import os, re
from datetime import datetime
filename=f"result-{datetime.now():%Y-%m-%d %H-%M-%S}.txt"
result = [] 

elogfileName= input("Please enter your elog FileName: ")
while True:
    startscript= input("####press any key, q to exit script#####: ")
    if startscript =="q":
        break
    else:        
        main()

update2022/12/21 new version: I decided to add while loop, to be more convenient users don’t have to keep on typing the filename. If press any key will go to main function, q to exit the script.

Step 2–1: Write the result file to store or column name

This step is an option, just to check and write a result file

checkfile()

This function basely is to check if the file of the result text file exists or not, if yes, delete it. This file will store your final parsing data in it, it will only exist when you finish running the script. You will not have this file in the beginning.

writefile():

This function is going to create the write column title name for the final result file.

I want to mention here the filename name after the datetime, ex: YYMMDDHHMMSS. In step 1 I have declare the filename: filename=f”result-{datetime.now():%Y-%m-%d %H-%M-%S}.txt”

def checkfile():
    if os.path.exists(filename):
        print("file exist, delete file")
        os.remove(filename)
def writefile():
    checkfile()
    with open(filename, 'a') as f:
        bar="#"*10
        f.write(("datettime \t Tput"+ " "*3+ "RbNum " + "MCS "+"Bler " +"nonWdBler\n").expandtabs(22))

Step 2–2 Read log file main() function code

I assign the elogfileName which is my elogfile name, instead of typing the full filename, using a variable to store the filename will be easy to manage.

if the givenString is found when reading the whole file, then it will go to parse(line)this function, which will start parsing and splitting related values.

# if find target ULDL go to parse() to append to result list
def ULDLprint(target):
    
    with open(elogfileName, 'r') as filedata:
        for line in filedata:   
            if target in line:
                # Print the line, if the given string is found in the current line
                parse(line, target)

def main():

    givenString = input("Please enter your search (Ex: DL- UE / UL- UE / UL- UE[ 0] / both:):")
    writefile()
    if givenString =="both":
        UL = 'UL- UE'
        DL = 'DL- UE'
        emptywrite("UL")
        #print(f"="*25+"UL"+"="*25)
        ULDLprint(UL)
        #split line ==
        emptywrite("DL")
        #print(f"="*25+"DL"+"="*25)
        ULDLprint(DL)
        #ULprint()
        #DLprint()
    else:      
        emptywrite(givenString)           
        with open(elogfileName, 'r') as filedata:
            
            for line in filedata:   
                if givenString in line:
                    
                     # Print the line, if the given string is found in the current line
                     #print(line.strip())
                     parse(line, givenString)
    #print ("="*30)

Update 20221221: add a new function to store main function and ULDLprint function. The main function which key in the option you want to filter Downlink: DL- UE, uplink: UL- UE, or both for Uplink and Downlink. ULDLprint will append into list and do parsing data

Step 3: parsing data

This function it’s pretty long will cover step 3-step 5 so please notice these steps use the same function def parse(data,ULDLstr ):

I will be using this pattern: re.search(r’\[(\d+\.\d+\.\d+)\].*?(Tput=[^]]+)’) , if you’re not clear with this go to PART2. I make many examples. In this part, I’m not going to talk in more detail about it. Basely I want to get the timedate, and also the value.

def parse(data, ULDLstr):
  datestr = data.split('[', 1)[1].split(']')[0]
  search = re.search(r'\[(\d+\.\d+\.\d+)\].*?(Tput=[^]]+)', data)
  #remove () and comma after value
  m3New= re.sub("[\(\[].*?[\)\]]", "",search.group(2)).replace(',','').strip().split()
  #clear lisit this will speed up code
  result.clear()
  givenString=ULDLstr

There’s one thing I want to mention here with the Bler option in the elog file:

UL use option : PuschBler, and nonWPuschBler
DL use option: PdschBler, and nonWPdschBler

So I use an if else statement if givenString is DL or UL to assign parameters.

    bler1=""
    bler2=""
    
    if givenString in 'DL- UE' or 'DL- UE' in givenString :
        bler1="PdschBler="
        bler2="nonWPdschBler="
    elif givenString in 'UL- UE' or 'UL- UE' in givenString :
        bler1="PuschBler="
        bler2="nonWPuschBler="
    else: 
        print("givenString Not found string")

The reason of this givenString in ‘DL- UE’ or ‘DL- UE’ in givenString is because we allow user to add with DL- UE or DL- UE [0], so I have to use this way.

[update 2023April] ID has problem which I fixed, the behavior of UE id as below:

UE [ 0]=> single ID with space, but ID will increment
UE [10]=> when increment with 2 digit will not have space

so change the regular expression to :

accepted_strings = re.compile(r”([DU]L\-\ UE(\[\s*(\d{1,2})\])?)|both$”)

Step 4: save the result to the list result

save the result into the list, over here our list name is the result. Below you can see i use getelement(m3New, ‘Tput=’) instead of print, I save the value into the list

    result.append(datestr)    
    #print(getelement(m3New, 'Tput='),' ', getelement(m3New, 'RbNum='), '', getelement(m3New, 'Mcs='))
    #comment 
    result.append(getelement(m3New, 'Tput='))
    result.append(getelement(m3New, 'RbNum='))
    result.append(getelement(m3New, 'Mcs='))
    result.append(getelement(m3New, bler1))
    result.append(getelement(m3New, bler2))

please make sure you declaregetelement() function, else you can run it.

def getelement(li, element):
    ind = li.index(element)
    return li[ind+1]

Step5 print it or write it into a text file

Since we have saved all of our results into a result list, so it’s time to write into a file or print it out. I have made many different methods of writing or printing method, you can decide which to use. I have tested all of them can work properly.

please remove the # to see which one you like.

  #print(result)
    listprint() #write file into text
    listprint2() #only print 
    #listprint_Method2()  #write file or print using list comprehension
    #listprint_Method3()  # write file or print using enumerate
    #listprint_Method4() # read or write using zip

Full parse() function code:

def parse(data, ULDLstr):         
    #get the time
    datestr = data.split('[', 1)[1].split(']')[0]
    #split from tput
    #[20221018.170247.411259][info]:[DL- UE[17]: Tput=
    #search = re.search(r'\[(\d+\.\d+\.\d+)\].*?(Mcs=+)*?(Tput=[^]]+)', data)
    search = re.search(r'\[(\d+\.\d+\.\d+)\].*?(Tput=[^]]+)', data)
    #remove () and comma after value
    m3New= re.sub("[\(\[].*?[\)\]]", "",search.group(2)).replace(',','').strip().split()
    
    result.clear()
    givenString=ULDLstr
    bler1=""
    bler2=""
    if givenString in 'DL- UE' or 'DL- UE' in givenString :
        bler1="PdschBler="
        bler2="nonWPdschBler="
    elif givenString in 'UL- UE' or 'UL- UE' in givenString :
        bler1="PuschBler="
        bler2="nonWPuschBler="
    else: 
        print("givenString Not found string")

    result.append(datestr)    
    #print(getelement(m3New, 'Tput='),' ', getelement(m3New, 'RbNum='), '', getelement(m3New, 'Mcs='))
    
    #comment 
    result.append(getelement(m3New, 'Tput='))
    result.append(getelement(m3New, 'RbNum='))
    result.append(getelement(m3New, 'Mcs='))
    result.append(getelement(m3New, bler1))
    result.append(getelement(m3New, bler2))

    #DL
    #result.append(getelement(m3New, 'PdschBler='))
    #result.append(getelement(m3New, 'nonWPdschBler='))

    #UL
    #result.append(getelement(m3New, 'PuschBler='))
    #result.append(getelement(m3New, 'nonWPuschBler='))
                                     
  
    #print(result)
    listprint() #write file =>ok
    #listprint2() #print =>ok
    #listprint_Method2()  # write file =>ok
    #listprint_Method3() #write file =>ok
    #listprint_Method4()

listprint() method and listprint2() : is the most basic method, you can try with this first.

You can decide to print or write or use both. There is one thing I want to mention here with write file listprint2() . You can see the code cycle % 6 == 0 , this means after 6 columns will skip the new line, if not use this it will not add a new line.

#write to file
def listprint():
    #checkfile()
    cycle = 0    
    #with open("result.txt", "a+") as f:
    with open(filename, "a") as f:
        
        cycle += 1
        for element in result:            
            #print(element+ " ")
            f.write(element+ " ")           
        f.write("\n")
        #f.write()

#print
def listprint2():
    cycle = 0
    
    for element in result:
        cycle += 1
        #print(element, end="")

        print(element, end=" ")
        if cycle % 6 == 0:
            print("")

listprint_Method2

In this method I use a list comprehension method with a pretty short code as below:

def listprint_Method2():
    #####method1
    for i in [result[c:c+6] for c in range(0,len(result)) if c%6 == 0]:
       print(*i)

if you want to know what is normal for loop, please refer to this code:

  temp = []
    for c in range(0, len(result)):
        if c % 6 == 0:
            temp.append(result[c:c+6])   
    #for i in temp:
    #    print(*i)        
    #temp is two array
    with open(filename, "a+") as f:
        for file in temp:
            #file = file.strip()
            my_str = ' '.join(file)
            #print(type(my_str))
            f.write(my_str + "\n")

listprint_Method3

print result use this

for index, c in enumerate(result):
       if index % 6 == 0:
           print(*result[index:index + 2])

write the result to the file use this:

    with open(filename, 'a') as output:
        for index, c in enumerate(result):
            if index % 6 == 0:
                print(*result[index:index + 6], file=output)

listprint_Method4

def listprint_Method4():
    results = iter(result)
   
    with open(filename, 'a') as output:
        for i in zip(results, results):
            print(*i, file=output)
             #file=output

If you just want to print it instead of write in a file just remove the with open() and remove file-output() you can refer listprint_Method3 method

After here it will write the result into a file, like the one below.

Step6 convert to excel(option)

There are many pythons excel library or package you can use, but I choose two libraries one is openpxl and pandas.

[2023APRIL] I have changed converting Excel into two parts, one is if you parse either UL or DL, please use the below openpxl method, if not parse both UL and DL please use the pandas method which allows adding UL and DL separately sheet.

openpxl method

We will let the user enter the result file name which will generate in step 5. Now before running it, please make sure openpxl library, you can use pip install openpxl to install it. We only have one sheet so we will use the sheet[0]. So this code is read your result file, then separates the space and stores it in sheet[0], like list123 = line.split() .

It will check the result file’s tput string, it will accept either TPUT ULTPUT DLTPU

if list123[1] == 'Tput':
    sheet[0].append(list123)  # write into excel
elif list123[1] == 'DL-Tput':
    sheet[0].append(list123)  # write into excel
elif list123[1] == 'UL-Tput': 
    sheet[0].append(list123)  # write into excel

Below I also adjust the column width

while column < 6:
  i = get_column_letter(column)  
  #print(i)         
  sheet[0].column_dimensions[i].width = 25   
  column += 1

Add a font to the header of the column

sheet[0]['A1'].font = Font(size = 14, bold = True)
sheet[0]['B1'].font = Font(size = 14, bold = True)
sheet[0]['C1'].font = Font(size = 14, bold = True)
sheet[0]['D1'].font = Font(size = 14, bold = True)
sheet[0]['E1'].font = Font(size = 14, bold = True)
sheet[0]['F1'].font = Font(size = 14, bold = True)

full code

import openpyxl, string
from openpyxl.utils import get_column_letter
from openpyxl.styles import Font

def excelconvertResult(result):
    f = open(result, 'r+')  # open text
    excel = openpyxl.Workbook()
    sheet = excel.worksheets
    line = f.readline();  # read text

    while line:
        list123 = line.split()  # convert
        if "=" in line:
            pass
        else:
            if list123[1] == 'Tput':
                sheet[0].append(list123)  # write into excel
            elif list123[1] == 'DL-Tput':
                #pass
                sheet[0].append(list123)  # write into excel
            elif list123[1] == 'UL-Tput':
                #pass
                sheet[0].append(list123)  # write into excel 
            else:
                list123[1] = float(list123[1])
                list123[2] = float(list123[2])
                list123[3] = float(list123[3])
                list123[4] = float(list123[4])
                list123[5] = float(list123[5])
                sheet[0].append(list123)  # write into excel
            
                #excel cell's font
                sheet[0]['A1'] .font = Font(size = 14, bold = True)
                sheet[0]['B1'].font = Font(size = 14, bold = True)
                sheet[0]['C1'].font = Font(size = 14, bold = True)
                sheet[0]['D1'].font = Font(size = 14, bold = True)
                sheet[0]['E1'].font = Font(size = 14, bold = True)
                sheet[0]['F1'].font = Font(size = 14, bold = True)
                
        #adjust the column width 
        column = 1
        while column < 6:
            i = get_column_letter(column)  
            #print(i)         
            sheet[0].column_dimensions[i].width = 25    
            column += 1
        line = f.readline()  # read next line
    excel.save(excelfilename+'.xlsx')
resultfilename=input("please enter your report txt file name: ")
excelfilename=input("please enter save excel file name: ")
excelconvertMAC(resultfilename)

pandas method

pandas is a powerful tool which use to work at data frames, most often used often in data science or excel related fields. Please install the pandas library(pip install pandas) before running it. The code is much shorter than the previous one.

you can use this one:

Step: read txt file

#READ TXT FILE
with open (resultfilename, 'r')as myfile:  
    readline=myfile.read().splitlines()
    
    for line in readline:
        #print(line)
        if "=" in line:
            current_key = line.strip("=")
           
            lists[current_key] = []
        else:
            assert current_key is not None # there shouldn't be data before a header
            
            lists[current_key].append(line)

Step 2 store Ul and DL into list

def UL():
    for i in lists["UL"]:
        i=i.split()
        ULlist.append(i)
def DL():
    for i in lists["DL"]:
        i=i.split()
        DLlist.append(i)

Step3: write into excel

def writeExcel(result):
#writing into excel sheet
    if result =="UL":
        #uplink
        df1 = pd.DataFrame(ULlist)
        df1 = df1.rename(columns=df1.iloc[0]).drop(df1.index[0])
        df1['UL-Tput'] = df1['UL-Tput'].astype(float)
        #df1[TPUTULvalue] = df1[TPUTULvalue].astype(float)
        df1['UL-RbNum'] = df1['UL-RbNum'].astype(float)
        df1['UL-MCS'] = df1['UL-MCS'].astype(float)
        df1['UL-Bler'] = df1['UL-Bler'].astype(float)
        df1['UL-nonWPuschBler'] = df1['UL-nonWPuschBler'].astype(float)
      
    elif result =="DL":
        df2 = pd.DataFrame(DLlist)
        df2 = df2.rename(columns=df2.iloc[0]).drop(df2.index[0])

        df2['DL-Tput'] = df2['DL-Tput'].astype(float)
        df2['DL-RbNum'] = df2['DL-RbNum'].astype(float)
        df2['DL-MCS'] = df2['DL-MCS'].astype(float)
        df2['DL-Bler'] = df2['DL-Bler'].astype(float)
        df2['DL_Bler'] = df2['DL_Bler'].astype(float)
        df2['DL-nonWPdschBler'] = df2['DL-nonWPdschBler'].astype(float)
    
    elif result=="both":   
        #uplink
        df1 = pd.DataFrame(ULlist)
        df1 = df1.rename(columns=df1.iloc[0]).drop(df1.index[0])
        #df1[TPUTULvalue] = df1[TPUTULvalue].astype(float)
        df1['UL-Tput'] = df1['UL-Tput'].astype(float)
        df1['UL-RbNum'] = df1['UL-RbNum'].astype(float)
        df1['UL-MCS'] = df1['UL-MCS'].astype(float)
        df1['UL-Bler'] = df1['UL-Bler'].astype(float)
        df1['UL-nonWPuschBler'] = df1['UL-nonWPuschBler'].astype(float)

        #downlink 
        df2 = pd.DataFrame(DLlist)
        df2 = df2.rename(columns=df2.iloc[0]).drop(df2.index[0])
        df2['DL-Tput'] = df2['DL-Tput'].astype(float)
        df2['DL-RbNum'] = df2['DL-RbNum'].astype(float)
        df2['DL-MCS'] = df2['DL-MCS'].astype(float)
        df2['DL-Bler'] = df2['DL-Bler'].astype(float)
        df2['DL-nonWPdschBler'] = df2['DL-nonWPdschBler'].astype(float)

Step4: excel styling

    with pd.ExcelWriter(excelfilename+'.xlsx', engine='xlsxwriter') as writer:
        if result == "both":

            df1=df1.style.set_properties(**{'text-align': 'center'})
            df2=df2.style.set_properties(**{'text-align': 'center'})
                #multiply option
                #df1=df1.style.set_properties(**{'text-align': 'center',
                #'color':'red', 
                #'font-size':'1.0rem',
                #'font-weight': 'bold',
                #'background-color': 'yellow' })

            df1.to_excel(writer, 'UL', index=False)
            worksheet = writer.sheets['UL']   
            worksheet.set_column(0, 1, 25)   
            worksheet.set_column(1, 4, 15) 
            worksheet.set_column(4, 5, 20) 
            #worksheet.set_column(3, 5, 15)     
               
            df2.to_excel(writer, 'DL', index=False)
            #worksheet.set_column(1, 0, 20)
            worksheet = writer.sheets['DL'] 
            worksheet.set_column(0, 1, 25)   
            worksheet.set_column(1, 4, 15) 
            worksheet.set_column(4, 5, 20)

STEP: FULL CODE

types=""
if "DL" in lists and "UL" in lists:
    #print("both UL and DL exist")
    types="both"
    DL()
    UL()
    writeExcel(types)
    
elif "UL" in lists:
    #print("UL exist")
    types="UL"     
    UL()    
    writeExcel(types)
elif "DL" in lists:
    #print("DL exist")
    types="DL"
    #print("UL exist")
    #types="UL"      
    DL()  
    writeExcel(types)    
else:
    #print("Neither exist")
     types="NOt Exist"

(Case2) Automation for senarior3: get timedate, and pdcp

[Update 20230218] I have decided to update senarior3 because I felt like there are many interesting things to share with people, especially with an excel sheet that is pretty awesome. The output of this result will look like this:

Let me walk through the step of this code:

Step1: Run excel_pdcp_v2.py to parse the data out, after parse out will look like the above test.txt

I am not going to dive deep into this script, it’s related to the previous one just some search value is different.


def main():
    accepted_strings = {'UL', 'DL', 'both'}
 #elogfile=input("enter elog: ")
    givenString=input("enter UL/DL/both: ")

    if givenString =="both":
        UL = 'PDCP UL'
        DL = 'PDCP DL'
        emptywrite("UL")
        ULDLprint(UL)
        #split line ==
        emptywrite("DL")
        ULDLprint(DL)

    elif givenString not in accepted_strings:
        print("Not found, please reenter correct option")
    else:   
        if givenString=="UL":
            emptywrite("UL")
            givenString='PDCP UL'
        elif givenString=="DL":
            emptywrite("DL")
            givenString='PDCP DL' 
            
        with open(elogfile, 'r') as filedata:
            for line in filedata:        
                if givenString in line:
                     timeparse(line)
                     
elogfile=input("enter elog: ")
while True:
    startscript= input("####press any key, q to exit script#####: ")
    if startscript =="q":
        break
    else:
        main()

So it will ask the user to enter UL, DL or both( DL and UL), and it will parse the text file and write into a new textfile. We are grabbing three things:

timedate
DL’s ingress + egress value,
UL’s ingress + egress value.

Below is the log file, to show what I mean. The blue is the timedate, the red id the DL, and the green is the UL

def timeparse(data):
    datestr = data.split('[', 1)[1].split(']')[0]
    #ingress traffic: 96.125687, egress traffic: 95.911179
    searchtest=re.search(r'(ingress [^(]+).+(egress [^(]+)',data)
    m3New= searchtest.group(1)+", "+ searchtest.group(2) 
    m3New_1=m3New.replace(", ", ":").strip().split(':')
    #with list
    result.clear()
    #result.append(m3New)   
    #print(result)
    result.append(datestr)  
    result.append(getelement(m3New_1, 'ingress traffic').strip())    
    result.append(getelement(m3New_1, 'egress traffic').strip())
    ###########Print or Write result ###########
    listprint() #write file =>ok

Below show you what’s the output to be clear to understand what is been parsing. So basically I just want the timedate, and the ingress and egress Tput value. After parsing it will append to the list, which is similar to the previous example.

After it will generate a new file name PDCP_result.txt. You can change the name you like

Step2. run the excel_pdcp_v2.py , which will read PDCP_result.txt and write into excel with a different sheet to separate.

Step2.1 read file

Note: I use test.txt, which is the same as PDCP_result.txt, due to this is debug usage.

so in the code below I just check if there is an equal sign========, and split the UL or DL and

import pandas as pd
resultfilename=input("please enter your txt file name(Ex: test.txt) : ")
excelfilename=input("please enter saving excel file name(Ex: test): ")

lists = {}
current_key = None
#with open ('test.txt', 'r')as myfile:  
with open (resultfilename, 'r')as myfile:  
    readline=myfile.read().splitlines()
    for line in readline:
        #print(line)
        if "=" in line:
            current_key = line.strip("=")
           
            lists[current_key] = []
        else:
            assert current_key is not None # there shouldn't be data before a header
            lists[current_key].append(line)

As you can see lists will contain all the file’s content.

Step2.2 pandas dataframe

as you can see the content are all string, so we wish to change the Tput value to float this is how you can use:

        #inital new dataframe
        df1 = pd.DataFrame(ULlist)
        df1 = df1.rename(columns=df1.iloc[0]).drop(df1.index[0])
        #convert string to float
        df1['ingress-traffic'] = df1['ingress-traffic'].astype(float)
        df1['egress-traffic'] = df1['egress-traffic'].astype(float)

Step 3.3 write into excel and change column width

write exel using this :
with pd.ExcelWriter(‘out.xlsx’, engine=’xlsxwriter’) as writer:

let change the format of excel you can set as below, such as text align to center, or font size, etc.

single format
df1=df1.style.set_properties(**{'text-align': 'center'})
multiply format if you have to change many format

df1=df1.style.set_properties(**{‘text-align’: ‘center’, ‘color’:’red’, ‘font-size’:’1.0rem’, ‘font-weight’: ‘bold’, ‘background-color’: ‘yellow’ })

Add sheet name and width of the column

syntax: worksheet.set_column(start, end, width)
EX:
worksheet = writer.sheets['sheet1']#name sheet worksheet.set_column(0, 2, 20)#start to end column and width

Here is the full code:

with pd.ExcelWriter('out.xlsx', engine='xlsxwriter') as writer:
    
    
    df1=df1.style.set_properties(**{'text-align': 'center'})
    df2=df2.style.set_properties(**{'text-align': 'center'})
    #multiply option
    #df1=df1.style.set_properties(**{'text-align': 'center',
    #'color':'red', 
    #'font-size':'1.0rem',
    #'font-weight': 'bold',
    #'background-color': 'yellow' })
    
    
    df1.to_excel(writer, 'sheet1', index=False)
    worksheet = writer.sheets['sheet1']   
    worksheet.set_column(0, 2, 20)   
    
   
    df2.to_excel(writer, 'sheet2', index=False)
    #worksheet.set_column(1, 0, 20)
    worksheet = writer.sheets['sheet2'] 
    worksheet.set_column(0, 2, 20)

[Update two new case 2023April] Case 3 and case 4]

(case3) Automation for scenarior2: get timedate, and other value with multiply UE

This case is related to case2 but the different part is case2 parse the layer2 Log with single UE, but in this case I am showing you to parse multiply UE’s average tput value.

Step1: add the main function

Let me describe what we are doing in this part or you can refer to part 1 scenario 2

Let me explain here, the first green circle should be ignored because it’s an incomplete log. We wish to parse in this order get the m>>>DL or eUL<<< which will be our TPUT Value, then DL-MCS or UL-MCS get other relate values.

But as you can see the green circle is in front of the Tput parsing string, which is incorrect, so inorder to fix it we need to check it. so let begin explain the code.

Step 1: create main function:

add this to check after the first pattern is found

for nextline in filedata: if re.search(r’\[(\d+\.\d+\.\d+)\].*?(>>> DL- Mcs=[^]]+)’, nextline):

The reason why I read the file two times is that one is for UL, and DL. The count is to control writing the header or beginning of the text file. Let move to next step to talk about writefile(), l writing the header or beginning of the text file

def main():
    countUL =0 
    countDL =0            
##########################
    with open(elogfileName, 'r') as filedata:    
        for line in filedata:                    
            #print(line)
            if "m>>> DL-" in  line:
                if countDL == 0:
                    emptywrite("DL")
                    writefile("DL")
                    countDL+=1
                for nextline in filedata:
                    if re.search(r'\[(\d+\.\d+\.\d+)\].*?(>>> DL- Mcs=[^]]+)', nextline):
                        #print(line, nextline, end='')
                        givenString="DL"
                        parse(line, givenString)
                        parse_bler(nextline, givenString)
                        break # so you can start looking for the first match again
                        
    with open(elogfileName, 'r') as filedata:    
        for line in filedata:                    
            if "mUL <<<-" in  line: 
                if countUL == 0:
                    emptywrite("UL")
                    writefile("UL")
                    countUL+=1
                for nextline in filedata:
                    if re.search(r'\[(\d+\.\d+\.\d+)\].*?(UL <<<- Mcs=[^]]+)', nextline):
                        givenString="UL"
                        parse(line, givenString)
                        parse_bler(nextline, givenString)
                        break # so you can start looking for the first match again

Step 2 create emptywrite and write file:

def checkfile():
    if os.path.exists("result.txt"):
        print("file exist, delete file")
        os.remove("result.txt")

def writefile(status):
    checkfile()
    with open(filename, 'a') as f:
        bar="#"*10
        f.write((f"datettime \t {status}_Tput(ingress) {status}_Tput(egress) {status}_RbNum {status}_MCS {status}_Bler {status}_nonWdBler\n").expandtabs(22))

which will be like this:

Step3 add parsing string for Tput value, and other value

This section is same as case1 and case2 parsing

parse(list, UL or DL) =>going to parse Tput Value(ingress and egress)
parse-bler(list, UL or DL)=> get the opter option like MCS RB Bler

then it will save the result to a list write into file

def parse(data, ULDLstr):   
    #get tput value
    datestr = data.split('[', 1)[1].split(']')[0]  
    Tputvalue=re.search(r'(ingress [^(]+).+(egress [^(]+)',data)
    m3New= Tputvalue.group(1)+", "+ Tputvalue.group(2) 
    m3New_1=m3New.replace(", ", ":").strip().split(':')
    result.append(datestr)
    result.append(getelement(m3New_1, 'ingress traffic').strip())
    result.append(getelement(m3New_1, 'egress traffic').strip())    
    #listprint2()
    #listprint()
def parse_bler(data, ULDLstr): 
    #get bler
    blerresult = re.search(r'\[(\d+\.\d+\.\d+)\].*?(Mcs=[^]]+)', data)
    blerDL= re.sub("[\(\[].*?[\)\]]", "",blerresult.group(2)).replace(',','').strip().split()
    if ULDLstr in 'DL':
        bler1="PdschBler="
        bler2="nonWPdschBler="
        
    elif ULDLstr in 'UL':
        bler1="PuschBler="
        bler2="nonWPuschBler="
        
    #print(m3New2)
    #print(getelement(blerDL, 'RbNum='))
    result.append(getelement(blerDL, 'RbNum='))
    result.append(getelement(blerDL, 'Mcs='))
    result.append(getelement(blerDL, bler1).strip())
    result.append(getelement(blerDL, bler2).strip())

    listprint()
    result.clear()
def listprint():
    #checkfile()
    cycle = 0    
    
    with open(filename, "a") as f:
        for element in result:            
            #print(element+ " ")
            f.write(element+ " ")     
        f.write("\n")

Step4: create a new script to convert txt to Excel

import pandas as pd
import re
resultfilename=input("please enter your txt file name(Ex: test.txt) : ")
#resultfilename="111.txt"
excelfilename=input("please enter saving excel file name(Ex: test): ")
#excelfilename="111excel"

lists = {}
current_key = None
#with open ('test.txt', 'r')as myfile:  
with open (resultfilename, 'r')as myfile:  
    readline=myfile.read().splitlines()
    for line in readline:
        #print(line)
        if "=" in line:
            current_key = line.strip("=")
            lists[current_key] = []
        else:
            assert current_key is not None # there shouldn't be data before a header
            lists[current_key].append(line)
ULlist= []
DLlist= []

def UL():
    for i in lists["UL"]:
        i=i.split()
        ULlist.append(i)
def DL():
    for i in lists["DL"]:
        i=i.split()
        DLlist.append(i)
def writeExcel(result):
#writing into excel sheet
    if result =="UL":
        #uplink
        df1 = pd.DataFrame(ULlist)
        df1 = df1.rename(columns=df1.iloc[0]).drop(df1.index[0])
        df['UL_Tput(ingress)'] = df1['UL_Tput(ingress)'].astype(float)
        df1['UL_Tput(egress)'] = df1['UL_Tput(egress)'].astype(float)
        df1['UL_RbNum'] = df1['UL_RbNum'].astype(float)
        df1['UL_MCS'] = df1['UL_MCS'].astype(float)
        df1['UL_Bler'] = df1['UL_Bler'].astype(float)
        df1['UL_nonWdBler'] = df1['UL_nonWdBler'].astype(float)
        
    elif result =="DL":
        df2 = pd.DataFrame(DLlist)
        df2 = df2.rename(columns=df2.iloc[0]).drop(df2.index[0])
        #df2['ingress-traffic'] = df2['ingress-traffic'].astype(float)
        #df2['egress-traffics'] = df2['egress-traffics'].astype(float)

        df2['DL_Tput(ingress)'] = df2['DL_Tput(ingress)'].astype(float)
        df2['DL_Tput(egress)'] = df2['DL_Tput(egress)'].astype(float)
        df2['DL_RbNum'] = df2['DL_RbNum'].astype(float)
        df2['DL_MCS'] = df2['DL_MCS'].astype(float)
        df2['DL_Bler'] = df2['DL_Bler'].astype(float)
        df2['DL_nonWdBler'] = df2['DL_nonWdBler'].astype(float)

        #writeexcel(result)
    
    elif result=="both": 
        #uplink
        df1 = pd.DataFrame(ULlist)
        df1 = df1.rename(columns=df1.iloc[0]).drop(df1.index[0])

        #df1['ingress-traffic'] = df1['ingress-traffic'].astype(float)
        #df1['egress-traffics'] = df1['egress-traffics'].astype(float)
        df1['UL_Tput(ingress)'] = df1['UL_Tput(ingress)'].astype(float)
        df1['UL_Tput(egress)'] = df1['UL_Tput(egress)'].astype(float)
        df1['UL_RbNum'] = df1['UL_RbNum'].astype(float)
        df1['UL_MCS'] = df1['UL_MCS'].astype(float)
        df1['UL_Bler'] = df1['UL_Bler'].astype(float)
        df1['UL_nonWdBler'] = df1['UL_nonWdBler'].astype(float)
        
        #downlink 
        df2 = pd.DataFrame(DLlist)
        df2 = df2.rename(columns=df2.iloc[0]).drop(df2.index[0])
        df2['DL_Tput(ingress)'] = df2['DL_Tput(ingress)'].astype(float)
        df2['DL_Tput(egress)'] = df2['DL_Tput(egress)'].astype(float)
        df2['DL_RbNum'] = df2['DL_RbNum'].astype(float)
        df2['DL_MCS'] = df2['DL_MCS'].astype(float)
        df2['DL_Bler'] = df2['DL_Bler'].astype(float)
        df2['DL_nonWdBler'] = df2['DL_nonWdBler'].astype(float)

        
    #with pd.ExcelWriter('out.xlsx', engine='xlsxwriter') as writer:
    with pd.ExcelWriter(excelfilename+'.xlsx', engine='xlsxwriter') as writer:
        if result == "both":

            df1=df1.style.set_properties(**{'text-align': 'center'})
            df2=df2.style.set_properties(**{'text-align': 'center'})
                #multiply option
                #df1=df1.style.set_properties(**{'text-align': 'center',
                #'color':'red', 
                #'font-size':'1.0rem',
                #'font-weight': 'bold',
                #'background-color': 'yellow' })
                
                
            df1.to_excel(writer, 'UL', index=False)
            worksheet = writer.sheets['UL']   
            worksheet.set_column(0, 2, 20)   
            worksheet.set_column(3, 6, 15)     
               
            df2.to_excel(writer, 'DL', index=False)
            #worksheet.set_column(1, 0, 20)
            worksheet = writer.sheets['DL'] 
            worksheet.set_column(0, 2, 20)   
            worksheet.set_column(3, 6, 15) 

#check textfile contain Ul or DL
types=""
if "DL" in lists and "UL" in lists:
    #print("both UL and DL exist")
    types="both"
    DL()
    UL()
    writeExcel(types)
    
elif "UL" in lists:
    #print("UL exist")
    types="UL"     
    UL()    
    writeExcel(types)
elif "DL" in lists:
    #print("DL exist")
    types="DL"
    #print("UL exist")
    #types="UL"      
    DL()  
    writeExcel(types)    
else:
    #print("Neither exist")
     types="NOt Exist"

        #writeexcel(result)

refer github for sourcecode:

(case4) Automation for scenarior1: get timedate, and TPUT value

This case is the same as case 1, but more simple one, the difference is I just wants to get the datetime and Tout value, and i will show you different convert excel method; you can decide to use which one.

Please refer above directory picture, as you can see there are two excel folder

=> ExcelPandasMethod:using pandas method
=>Excel_openpxlMethod2: using openyml method

Step1 Parse value Dateitme and put value

import os
givenString = "DL- ingress traffic"
#givenString = "DL- UE"
result = []
filename="result.txt"

def checkfile():
    if os.path.exists("result.txt"):
        print("file exist, delete file")
        os.remove("result.txt")
       
def timeparse(data):
    #parse the date and time
    datestr = data.split('[', 1)[1].split(']')[0]
    Tput = data.split(" DL- ingress traffic:", 1)[1].split(',')[0].split('(')[0].strip()
    #list
    result.clear()
    result.append(datestr)    
    result.append(Tput)
    listprint() #write file =>ok
    listprint2() #print =>ok
    
#write to file
def listprint():
    #checkfile()
    cycle = 0    
    with open(filename, "a") as f:
        cycle += 1
        for element in result:            
            #print(element+ " ")
            f.write(element+ " ")           
        f.write("\n")

#print
def listprint2():    
    cycle = 0
    for element in result:
        cycle += 1
        print(element, end=" ")
        #print ('='*30)
        if cycle % 2 == 0:
            print("")

###################################    
    # MAIN SCRIPT    
###################################
elogfile=input("Please enter your elog file:")
writefile()
print ('datetime \t \t tput')
print ('='*30)
with open(elogfile, 'r') as filedata:
    for line in filedata:   
        if givenString in line:
             # Print the line, if the given string is found in the current line
             #print(line.strip())
             timeparse(line)
#print list value
print ("="*30)
#print(result)

Step 2 convert it to excel

Method1:

import openpyxl, string
from openpyxl.utils import get_column_letter
from openpyxl.styles import Font

def writeexcel(result):
    f = open(result, 'r+')  # open text
    #########if load excel file ########################
    excel = openpyxl.Workbook()
    sheet = excel.worksheets
    line = f.readline();  # read text
    sheet2=excel.create_sheet("Mysheet1", 1) 
    new=[]
    count =0
    while line:
        list123 = line.split()  # convert
        if "=" in line:            
            pass  
        elif "-" not in line and "#" not in line:   
        #else:    
            if count <=2:
                if list123[1] == 'Tput':
                    sheet[0].append(list123)  # write into excel
                else:
                    list123[1] = float(list123[1])
                    sheet[0].append(list123)  # write into excel         
        line = f.readline()  # read next line
    excel.save(excelfilename+".xlsx")
resultfilename=input("Please enter your txt file:")
excelfilename=input("Please enter your xlsx file:")
writeexcel(resultfilename)

Method2 pandas

I have check if text file contain- or = ignore it:

this only support one single character
#df = pd.read_csv(file, sep=" ", names=column_names, comment="-"

import pandas as pd
def checkchar(x):
    char11=""
    for line in file:
        if '-' in line:
            char11="-"
            break
        elif '#' in line:
            char11="+"
            break 
        else:
            break
    return char11            

with open("result.txt", "r") as file:
    column_names = next(file).split()
    resultchar=checkchar(file)    
   
    if resultchar == "":
        df = pd.read_csv(file, sep=" ", names=column_names)
    else:
        df = pd.read_csv(file, sep=" ", names=column_names, comment=resultchar)
        #df = pd.read_csv(file, sep=" ", names=column_names, comment="-"
    df.to_excel("test.xlsx", index=False)
 
writer = pd.ExcelWriter("test.xlsx", engine="xlsxwriter", )
df.to_excel(writer, sheet_name="Data", index=False)
sheet = writer.sheets["Data"]

# Set column widths (option)
sheet.set_column(0, len(column_names)-1, 25)
 
#st header's header (option)
header_format = writer.book.add_format({'bold':True, 'font_size':14})
for index, name in enumerate(column_names):
    sheet.write(0, index, name, header_format)
 
#writer.save()
writer.close()

Method3 openpxl

import openpyxl
from openpyxl.styles import Font
from openpyxl.utils import get_column_letter
 
def convert_cell(text):
    """Try to convert cell text to object of appropriate type"""
    # Currently the only type we convert are floats
    try:
        return float(text)        
    except:
        pass
    return text
 
def text_file_to_excel(src, dest):
    """Read spreadsheet like thing from string file and convet to excel"""
    wb = openpyxl.Workbook()
    sheet = wb.worksheets[0]
 
    with open(src, "r") as src:
        # First line is column headings
        line = next(src).split()
        sheet.append(line)
        for column in range(1, len(line)+1):
            
            sheet.cell(1, column).font = Font(size=14, bold=True)
            sheet.column_dimensions[get_column_letter(column)].width = 25
 
        # Copy remaining rows converting number strings to numbers.
        for line in src:
            te=[convert_cell(text) for text in line.split()]
            if "---" not in line:  # check if the line does not contain "---"
                sheet.append([convert_cell(text) for text in line.split()])
    wb.save(dest)
text_file_to_excel("result.txt", "test.xlsx")

Case4 Merging multiple log file into one file

[Update 20220419]

What?

Today I realized if I have multiple elog, it’s pretty hard to enhance the log file, so I decided to find a new solution to solve this issue. The solution is to merge all the related log files into one file. This is pretty useful when you have multiple log or Txt file and wants to merge all into one file.

Why?

If I have log like this:

Then I have parse one by one which is a waste of time just like this:

So i decide to show you to merge them all into one file then you can run the above case1 ~case3 which will be more productive and much more efficient.

How?

I show you two methods, you can decide which one to use one is using the normal read and write, and another one is using the glob method.

using read and write file

   
#directory = "/path/to/files"
directory = "."
# Output file name
output_file = "merged.txt"

# Loop through all files in the directory and append their contents to the output file
with open(output_file, "w") as outfile:
  for filename in os.listdir(directory):
    if filename.startswith("elog_gnb_du_layer2"):
       with open(os.path.join(directory, filename), "r") as infile:
         outfile.write(infile.read())

glob

import glob
file_pattern = 'elog_gnb_du_layer2*'
file_list = glob.glob(file_pattern)
with open('merged_file.txt', 'w') as outfile:
  for file in file_list:
     with open(file, 'r') as infile:
       outfile.write(infile.read())

This is pretty useful especially when there are a lot of data and the need to analyze it

[update20230602]

I have realize there is some problem with the merge script, the problem is if your file look likeXXX_8.txt XXX_9.txt, XXX_10.txt then it might read the XXX_10.txt => XXX_8.txt => XXX_9.txt pretty strange right.

SO in order to fixed this issue have to use natsort this method, I find this solution using chatgpt, it mentions using this method.

so let me show you a demo to make a comparison between both methods, please refer below picture. As you can see without using the natsorted it will read 10.txt =>8.txt=>9.txt, whereas using natdorted it will read file in order.

Ｉ hope is is a good understanding and example.

Case5 Data Visualization Draw the Tput into Line graph

I have been studying matplotlib.pyplot which is able to draw a line graph, which is also my plan. Today I finally fixed the x-axis issue and am ready to publish this post on the drawing line graph.

Due to my Excel file containing many data about more than a thousand rows, which is the x-axis letter or word will overlap, which looks like this

I tried the below method which doesn't work by increasing the interval, but the more interval I put the x-axis of the date will be cut off so this is not a good solution.

tick_interval = 15 # Adjust the interval as per your preference
plt.xticks(range(0, len(datetime), tick_interval), rotation=90)

So the best solution to this problem without overlapping each other : plt.xticks( np.linspace(0, len(datetime)-1, 100 ),rotation=90 )

So this code is evenly spaced numbers over a specified interval, so if I have 1000 rows, it will divide and left with 100 row or something like that so will not overlap word.

The syntax is:
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

I am lazy so ask chatGPT what is np.linspacebelow is what it mention:

Full code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Read the Excel file into a pandas DataFrame
data = pd.read_excel('excelFileName.xlsx')

# Extract datetime and Tput columns
datetime = data['datetime']
tput = data['Tput']
mcs = data['MCS']
# Convert the datetime column to numeric values using NumPy
x = np.arange(len(datetime))
#figsize=(width,height)
plt.figure(figsize=(15,5), dpi=300)
# Plot the line graph
plt.plot(datetime, tput, label='DL')
plt.plot(datetime, mcs, label='MCS')

plt.xlabel('time')
plt.ylabel('Tput')
plt.title('TPUT')

#tick_interval = 15  # Adjust the interval as per your preference
#plt.xticks(range(0, len(datetime), tick_interval), rotation=90)
plt.xticks( np.linspace(0, len(datetime)-1, 100 ),rotation=90 )

plt.legend() 
#adjust the layout 
plt.tight_layout() 
#plt.show()

plt.savefig('tput.png')

I mark the related code in the below picture to be clear to understand which each code identify it:

Excel looks like this, script will read this file

the output will look like this:

Let me wrap up the step before running this:

First Prepare your elog file
Second: run the parse script to parse the DL or UL or both
Third: convert to excel file
Forth: run this script which ill run the third step’s excel file and save as a png file.

Summary

This is what I have learned during this time of learning to grab data. I use to hate to use regular expressions because it’s pretty complicated. But I have no other option, so I went and play around with it, and learn a lot of stuff.

So what will you gain after reading this, I believe there’re many skills I mention in this post, that you will be able to use in many different projects. I don’t like to just have one answer, so I went and search online, asking for some help in discussion forums, etc. Coding has no right or wrong answer, but you need to learn how to find solutions and many different ways. Last but not least this note will help you, I know there are many more methods to achieve it.

Let's wrap up all patterns I mention in this post:

This is a note of grabbing string on what I have learned, and there’s more to learn. This is not the final code, I will keep on updating the code. If you’re interested in full code, please refer to my GitHub to see all code. I make three scenario individual files, and in the future, I will try to merge together the code.

I try my best to use many different ways to establish the solution, and I just want to keep it simple, for beginners to understand while I’m searching.

PS: If some code is not good, please leave a comment, or let's discuss it. Thanks

Note: I will update some some picture if I have time, there are some part I will havn’t update. Please wait for my update if you’re interesting on this topic. Thanks in advance.

[2023–04–10Update:]

My code has final been released in GitHUb, and I made a lot of improvements.

Description of file:

How to run it

Last but not least I have one last word to say:

This is based on Network related automation Project, you don’t have to know anything related to Network in order to know this project. Basely this project is to analyze the log file, grab the log into a Txt file then convert it to Excel. I believe it’s able used in many files, so wish it can help.

As you notice you will get to understand this in this code:

Read and write text file
filter-related string pattern using regular expression
write into excel
merge multiply file
plot excel file into excel file