Quick and Dirty Python Log File Parsing

Problem :

 Lets say you have a lot folders on your drive with many text/log files you need to search, but you only need a handful of data from each log file. This is a common problem in IT , but you can use python to do log parsing. 

Solution : 

You can easily use Python to read every file and pull out exactly what you want! 

Lets go Line by Line! 

Please install Python 3 and make a blank .py file in the directory you will be searching. Open the .py file in a text editor like Notepad+

Imports

To start, you will need to import some Libraries (pre-written pieces of Python code with modules that have functions, classes, and methods you can use) 

Import these at the top of your file with ‘from’ and ‘import’ statements below


from pathlib import Path
import sys
#Path lets you read file paths
#sys lets you interact with the OS, in our case the text output

Variables

Put a space below the imports, now declare two Variables, src_array for defining the directories we will search in an array (a list of things in quotes separated by commas and closed with brackets) , and c as a counter. 

src_array = ['directory path 1', 'directory path 2', etc]

c = 0

Main Loop

Below the variables, we start to make the heart of the program, where it reads the source array we just made, and iterates through it.
The first line we create the loop, then a variable source_dir which reads in the path from each index in the directory array, and than opens the directory. (you’ll need to run this script as admin to read files) 

files is the variable created from .iterdir()  method which reads all files

We then use the sys class to write some formatted output  so the total output is easy to read. It writes a line separator 

for i in src_array:
source_dir = Path(src_array[c])
files = source_dir.iterdir()
sys.stdout.write("=======================")
sys.stdout.write("\n**Reading Directory : "+i+'**')
sys.stdout.flush()

Inner Loop (Main Function)

We now loop through each file in the array of files created above (for file in files) and use a with clause to process each in memory as a file object (file handle) 

It now reads each line in the file, assigning it to ‘line’ and has a series of IF statements searching for text in each line that we want, or combinations. Then prints if it has a match.

After the IF statements, it moves to the next file, then up the chain to the next directory until it has finished! (c variable tracks the directory, we add 1 to it after both loops run so it knows to move to the next dir in the array)

I’ve tested this on files with 100s of directories and 100s of files and it works very well! If you liked this, I wrote a few other automation scripts for web scraping here

 for file in files:
with file.open('r') as file_handle :
for line in file_handle:
if "error" in line:
sys.stdout.write(line)
sys.stdout.flush()
if "192.168.0.1" and "syn-ack" in line:
sys.stdout.write("----"+line)
sys.stdout.flush()

c+=1

Full Script :

from pathlib import Path
import sys
src_array = ['example','example2']
c = 0
for i in src_array:
source_dir = Path(src_array[c])
files = source_dir.iterdir()
sys.stdout.write("====================")
sys.stdout.write("\n**Reading directory : "+i+'**')
sys.stdout.flush()
for file in files:
with file.open('r') as file_handle :
for line in file_handle:
if "error" in line:
sys.stdout.write(line)
sys.stdout.flush()
if "192.168.0.1" and "syn-ack" in line:
sys.stdout.write("----"+line)
sys.stdout.flush()
c+=1

Leave a Reply

Your email address will not be published. Required fields are marked *