Searching a Codebase with Python — Automate Your Job
How to quickly find deprecated code in a codebase using nothing other than Python and the OS import.
Just this last week I was given the task to go through one of our codebases at work to find unsupported code. We are in the process of moving to the cloud and needed to identify code that isn’t cloud compatible. Normally this would mean hours of sifting through code to find the keywords and noting them down before attempting to make updates.
Not today though! With the power of Python, I will show you how you can search an entire directory to find keywords in a file. Once we find files with offending code we will create a list so we can find the most offensive files and focus on those.
Iterating the File Tree
First, we will collect the paths for files we are interested in. Here we have the directory to search defined at the top. Also, we created an exclusion list. The exclusion list makes sure we don’t get files that aren't relevant to our platform. The node_modules folder is a prime example of one we don't want to include.
We initialize file paths outside the loop, as we will be iterating over it later.
OS.walk
will help us with the hard work of iterating each of the directories. It returns the root directory name (as we iterate we are getting an updated root). Also, it will give us a list of subdirectory names and file names.
We are ignoring the subdirectory names because we just want the paths to files. Collecting them means iterating over the list of file names in the directory, check to make sure it has an appropriate extension, and make sure it doesn't exist in our exclusion list.
Sort Through the File
Next, we will want to go through all the files we found and see if they have any of the offensive code. We created an array that contains all the methods we want to search for. Any occurrences of these will need to be updated later.
We will visit each of the files we found in our last step. When we visit them we will open and count the occurrences of any offensive code. If we find an occurrence, we will add to the occurrences array with the path, unsupported code, and the number of times it appeared.
Results
Using this sort bit of scripting saved me from having to open 150+ files of code. Instead, it found the 26 files with offensive code so I can focus on those. Also, I was able to give my manager a better idea of the scope of the project in just a few minutes rather than several days.
For those who may look over the code and point out this could have been done in fewer steps using map, filter, and reduce — you are right! Not every situation needs polished code though. This is a great example of how with relatively little Python experience, anyone can save time at their job.