Analytics Vidhya
Published in

Analytics Vidhya

Day 15: Reliably modify and replace text with the least amount of effort

To guarantee your analysis is streamlined and repeatable, get acquainted with the function regexprep.

On and , we used regular expressions to locate text. In today’s post, we use regular expressions within the function regexprep, which is the key to making precise and effective text substitutions.

Here are a few simple use cases for regexprep:

Removing characters from filenames. In the interest of consistency, it’s great if you can take a set of data with a filename and turn that filename into a figure title. Here’s how you might do it.

% Lets suppose we have a file with some data %
file = 'C:\Files\Kaleemah\Mouse1\Images\Neuron_10_May_12_Dorsal.tif'
% We would load the file and show the image %

Now you want to create a title but you don’t want all that extra stuff. Here’s how to do it in two steps. First, make sure you feel comfortable with the topics I cover in Day 13 and Day 14. Next, we’ll just get the filename:

% Extract just the filename %file = regexp( file, '(Neuron.*)(?=.tif)', 'match')
file = file{1}

Aside: Yes there’s a far more sophisticated way of doing this, which I have not covered yet because as I’ve said, regular expressions can become very complex and I suggest that if you have the time and patience, you spend a few weeks on them. Here is the alternative approach:

file = 'C:\Files\Kaleemah\Mouse1\Images\Neuron_10_May_12_Dorsal.tif'
file = regexp( file, '([^\\]+$)', 'match')
file = regexp( file, '(.*)(?=.tif)', 'match')
% To recover the output as a string you'll need to use file{1}{1} %

In either case, at this point, you should have a variable file which is a string containing: ‘Neuron_10_May_12_Dorsal’

Now if you tried to create a figure and title a plot, whether it’s through the axes or title function, you will run into this issue:

Here’s how to use regexprep to get around this:

file = regexprep( file, '_', ' ' )

If you then title the plot, you will see the text rendered without the subscripts.

Another use case: loading, renaming, and saving files. If you are interested in reliable and repeatable analysis that won’t cause headaches for you or your collaborators later on, then the following image is your worst nightmare:

Accessing files through a graphical interface is not only tedious and manual, but the worst thing about it is that, in most cases, it leaves no trace of where the file you’re opening is located.

The alternative is to have written filenames which are kept in your analysis scripts or functions, so that the next person who goes to repeat your analysis can see exactly where the files are located.

Here’s what that could look like:

files{1} = 'C:\Files\Kaleemah\Mouse1\Images\Neuron_10_May_12_Dorsal.tif';

This would be the location of the loaded file. Next, suppose you want to read in the image and do some analysis of it.

% Suppose you want to load the images (For illustrative purposes only!)   %

% image{1} = imread( files{1} );
% image{1} = some_analysis_fxn( image{1} );

It would be fine to write the output image to the original location (files{1}), but then you’d overwrite your original image. Here’s how you could replace the folder location and add ‘_modified’ to the filename:

% First change the folder location %
files_adjusted{1} = regexprep( files{1}, 'C:\Files\Kaleemah\Mouse1\Images\', 'C:\Files\Kaleemah\Mouse1\Images\Adjusted\' )
% Next, change the file name %
files_adjusted{1} = regexprep( files_adjusted{1}, '.tif', '_modified.tif' )

The new filename should be:

“C:\Files\Kaleemah\Mouse1\Images\Neuron_10_May_12_Dorsal_modified.tif’’

You can now save your modified image to the new location contained in the variable files_adjusted{1}.

Now that you have these important text manipulation skills under your belt from , , and , let’s move on to , where we’ll learn about another extremely useful skill for any data scientist: tables!

--

--

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jozsef Meszaros

Neuroscientist/cell biologist/data scientist at Columbia University