Matching spaces between words excluding newline characters using negative-lookbehind regex

Abu Sakib
Blog of An Introvert
2 min readJul 9, 2018

So for some time now I’ve started learning regular expressions. It’s an awesome tool to find any type of pattern quickly in any type of text. In the past I’ve seen the option to use regex to find something in a lot of software, specially text editors and various terminal commands; now I finally understand it’s for their suck powerful capability of finding patters quickly.

As for my next project, I’m trying to cook up a program that will find common typos like two or more spaces between words, multiple periods or exclamation marks etc. and correct them.

Then I found out about lookbehind and lookahead. These are more powerful tools apart from the basic knowledge of regex.

So below is the regular expression I’ll use to match all spaces excluding newline:

(?!\n)\s+

This expression tells to find space characters that are not preceded by a newline. And it’s a negative-lookbehind regex.

It can be applied to the following text:

As you can see, there are two newlines, and hell of a lot of spaces. The regex can be easily tested and verified here.

We can use this regex in Python by using the re module and applying it to our desired text.

import re
spaceregex = re.compile(r'(?!\n)\s+')
###apply to your desired text###

My idea of the project would be to apply correction that has been copied to the clipboard or directly read from and write to files.

I’m gonna do a full follow-up as soon as I’m done with the whole thing.

If you liked my article, please give me clap and share it- it’d mean a lot to me…

--

--

Abu Sakib
Blog of An Introvert

Interested in programming, Linux, open source, literature, art, film, philosophy, chess, anime and many other things…