Balancing Chemical Equations With Python

Mohammad-Ali Bandzar
The Startup
Published in
9 min readMay 27, 2020

We’ve all taken that pesky high school chem class where we were asked to balance chemical equations and felt mindless tasks like that could be automated.

Today we will be writing a short Python program designed to balance chemical equations. To start off we will be taking user input using Python’s built in input() function.

Before we go any further, i’d like to point out that the majority of the programming will be with regards to populating a matrix with the quantities of our chemical equations and we will be importing a library to handle solving the matrix.

After taking in our user inputted data. We are going to do the most obvious thing and remove any whitespace the user may have added as that will mess with our parsing and our output formatting, this will be done using pythons built in .replace() function. We are then going to split this string by plus signs to generate a list of chemicals. We will do this for both reactants and products.

Below is an example of what we have so far, (I added print statements for both reactants and products):

We are now going to loop through our two lists and call a function we haven’t written yet for each compound. We are going to pass the following arguments: compound, index which will go from one to the number of compounds(the sum of the two lists) (this is so our program know which row of our matrix to modify), and our third parameter will be which side of the chemical equation our compound is on, one for the left side and negative one for the right side. This is important because products should be inputted as negative numbers into our matrix.

Now we must create the function I arbitrarily named compoundDecipher. The first thing we are going to do is separate out the parenthesis from the rest of the compound. We will do this using regular expressions. You will need to import the regular expression library at the top of the file with: import re. Then you can use regex to separate out the parenthesis segments

To breakdown the regex, the outermost parenthesis(near the single quotes) indicate that that is our capture group and it is what we want to keep. The inner parenthesis with the forward slash before them mean that we want to literally find parenthesis(this is called escaping) the [A-Za-z0–9] indicate that we are ok with any letter(of any case) or number within our parentheses and the asterisk after the square brackets is a quantifier. It means that we are ok with having zero or infinite many letters(of any case) or numbers within our parenthesis. and the [0–9] near the end, indicate that we want to include ALL digits to the right of our parenthesis in our split. If you want to play around with regular expressions you can view this one here.

We are now going to loop through our segments, if a segment begins with a ( we want to split the segment so that the part with the number is separated from the rest. we now know that the number to the right of the parenthesis which I refer to as a “multiplier” will be the second half of our split. The rest of the compound will be the first segment. we will use the [1:] to remove the first character of our segment as it’s a bracket we don’t need.

you can play with the regex used here

Note that the print statements have been added so we can test the program out as shown below:

From here, all that’s left to do is isolate the individual elements and to populate our matrix. To isolate our elements, i’m going to create a new function. I’m going to pass to it:

  1. segment
  2. Index: (so it knows which row of our matrix we are modifying),
  3. multiplier: which we will set to one for segments that dont have parenthesis
  4. Side: so our function can make our products negative in our matrix

Before writing this function, we are going to remove our print statements and modify our previous function compoundDecipher to call it as follows:

we will start off by separating out the elements and numbers using a regex and looping through every element with a while loop.

you can play with the regex used here

Within the loop, we are going to start off by checking if the length of the segment is greater than zero, because we will have some blanks. If it is greater than zero, we want to check if the one after it is non-zero. If so the we will call another function we haven’t made to add the element to a matrix we are yet to define.

So our new function which I have called addToMatrix has the following parameters:

  1. elementName: the name of the element(should be the same as one on the periodic table)
  2. Index: the row of the matrix to insert the data to
  3. count: how many of that particular element to add to our matrix
  4. side: 1 for products and -1 for reactants

Before we begin writing our new function, we want to create two global lists which I will call elementList and elementMatrix.

I like to create global variables directly below imported modules.

Now we can start writing our final function. To start off we will check if the elementMatrix needs to have a row added to it, if our index is equal to our elementMatrix length, we know that we need to create a new row.

Inside this if statement we are going to fill the row we just created with the same number of zeros as we have elements(the length of elementList)

Now we are going to check if the element we want to add to our row is one we’ve NOT seen before. We will do this by checking if it can’t be found in elementList. If so, we are going to add it into elementList and we are going to add a zero to every row in our matrix.

Now we are going to locate which column we want to modify by using the index function on element list. We are then going to modify elementMatrix at the index and column position and increment it by the product of element count and the side(to give it the correct sign).

Our function should now look as follows:

Now as a sort of sanity check at the very bottom of our program i’ve added the following print statements so we can test it out.

We can now try and decipher what we are seeing. The second line from the bottom represents elementList which we can see located 4 elements Ca,O,H,P which looking at the equation appears to be all of them. The last line represents elementMatrix each set of square brackets within this line represent a row in our matrix. The first row of our matrix would be (1,2,2,0) which using elementList data would represent 1 Calcium 2 oxygen 3 hydrogen 1 phosphor which matches what we see in our first reactant compound. This applies to every row in our matrix. With rows representing products being negative.

All that’s left to do is solve the matrix and output it in a beautiful fashion. To solve the matrix we will be using Sympy. So we will start off by importing the parts of it we need, the matrix functionality and the Lowest Common Multiple function(LCM).

We are going to start off by converting our “matrix” into one Sympy understands:

We are now going to transpose our matrix(swap rows and columns) because we want each Column to represent a coefficient of a compound in our chemical equation.

If we were to now directly run this through the matrix solver it would return all zeros. Because all zero coefficients is technically a balanced equation. To find our desired solution we will instead calculate the matrix’s null space (it’s a 2d matrix but we only need the first row) any multiple of the null space will be a valid solution to our matrix.

To better see how this works here’s an example:

If we use those values as coefficients to our reaction, we technically have a valid solution. But since there is a preference for integer coefficients in chemical equations, we want to find the lowest common multiple of all our coefficients and want to multiply that in.

We are now technically done, but i want a cleaner looking result after putting in all this effort. So i’ve written a couple of lines of code to better format our output.

It now looks as follows:

Thanks for reading, you are now able to answer all those pesky Balancing chemical Equations questions on Khan academy.

The complete code has been attached below

import re
from sympy import Matrix, lcm
elementList=[]
elementMatrix=[]
print("please input your reactants, this is case sensitive")
print("your input should look like: H2O+Ag3(Fe3O)4")
reactants=input("Reactants: ")
print("please input your products, this is case sensitive")
products=input("Products: ")
reactants=reactants.replace(' ', '').split("+")
products=products.replace(' ', '').split("+")
def addToMatrix(element, index, count, side):
if(index == len(elementMatrix)):
elementMatrix.append([])
for x in elementList:
elementMatrix[index].append(0)
if(element not in elementList):
elementList.append(element)
for i in range(len(elementMatrix)):
elementMatrix[i].append(0)
column=elementList.index(element)
elementMatrix[index][column]+=count*side

def findElements(segment,index, multiplier, side):
elementsAndNumbers=re.split('([A-Z][a-z]?)',segment)
i=0
while(i<len(elementsAndNumbers)-1):#last element always blank
i+=1
if(len(elementsAndNumbers[i])>0):
if(elementsAndNumbers[i+1].isdigit()):
count=int(elementsAndNumbers[i+1])*multiplier
addToMatrix(elementsAndNumbers[i], index, count, side)
i+=1
else:
addToMatrix(elementsAndNumbers[i], index, multiplier, side)

def compoundDecipher(compound, index, side):
segments=re.split('(\([A-Za-z0-9]*\)[0-9]*)',compound)
for segment in segments:
if segment.startswith("("):
segment=re.split('\)([0-9]*)',segment)
multiplier=int(segment[1])
segment=segment[0][1:]
else:
multiplier=1
findElements(segment, index, multiplier, side)

for i in range(len(reactants)):
compoundDecipher(reactants[i],i,1)
for i in range(len(products)):
compoundDecipher(products[i],i+len(reactants),-1)
elementMatrix = Matrix(elementMatrix)
elementMatrix = elementMatrix.transpose()
solution=elementMatrix.nullspace()[0]
multiple = lcm([val.q for val in solution])
solution = multiple*solution
coEffi=solution.tolist()
output=""
for i in range(len(reactants)):
output+=str(coEffi[i][0])+reactants[i]
if i<len(reactants)-1:
output+=" + "
output+=" -> "
for i in range(len(products)):
output+=str(coEffi[i+len(reactants)][0])+products[i]
if i<len(products)-1:
output+=" + "
print(output)

--

--