How to automate Doc word file conversion to PDF in Batch with Python easily (Updated)

Umar Farooq Khan
The Startup
Published in
5 min readDec 4, 2020

Sometimes we would like to convert some docx/doc or word files to PDF without going into large number of steps, which can be very hectic when we are converting docx files to PDF on a daily basis or even on hourly basis or if you want to convert files in batch then it can be quite difficult to do this task.

I came across this issue and tried to automate this process when I used to be in the job hunting phase, where I have to convert my CV and Cover letter to PDF after some necessary updates, some values that keep on changing on a daily basis, which can be quite tedious to change 2–3 values in my cover letter and change the company name and position for the job for each iteration/application. Let’s say you are a jack of all trades and you know many things like Android Development, data analysis and machine learning skills etc. In this case, you have to update a lot of things in your CV and cover letter to make your application a bit more personalized to increase your chances on landing a job.

Note:- I am planning to write an article for people who are in job hunting phase. I will share the script which I have made to automate this whole update process+Conversion in which we have to change some values along with company name and position in our Application.

So is there any harm in wasting your time?

Yes, it’s very important thing that we should focus on our time consuming tasks.

It’s undeniable fact that Python is here to help you in this case. You can make it a script which can update and convert files in just milliseconds.

Method 1:-

If you want a good looking easy to use File picker then you can use this method.

For the file picker dialog, we have to use the tkinter library which is basically a Python GUI library.

from tkinter.filedialog import askopenfilename

Library which is responsible for the conversion of the Word file to pdf is comtypes and for file’s path handling is OS. (Operating system)

import comtypes.client
import os

So we can prompt a user by opening a dialog by:-

filename = askopenfilename()

Next Step is to give this filename which has the full path of the input file along with file extension therefore, you don’t need to worry about full path of the file or extension.

in_file = os.path.abspath(filename)

Now specify the output path including filename.

out_file = os.path.abspath(r"C:\Users\Hp\Desktop\Umar")

Note:-

  1. Don’t need to specify file extension which is pdf in our case.
  2. When you are giving paths here, always write ‘r’ before the string to format it.

Now create a object of comtype by:

word = comtypes.client.CreateObject('Word.Application')

Open the input file and save it as output file and file-format in our case is 17 which is the code for PDF conversion.

doc = word.Documents.Open(in_file)doc.SaveAs(out_file, FileFormat=17)

And finally close the doc file object which is the best practice in programming. :)

doc.Close()

Convert the files in Batch

Second method is same as above, if we don’t want to choose file from the dialog, then we don’t need to use tkinter library, we just have to give the input file path like this:

in_file_path= r"C:\Users\Hp\Desktop\inputfile.docx"

So for converting the files in batch we have to give all the input files path to our program, so for that we have to take help from glob library.

import glob

For fetching all the files from a directory or enlist all the files with a specific extension from a directory, we have to give path of our directory where all our required files reside and write extension name at the end like to specify the extension. This will give you only that type of files.

fileslist=glob.glob(r"C:\Users\Hp\Downloads\*.docx")

It will give us list of all file-paths including filenames.

It is pertinent to mention here that we have to specify the output file name also, as we are doing this conversion in “batch” so we have to handle the output file names also, means output file names must be same as the input in order to have full control over the files which is an important step.

So for that we have to take help from regex library which is my personal favourite library and it’s problems are my favourite ones to solve.

import re

glob.glob() will give us a list of filename and its path, but we don’t need full path of the file, we only need the name of the file for naming the output files.

Output of glob.glob()

[‘C:\\Users\\Hp\\Downloads\\Hidden Gem.docx’, ‘C:\\Users\\Hp\\Downloads\\sample (1).docx’, ‘C:\\Users\\Hp\\Downloads\\resume of PI.docx’, ‘C:\\Users\\Hp\\Downloads\\Sources.docx’, ‘C:\\Users\\Hp\\Downloads\\TALK SHOW.docx’]

So we have to extract only filename from the full path.

we will use the regex function which will replace the string with our substitute string upon giving the regular expression. The regular expression which is responsible for removing the filepath is as follow:

regex= ".*\\"

re.sub() will help us.

filename=re.sub(regex_filename, "", fileslist[i])

But this will give us filename with its extension but we don’t need the extension as I mentioned earlier. So to remove that we have to use this regex.

regex_withoutext=r".docx"filename=re.sub(regex_withoutext, "", filename)

Full code to convert docx file to pdf in batch is as below:-

import docx
import glob
import os
import re
import comtypes.client
fileslist=glob.glob(r"C:\Users\Hp\Downloads\*.docx")
regex_filename=r".*\\"
regex_withoutext=r".docx"
for i in range(0, len(fileslist)):
filename=re.sub(regex_filename, "", fileslist[i])
filename=re.sub(regex_withoutext, "", filename)
in_file = os.path.abspath(fileslist[i])

out_file = os.path.abspath(r"C:\Users\Hp\Desktop\\"+str(filename))

word = comtypes.client.CreateObject('Word.Application')
doc = word.Documents.Open(in_file)
doc.SaveAs(out_file, FileFormat=17)

doc.Close()

Note:-

After running this code, you may see a prompt dialog like this:

Prompt Dialog Box

Now just select the first option and click OK then close word application afterwards and everything will work perfectly.

--

--

Umar Farooq Khan
The Startup

Programmer by Profession | Data scientist | Python Developer | Android developer