Python “lxml” Memory Leak

陳信宏 Ted Chen
DevOps’ hole
Published in
2 min readMay 19, 2022

Memory leak issue for Python “lxml” package and solution found on Internet

Background

  • I’m building up a Flask web service scraping stock data
  • Service is on an 512MB memory instance
  • Service is wrapped by Docker
  • Use “lxml” with xpath to parse html string

“lxml” Memory Leak

I write a web service in Docker with schedulers to scrap data every day. And every few days, the service got killed by OS because of OOM(out of memory). I use “memory_profile” and set scheduler running every minute in local to trace memory usage.

“lxml” memory usage when parse html string
Memory usage outside the “lxml” function

After period of time, the service still occupied the memory. In general, there is garbage collection periodically releasing unused memory but seemed not working here. And I googled and found lots of “lxml” memory leak threads.

Solution—Run “lxml” Function in Sub-Process

I found this article on Reddit. He has the “lxml” function running in sub-process and terminates the sub-process after finishing the “lxml” function and those un-freed memory is released while the sub-process terminated.

import multiprocessing
from lxml import etree
def lxml_func():
...
tree = etree.HTML(response.text)
...
return result
results_queue = multiprocessing.Queue()def subprocess_function():
results_queue.put(lxml_func())
parse_process = multiprocessing.Process(target=subprocess_function)parse_process.daemon = Trueparse_process.start()result = results_queue.get() # blocks until results are availableparse_process.terminate()

And there is still memory occupied in sub-process. But main process’ s memory remains. Just make sure you terminate sub-process after “lxml” function finished.

“lxml” function memory usage in sub-process
Memory usage outside the “lxml” function in main process

It’s kinds of workaround. But I need “lxml” to parse data using xpath.

Summary

Python “lxml” package has memory leak issues. Running memory leak function in sub-process makes occupied memory released while the sub-process terminated.

--

--

陳信宏 Ted Chen
DevOps’ hole

攝影、程式和一些資訊 Photography, coding and information