How we reduced the CPU usage of our Lua code

We discovered Lua and Openresty about a year ago, and when we realized everything we could do with it, we started deploying it on all our sites.

We started moving logic we had in Varnish to Openresty to allow easier upgrade of our Varnish cluster and even translate some of our PHP code to Lua for improved security and better user experience.

One thing that we could not figure out, the CPU usage of the boxes running Openresty just kept increasing forever. We couldn’t just let the boxes crash when CPU usage was reaching 100%, so we added a cron to reload nginx every hour. This would kill all the workers and reset the CPU usage to a lower level.

Total CPU usage of one of the Openresty server over 24h

Obviously, this was not a long term solution and we had to find the real issue within our code. We started a thorough investigation of the code, trying to optimize every line of code and remove all unnecessary classes. It took us two weeks to rewrite all the code, and make sure we would have as little code as possible and as optimized as possible to avoid any leak.

Unfortunately, after we deployed this new code, the CPU usage was still the same. We couldn’t explain it, so we started disabling each Lua script, one by one, to try to find which one was causing the increase in CPU. No matter which script we would keep, just one script would make the CPU usage to increase.

And then it hit me. Suddenly, one morning on my way to work, I realized what was the issue. I knew it had to be something that would increase the complexity of the code over time to get that increase on the graphs. I looked again at the code, and my suspicions were confirmed.

All our lua scripts had the following line of code right at the beginning of the file:

package.path = package.path .. ";/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua"

What we did not realized when we were coding, is that this would create an infinite loop of inclusions. Because we were appending the lua folders to the existing package.path, on each iteration of the code, it would append the custom path over and over.
On the first request, our package.path would look like that:

# the custom include path is already present 3 times
/usr/local/openresty/site/lualib/?.ljbc;/usr/local/openresty/site/lualib/?/init.ljbc;/usr/local/openresty/lualib/?.ljbc;/usr/local/openresty/lualib/?/init.ljbc;/usr/local/openresty/site/lualib/?.lua;/usr/local/openresty/site/lualib/?/init.lua;/usr/local/openresty/lualib/?.lua;/usr/local/openresty/lualib/?/init.lua;./?.lua;/usr/local/openresty/luajit/share/luajit-2.1.0-beta3/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;/usr/local/openresty/luajit/share/lua/5.1/?.lua;/usr/local/openresty/luajit/share/lua/5.1/?/init.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua

And after 1 000 requests, it would look like that:

# I cut the central piece to avoid too much text, but after 1 000 requests, the custom paths were present 110 times.
/usr/local/openresty/site/lualib/?.ljbc;/usr/local/openresty/site/lualib/?/init.ljbc;/usr/local/openresty/lualib/?.ljbc;/usr/local/openresty/lualib/?/init.ljbc;/usr/local/openresty/site/lualib/?.lua;/usr/local/openresty/site/lualib/?/init.lua;/usr/local/openresty/lualib/?.lua;/usr/local/openresty/lualib/?/init.lua;./?.lua;/usr/local/openresty/luajit/share/luajit-2.1.0-beta3/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;/usr/local/openresty/luajit/share/lua/5.1/?.lua;/usr/local/openresty/luajit/share/lua/5.1/?/init.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;;;;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;
[...]
;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua

At least we knew what was causing all our troubles, we just needed a way to fix it. 
The Openresty documentation is pretty good, and we quickly found what we needed: https://github.com/openresty/lua-nginx-module#lua_package_path
We removed all the package.path concatenation from our Lua script, and added that one line of nginx configuration.

# in nginx.conf, we added the following, in the http{} block:
lua_package_path '/usr/local/nginx/includes.d/?.lua;/home/user/lua/?.lua;/home/lua/?.lua;;';
The CPU usage went down after we removed the package.path concatenation

As you can see on the CPU usage graph, it stayed stable after the deployment. We were relieved, as it means now we can add more Lua scripts.
On another project, applying the same changes resulted in ~170% increase in number of processed requests (from 116 RPS, to 199 RPS).