Data Cleaning Exercise — Network Graph Objects from Lua — Part 1

Christian Buchert
Chris Learns Data Science
3 min readFeb 6, 2017

So, I’m kind of addicted to Factorio streamers on YouTube.

Factorio: a primer

Without getting into it too much, Factorio is what you get when you combine Command and Conquer ’95, Minecraft, and an infatuation with automating everything. Your goal is to design-build a factory on an alien planet while the trying to keep native wildlife from gnawing at your ankles. I have a hate/hate relationship with projects that are designed while they are being constructed. You may feel differently. That’s your prerogative. I’ve spent one career in CAD management for AEC and my experience with these projects has been one of headaches and bummer deadlines. So naturally, when I found this game and fell in love with it despite this flaw, I immediately started thinking about how I could better plan my factories so I wasn’t trying to do the engineering / design work while I was constructing the factory and making the biters get dead.

The plan is fairly simple-ish.

I put together a spreadsheet that runs the numbers for the factory. My goal for the factory is to launch 1 rocket with a satellite payload into space every minute and the spreadsheet’s role will be to determine resource collection and factory production volumes required to meet this goal. Once this initial engineering is complete, then I will produce a network graph that indicates flow lines from source resource to where they are consumed, where node scale will indicate the quantity of the resource required to achieve the factory goal. Extra points will be awarded by me to me if the graph also prioritizes node proximity from parent to child nodes as a function of required node volume, as this could be really useful in the next step: AutoCAD. In AutoCAD, I will build a block library that will represent the structures which will comprise the factory. Once these drawings are finalized, I’ll round up some friends and we’ll build the thing, streaming the building sessions on Twitch and archiving the videos on YouTube for any later viewing needs that might be had.

What does any of this have to do with growing a data science skill set?

The good developers of Factorio scripted the game in Lua and left all the data files exposed. This means that all resource, manufacturing, and tech tree data is available for the picking, provided you know Lua or can do something with Lua hash maps:

{
type = "recipe",
name = "explosives",
energy_required = 5,
enabled = false,
category = "chemistry",
ingredients =
{
{type="item", name="sulfur", amount=1},
{type="item", name="coal", amount=1},
{type="fluid", name="water", amount=1},
},
result= "explosives"
}

As you should and may well know, data science entails various and sundry disciplines, not lease among which is data regularization. The goal of this exercise is to extract the relevant resource data from the Lua files and structure it such that the aforementioned network graph can be generated using Gephi or some such.

With all that out of the way, the only thing to do is to start making it happen. You can find the Github repo with all the files here. README.md explains the folder structure if you have questions.

--

--