== 2019-Aug-16 Update ==
My buddy Garrett 10x’ed the idea below, and wrote about it over at https://cloud.google.com/blog/products/identity-security/understand-gcp-organization-resource-hierarchies-with-forseti-visualizer
While everything below should still work (albeit with some minor tweaks, cuz Forseti versions), I strongly recommend that you check out the great stuff he’s working on as well! Cheers!
== /Update ==
Customers and colleagues often struggle to create hand-wired diagrams of their cloud app’s architecture, and keeping those diagrams up-to-date as things change isn’t fun either…
But now, by combining Forseti Security’s Inventory Scanner, some clever-but-fairly-gnarly Cloud SQL, and D3.js collapsible trees, you can automatically generate interactive architecture diagrams that represent your GCP environment, much like this one…
I’ll let you in on a little secret — I absolutely HATE writing documentation. I’ve ALWAYS hated it. There, I said it. I regret nothing…
When I wrote code for a living, that mostly meant wiring up class/inheritance diagrams, and now in my role as a Cloud Engineer, it’s GCP Architecture diagrams. Sure, when you’ve only got a handful of classes, projects, or resources, it’s not that bad. You can probably fit the whole thing on a reasonably sized whiteboard. But as soon as things start to get a little bigger, a little more serious, and start changing to adapt to what your business needs, things get out of hand (and out of date) FAST!
For me, the truth is this: keeping handwritten docs up-to-date in a living, breathing, ever-changing environment is a losing battle, and I just don’t want to do it anymore. You shouldn’t either. IMHO, the best documentation is automatic documentation, generated directly from the things that you need to be documented…
In that previous software-developer life, we automated this with a Jenkins job that triggered on code commit, ran a combination of doxygen and graphviz to read/analyze our code, and output fancy webpages with pretty class hierarchy diagrams. Kinda like this:
In the world of GCP, it should be just as easy, and I wanna build it. In order to do that, however, I’ll need a few different pieces:
- Something to scan a GCP organization, and list all the things that exist there. Bonus points if said scanner keeps track of hierarchy (parent/child relationships) and organization (which things are components of another thing — think VMs in a network)
- Something to draw out a picture of the GCP architecture in a useful & dynamic way. Bonus points here we can print out the picture on a large-format printer.
- Some way to glue together the output of the first thing, into the input of the second thing.
SPOILER ALERT — All the pieces above exist, and I got ’em wired up together into something that looks like this:
Here’s the tl;dr of how I got there (with longer explanation below):
- Use Forseti Security to perform an inventory scan all GCP resources
- Connect to Forseti’s Cloud SQL database, and export the most recent inventory scan to CSV
- Import CSV data into an HTML webpage, and use D3.js to visualize inventory data as a dynamic, interactive tree
Full Disclosure: This initial solution isn’t nearly as polished as the Billing Visualization report that Ryan McDowell and I published a while back, and that’s 100% on purpose. My intent with this first post is to quickly show what’s possible with the pieces we’ve got today, and get a working prototype & code out into the world (jump to bottom for github link). From there, I’ll shoot to make the architecture viewer better in subsequent posts, as we all learn more…
So let’s get started, shall we?
Set up inventory scanning with Forseti 2.0
First things first, we need to scan our GCP environment and list out all the folders, projects, and other resources that exist there. Luckily, Forseti Security (2.0 or later) takes care of all the GCP inventory scanning for us, and creates a nice, neat Cloud SQL table of things with parent->child relationships. This is a GREAT start!
Setting up Forseti is pretty straight-forward — it’s just a ‘git clone’ of their GitHub repo and running a setup wizard, but there are a couple of things to watch out for before you get started:
- You’ll most likely need to run the wizard as an Org Admin (or have an Org Admin pal run it for you), cuz permissions. Depending on how your company is structured, this might be a big deal, so I wanted to specifically call it out here…
- As with most cloud things, you’ll need a billing account set up and associated with your project, in order to pay for the infrastructure that Forseti runs on. NOTE: Running Forseti on an ongoing basis will cost money!
- Estimates from my own Forseti project are around $125 USD per month, and there are NO guarantees that this number will be what you see in yours. You MUST monitor on your own, and you take ALL responsibility for the resulting charges. That said, it’s totally possible that future iterations of these posts will investigate ways to customize the setup and lower costs to make it more approachable to folks on a budget…
- Shutting the Forseti servers down is a decent way to cut costs WAY down. Consider stopping the VMs and Cloud SQL instances when not in active use.
OK, with all that out of the way, you can find the official Forseti setup instructions here:
Once you’ve got Forseti installed, the inventory scanner should run automatically and the steps below should be able to help you check the output of the scan, and export inventory data to be visualized…
Export a snapshot of your GCP resources
At this point, we should have a running instance of Forseti, and some inventory records stored in our Cloud SQL database. Let’s take a look…
First, let’s connect to the Cloud SQL instance, so we can run queries locally. I connected using a separate Google Compute Engine instance, but feel free to choose whatever is easiest for you. Ultimately, you’ll just need to be able to execute SQL queries, and see the results. Once you’re connected, here are some fun queries to run:
Get the id of the latest inventory scan
Query for latest inventory records (using ‘id’ from above)
Disclaimer: There’s a LOT in the query below that’s less-than-awesome, which is why I wanted to give it away instead of having you figure it out on your own. You’ll see me do things like JSON lookups from big text blob fields, quotation mark replacement, and IFNULL() replacement. All of this was needed to make the CSV output play nicely with the JS text file reader and D3.js processing, so be forewarned: If you play with this, here be dragons…
Export inventory data to CSV
So now that we know there’s data to play with, let’s export the most recent set of inventory records, and use that data as input to our visualization utility. Here’s how:
- Follow the instructions in the Cloud SQL docs here to walk through the export steps.
- When you get to the part where you enter a SQL query, use the one from directly above. It should look like the image below.
- Click Export, and you’ll be taken back to the main Cloud SQL interface while the data export operation completes in the background. NOTE: similar to the note above, this may take a few minutes depending on the number of GCP resources included in the inventory scan.
- When complete, you should have the resulting CSV file dumped out to the Google Cloud Storage bucket. HUZZAH!
Let D3.js draw it all out for you!
Maybe it’s because I’ve never been much of a UI guy, or maybe it’s because my definition of an “interactive” website was formed back when I could make scrolling text using the
<marquee> tag, but this project was my first foray into using the D3.js libraries as a way to visualize data. That said, now that I’ve found ’em, I don’t think I’ll ever use anything else. Once you grind your way through the initial learning curve, find a good fit for *how* you want to visualize your data, and muck your way through a few potential lib version mismatches (*grumble* v3/v4/v5 *grumble*), the D3 libraries offer what feels like unlimited power and flexibility to tell your story visually. So, needless to say, while the learning curve was tough (and still is), I’m a big fan…
When I started, I knew that this project was going to be a pretty decent fit for a tree structure (nodes, leaf nodes, leafs as orgs/folders/projects/resources), so my first priority was going to be finding a decent example to work off of. Luckily, a bit of Googling around turned up these working examples to use as a starting point (thanks!):
These examples did *most* of what I was looking for — drawing multi-level trees, and dynamic expand/collapse functionality, but I knew there was going to be a bunch of “custom” work needed. So, similar to how I start most projects, I grabbed some code as a starting point, copied it locally, added it to a github repo for revision control and an oh-crap-I-broke-it-all safety net, and started hacking…
Once I had code locally, I knew I was going to need a local web server to check out my changes. Keeping it super scrappy here, I just opted to use the SimpleHTTPServer that comes as a part of a default python install. So, from inside the directory where I was hacking on code, I just ran the following:
python -m SimpleHTTPServer 8000 &
Once that command completed and returned back to the cmd prompt, I fired up a browser and pointed it at:
And voilà, now I’ve got my “edit code -> test code -> commit code” loop in place, and we can hack away freely, mostly safe that I won’t break things in a way that’s unrecoverable…
From there, it was just a matter of finding examples of functionality that I wanted to add, picking them apart to understand the basics, and adapting what I had found to fit what I wanted. Let’s take a quick look, piece by piece, but in no particular order…
Loading tree data from CSV file
- I had to remember that browsers load files asynchronously, and needed .then() to wait until loading completed, and
- browsers generally cache things for performance reasons, so appending a cachebuster param to the filename helped keep me from going insane wondering why my local changes weren’t showing up when I refreshed the page…
Turning CSV inventory data into D3 model
One of the coolest things about working with D3 is that, once you get your data into a format that the library understands, things just mostly work right out of the box. Sweet, right? Well, sorta…
One of the WORST parts of getting this project to work was figuring out how to parse the input CSV file and populate the D3 tree structure in the “right” way, so that the D3 lib drew it out the way I wanted. Remember the “very strict format” I mentioned while talking about the input CSV file above? Yea, so, that strict format is VERY tightly coupled to the code that parses each inventory line, assigning values in specific input positions to the corresponding node variables.
Mess that up, even a little bit, and all of a sudden the code to link parents to children gets goofed up, and the tree doesn’t render at all. Good times…
Adding in the GCP icons
At this point in my adventure, I had a working, interactive tree diagram that accurately reflected my org’s root/folders/projects/resources hierarchy, but it just looked kinda…. meh…
So what do you do when you’ve got something that just looks ‘meh’? You add pictures, of course!
Wasn’t too hard to find the official Google Cloud Platform logos off on the Google Cloud website, and once I had the ones I needed hosted out on Google Cloud Storage, all I needed was a bit of image-setting code to change the boring circles of the tree into much prettier, official GCP logos.
But wait, why are your folders laid out like that?
The short answer is that I recently stumbled into a somewhat-opinionated and awesome conversation around the “optimal” way to organize a GCP environment, in order to accommodate things like different teams, biz units, dev-vs-prod, etc. So, I thought I’d use the learnings there to build out my own org and, along the way, write about it and give away some more code. What you’re seeing there is the first step in what I’m hoping will be one of my next posts, probably with a catchy title like “Bootstrap your shiny new GCP Org with this one weird trick” or something…
Soooooo, stay tuned!
Don’t Re-Invent the Wheel! Take my code and run with it!…
As much fun as figuring out all the D3.js stuff was, there’s no reason for you to go through all the “fun” all over again on your own — that’s what I’m here for! Please feel free to fork the repo below, and mess around with the working code & CSV data on your own!
…and of course, github issues and pull requests welcome! I’ll do my best to review regularly, add comments, and merge in PRs that fit with the vision/direction I’ve got for this thing. If you’d like to contribute, and want some ideas of what I’ve got in mind for the future, check out the list below… :)
…and this is just the beginning…
So that’s my MVP of getting your GCP environment drawn out automatically for you, but that’s definitely NOT where I intend on stopping…
Other fun ideas we’re looking to add in future revisions (in NO particular order, GitHub issues coming soon!):
- Create endpoint to pull inventory data from (and replace CSV file)?
- Horizontal vs Vertical drawing modes?
- Resource filters on UI?
- Expand all/Collapse all on UI?
- onHover details for icons?
- Smooth out some of the animation transitions (expand/collapse icons)?
- Instances organized by VPC/network?
- Firewall rules per network?