Why Steampipe over Boto3

Exploring Steampipe as an alternative for Querying

Harsha Koushik
Kernel Space
6 min readMar 4, 2022

--

What is Boto3?

Boto3 is just the name of Python SDK for AWS. Boto3 is undoubtedly one of the most used and wonderful SDKs used to make API calls to AWS in order to configure, manage and Query AWS Resources.

Why not Boto3?

It really felt good to use Boto3 which makes it easier to query and analyze data at same place. But just like any other tool, we started seeing the cons of it when it is being used at scale. For example — this is my requirement: know which user has not enabled their MFA but is still able to do things in the account and is not added to any group which enforces MFA, sounds clear and simple right?. Let me add some more detail here, the account which you will scan has 700+ users in it, code becomes little complex due to pagination and stuff but still does the job. Now adding little more detail to the requirement — don’t just scan our Prod account, scan all our 15 or 20 accounts for that matter, number here is just an example. Now its getting heavy isn’t it?

Here I have taken IAM as an example, which is Global, so no worries about specifying Regions and stuff. But lets say I want you to get all my Guard Duty findings in ALL Regions ALL Accounts, it becomes frustratingly complex. I am not saying its really hard to write that code, but it gets harder to maintain it, pass it on to a fellow developer. Using Boto on crazy number of accounts feels like this:

So the question is what am I using now if not Boto? It is Steampipe. Let me introduce Steampipe to you here.

What is Steampipe?

Steampipe is a tool which can be simply defined as

select * from cloud;

Yes you guessed it right, you can query your cloud using SQL. Steampipe sees Cloud as a simple Database. Steampipe is a tool created by turbot which not only can be used to query Cloud Platforms like AWS/Azure/GCP/Alibaba but also platforms like Github, Kubernetes, Okta etc. Steampipe has around 67 plugins like this at this point.

Site — https://steampipe.io/

Github — https://github.com/turbot/steampipe

How Steampipe works?

Before I explain what Steampipe solved for me, I would like to talk about how it works a bit so it becomes simple to understand moving forward. Steampipe picks AWS credentials from the ‘.aws/’ directory itself, just like any other tool. Apart from AWS config files, steampipe has its own config file(s) which will be used by steampipe to understand what profiles/accounts you wish to scan. Config file location — ‘~/.steampipe/config/aws.spc’ in case of AWS. You should provide each Account/Profile as a connection in Steampipe like this —

Steampipe Sample Config File

As mentioned before, steampipe will try to find the credentials for these connections in AWS Config directory by their profile name. We can use these connection names to query based on account or use aggregator config to query from all the accounts mentioned as connections in config file. Aggregator config will be written in the same config file and looks like this —

I feel this should be enough to get an overview of how steampipe works. Lets see what Steampipe solved for me.

What did Steampipe Solve for me?

I will explain what and how Steampipe addressed some of the problems here by explaining its features. Some of the features which makes it easier to use are —

Tables — Everything is Tables in Steampipe. Whatever you query from Steampipe will be represented as a table which makes it really simple to sort, filter and transform data. Simple example of AWS IAM User Table —

Structure of aws_iam_user table.

This is just the blueprint of AWS IAM Users. For example if I execute —
select * from dev.aws_iam_user’, this gets all the iam user details in Dev account and outputs into a table with this structure so that I will get to see a lot of information about IAM User in a simple table. Ofcourse steampipe also leverages AWS API Calls under the hood and does this for us which takes us to the next feature — Abstraction.

Level of Abstraction — Steampipe creates a better abstraction so that we need not worry about the underlying API calls and their Parameters all the time. Most of the tables are combination of multiple API calls, output gets consolidated and represented in the predefined tables.

Powerful Data Operations — As the language used by Steampipe is SQL, it really offers great support to perform operations on Data such as sorting, filtering, transforming etc.. One important feature you will love here is JOINS, as we can join multiple tables and bypass unnecessary consolidation operations like we do in Code.

Live/On-Demand Queries — Though steampipe uses Database to view the results, it doesn’t store any data in the DB as such, meaning — it directly makes a query as needed, does not do this — ‘collect — store and then operate’. If we collect, store and then operate the data is not real and live & also collection of all data from all accounts takes time and is resource intensive, hence better to keep it live and query only whats needed.

Do not re-write, re-use instead — Steampipe follows this principle, it allows us to store the queries as ‘.sql’ files and pass the filename to ‘steampipe query <filename.sql>’ when executed again. This feature makes it really simple to run some complex query which contains joins, data extraction from nested json and stuff just by passing the file where this is stored.

Almost Zero Coding — I know its hard to believe there are people who hate coding, they should feel happy seeing this, no offense, jk. Now the need to maintain connections, writing nested loops, pagination is not needed anymore, its handled.. just do SQL. So it becomes easier for someone who has no coding knowledge but still needs to work on AWS Query Automation.

Note: Steampipe doesn’t solve the problem of API Rate Limiting, that is something which AWS controls at each AWS account on a per-Region basis, there is request rate limiting and resource rate limiting. Just like Boto faces this issue, Steampipe also does as it is something at AWS side.

Let us look at an example and conclude on how Steampipe addressed a lot of problems here. I will take the same example which is mentioned as reference here —

Know which user has not enabled their MFA but is still able to do things in the account(s) and is not added to any group which enforces MFA from all AWS Accounts.

To know this we can execute this query in Steampipe and analyze and find out from the output data —

This query gets the account_id, user_name, groups, mfa_status from aws_iam_user table from all accounts and console_password_status from aws_iam_credential_report from all accounts and joins them, produces output to a csv file called user_mfa.csv. This output file can be analyzed easily to find if a user is added to EnforceMFA group or not and conclude the list of Users without proper MFA mechanism.

Conclusion

Using Steampipe over Boto3 is a personal choice, as it solves lot of problems for me at scale, I started using this, this may not be the best tool for everyone in the market, it all depends on the use case.. Please do suggest in the comments if you found better tools or better ways of doing things, always open to improve and innovate in a better way.

I have only mentioned features which I used and found helpful. Please feel free to explore other features which may help you. URLs of site and github are mentioned above in the article.

Steampipe Plugins: 67 as of now — https://hub.steampipe.io/plugins
Steampipe Mods: 21 as of now — https://hub.steampipe.io/mods
Steampipe Architecture: https://steampipe.io/docs/develop/architecture
Steampipe Container: https://steampipe.io/docs/develop/containers

Note: Steampipe is used only to query and not make changes to state of resources like Boto3 does. So if you need to make changes to resources, you might still want to use Boto3 or terraform or any other tool for that matter, steampipe is to query the state of resources in Cloud

Probable/Close Alternatives:

CloudQuery: https://www.cloudquery.io/
AWS-Inventory: https://github.com/nccgroup/aws-inventory
Cloud-reports: https://github.com/tensult/cloud-reports

Please feel free to point out mistakes if there are any. Thank you for reading. You can connect with me on Linkedin / Twitter. Happy to answer your queries.

--

--