Manipulating plist and registry output in osquery

Pivoting SQL data with MAX & CASE

Fritz Ifert-Miller
Kolide
17 min readNov 26, 2019

--

Why can’t these rows be columns?

If you have ever queried the registry or plist tables in osquery, you have encountered results that were formatted unlike any other osquery data.

Whereas osquery generally utilizes a relational data model for its table schema, results from plist & registry more closely resemble the entity–attribute–value (EAV) data model.

Let’s examine the basic structural differences between the two:

The relational data model

Organizes data into columns:

The columnar output allows us to know ahead of time what data to expect. Our predefined schema below, indicates that every person record can have both a first and a last name:

The EAV data model

Organizes data into rows:

In our case the structure maps the following way:

Entityuser_id

Attribute key

Valuevalue

While all of the data from the first table is present, there is also an entry for eye_color belonging to user_id:2. This attribute would necessitate its own column in the relational model.

Which data model is better?

Each data model possesses its own advantages and drawbacks. Typically, a relational model is best suited for scenarios where there will be a rigid schema and where relational operations such as JOIN’s will be necessary.

The EAV model on the other hand is more dynamic because new attributes (columns in the relational model) can be added without modifying the underlying schema.

The bulk of osquery’s interactions are within the context of a relational model. Each table has its own unique columns which can be used to easily compartmentalize data and JOIN it with other data when necessary.

But sometimes…

As we mentioned above, you will sometimes run into data that doesn’t adhere to the relational model. Several of the data-sources available in osquery return results formatted in the EAV model, the two most important being the plist table for macOS devices and the registry table for Windows devices.

What does this mean for an osquery user?

When querying devices for several values that are stored in a single registry directory, they will receive a separate row for every key/value pair. Let’s take a look at the following example:

As we can see there is a large degree of operating system info available via the registry. When we compare this with the the output of the corresponding osquery virtual table os_version we can quickly observe the difference in structure between the two datasets:

Compared to the registry output, the os_version does not mention the release or build_revision number among other useful OS information. Leveraging the registry can be critical for closing some of the gaps in the existing Windows virtual table output. Unfortunately, the registry data is hideous to look at and virtually unusable when you want to JOIN against another table.

EAV formatting issues likewise affect the output of the osquery plist table. Let’s take a look at the com.apple.loginwindow.plist. Which caches among other things whether the Guest Account is enabled on a macOS device.

The output of this plist is more manageable than the earlier registry example. However, when querying large numbers of devices it can be preferable to produce a single row per device by pivoting each key/value pair into its own respective column.

The most significant advantage of pivoting, is that unless our data is stored in a columnar format, it is terrifically burdensome to JOIN it against another osquery datasource. There are a large number of plists that belong to only a single user and the ability to JOIN against the users table is invaluable.

So how do we get these disorganized rows into a neat and tidy columns?

Pivoting data using MAX and CASE

Thankfully, there is a workaround to handling this mismatch of data structure. Although SQLite does not support the PIVOT function of more recent SQL technologies, CASE can be utilized as a pseudo-pivot so long as an aggregation function is also present. The aggregation function serves to enforce a single returned row of data. Let’s work stepwise through our above com.apple.loginwindow.plist example to arrive at columnar output:

Level 1 Transformations: CASE

CASE works as the IF/THEN logic of SQLite, the standard format is as follows:

  • CASE — begin if/then argument
  • WHEN — If statement
  • THEN — Then statement
  • ELSEoptional (otherwise defaults to NULL)
  • END AS — column name of outputted value

You can use all typical SQL logic within the WHEN/THEN/ELSE statements, including sub-selects, size comparisons, etc. For our purposes we want to output a corresponding value for a specific key as a column.

To pivot this data into a column, you might be thinking we could simply alias our output and call it a day:

Unfortunately, this approach fails when dealing with more than one value.

With CASE however, we can pivot multiple values at the same time. If we take our example of the com.apple.loginwindow.plist. We can see the data as individual columns:

Yuck, not quite what we want. All of our columns look right but our rows are a staggered mess and the data is not together in a usable format. CASE has run through each possible row and stopped when it hit a match. You can visualize this below:

So how do we flatten this output?

Level 2 Transformations: MAX(CASE

Pivoting single rows of data into columns

The MAX function returns only the highest value in a returned result-set:

Calling the MAX operator will force the CASE function to run through all possible matches and return the highest result (ignoring NULL’s) instead of only the first result. We can take advantage of that behavior to smush our results together like the graphic below:

To properly format our query, we will wrap our existing CASE statement in a MAX(…) function:

Nice just what we hoped to see!

This works great for our plist because it has only one set of data to pivot but what happens when we have multiple arrays that we would like to reconstruct?

If we use our MAX(CASE approach as it is currently written, we will only ever return one row of data. Let’s explore how we can avoid that situation by adding a GROUP BY operator.

Level 3 Transformations: MAX(CASE & GROUP BY

Dealing with multiple arrays of data within a single plist:

Like our above example we will use MAX to flatten and CASE to alias the data, transposing the data arrangement. Let’s try to return all of the iPad, iPod and iPhone devices associated with a macOS device via the following plist: ~/Library/Preferences/com.apple.iPod.plist

We can see there are three devices cataloged in the above plist each storing their own array of data (Device Class, Build Version, Serial Number, etc.). Let’s output the contents using osquery to understand how that data is stored in a flattened format:

We can see that the subkey column contains the identifier of a device along with an associated attribute: 90BF179953411CA6/Device Class

To keep track of what device a value belongs to, let’s first SPLIT that subkey out so that we can use it as a unique identifier:

Next let’s condense these unique_id's and remove duplicate entries with a GROUP BY operator:

We can now use this unique identifier to rebuild and transpose the data using ourCASE function from before. Let’s select a value only when a specific subkey (“Device Class”) is found:

As we can see, this result is not what we would expect, we have only one result with the desired: subkey LIKE '%Device Class', this is where the magic of MAX comes in.

As we learned, MAX forces a query to return only the greatest row in a given result set, importantly for us it also ignores NULL results:

We can continue to expand this same approach to construct an entire table of attached iDevices by adding additional CASE statements. Let’s take a look at that below:

Woohoo!

You can see that we have been able to take previously messy and flattened data of the plist and pivoted the values from rows into columns, presenting the data in format more typical of osquery’s virtual tables.

Working with the Windows Registry: Faking a GROUP BY

When working with the registry, you may not have a column available to GROUP BY in order to effectively flatten the data with MAX. If you do not have to worry about arrays you can deal with that sort of situation by providing a static column. For example, in the query below (which displays various Windows Firewall Profile Settings), I use SELECT 'test' AS idx and then later I GROUP BY idx.

Joining plists against users

Because the iPod.plist belongs to specific users we can join on the users table if we wanted to see which user account the devices are associated with. The easiest method to JOIN is using the path of the plist to reconstruct the directory column of the users table with a SPLIT and || (concat).

Where does the EAV pivot not work?

Unfortunately, without a column that can be used to reliably GROUP BY (eg. a key that will enforce uniqueness) pivots will not work if there are duplicate entries or null values.

What does this mean in the context of osquery?

Well for instance, using the approach described above, we cannot reconstruct the Software Update History plist into a more usable relational model:

Because the key value is always root we cannot rely on using it as the GROUPBY for our MAX function. Likewise, because the subkey column has duplicate values it also cannot be relied upon to effectively GROUP BY.

If we try either of these approaches we will at best end up with a single update that flattens and overwrites the other matching entries.

However… there is a workaround if you are willing to assume some risk…

Getting around lossy data flattening with SQLite hacks (rowid increments)

Level 9000+ Transformations: MAX(CASE WHEN rowid IN

Dealing with multiple arrays of data within a single plist, without a unique key

This method is an extremely brittle workaround to the above example. Because osquery returns data from certain tables (plist, registry) in a semi-predictable order (the files are read out line by line), we can use the hidden SQLite column rowid to return a row of arbitrary position related to another specified row.

Take the table below for example, I have bolded productKey and title to show their position in relation to one another:

Here we can see that each Software Update contains 7 associated keys:

  • __isSoftwareUpdate whether the update is considered a ‘software update’
  • release-notes notes on update
  • __installState whether the update has been installed yet
  • title name of update
  • installDate date update was installed
  • version optional version string
  • productKey unique identifier of update

I have bolded productKey above because of its unique content. In order to perform the MAX(CASE pivot later on, we will use it as a unique identifier for our GROUP BY function to link the other associated values.

If we wanted to return the title of a Software Update we can ask osquery to return the value of the row that is 3 rows above productKey:

As we can see this approach could then be extended and combined with the MAX(CASE strategy outlined above to return each of the corresponding values of interest by defining their position in relation to another row.

We need to alias the position of each relevant value that we want to pivot into a column by defining the relative rowid of each.

To simplify this process I find it helpful to compartmentalize my data collection using temporary tables which we can create using the WITH operator.

Below, I first output the total data of the Software Update plist to a temp table I call plist_output. I then create a series of indexes using the rowid in relation to the position where subkey = 'productKey':

We can then further alias this index record into another temporary table I will call unique_identifier. We can now use these indexes and the MAX(CASE function GROUP BY'ed on the new product_key column to get our desired final output:

Phew! What a silly thing to do!

While it is encouraging to have this tool in our kit, I cannot stress highly enough that it will cause you embarrassment and frustration if not utilized with careful judgment of data order/uniformity. It will break in more places than it will work.

Wrapping things up:

With the techniques described, a combination of (MAX(CASE…)) & GROUP BY will convert pesky rows into accommodating columns. You now have the tools to transform and pivot osquery results from the registry and plist tables.

Using these tables you can considerably extend the data collection of osquery beyond what is accessible in the existing virtual tables.

A plug for what we are doing at Kolide:

These techniques have allowed us to construct and extend many of our Inventory tables which beautify the raw osquery output. I wanted to give a peek at some of the strategies we’ve utilized so that others can build great queries.

If you are after beautiful osquery data without the work. I recommend taking a look at our SaaS osquery offering Kolide: K2, which is free to try (no payment details necessary). In addition to carefully curated output you can expect to see the following:

  • Automatic User to Device Assignment (macOS, Windows and Linux)
  • Checks: An alertable framework for device compliance/security
  • Interactive Slack App with customizable End-User Notifications
  • Live Query (request Beta access today, public launch soon!)
  • API (request Beta access today, public launch soon!)

Get started today, with a free trial of Kolide (no payment information or credit card required).

About the Author (Fritz Ifert-Miller):

I have been the UX designer at Kolide since the company’s founding in 2016. When I am not working in Sketch.app or discussing product features, I can be found writing SQL to improve Kolide’s data collection capabilities.

Further Reading:

This blog post was largely inspired by real-world use-cases and headaches experienced as an osquery user. In efforts to solve this problem I found the following blog-post:

http://techslides.com/using-max-and-case-to-pivot-mysql-data-stored-in-eav-model

If you would like to see more cases of querying and transposing data from the plists table I would recommend you read my other blog post:

macOS Forensics Using the osquery Plist Table

--

--

Fritz Ifert-Miller
Kolide

Fritz is the UX Designer at Kolide. Prior to Kolide, he worked at BIDMC as a neurology researcher studying Transcranial Magnetic Stimulation (TMS) of the brain.