Malware hunting made easy with osquery — and extensions..

Atul Kabra
5 min readFeb 4, 2018

--

Social media has changed pretty much every aspect of our lives. One of the most important things it has done is reduced the window of TOCTOR (I just made that up but it stands for ‘Time of Occurrence to Time of Reporting’ or implying the time window between the occurrence of an event and reporting of the event. The name is derived from the security vulnerability TOCTOU).

Businesses can leverage this reduced TOCTOR in many ways. A friend in the world of marketing recently explained me on how they capture the social media feeds during a live big event (say for e.g. super bowl) and then use that data to do targeted ad delivery. Of course, there is the complex analytics of ‘sentiment analysis’ in the background based on those live feeds from social media but you know that already.

Social media is playing a similar role in threat hunting. Live feeds on twitter and blogs have become excellent source of high fidelity threat indicators and IOCs which seemingly is much faster than the paid intel subscription an enterprise might have. No wonder new generation start-ups have sprung that scrape these indicators from blogs or feeds and convert them into STIX format and delivered over TAXII.

Gathering high fidelity intel at faster speed represents only one part of threat hunting. This intel now needs to be sent to the divergent endpoint population, and validated, before a CISO can sleep peacefully at night. If the later process is slow or inefficient, the problem remains as unsolved as ever.

A great variety of commercial EDR tools are available to make this job easy. The unfortunate part is they are very expensive, and I don’t mean only the expense of procuring them but also the expense of sustaining them and the peripheral costs of data storage on your cloud, and they are usually not cross-platform. They might be pretty well supported on one platform (e.g. Windows) but may not be so well on others (e.g. Mac). If this study by the leading endpoint security vendor is to be believed (and there is no reason not to), then attackers might be leveraging this gap of platform coverage to gain advantaged and foot in the door. As the study shows, a great deal of malwares detected on Mac were Windows malwares. This could be a case of ‘social engineering went bad’ or could be ‘gain entry on Mac, stay undetected and then pray to get transferred via email, network drives or USB on a Windows system’.

Whatever the case maybe the fact remains that today’s computing environment is composed of a heterogeneous mix of devices, operating systems and form factors. Any threat hunting solution therefore would be useful only if it provided across the board coverage and justifying the ROI.

Osquery aims at providing a great such alternative.

The highest fidelity indicator for a malware is its hash. Hashes for the latest detected threats get regularly published on social media and can then be used to hunt using osquery. osquery provides 2 tables to search for hashes:

  1. file_events
  2. hash

file_events table capture real time events on files along with their hashes. Therefore, a search on these tables on the endpoint can quickly and efficiently provide the data on whether a file with a particular hash (md5, sha-1 or sha-2) was seen on the endpoint. The search query is responded from a backend rocksdb store and the result is near instantaneous. A SQL query of the format ‘SELECT * from file_events WHERE md5 = ‘<>’;’ can be fired from a centralized osquery fleet manager to search across a large population of endpoints at scale and get the result data to validate if the threat is present in the enterprise or not. osquery makers recommend that file_events table be used in conjunction with file_paths and exclude_paths in the configuration to ensure the computationally intensive operations like hashing is done only for a selected set of files and not system wide. This optimization is certainly efficient, but it can potentially leave holes for malware to escape through and not get recorded in the file_events table.

The alternate way of searching for a file hash could be using the ‘hash’ table. ‘hash’ table computes the hashes for all the files given a directory path and with the use of ‘WHERE’ clause, the results can be filtered for a particular hash. Unlike the file_events table, there is no backing store for the ‘hash’ table. The virtual SQL table is created at run time by computing file hashes of all the files in the directory search path at run time. This can be computationally very very expensive and therefore is useful only when the threat hunter knows the exact directories to be searched and hopefully those directories don’t have too many other files. For a system wide search of a file hash, this table is certainly not an efficient route and the queries can take forever to work.

Unfortunately, file_events table is only available for Mac and Linux variants of osquery. When hunting for threats across endpoints, Windows continues to represent the biggest and wildest forest, making Mac and Linux almost look like National Parks. The unfortunate lack of file_events table on Windows makes osquery somewhat suboptimal for threat hunting on Windows. A query like ‘SELECT * from hash where directory like ‘c:\%%’;’ to try and search for a file hash on Windows system might work in theory but it will quite likely fry the hard disk before the results will come out.

With PolyLogyx Extension for Windows osquery this gap gets filled very efficiently. The extension provides ‘win_file_events’ table that mimics the file_events table of Mac and Linux. Like the file_events table, and to maintain feature parity, this table also provides the optimization of recording file hashes only on a selected set of folders, and therefore also suffers from the gap of having malware recording escape thru, if not in those folders.

To bridge this gap, the extension provides another table called ‘win_pefile_events’ that records all the new PE files created across the entire system. File based malwares have to bring an executable (PE) content on the endpoint and no matter where they try to hide it, they WILL get recorded in win_pefile_events. Using osquery constructs of SQL on the table this makes the system wide, and enterprise wide, hunt for a file malware hash a walk in the park (er.. a national park?)

--

--