Trillions — Behind The Scenes of Cyber Intelligence

@johnnychronix
ZeroGuard
Published in
6 min readApr 25, 2020

--

The world of spies and cyber intelligence is shrouded in mystery. But have you ever wondered what it actually takes to build and run a cyber intelligence company?

In an exclusive interview with The Many Hats Club, Cal Leeming, founder of ZeroGuard, reveals how they monitor the internet and analyse trillions of unique signals, in their effort to make the internet a safer place.

Adhering to the mission mantra of “collect all the data, analyze all the things”, Cal walks us through their epic journey of building intelligence capabilities from scratch. The numbers are impressive: 1PB of data and 6 trillion events, in the last 12 months alone.

Listeners were privy to some of the techniques ZeroGuard uses to gather intelligence, which up until now has been a closely guarded secret of cyber intelligence companies across the world.

Data for days…

Cal explains what they mean by cyber intelligence and what they consider to be valuable.

“First we look at the critical infrastructure which holds the internet together. This is primarily BGP tables, DNS, Zone Files, Whois, CT logs, RIR objects etc. The internet would simply stop working if these systems did not exist. This gives us a baseline view of internet assets at the fundamental level.”

Once you have obtained this prerequisite data, you can begin to enrich it with datasets from a variety of sources.

“We have detected crucial intelligence through a variety of sources, these include honeypots, public code repositories, website forums and such. It’s also a good idea to ingest community feeds, such as IP and domain reputation, as well as building your own.”

Another fascinating aspect was their approach towards data collection from APIs. Leeming explains that they actually prefer not to use the official APIs offered by most services, as they are often far more restrictive than traditional methods.

“Many official APIs will artificially restrict how much data they give you in a single query, and then apply rate limits on a per query basis. But by using their unofficial APIs, such as the ones used to drive frontend web applications, you can yield far more data. This is because they optimise queries to be low latency and predictive, for the sake of user experience.”

Of course, these unofficial APIs are more likely to break and there is increased operational overhead in keeping them running.

Shaken, not stirred

The conversation shifts into the subject of covert intelligence which, with a sardonic grin, Cal admits is sexier than it sounds. He explains the approach towards building “clean identities” for gaining privileged access within underground networks. And while his tone was characteristically humorous, it is clearly a serious undertaking — not for the faint hearted.

“Our work in covert intelligence yields highly valuable signals, in some cases they have allowed serious breaches to be detected. But it’s simply not possible to respond to everything, sometimes you just have to let go and let nature take its course.

At 12 Zeros Things Get a Little Weird

Leeming admitted that dealing with data volume at this magnitude causes unforeseen issues as software can behave erratically. He talks quite openly about their inability to use cloud based compute platforms such as Amazon Web Services (AWS).

“Everyone else, including many of our competitors, are running to the cloud for a variety of reasons. But the cost of storing this amount of data on Cloud services, as well as processing and customer querying, is astronomical.”

This is a difficult economic dilemma no matter the size of the company. Cal was faced here with the choice of bending the knee to big players, or building his own data facilities. He chose the latter.

Let’s build it ourselves, how hard could it be?

True to his rebellious nature, Cal, along with the invaluable aid from Zeroguard’s CTO Per Mosegaard, embarked on an almost impossible task of building their own data centre facilities, on a budget which would break even the most hardened tech professionals.

After a meeting of the minds at the bottom of a bottle, the duo decided on two facilities. The first contained several racks at 4D Data Centre in Crawley, UK, a highly professional facility, previously owned by BT and Orange, and who were fabulous at accommodating this endeavour.

But they quickly realised that backfilling 10 years worth of intelligence data would prove challenging. They needed another facility which had both flexible power on-demand and was also physically close by, as daily trips were required during the R&D phase.

Life In The Ghetto Paradise

And so they took it to the next logical yet ludicrous step and decided to build their own data centre; in a container in the middle of a farmer’s field at Cal’s residence. The pictures shared of this process looked like a three-day bender at a trailer park and podcast host Stu Peck bounced between looks of incredulousness and outright laughter, with a healthy dose of shock mixed in.

Battling the sheer weight of the equipment (2 tonnes), the (literally) duct-taped patchwork, the heat produced from the machines and the elements, somehow Cal and Per managed to erect a very serviceable DC capable of 1PB of storage, with 64 teraflops GPU, 4TB of RAM and 640gbps of internal bandwidth.

Of course, there were myriad problems encountered throughout the process, which Cal walked us through with equal amounts of pride and embarrassment. He cheekily included an amusing recollection of corporate resistance to fix bugs so you have to buy their newest and shiniest gear.

Everyone has a budget

The investor for ZeroGuard was actually another one of Cal’s companies, River Oakfield, a cyber security consulting company. But even so, they only had a limited budget to play with — £150,000.

As is probably already evident, the driving force behind this adventure was money — 150K might buy a nice ride, but it sure doesn’t go far in the tech world. The key to saving was buying older and/or refurbished servers.

Enter the saviour, Tone; a totally standup guy out of Oxford, with a warehouse full of recycled IT equipment. At one point, Cal shows the cost differences of their major purchases, which originally should have cost £430,000, but ended up costing a mere £7,200.

“By making smart decisions, we were able to purchase equipment at an average 6% of the original RRP.”

By the end, they spent around £60k in hardware, which would have cost upwards of £1m only a few years ago. Tech depreciation IS a commodity!

So What’s Next?

This is not Cal’s first startup rodeo, and he’s got big plans for the future.

“We are very excited about how advances in Machine Learning could reduce our Signal-to-Noise ratio, and relatively new players such as TimescaleDB, InfluxDB and Rust are literally game changers in the world of big data.”

He finished with touching on upcoming business challenges, such as ethical considerations, vetting processes and acceptable use policies.

For now, he and his posse are working hard to get their platform to market and into the hands of users as quickly as possible.

And all hail Rust!

--

--