Why I came to work at Panaseer

Back in 2017 I was invited by my CEO to go to a series of tech meetups for CTOs and aspiring CTOs that were interested in sharing their experiences of building tech companies. I was introduced to Charaka (CTO of Panaseer) and I remember Big Data and Security being mentioned in the same sentence when I asked what Panaseer did.

My knowledge of security was limited to the web development domain, and my understanding of big data was based on a few articles I’d read online discussing the limitations of current computing technology. I wasn’t looking for a job at the time, but at the next few meetups we always ended up chatting, sharing ideas and knowledge. A year or so later an opportunity rose to work together, so I took it as a chance to move and learn something new.

Having spent more than a decade in the online gambling industry which is a high volume, low latency, highly transactional environment, I was used to working with large monolithic databases, that cost millions of pounds to purchase and are at the computational limit of what was available to purchase commercially at the time. I’d often have capacity planning conversations that weighed up the cost of a new purchase vs performance tuning the platform vs re-architecting the platform to reduce the responsibilities of the monolith into other databases.

The problem with performance tuning a platform on a single database is the returns decrease the more you do. Your first round of performance tuning gains you 50%, the next 30%, the next 10% and so on. There is a finite amount of work you can get out of a processor.

The issue with rearchitecting an enterprise platform that has been developed for 15 years by thousands of developers that is taking millions of pounds of transactions a day, is:

  • It’s extremely expensive;
  • Hard to do;
  • You can’t do it all at once;
  • You have no guarantee you can deliver it in a fixed time scale.

It’s by no means impossible but if you have the option of dropping a few million pounds on a new database server to guarantee your capacity vs venturing on a high complex new IT project with a lot of uncertainty, people will often go for the safe option despite the cost.

Another issue was the cost of storage, DBA’s would also often grumble about the huge cost of buying expensive high-performance disk in large volumes and we would venture on expensive projects to archive the data off to cheaper storage. I remember saying to a DBA once in my early days, “aren’t disks cheap? I can get a 1TB drive for a less than £100 these days”, to which they responded laughing “not this stuff, it isn’t the average hard drive you put in a computer. I’m not just buying one either, every gig I buy I need to multiply the cost by 5x for all the different environments, backup, replication, performance, disaster recovery”. And this is where my interest in moving to Big data came from.

The problem space is not the same, Big data is not related to transactional systems, but finding a solution to the limitations of existing software and hardware for data processing is.

Panaseer was started by a group of earlier adopters in the Big Data ecosystem, and the platform has been designed with some key goals in mind.

  • Users can ask any question about their security position;
  • The platform should be extremely scalable, and this shouldn’t affect how long it takes to get an answer.

Solving this with a typical relational, transactional database is just not possible from a financial and computational perspective, so I was really interested in how the technology worked and was applied to the problem.

Panaseer are big users of the Hadoop framework, which provides a suite of applications/libraries that allow for distributed processing of large data sets across clusters of computers. Hadoop is not one application in itself, there are many projects within Hadoop that solve all manner of distributed computing problems. To list a few used by Panaseer:

  • HDFS — the underlying filesystem that allows many computer’s storage to appear as one
  • Spark — imagine you had to proof read a huge novel that was thousands and thousands of pages. You are under time pressure to complete it today and if you did it on your own it would take days. However, the good news is you have a group of friends to help you. You take the book, and rip off the first 100 pages handing it to your first friend, the next 100 pages to the next etc, they each go off to read their 100 pages and make updates, they each return with the results, ask for 100 more and you rebuild the book in order. This is what Spark is doing with huge data sets, chopping them up into pieces for processing by different computers so that it doesn’t matter how large the task is in hand, we can just get more computers to get the response in reasonable time.
  • Yarn — Is a bit like the person who organises taxi drivers at a taxi company, it accepts tasks and assigns them to resources and ensures the tasks are scheduled fairly, except in this case the taxis are computers.
  • Hive, Avro, HBase, Phoenix, Zoo Keeper and the list goes on….

Yes, their names can be confused with Pokemon and I could try to give you more terrible analogies, but I can assure you they each have a very specific purpose for solving problems in this space.

I joined Panaseer just over a month ago as the Head of Engineering with my own personal goal of becoming more knowledgeable in the world of distributed computing. The interview process began with an informal chat about what I was really looking for in the role and shortly after I was invited to the spacious offices that sat beside the Thames to discuss my experience.

Before I was even offered a role, I was able to attend “Panabeers” on a Friday which is a regular social occurrence the whole company is optionally invited to attend where I got to speak to all parts of the business casually over a drink. This really gave me confidence the company had the right culture before joining, and I knew I was going to fit in.

The first week of my induction was fun, I’ve already got my own Hadoop cluster running on my laptop and have dug deep into the core computational engine of Panaseer, it was painful stepping outside of what I’m used to, but it’s a good pain. I got so use to the comfort of the same technologies and paradigms to solve problems, and Panaseer has shaken that all up.

Fancy a change from the typical technologies you work with like I did? We are hiring, there are plenty of different opportunities available, so send us your CV!