(This is part II of a two part series of posts, you can find part I here)
One of the most powerful parts of the Mixpanel query language is the operator, which allows you to select events or profiles based on the value of any element in a list. The operator is just a bit more magical than the other operators in our query language, both in its power and in its implementation.
We’ve already written about building the Mixpanel expression language — the language we built inside of the Mixpanel data store to allow you to query and select…
A few weeks ago we started noticing a dramatic change in the pattern of network traffic hitting our tracking API servers in Washington DC. From a fairly stable daily pattern, we started seeing spikes of 300–400 Mbps, but our rate of legitimate traffic (events and people updates) was unchanged.
Pinning down the source of this spurious traffic was a top priority, as some of these spikes were triggering our upstream routers into a DDos mitigation mode, where traffic was being throttled.
There are a couple of good built-in linux tools that help in diagnosing networking issues.
ifconfigwill show you…
We’ve been hosting a series of monthly meetups on C++ programming topics. The theme of the series is a chapter-by-chapter reading of Scott Meyers’ new book, “Effective Modern C++”.
The meetings so far have been
Next up, we’ll be continuing chapter 6 with a presentation on “ Generic Lambdas from Scratch “. Come by the office and check it out!
Originally published at https://engineering.mixpanel.com on March 19, 2015.
(This is part one of a two part series, you can find part II here)
The Mixpanel reporting API is built around a custom expression language that customers (and our main reporting application) can use to slice and dice their data. The expression language is a simple tool that allows you to ask powerful and complex questions and quickly get the answers you need.
The actual Mixpanel expression engine is part of a complex, heavily optimized C program, but the core principles are simple. …
We recommend setting up work queues and batching messages to our customers as an approach for scaling upward server-side Mixpanel implementations, but we use the same approach under the hood in our Android client library to scale downward to fit the constraints-battery power and CPU-of a mobile phone.
The basic technique, where work to be done is discovered in one part of your application and then stored to be executed in another, is a simple but broadly useful; both for scaling up in your big server farm and scaling down for your customer’s smartphones.
When clients ask us for help…
On Monday we shipped distinct_id aliasing, a service that makes it possible for our customers to link multiple unique identifiers to the same person. It’s running smoothly now, but we ran into some interesting performance problems during development. I’ve been fairly liberal with my keywords; hopefully this will show up in Google if you encounter the same problem.
The operation we’re doing is conceptually simple: for each event we receive, we make a single MySQL SELECT query to see if the distinct_id is an alias for another ID. If it is, we replace it. …
At Mixpanel, we believe giving our customers a smooth, seamless experience when they are analyzing data is critically important. When something happens on the backend, we want the user experience to be disrupted as little as possible. We’ve gone to great lengths to learn new ways for maintaining this level of quality, and today I want to share some of the techniques were employing.
Mixpanel.com runs Django behind nginx using FastCGI. Some time ago, our deploys consisted of updating the code on our application servers, then simply restarting the Django process. This would result in a few of…
Memcache is great. Here at Mixpanel, we use it in a lot of places, mostly to cache MySQL queries but also for other data stores. We also use kestrel, a queue server that speaks the memcache protocol.
Because we use eventlet, we need a pure python memcache client so that eventlet can patch the socket operations to be non-blocking. The de-facto standard for this is python-memcached, which we used until recently.
When customers send data to our /track API endpoint, it hits a load balancer which forwards it to an API server. The API server sticks the data on a…
This post is a follow up to Why we moved off the cloud.
As a company, we want to do reliable backups on the cheap. By “cheap” I mean in terms of cost and, more importantly, in terms of developer’s time and attention. In this article, I’ll discuss how we’ve been able to accomplish this and the factors that we consider important.
Backups are an insurance policy. Like conventional insurance policies (e.g. renter’s), you want piece of mind that your stuff is covered if disaster strikes, while paying the best price you can from the available options.
Backups are similar…
Last year, I wrote about my internship story because I felt it was such an impactful experience for me. It was simply a story of how working hard and being out in Silicon Valley can lead to very serendipitous occurrences. I don’t think I could have built Mixpanel without the knowledge and connections I gained at Slide. I learned so much about product, how to “get things done” at a real company, and met really close friends that I will take with me forever in life. …