Image for post
Image for post

This is the final part of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here: part 1, part 2, part 3, part 4). Now I’m going to talk about one more type of outages and the conclusions we made about them, how we modified the development process, what automation we introduced.

1. Bad release: bug

This is the most unpleasant kind of outages and incidents. The only kind that doesn’t have any visible symptoms besides complaints of end users or business users. …


Image for post
Image for post

This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here: part 1, part 2, part 3). In this and the next part, I’ll talk about the accidents and outages in detail.

1. Bad release: database overload

Let me begin with a specific example of this type of outage. We deployed an optimization: added USE INDEX in an SQL query; during testing as well as in production, it sped up short queries, but the long ones — slowed down. The long queries slowdown was only noticed in production. As a result, a lot of long parallel queries caused the database to be down for an hour. We thoroughly studied the way USE INDEX worked; we described it in the Do’s and Dont’s file and warned the engineers against the incorrect usage. We also analyzed the query and realized that it retrieves mostly historical data and, therefore, can be run on a separate replica for historical requests. …


Image for post
Image for post

This is the next article of the series describing how we’re increasing our service availability in Citymobil (you can read the previous parts here and here). In further parts, I’ll talk about the accidents and outages in detail. But first let me highlight something I should’ve talked about in the first article but didn’t. I found out about it from my readers’ feedback. This article gives me a chance to fix this annoying shortcoming.

1. Prologue

One reader asked me a very fair question: “What’s so complicating about backend of the ride-hailing service?” That’s a good question. Last summer, I asked myself that very question before starting to work at Citymobil. I was thinking: “that’s just a taxi service with its three-button app”. How hard could that be? It turned to be a very high-tech product. …


Image for post
Image for post

This is a second article out of a series “Citymobil — a manual for improving availability amid business growth for startups”. You can read the first part here. Let’s continue to talk about the way we managed to improve the availability of Citymobil services. In the first article, we learned how to count the lost trips. Ok, we are counting them. What now? Now that we are equipped with an understandable tool to measure the lost trips, we can move to the most interesting part — how do we decrease losses? Without slowing down our current growth! Since it seemed to us that the lion’s share of technical problems causing the trips loss had something to do with the backend, we decided to turn our attention to the backend development process first. …


Image for post
Image for post

In this first part of an article series “Citymobil — a manual for improving availability amid business growth for startups” I’m going to break down the way we managed to dramatically scale up the availability of Citymobil services. The article opens with the story about our business, our task, the reason for this task to increase the availability emerged and limitations. Citymobil is a rapid-growing taxi aggregator. In 2018, it increased by more than 15 times in terms of number of successfully completed trips. Some months showed 50% increase compared with the previous month.

The business grew like a weed in every direction (it still does): there was an increase in server load, team size and number of deployments. At the same time the new threats to service availability emerged. The company faced a task of the most importance — how to increase availability without compromising company growth. In this article, I’ll talk about the way we managed to solve this task in a relatively short time. …


Image for post
Image for post

Surely you’ve heard of Meltdown by now. It’s a hardware vulnerability that allows an unauthorized process access to privileged memory. It affects Intel processors produced since 1995. Here are some details: https://en.wikipedia.org/…/Meltdown_(security_vulnerability)

The only known effective way to fix this issue is to apply a patch to the kernel of your OS (Linux, Windows, macOS), which will significantly increase the cost of system calls: that will result in an average of 5–30% of performance penalty on everything you run on your Linux, Windows, or macOS. More details here: https://www.theregister.co.uk/…/01/02/intel_cpu_design_flaw/

What does that mean for the software you use? The more syscalls it makes per operation the worse. That is especially bad for database management systems (DBMS), because they normally make a lot of syscalls per query. This is a very simplified execution chain of a traditional relational DBMS processing a…


Original article here: https://habrahabr.ru/post/317584/

Image for post
Image for post

We sat down with Konstantin Osipov, developer, creator and head of the Tarantool project, to talk about his work environment, habits and tools of the trade. He told us how he manages to maintain work-life balance and wear both the developer and manager hats at Mail.Ru Group. Also, don’t miss Konstantin’s must-read list of IT books that he shared with us.

What do you do in the company?

I write code, review it, discuss tasks with my team. That is to say, I’m a developer.

On the other hand, I give talks and interviews. Today, I couldn’t get many of my ideas across and was stumbling over my words. …


Image for post
Image for post

Original article at: https://m.habrahabr.ru/post/317584/

This is another article in the series on the work life of IT specialists, where we ask them how they maintain work-life balance, what professional habits they have, what their tools of the trade are, and about much more.

It’s always interesting to see what common ground they share as well as where their views differ from one another. Their answers may help us discover some hidden trends or may offer useful tips that will come in handy for many of us.

Our guest today is Konstantin Osipov, developer, creator and head of the Tarantool project. He told us how he manages to wear both the developer and manager hats, and shared his must-read list. …


Hey!

In this article, I’d like to talk about when I use an in-memory database and when I prefer a traditional DBMS and why.

Image for post
Image for post

When I need to decide which DBMS to use — in-memory (let’s call it IMDB) or traditional (I’ll be calling it RDBMS) — I usually make a choice based on the type of storage where my data is going to be kept. I personally divide all the numerous options out there into three groups: RAM, solid-state drive or flash memory (SSD) and hard disk drive (HDD). …


Hey!

In a couple of my last articles, I was talking about persistence with in-memory databases. Check this out here and here.

In this article, I would like to touch upon the performance problems of in-memory databases. For starters, let’s just talk about performance in the simplest case, when you change the value of a specified key. And let’s simplify this case even further: assume there’s no database server at all. I mean, no client-server interaction over network. So, the database resides totally inside your application’s RAM space.

If you didn’t have a database server, then you would probably store key-value pairs inside your application’s memory in a hash table. …

About

Denis Anikin

VP Tech at Citymobil (taxi aggregator)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store