Stories by Igor Kuznetsov on Medium

Building a dev setup for the AI era

Igor Kuznetsov — Sat, 09 May 2026 09:48:28 GMT

We already had a local development setup.

It was not beautiful, but it worked. There was a docker-compose.yml file. I could start the services I needed and do my job. Some engineers used Docker Compose too. Others ran parts of the system directly on their machines. People used different editors as well: VS Code, Cursor, Zed, or just a terminal. For a backend project, this was normal.

Then Claude Code entered the picture.

That changed how I looked at the whole setup. The question was no longer just:

Can a developer run the project locally?

It became:

Can a developer and an AI agent work in this project without giving the agent my whole laptop?

That second question is harder.

When I use an AI coding agent, I want it to do actual work. I want it to read files, run tests, inspect errors, make changes, and try again. If I have to approve every small command, the flow breaks. At some point the tool starts feeling like a very slow autocomplete.

But the other extreme is also uncomfortable. Running an agent in a permissive mode directly on the host gives it access to much more than the project. My laptop has SSH keys, shell history, browser data, personal config, other repositories, and a lot of accidental state. Even if I trust the tool, that is too much surface area.

So this work was not about using devcontainers because they are fashionable. I was trying to build a better local development setup for a world where AI agents are part of the workflow.

The setup needed to stay familiar for people who already used Docker Compose. I wanted it to feel native for people working in IDEs. Another requirement: give Claude Code enough room to be useful without exposing the whole host.

In this article, I will walk through how I took an existing Docker Compose setup and added service-specific devcontainers, ran Claude Code inside them, kept secrets and VPN access scoped to the project, and looked at what still needs stronger guardrails.

The setup was fine until it was not

Before this work, the local workflow was straightforward.

Clone the repo. Start Docker Compose, or run the services locally. Turn on the corporate VPN when staging services were needed. Run Claude Code on the host, or run it somewhere else if you had your own setup.

Nothing was obviously broken. That is probably why this kind of work is easy to postpone. The pain is spread out across small things.

One person runs everything through Docker Compose. Another person runs one service locally and the rest in containers. Debugging works in one editor but not in another. None of these problems is dramatic, but they add up.

VPN access had the same shape. Some workflows needed staging services behind the corporate VPN. Turning the VPN on globally worked, but it affected the whole machine. Routing changed. DNS changed. Other unrelated work could be affected. The project needed VPN access sometimes; my whole laptop did not.

Claude Code made these tradeoffs more visible.

An AI agent is useful when it can move with fewer interruptions. But fewer interruptions usually mean broader permissions. If those permissions are on the host, I do not like the risk. If they are inside a project container, I can live with the tradeoff.

A container is not a perfect security boundary. I would not describe this as “secure” in an absolute sense. But it is still a smaller box than my laptop, and that matters.

The rough principle became:

Give the agent room to work, but make the room smaller.

What changed with AI agents

I was thinking a lot about Dan Guido’s talk on how Trail of Bits rebuilt their company around AI. The detail that stayed with me was not a specific repository or tool. It was the idea that adopting AI is not just installing a CLI.

If you want people to use these tools seriously, you need a workflow around them. You need defaults. You need a good first run. You need some policy. You need a place where the tool can act without forcing the developer to make a security decision every two minutes.

AI adoption needs the same standards as any other tool. If every developer has to decide how to install the agent, where to run it, what permissions are OK, and where config belongs, they spend energy before doing real work. A standard dev environment removes one of those decisions: the project opens the same way, the agent runs in the same kind of container, shared settings live in the repo, and personal auth stays personal.

That matched my experience.

If Claude Code asks for approval all the time, people will use it less. Or they will give it broad permissions in the easiest place: directly on the host. Nobody wants to fight their tools all day.

I wanted a middle ground:

Claude Code runs inside the devcontainer
the main filesystem it sees is the project
secrets are mounted only when needed
project guidance lives in the repo
personal auth stays outside the repo
later, we can add stricter network and command guardrails

This does not solve every problem. It just changes the default from “agent on my laptop” to “agent in this project environment”.

For me, that is already a meaningful improvement.

Two services, one Compose stack

The project was a monorepo with a few connected services. I will call them an API Gateway and a backend service.

The API Gateway proxies requests to the backend service. They also communicate through queue-like flows. In production, that is closer to AWS SQS. Locally, Redis stands in for that piece. The backend service also needs PostgreSQL.

So the local setup had the API Gateway, the backend service, Redis, and PostgreSQL.

The design question was simple to ask and annoying to answer:

How should devcontainers work in a monorepo with multiple connected services?

One answer is to create one big devcontainer for everything. I did not like that. It makes the environment heavier, and it does not match how I usually work. Most of the time I am inside one service, even if I need the rest of the stack running nearby.

Another answer is to give each service its own full Docker Compose setup. That also felt wrong. It duplicates configuration and makes cross-service work harder.

The approach that fit this repo was somewhere in the middle.

We kept one root Docker Compose stack. Then each service got its own devcontainer config. The API Gateway devcontainer could start the pieces it needed. The backend devcontainer could start its own dependencies. When I needed to work across the boundary, both devcontainers could exist at the same time.

In practice, the structure looked roughly like this:

repo/
  docker-compose.yml
  api-gateway/
    .devcontainer/
      devcontainer.json
  backend/
    .devcontainer/
      devcontainer.json

The names do not matter much. The important part is that docker-compose.yml stays at the root, while each service owns its own devcontainer entry point.

This is not a universal monorepo pattern. If you have dozens of services, you may need something more deliberate. But for a small set of connected services, this worked well.

The devcontainer did not replace Docker Compose. It reused it.

That distinction made the design much easier to explain. Docker Compose remained the shared runtime model. Devcontainers became the IDE-friendly entry point into that model.

Reusing Docker Compose

One thing I liked about this setup is that it did not create a second stack.

The project already had a working docker-compose.yml, so the devcontainer config pointed at it:

{
  "dockerComposeFile": ["../../docker-compose.yml"],
  "service": "api-gateway",
  "workspaceFolder": "/workspace",
  "shutdownAction": "none"
}

This is simplified, but the shape is the point.

The IDE opens inside the selected Compose service. The other services still come from the same Compose file. People who prefer terminal workflows can continue using Compose directly.

One setting was more important than I expected:

{ "shutdownAction": "none" }

This matters when multiple devcontainers share the same Compose stack.

Imagine the API Gateway is open in one IDE window and the backend service is open in another. Closing one window should not stop services that the other window still needs. Without thinking about shutdown behavior, it is easy to make one editor window accidentally own the whole stack.

With shutdownAction: "none", the devcontainer behaves more like an entry point. It does not try to be the lifecycle manager for everything.

That is a small config detail, but it changes how the setup feels.

Keeping the container idle

Another mistake I wanted to avoid was auto-starting the app as soon as the devcontainer opened.

At first that sounds convenient. Open the container, app starts, done.

In practice, it gets in the way.

I wanted VS Code tasks and launch configs to control the workflow. Sometimes I want watch mode. Sometimes I want debug mode. Sometimes I just want a terminal with dependencies installed. If the container starts the app by default, then a task can start a second copy and hit a port conflict.

So the devcontainer should prepare the environment, then wait.

The setting for that is:

{ "overrideCommand": true }

With this enabled, the container does not run the normal service command from Docker Compose. It stays alive, and the editor tasks decide what happens next.

For debugging Node inside the container, the inspector needs to listen on an address reachable from outside the process namespace:

node --inspect=0.0.0.0:9229 ...

Binding to localhost inside the container can be confusing because it is localhost from the container's point of view, not necessarily from the host or debugger's point of view.

I also learned to be explicit with ports. Docker Compose can publish ports. VS Code can auto-forward ports. Both features are useful, but if they both try to be clever at the same time, debugging becomes harder to reason about.

The pattern I settled on was:

Compose defines the important service ports
editor tasks start the app
launch configs attach the debugger
the devcontainer provides the environment

That separation made the workflow easier to debug.

Moving readiness into Compose

One cleanup came from a CodeRabbit review.

There was a startup script that waited for PostgreSQL with a shell loop. You have probably seen this pattern: try to connect, sleep, try again, repeat until the database accepts connections.

It worked. I have written this kind of script many times.

But in this setup, it was in the wrong layer. Docker Compose was already managing PostgreSQL, so Compose should also know when PostgreSQL is ready.

The cleaner version was to use a PostgreSQL healthcheck with pg_isready, then make the backend service wait for it:

services:
  postgres:
    healthcheck:
      test: pg_isready -U backend -h 127.0.0.1
      interval: 5s

  backend:
    depends_on:
      redis:
        condition: service_started
      postgres:
        condition: service_healthy

The actual username and service names depend on the project, but the idea is the same: let PostgreSQL report when it can accept connections, and let Compose use that signal.

The important part for the dependent service is this:

depends_on:
  postgres:
    condition: service_healthy

This moved the readiness concern closer to the service that owns it.

A post-create script should prepare the workspace. It can install dependencies or create local files. But waiting for infrastructure belongs in the infrastructure config when Compose is already responsible for those services.

CodeRabbit was useful here because it pointed at a rough edge. I still had to decide whether the suggestion fit the architecture. That is the right relationship with AI review tools, at least for me. They are good at making me look twice. They do not get to make the design decision.

Running Claude Code inside the devcontainer

This was the section I cared about most.

Installing Claude Code was only half the work. The real decision was where it should run.

That meant running it inside the devcontainer as the existing non-root node user, with the terminal starting at the repo root. This mattered because the Claude Code devcontainer docs recommend a non-root user, especially with --dangerously-skip-permissions: if I reduce approval prompts for the agent, I do not also want it running as root, and I do not want to mount more of the host than the project needs.

The repo root part sounds minor, but it matters. Claude Code needs to find project-level guidance and settings. If the terminal opens in the wrong directory, the tool may miss files like CLAUDE.md, or the developer has to remember extra setup steps.

Persistence also needed a bit of care.

A named volume for ~/.claude can preserve Claude Code's directory across container rebuilds. But it does not automatically preserve sibling files such as ~/.claude.json.

That distinction matters because tool state and auth can live in different places.

The pattern I liked was:

project-owned Claude guidance stays in the repo
developer-owned auth stays outside the repo
auth is mounted through docker-compose.override.yml
the mount is read-only where possible

For example:

services:
  api-gateway:
    volumes:
      - ~/.claude.json:/home/node/.claude.json:ro

The /home/node path in this example is not random. It matches the non-root user inside the container. If the container uses a different non-root user, the mount path should match that user's home directory instead.

This gives the team a shared way to configure Claude Code for the project without putting personal credentials in Git.

I like this boundary. The repository can describe how the agent should behave in this codebase. The developer can bring their own auth. Those are different concerns, and they should stay separate.

Reference: Claude Code devcontainer docs.

Putting VPN access in the container

The VPN work followed the same logic.

Some workflows need staging services behind a corporate VPN. That does not mean my whole laptop should be on that VPN all day.

So I moved VPN access into the devcontainer path.

The image can include OpenVPN tooling. A local Compose override can add the device, capability, and personal .ovpn file:

services:
  api-gateway:
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    volumes:
      - ~/vpn/company.ovpn:/etc/openvpn/company.ovpn:ro

The real .ovpn file is personal and should not be committed. The repo can include a docker-compose.override.yml.example to show the shape of the setup, and each developer can create their own local override.

I also added an editor task to connect the VPN. That made it part of the normal development workflow instead of a separate thing I had to remember.

The rule here is simple:

Put access where the project needs it.

The app sometimes needs the VPN. My whole machine does not.

What I would add next

The setup is still not as strict as I would like.

The next thing I would look at is network control. Running Claude Code inside a container is better than running it on the host, but an open container network still allows a lot.

A stricter version could allow access to Claude Code, package registries, required internal services, and selected staging endpoints. Everything else could be blocked by default.

That sounds clean on paper. In real projects it can get annoying quickly. Package managers download from more places than you expect. Company networks have weird dependencies. Internal tools call other internal tools. A strict allowlist needs maintenance, or people will route around it.

I would also like better guardrails around destructive actions:

deleting large directory trees
force-pushing branches
changing sensitive config
installing unexpected packages

Some of this could be handled with hooks. Some of it needs team policy. Some of it may be too fragile to treat as security.

I am fine with that. Guardrails do not have to be perfect to be useful. They just need to catch enough mistakes to be worth the friction.

The result

This work started as devcontainer setup, but that is not really what it was about.

It was about making the development environment fit the way we now work.

After the changes, IDE users get a native devcontainer workflow. Terminal users can still use Docker Compose. The two connected services can run together. Claude Code can work with fewer approval prompts, but inside a project environment instead of directly on the host. VPN access can be scoped to the container. The setup is also documented in the project instead of living only in someone’s head.

That feels like progress.

It does not remove all risk. It does not mean this exact design fits every repo. And it definitely does not make AI agents magically safe.

But I am much more comfortable giving Claude Code room to work inside a constrained project environment than on my laptop.

That is the practical tradeoff I wanted: fewer interruptions, more useful autonomy, and a smaller blast radius when something goes wrong.

References

These resources might be inspirational for you as they were for me. Specifically I want to mention the video by Dan Guido about his approach to transform the company to AI native. It’s a top to bottom action, but you can learn a lot on hands on actions.

Dan Guido’s talk on rebuilding Trail of Bits around AI

https://medium.com/media/c28836c183d39de194cb919a45626ce4/href

Originally published at https://igkuz.ru on May 9, 2026.

Managing Unicorn & Puma web servers with systemd

Igor Kuznetsov — Mon, 16 Sep 2019 10:21:01 GMT

In production environment, you have to deal with service crashes and auto restarts, and there are plenty of tools to that end— supervisord, monit etc. Having lots of projects, we try to use standard utilities that come with distros, Ubuntu in our case.

Standard Ubuntu init system is Systemd, definitely one of the most famous and widely used tools. It’s a security standard to run apps under non-privileged users, and systemd offers a solution in this case.

Our initial setup is Ruby via RVM and Unicorn/Puma running simple Rack application.

Important notice, I don’t provide Docker files with examples as systemd won’t succeed in a non-privileged mode in Docker, so I advise you to test this in a separate virtual machine. You can find more on this case on askubuntu and Docker docs. For macOS, you can use LXD or multipass to spin up Ubuntu VM. LXD setup can be tricky, so for this project, I’ve prepared a cloud-init file that can be used as a multipass entrypoint.

https://medium.com/media/5824086ef6dc8367cbf94dd4cb5542ff/href

multipass launch -n sysd -c 1 -m 2g --cloud-init init.yml xenial

Based on Xenial, because we made an infrastructure upgrade from 12.04 to 16.04 LTS in 2017, so it’s our standard for this & next year.

Assume we have a non-privileged user awesomeapp with home directory /apps/awesomeapp and shell /bin/bash. App root /apps/awesomeapp/awesomeapp-git.

Let’s prepare the directory structure for systemd files:

mkdir -p /apps/awesomeapp/.config/systemd/user

Create unicorn.service . All files for the application can be found at sample repo on my Github.

https://medium.com/media/d89516b1ebade7e149e331de9fd4faed/href

Reload systemd daemon:

awesomeapp@sysd:~$ systemctl --user daemon-reload
Failed to connect to bus: No such file or directory

There is a discussion about this error on launchpad. To fix it, just add to /apps/awesomeapp/.profile.

export XDG_RUNTIME_DIR=/run/user/`id -u`

After that, from root or your sudo user enable user lingering:

sudo loginctl enable-linger awesomeapp

check that it works:

loginctl user-status awesomeapp

You should see smth like this:

User status after lingering is enabled.

sudo -iu awesomeapp
systemctl --user daemon-reload

systemctl command under app user without sudo, it’s not a mistake. App user can use it freely, so, dev team can maintain service files easily without operations team involved.

Check the status:

Checking Unicorn systemd service status

It’s inactive, not running and disabled. It means that it won’t start up after boot. For example, you have 1..N app machines, and some of them can go down for system upgrades or just because of a crash, so we need the app running after reboot.

systemctl --user enable unicorn

Enabling Unicorn systemd service

Let’s give it a try:

Starting and checking status for Unicorn systemd service

Unicorn master process and 2 worker process, as we configured.

Checking app response for Unicorn systemd service

Let’s do the same for Puma, standard Rails web server. Puma.service file:

https://medium.com/media/dd3caa2cafaae40f5f7f531dc4358dcf/href

Puma systemd service run and check

Now you can reboot VM and check that everything up and running.

All commands with comments.

Unicorn:

# check unicorn.service status
systemctl --user status unicorn

# start unicorn.service
systemctl --user start unicorn

# start unicorn.service on VM boot
systemctl --user enable unicorn

# disable starting unicorn.service on VM boot
systemctl --user disable unicorn

# restart unicorn.service
systemctl --user restart unicorn

# graceful restart unicorn.service
systemctl --user reload unicorn

# stop unicorn.service
systemctl --user stop unicorn

Puma:

# check puma.service status
systemctl --user status puma

# start puma.service
systemctl --user start puma

# start puma.service on VM boot
systemctl --user enable puma

# disable starting puma.service on VM boot
systemctl --user disable puma

# restart puma.service
systemctl --user restart puma

# graceful restart puma.service
systemctl --user reload puma

# stop puma.service
systemctl --user stop puma

That’s it for today. Puma and Unicorn configs are for the sample app, for production purposes, you definitely must adopt to needs and load.

Ruby retry/scheduled tasks with Dead Letter Exchange in RabbitMQ

Igor Kuznetsov — Wed, 13 Mar 2019 07:01:02 GMT

©https://soulstudiostore.com/design/paula-studio-ill-do-it-later-mood-clock

There is a project where I need outgoing requests rate limiting. This is the opposite case for the more common situation when you develop API and rate limit clients incoming requests.

With outgoing requests, you need to queue them and give a slot only for some. We’re collecting analytics for web pages. As we collect data only from public sources, we try not to be banned by firewalls or DDoS protection services. Making 1 request per second for content sites is more than enough, as there is no need for speed.

RabbitMQ Cluster is part of our infrastructure and the default queuing solution. Unfortunately, RabbitMQ doesn’t come with native support for delayed or scheduled messages. Fortunately, RabbitMQ has Dead Letter Exchanges (DLX), which allows us to simulate message scheduling.

I’m going to explain how it works and what do we need to build.

When you create a queue (Q1) bound to exchange (X), you can also specify the Dead Letter Exchange (DLX) to route rejected messages. This process is automatically handled by RMQ. Another queue (Q2) bound to DLX to consume the messages after they’ve been rejected from original (Q1).
And to deliver messages automatically back we can set original exchange as dead letter exchange to the (Q2).
Look at the image.

Sounds a little bit complex, but it’s cool that we need only 1 consumer, all other stuff will be handled by RabbitMQ.

How to?

Create TargetQueue bound to TargetExchange:
– set Dead Letter Exchange to RetryExchange
Create RetryQueue bound to RetryExchange:
– set Dead Letter Exchange to TargetExchange
– set Message Time To Live (TTL) to the desired time (1 minute for example)

Steps to test the solution:

Publish message to TargetQueue
Consumer gets the message and tries to process it
Process fails, consumer rejects the message
Rabbit routes the message with the same routing key to RetryExchange
Message moves to RetryQueue, sits for 90 seconds
When message expires, it is resent to TargetExchange and routed to TargetQueue

Let’s proceed to code

In Ruby, there are two popular solutions for RabbitMQ. Bunny and Sneakers, which is a really nice abstraction for Bunny. For this post, I choose Bunny as it can show some low-level operations. For my next post, where the whole project is going to be described, I’ll show the “Sneakers” way.

As always, the final solution can be found in my repository on GitHub.

Create a Gemfile and install all necessary stuff.

https://medium.com/media/6c2339366ba946682ab494dbb9b17ede/href

Create publisher, it only publishes a message with routing key to an exchange.

https://medium.com/media/85fd9d95fef1edd457c46f19e1ab2b1b/href

Create consumer, it will get messages, process them, try to get the slot for the request and whet it fails, reject them.

https://medium.com/media/1d1687c2b7773d19452a96ff82f7fd57/href

Create the starting point for our applications. We create exchanges and queues in it.

https://medium.com/media/62bc29306014c75165248de1d72b2482/href

For rate-limiting, I prefer Redis way. It is one of the most popular solutions for in-memory databases. I think a lot of ruby projects use Sidekiq, so you definitely have Redis at your infrastructure. It can be tricky to develop a rate-limiting without Redis or Memcached, but still, there is a way. If you feel strong and enjoy inventing the bicycles, code it by yourself. For example — Token buckets.

App in action

docker-compose run app ./start

Take a look at RabbitMQ management page. It is available at http://localhost:8080.

Created RabbitMQ exchanges

Created RabbitMQ queues

docker-compose run app ./publisher

10 messages published

As we can see all messages are published, but as there are no consumers, all they sit in a queue.

docker-compose run app ./consumer

Messages processed by consumer

Messages reside in retry.queue

Wait for 60 seconds and look at the terminal.

Messages processed after 1 minute waiting in retry.queue

The messages are returned back to the work.queue , once again, only 1 message is processed as there are no slot for others, and they are pushed back to retry.queue by Rabbit.

If you grabbed the code from the repo, read the README and started Docker containers, then you can visit http://localhost:8080 and enjoy watching how messages are processed and requeued by yourself.

Notes on DLX & TTL

There are several ways how you can set a DLX and TTL.

DLX:

When declaring a queue.
With RabbitMQ policies.

TTL:

When declaring a queue
With RabbitMQ policies
On a per-message basis

Pros & cons

When you declare a queue, the only way to change options is to stop all the consumers and publishers, drop the queue and redefine it from the ground. It’s not that good, as you don’t know how can publish to it. But all configuration can be done via code.

When you create policies, you can change the policy at any time and no app redeployment will be needed. On the other hand, you must grant people permission to create policies, which is dangerous. Another option is to ask system administrators or operations team to change the policy, which can be a little bit annoying for them.

TTL can be added with an expiration header to message. When a message expires and gets to the head of the queue, it is automatically routed to DLX by RabbitMQ.

Feel free to ask questions in comments or connect directly on twitter.

How to setup Ruby Object Mapper (ROM) for standalone project

Igor Kuznetsov — Mon, 11 Mar 2019 07:01:00 GMT

I’ve been looking at Data Mapper project for a long time. It transformed into Ruby Object Mapper and when I came up with a simple standalone project for collection post analytics, I decided to use it.

Official ROM documentation is a great place to start, but there is a lack of information on some basic setup. That’s why I thought that my experience will be useful for those who just came from Active Record and looking for first steps.

We’re going to build a simple application that starts Pry as a console. There will be 2 entities — Company and Post. You can check the rom-sample-app for the full code example on GitHub.

Company & Post relations

Project setup

$ mkdir rom-sample-app && cd rom-sample-app
$ touch Gemfile

https://medium.com/media/ebb933a6ea24186ae00881fb3006ba85/href

$ bundle install

Create boot.rb and require project dependencies.

https://medium.com/media/1799a960801c7bdf93ef5a005fcf1362/href

Create console.

$ touch console
$ chmod +x console

https://medium.com/media/47a69d6a796f16a44915cd38aba7409b/href

Now we can get Pry console by calling $ ./console . Next we must connect to DB and create our entities.

Working with MySQL from ROM

Connecting ROM & MySQL

Prepare ROM::Configuration to connect app to MySQL database.

ROM uses Sequel API and we have to count with it. Rails provide default Rake tasks for creating and deleting the DB, ROM doesn’t. I won’t describe the process of creating and deleting the database as it is necessary only 1 time in production environment. Most of the time this process involve DevOps engineers or system administrators and developers only grab the config. I suppose you can create the database for development and test environments on your own.

https://medium.com/media/874014c7b3eb3d4c768fff42001cb046/href

You shouldn’t work with config like this in production environment. Read from separate file or env variables instead. But for the prototyping purposes it’s more than enough.

Now we have MySQL server, created database and in console we can work with MAIN_CONTAINER object. Calling ROM.container will finalize the process of configuring ROM and all the hooks and callbacks will be invoked. So if you need something to register or configure please do it before calling container method.

Working with ROM migrations

ROM migrations

The SQL adapter uses Sequel migration API exposed by SQL gateways. You can either use the built-in rake tasks, or handle migrations manually. To load migration tasks simply require them and provide db:setup task which sets up ROM.

https://medium.com/media/3ecf12728a138d191109ec2864b9b272/href

Create migrations for Company and Post.

$ bundle exec rake db:create_migration[create_companies]
$ bundle exec rake db:create_migration[create_posts]

Now there are 2 files in db/migrate directory. Migration names are prepended by timestamps. Timestamp migrations are created by default, but there is other setup where you can use just integers. More on that in official docs.

I used ROM in real project, so I’m going to put here the schema and explain why I needed these fields in other publication.

https://medium.com/media/414b2d9cdbd0eefcdfec0e7519a89865/href

Run migrations and check that schema exists. On bundle exec rake db:migrate there will be a check for pending migrations. We don’t have any abstractions to work with data, just gateway connection for now.

$ bundle exec rake db:migrate
  <= db:migrate executed

$ ./console
pry(main)> MAIN_CONTAINER.
  gateways[:default].connection.schema(:companies)
=> [[:id,
  {:primary_key=>true,
   :auto_increment=>true,
   :generated=>false,
...
 [:updated_at, {:primary_key=>false, :generated=>false, :allow_null=>true, :default=>nil, :db_type=>"datetime", :type=>:datetime, :ruby_default=>nil}]]

ROM Relations

ROM Relations for Company and Post

Users of ROM implement Relations, which give access to data. A relation is defined as a set of tuples identified by unique pairs of attributes and their values. An example of relations is tables in a SQL server. Relations are really the heart of ROM. They provide APIs for reading the data from various databases, and low-level interfaces for making changes in the databases.

Let’s create Post & Company relations in lib/relations/ directory.

https://medium.com/media/f53d9799b955c95bd21cba4c56389510/href

As ROM uses dependency injection throughout the lib, we must register our components. More on that in official docs.

configuration.register_relation(Companies, Posts)

We put this code into console file. When your app grows, you would definitely move it to a special place like initializer or smth similar. But for demonstration purposes it’s more than enough to put it near the app running code.

Now we can access relations from MAIN_CONTAINER, like:

pry(main)> MAIN_CONTAINER.relations[:companies].count
=> 0
pry(main)> MAIN_CONTAINER.relations[:posts].count
=> 0

Let’s go further with ROM Commands. We want not only querying the data, but add or update some.

ROM Commands

Commands are used to make changes in your data. Every adapter provides its own command specializations, that can use database-specific features.

Core commands include following types:

:create - a command which inserts new tuples
:update - a command which updates existing tuples
:delete - a command which deletes existing tuples

We are going to create commands for Posts and Companies and put them into lib/commands/.

https://medium.com/media/d3cc6c55362046c644aea6c95307430c/href

Note the lines

use timestamps
timestamp :created_at, :updated_at

This how we use Timestamps plugin to automatically set the dates like Active Record does.

Now we must register our commands:

configuration.register_command(CreateCompany, DeleteCompany)
configuration.register_command(CreatePost, UpdatePost, DeletePost)

Let’s create the 1st company:

[1] pry(main)> companies = MAIN_CONTAINER.relations[:companies]
[2] pry(main)> companies.command(:create).call(
       name: 'My 1st Company', domain: 'http://example.com')
=> {:id=>1,
 :name=>"My 1st Company",
 :domain=>"http://example.com",
 :state=>"running",
 :created_at=>2019-03-09 14:24:23 +0000,
 :updated_at=>2019-03-09 14:24:23 +0000}

That’s it. id and state were assigned automatically, timestamps creation was also handled by ROM.

Testing with ROM

The question then arises, “How should I test this stuff?..”. We’re going to use RSpec with ROM factory. It’s kind of replacement for factory_bot gem by Thoughbot.

It’s time to add settings.yml to project and put config there. Also we need to add rspec dependency to Gemfile and create test database. View the commits to the repo: 1, 2.

Define factories

https://medium.com/media/8b7857424d5aaf13c5e6a35570151da8/href

Check company_spec.rb in repository for simple test of creating company with 1 post.

This is it. We created a standalone console application with Ruby Object Mapper. Additional info on running app in docker container is in README.

If you have any questions, write a comment or connect directly on twitter.

Socket Activated Containers (Unicorn + Systemd)

Igor Kuznetsov — Tue, 17 Jul 2018 08:34:00 GMT

У клиента есть большое количество медийных спец проектов (~250). Это виджеты, лендинги, апишки и т.д. С 2012 года это все живет на 1 машине со связкой Nginx + Passenger + Ruby.

Все хорошо, за исключением момента обновления ОС, когда все проблемы со старыми/новыми версиями пакетов вылезают и заявляют о себе в полный голос.

Казалось бы, идеальная история для контейнеров, но есть одно но. Passenger или PHP-FPM умеют то, что из коробки нет даже у Kubernetes — это старт по входящему трафику.

На просторах сети это называется (гуглится) — Socket Activation. На Network или Unix сокет приходит пакет, сервис запускается. Покопавшись на Stack Overflow и некоторых админских форумах, понял, что мысль о том, что контейнер, можно только по необходимости поднимать интересует многих. Такие запросы есть у больших ребят — привет велосипеды (читай свои решения).

Оказывается то, что нужно написали уже лет 7 назад. Пост про реализацию функционала в systemd датирован 2011 годом. Ну и в 2013 уже вышел пост про старт контейнера с помощью той же функциональности. Есть одно но, там используются стандартные средства systemd для виртуализации, нас же интересуют более популярные на данный момент Docker.

Схему забрал из блога Atlassian.

Во всех статьях говорили о proxy сервисе, который принимает трафик и отвечает за запуск контейнера. Gочему нельзя сокет использовать сразу в контейнере? Вопрос этот возникает, потому что на каждый сервис придется делать 3 конфига systemd и минимум 2 сокета, что как мне казалось много и не слишком красиво. Посмотрел потом на конфиги kubernetes и 3 конфига systemd показались маленькими, простыми и понятными.

Но так что с сокетом то? А вот я не нашел способа как его пробросить в контейнер. Если кто знает, с радостью послушаю. Проблема не в софте (unicorn и puma), который может переиспользовать проброшенный сокет. Есть стандартные процедуры для этого. Через ENV переменные передаются ID процесса (LISTEN_PID) и номер слушающего сокета (LISTEN_FDS), после чего софт не должен пытаться открыть новый, а переиспользовать (подключиться) к сокету с соответствующими координатами.

Проблема заключается в том, что ID процесса и ссылку на сокет будут переданы с host машины, контейнер естественно ничего о них не знает.

Шарить данные с хост машиной конечно можно, но безопасно ли?

https://medium.com/media/7d81a48401be2437677883a7e900125e/href

Сделал gist с файлами для тестов. Простейшее Rack приложение и набор сервисов для systemd. Чтобы это все завелось надо завести отдельного пользователя, назвал его sact, и сложить Gemfile, Gemfile.lock, unicorn.rb, config.ru файлы в /apps/sact/sact-git . Установить docker, сложить systemd сервисный файлы в /etc/systemd/system и перезагрузить демона через systemctl daemon-reload . Также понадобиться nginx. Но думаю для тех кто это читает не составит большого труда разобраться.

Если обратиться на 80 порт localhost

curl localhost, то получим Simple app for test . Небольшая задержка на старте нас не пугает, потому что подразумевается что к сервисы обращаются не часто.

Что я не успел сделать — отслеживание трафика за определенный момент времени с целью остановить сервис. По идее это просто парсинг логов, если за последние N минут ничего не было, то останавливаем сервис с контейнером. Если мы соберемся делать такое решение на наших 250+ проектов, то обязательно напишу.

Stay tuned…

P.S. Где что подсматривал:

От разработчика systemd про активацию контейнеров.
Это на Python, но можно посмотреть как пробрасывается Socket из Systemd
А это пример на Ruby как работать с сокетом Sytemd
Откуда взял гифку

Переезд

Igor Kuznetsov — Tue, 06 Feb 2018 07:01:02 GMT

В декабре 2017 мы решили сменить датацентр (ДЦ). Решение серьезное и принималось из-за 2х причин. Отсутствие защиты от DDoS атак и ценник. Про то как классно ДЦ нам помог(нет) во время атаки я уже рассказывал, ну и снизить затраты на инфраструктуру не потеряв в производительности это challenge.

Важные решения в компании не принимаются одним человеком и должны быть хорошо аргументированы.

Как защищал решение перед руководством

1. Провел исследование нескольких вендоров
2. Составил таблицу +/- нового вендора
3. Посчитал новую стоимость владения (TCO — Total Cost of Ownership)
4. Посчитал стоимость переезда. Это человеко-часы, возможный простой по другим задачам, риски на частичную потерю работоспособности системы или полный выход из строя системы
5. Составил план переезда
5. Оформил вышеперечисленные пункты в набор слов и таблиц Excel
6. Написал письмо на всех заинтересованных

На что обращают внимание:

CFO — профит/потерю в деньгах
CEO — репутационные риски из-за простоя системы, как простой по другим задачам затронет бизнес в целом, целесообразность переезда именно сейчас

Как выбрать ДЦ

В моем случае всё было достаточно просто, потому что у нас уже была часть инфраструктуры в этом ДЦ. Но сравнения с другими хостингам мы с системным администратором тем не менее проводили.

Приоритеты в списке расставьте сами:

Географическое расположение. Чем ближе, тем лучше, но не Россия 😏
Стоимость оборудования. Дешево, не всегда хорошо, нужно смотреть на характеристики.
Характеристики оборудования. Дата выпуска процессоров, их частота. Объем оперативной памяти и возможность для расширения. SSD/HDD, возможность RAID с батарейкой, сетевые карты и входящий канал (10Gb/s — отлично, 1Gb/s — пойдет).
Функционал по настройке сети. В Softlayer можно сделать хороший LVS и утилизировать по факту канал каждого сервера, получая 1xN Gb/s (где N — кол-во серверов). В Hetzner IP прибит к серверу и с другой машины отвечать нельзя. Failover IP перемещается только через запрос в API.
Защита от DDoS. Да, в 2018 без неё никуда, особенно если вы интернет издание. Есть конечно бесплатные CloudFlare и Google Shield, но вы теряете в контроле. Если ДЦ может защитить вас на сетевых уровнях, то лучше выбрать такой ДЦ.
Включенный в пакет сетевой трафик. Оборудование быстро устаревает, а за трафик в последняя время берут дорого. Чем больше в базовом пакете, тем лучше.
Наличие русскоговорящей поддержки. Да, на англ можно спокойно всё узнать, но налаженный контакт с менеджером в ДЦ намного полезнее. Можно узнать много подробностей и получить хорошую консультацию о том как на их мощностях построить нужную инфраструктуру.
Облачные сервисы. У нас есть Ceph для хранения изображений, если бы похожее решение as a service было у ДЦ, то я думаю взял бы. Но пока поддерживаем сами.
Выделенный канал с другими ДЦ. Как правило большие ребята прокидывают свои кабели или выкупают мощность, чтобы дать крупным клиентам возможность перекидывать большие объемы данных между разными провайдерам.

После этапа согласований можно приступить к переезду. Описываю личный опыт:
1. Заказали машины нужной конфигурации. Обычно 3 рабочих дня, но в этот раз делали неделю, списываю на нестандартный запрос (RAID с батарейкой, дополнительные сетевые, failover IP, дополнительная подсеть).
2. Засетапили инфраструктуру. Образы виртуальных машин, настройка сетевых интерфейсов, балансировка, небольшие тесты железа. (1,5–2 недели)
3. Подготовка проектов к переезду. К этому моменту должны быть известны подсети, отказоустойчивые адреса для сервисов и оттестированы failover. (2 дня)
4. Зафиксировали план переезда. Это просто последовательность действий, чтобы во время работ не задумываться о следующем шаге и ничего не забыть.
5. Согласовали дату и временное окно. Сотрудники не ходят в приложения и ничего не создают, чтобы избежать лишних проблем. Мы пока не Google и даже не VK, поэтому можем себе позволить.

Как выглядит сам переезд

Что у вас должно быть на старте:

Набор серых IP всех виртуальных машин
Набор highly available IP для инфраструктурных сервисов (Memcached, DB, Elastic, LoadBalancer, etc)
VPN туннели между ДЦ
Набор конфигов приложений с новыми адресами
Список команд для старта/стопа приложений/очередей/сервисов под рукой
Холодная голова и запас времени

Cache. У нас свой небольшой CDN и чтобы вал запросов не положил хранилище статики, добавили новый IP на балансер с меньшим чем у основных машин весом и дали кэшу прогреться. После стабильных 90% Cache hit, переключаем весь трафик на новые машины.

DB. Master-Master репликация с одни активным мастером. Исходная позиция old-db0<->old-db1. Приложения смотрят HA IP –> old-db1.

Делаем read replica new-db1 и new-db0. Приходим к виду newdb0<-new-db1<-old-db1<->old-db0. Это схему надо настроить заранее, чтобы реплики успели подтянуть данные.

Рвем мастер-мастер old-db0<->old-db1, настраиваем новый master-master old-db1<->new-db1. Переходим к приложениям.

Поднимаем приложения на новых машинах. Новые инстансы смотрят в HA IP –> new-db1. Очереди не поднимаем, записи в базу пока нет. Проверяем работоспособность. Прогреваем кэш несколькими запросами.

На старых машинах выключаем очереди и всё что может писать в базу, кроме пользовательских запросов. Делаем проксирование запросов в новый DC на nginx. Переключаем DNS. Ускоряем обновление публичных DNS через волшебную кнопку Google.

Выключаем приложения на старых машинах, проверяем наличие коннектов на old-db1. Когда кроме процессов репликации ничего нет, идем рвать мастер-мастер. Делаем новый мастер-мастер new-db1<–>new-db0. Рвем связь со старой базой в DC1.

Включаем очереди и дополнительные сервисы. Запускаем переиндексацию документов в Elastic. Можно было бы перетащить базу и доиндексировать, но данных немного, поэтому решили сделать переиндекс для профилактики. Elastic только для поиска, а он исторически делает <1% трафика, поэтому не страшно выдать пользоветлям пустой ответ.

Делаем ручные проверки критичных мест приложений. Если что-то идет не так, то чиним, но у нас было ок.

У изданий была куча дополнительных доменов, нужно переключить все. Что именно можно найти в конфигах nginx в разделе server –> server_name.

Проверяем все правила алертов мониторинга и идем отдыхать.

Спустя 2 дня, когда DNS обновились, начинаем тушить машины в старом ДЦ и запрашиваем их выключение.

В итоге мы получили снижение стоимости владения в 4 раза, но потеряли в кол-ве процессоров.

Отражая DDoS

Igor Kuznetsov — Thu, 04 Jan 2018 15:54:34 GMT

Представьте ситуацию — вы идете домой с работы по знакомой улице, слушаете музыку и готовитесь достать ключи, но из-за угла выходит пара здоровых ребят, которые выше, сильнее, быстрее (на первый взгляд) и бьют вас прямым в челюсть. Вот так выглядит DDoS атака для 99% компаний. Это неожиданно и неприятно для владельца бизнеса, но очень интересно для разработчиков и системных администраторов.

Отличная возможность проверить реакцию: хваленые системы мониторинга, обещанные плюшки от софта и провайдеров, и, самое главное, собственные решения по резервированию.

За последние 4 месяца напали на 2 наших издания. В первый раз вымогали 1 BTC, второй раз ничего не просили, просто били.

DDoS атака это внештатная ситуация и закладываться под 2000% нагрузки при построении инфраструктуры нет смысла.

Почему? — Потому что атака может быть такой, что Akamai и Yandex не отобьются. И возможность держать 2000% будет каплей в море, но за время простоя оборудования компания сожжет вагон денег, и все равно ресурс ляжет.

Нужно ли готовиться, если все так плохо? — Однозначно. Но как обычно с умом.

Как это выглядит в кино?

https://medium.com/media/a62cc8a64e46f2571572fd4a54d40edc/href

Куча открытых терминалов, что-то куда-то летит, шифруется-криптуется-пингуется.

[Загорается красная лампочка].

— Сэр, на нас напали

–– Покажи им, Джон.

[Тут обязательно ломается спутник, детектится плохой парень и вот уже бравые ребята выносят дверь какого-нибудь бункера и всех ластают].

Как это выглядит в реальности?

Едешь ночью из бара на такси. Звонит мониторинг и электронный голос надиктовывает сообщение — unreachable. С телефона 504 и спиннер браузера нарезает бесконечные круги пока ты судорожно пытаешься вспомнить пароль от Zabbix… А там лавина 500-х и в 1000 раз больше запросов. И тебе надо что-то делать. Все спутники до тебя уже сломали при запуске или при сборке, так что выбор невелик, надо как-то отбиваться. Как в кино бывает только в кино.

Скрин Zabbix с одной из app машин. 500 — черный, 400 — красный, total — зеленый.

Нормальная нагрузка в это время 600–800rpm на 1 app машину.

Как идет атака?

Нас атаковали двумя самыми простыми и действенными способами:

Атака на уровне приложения (L7). GET/POST запросы в разные location (/, /login, /wp-login) с рандомизацией User-Agent и cookie.
Атака на транспортном уровне (L4). SYN flood, UDP flood.

Если от первого реально отбиться, настроив фильтры или WAF, то от второго не всегда. UDP flood крайне эффективен. Запретить UDP пакеты можно только с помощью хостера.

Флоу обычно такой — бьют в приложение (L7), если отбились, то бьют в (L4), если и тут отбились, то увеличивают размер. Дальше бьют и в L7 и в L4.

Как отбивались?

Нашли в логах сервера паттерны в запросах к приложению. Вытащили IP адреса и зафильтровали трафик на балансере.
Запретили POST там, где он не нужен.
Начали снимать дамп TCP и UDP пакетов. Обнаружили UDP flood.
Созвонились с хостером и попросили отключить UDP трафик к нашему IP. Но наш прекрасный хостер не сделал этого, а отправил в Null route. Проще говоря, заблокировал весь трафик к нашему IP, чтобы защитить свою инфраструктуру, и выключил нас.
Переключили на новый IP и подали заявки в CloudFlare Project Galileo и Google Shield.
Завели аккаунт и оплатили тариф professional в CloudFlare.
Перенесли зону в CloudFlare. Это была самая длительная операция. Зона переносилась 1,5 часа, это время сайт работал с сильными перебоями. (Вот тут можно ускорить обновление публичных DNS).
Включили магическую кнопку Under Attack. Это javascript challenge и агрессивная фильтрация запросов. До входа на сайт показывается страница, на которой должен отработать js скрипт и средиректить пользователя через 5 секунд.
Включили rate limit на количество запросов с одного IP и отключили такую же настройку на nginx.
Включили WAF (Web Application Firewall) от CloudFlare. Последующая настройка WAF для издания заняла примерно 3 дня. Это отслеживание запросов и работы редакции, отключение части фильтров и адаптация под нашу систему.
Настроили nginx, чтобы не блокировать запросы от подсети CloudFlare и прокидывать реальные IP пользователей в систему.

Как итог, издание было недоступно в сумме около 2х часов — что укладывается в SLA.

Тайминги

В среднем ботнет живет 1,5–2 недели. Хостеры заблокируют взломанные машины, с которых идет подозрительный трафик, или атакующим просто надоест.

Вымогатели атаковали неделю, били по 2–3 часа в пиковое по трафику время, а также ночью и рано утром. На другом издании включили в 12:00 и держали до 16:30 следующего дня. Причины и цели остались неизвестными.

CloudFlare включил в программу Galileo в течение двух дней. Защита на тарифе professional была сразу, но бесплатной её сделали спустя два дня.

Google Shield дал доступ к своему сервису спустя три дня. К тому времени мы уже уехали за щит CloudFlare.

Что сделать сразу после прочтения этого текста?

Проверить TTL в критичных DNS записях (сделайте хотя бы 900-1800). Это 15–30 мин. Сторонний DNS хостинг не умрет от такого TTL, self-hosted надо смотреть и помнить о резервировании сервиса. Низкий TTL позволит быстро менять IP и быстрее встать после переноса зоны.
Купите IP подсеть. Стоит копейки, если уходить за внешний щит, то нужно менять IP и выключать старый. Будете отбиваться сами, возможность менять IP даст время для маневров.
Подайте заявки в CloudFlare и GoogleShield. Это сервисы по защите медиа. Плюсы и минусы каждого распишу в отдельной статье. В кратце, у Shield быстрее сетап, но хуже защита.

Не нужно бросаться переписывать приложение или наращивать мощность машин. Приложение быстро не перепишите, а машины не помогут.

Какие сервисы можно использовать?

Бесплатные

CloudFlare Project Gallileo
Google Shield

Платные

Qrator
Kona Site Defender от Akamai

Сервисов по защите от DDoS атак намного больше, я лишь перечислил от известных компаний и добавил российского вендора.

Хостинг

Одна из самых важных составляющих инфраструктуры. Подходить к выбору надо очень аккуратно, особенно если вы медиа и на вас есть законодательные ограничения. Внимательно изучайте документацию на сайте хостинга и смотрите есть ли защита от DDoS атак. Платная или бесплатная и что именно хостер под этим понимает. Советую не полениться и позвонить. Консультанты на линии могут переключить на технарей, которые смогут более точно рассказать какой уровень атаки хостер выдержит и как сможет вам помочь.

Мы использовали Softlayer (IBM), всё что сделали эти ребята, так это отправили в null route без разговоров. Даже простая просьба отключить весь UDP трафик до наших машин была проигнорирована. Хоть Softlayer предоставляет самый большой функционал по работе с сетью, все же отсутствие защиты от DDoS атак и нежелания помогать заставили нас искать другие варианты.

Советую посмотреть в сторону Hetzner, Online.net и OVH. Это крупные компании, которые предоставляют защиту от DDoS. Российских хостеров не смотрел, но думаю там тоже есть игроки с нужными опциями. Hetzner предоставляет условно бесплатно. Качество и границы не проверял.

Что делать если вы сейчас под атакой?

Во-первых, выдохнуть. На вас уже напали и если подготовки было 0, то рвать волосы на голове или жопе уже бесполезно.

Во-вторых, понять проблему:

Вектор атаки.
Масштаб трагедии.

L7 — смотрите логи, вытаскивайте IP и фильтруйте.

L4 — tcpdump на балансировщике (машине где у вас висит основной IP) и составляйте список IP и подсетей. Можно фильтровать или отправить хостеру/сервис по защите.

Если вам забили канал мусорным трафиком, то вам поможет или внешний щит или хостер, других вариантов просто нет.

Можете написать в FB, чем смогу помогу.

Если вы смелый, ловкий, умелый…

Ботнет это куча зараженных машин (камеры, роутеры, виртуальные машины с WP, etc). У него есть 1-N управляющих центров. За центрами могут быть еще центры и уже за ними человек (группа людей).

Чтобы отбить ботнет нужно сломать одну из зараженных машин, определить откуда идут команды, сломать одну из машин управляющего центра, посмотреть нет ли там еще одного слоя и так вглубь. Контрол центры можно пролечить и остановить DDoS. TL; DR; статья от Wired о том, как отбивали один из самых больших ботнетов.

Весь предыдущий абзац это нарушение законодательства, как минимум в РФ, поэтому прежде чем так делать, даже если вы сможете, подумайте.

Многие издания подвергаются DDoS атакам. Например, в апреле нападали на Медузу. Это всегда неприятно, затрагивает работу всей команды и бизнеса в целом и вам нужно быть готовым к этому.