Stories by Shreya Patil on Medium

Private Connectivity Patterns in AWS

Shreya Patil — Thu, 23 Apr 2026 16:36:19 GMT

Understanding why keeping traffic off the public internet matters, and the architectural tradeoffs between each approach

Why Private Connectivity Matters?

When your application in AWS needs to talk to a service like S3, that request has to travel from point A to point B. The question is: what path does it take?

By default, that traffic goes over the public internet. Now, it doesn’t leave AWS’s physical infrastructure, but it does pass through public IP space. Think of it like sending an internal company memo through the public postal system. It’ll probably arrive fine, but it’s traveling on roads you don’t own, past people you don’t know, through systems you don’t control.

Private connectivity changes the route. Instead of the public path, your traffic moves entirely through AWS’s internal network, a dedicated, private backbone that never touches the public internet. Same origin, same destination, completely different path.

Why does this matter?

Smaller attack surface. Every time data crosses public infrastructure, there’s an opportunity, however small, for it to be observed, intercepted, or targeted. Private connectivity removes that opportunity entirely.

Compliance requirements. Industries like healthcare, finance, and government often have strict rules about how data moves. Many regulations require that sensitive information never traverse public networks. Private connectivity satisfies this by design.

Defense in depth. No single security measure is perfect. Private connectivity adds a network level layer of protection that works independently of your identity policies, encryption, or application level controls. If any of those layers fail, the network layer is still doing its job.

The bottom line: if your data doesn’t need to travel on public roads, it shouldn’t.

Pattern 1: VPC Endpoints

VPC Endpoints allow resources inside your VPC to communicate with AWS services without going through an internet gateway, NAT gateway, or using a public IP address. They are the easiest way to set up private connectivity.

There are two types, and they work quite differently.

Gateway Endpoints

Gateway Endpoints are available for only two services: S3 and DynamoDB. When you create one, AWS adds a route to your VPC’s route table that sends all traffic headed for these services through a private gateway instead of the public internet. Think of it like this: normally your traffic would take the highway (public internet) to reach S3. A Gateway Endpoint builds a private road directly from your VPC to S3. Your traffic never has to merge onto the highway at all. They are completely free, controlled through route tables, secured with endpoint policies, and work within the same AWS region. There is very little reason not to set them up in every VPC.

Interface Endpoints

Interface Endpoints use AWS PrivateLink technology and support over 100 AWS services including KMS, Secrets Manager, CloudWatch, SQS, SNS, and Lambda. They work by creating an Elastic Network Interface (ENI) inside your subnet with a private IP address. When your application makes a request to the AWS service, the traffic resolves to this private IP instead of the service’s public address. In simpler terms, instead of your request going out to the internet to find KMS, it finds KMS right there inside your own network through a private door. Use them whenever your security requirements demand that API calls stay completely off the public internet, especially for services handling sensitive data like KMS and Secrets Manager.

Pattern 2: AWS PrivateLink (Service to Service)

So far we have talked about using PrivateLink to connect to AWS services. But PrivateLink can also let you expose your own services privately to other VPCs or AWS accounts. No VPC peering, no public IPs, no complex routing.

This is especially valuable for multi account organizations and SaaS providers who need to give customers private access to their services.

How It Works

The provider places their service behind a Network Load Balancer (NLB) and creates an Endpoint Service pointing to it. The consumer creates an Interface Endpoint in their own VPC that connects to that Endpoint Service. Traffic flows through AWS’s private network. Neither side sees the other’s network.

Think of it like a restaurant kitchen window. The customer (consumer) places orders and receives food through the window, but never enters the kitchen (provider’s network). The kitchen staff never walks into the dining room either.

Why This Architecture Matters

One directional by design. The consumer can reach the provider’s service, but the provider cannot initiate connections back. VPC peering does not offer this.
No IP address conflicts. Both sides can use the same IP ranges since there is no routing between the VPCs.
Both sides must opt in. The provider approves which accounts can connect. The consumer creates the endpoint. Neither side is forced into it.

Simple rule: if you are sharing a specific service, use PrivateLink. If two VPCs need full open connectivity and you trust both equally, use peering.

Pattern 3: AWS Transit Gateway

As organizations grow, they end up with dozens or even hundreds of VPCs. If you try to peer each VPC to every other one, you get an unmanageable web of connections. Transit Gateway solves this by acting as a central hub that all VPCs connect to.

How Transit Gateway Works

Transit Gateway is a regional, managed router. You attach your VPCs, VPN connections, and Direct Connect gateways to it. It then routes traffic between those attachments based on route tables that you define. The real security power lies in those route tables. You can create completely separate routing domains so that certain VPCs can only reach specific destinations and nothing else.

This means a development VPC physically cannot send traffic to a production VPC because the route simply does not exist. This is network level segmentation at the organization level, and it is one of the most powerful security controls available in AWS.

Additional Security Features

Route table isolation. Each attachment gets its own route table, creating hard boundaries between network segments. If the route does not exist, the traffic cannot flow. It is that simple.
Appliance mode. When you need inline security inspection using firewalls or IDS/IPS, appliance mode ensures traffic between two VPCs always passes through the same inspection appliance, maintaining proper stateful flow.
Multicast support. For specialized use cases like financial market data distribution, Transit Gateway supports multicast domains with fine grained membership controls.

Putting It All Together: Reference Architecture

Here is how these three patterns combine in a real world, security focused AWS architecture:

Traffic Flows

App server to S3: Goes through the Gateway Endpoint. Never touches the internet. Free.
App server to KMS: Goes through an Interface Endpoint. Private. Encrypted API calls stay on AWS’s backbone.
VPC to VPC: Routed through Transit Gateway. Route tables enforce that production and development are isolated.
Shared services: Both environments reach shared services VPC through Transit Gateway, but shared services routes are the only ones they have in common.
On premises: Connected via Direct Connect or VPN to the Transit Gateway. Routed to the appropriate VPC based on route tables.

Cost Considerations

A data breach resulting from exposed traffic or lateral movement across overly connected networks is orders of magnitude more expensive than any of these charges.

Key Takeaways

Keep traffic off the public internet by default. Use Gateway Endpoints (free) for S3 and DynamoDB in every VPC.
Use Interface Endpoints (PrivateLink) for sensitive AWS service calls like KMS, Secrets Manager, and STS.
Use PrivateLink for service to service connectivity across accounts. It is one directional by design and avoids IP overlap issues.
Use Transit Gateway to replace VPC peering meshes at scale, and leverage its route tables for hard network isolation.
Layer these patterns together: Transit Gateway for macro level segmentation, PrivateLink for service level access, VPC Endpoints for AWS service access.
Private connectivity is a foundational building block for Zero Trust. It ensures that even if identity and application controls fail, the network layer provides defense in depth.

Breaking Into AWS With a Certificate: Exploiting IAM Roles Anywhere for Privilege Escalation

Shreya Patil — Tue, 21 Apr 2026 20:51:32 GMT

No passwords. No access keys. No MFA. Just a self-signed certificate and a misconfigured role — that’s all it takes to go from zero to admin.

INTRODUCTION

AWS IAM Roles Anywhere is a service that bridges the gap between traditional certificate based PKI infrastructure and AWS IAM. It allows workloads running outside of AWS (on-premises servers, CI/CD pipelines, containers in other clouds, edge devices) to obtain temporary AWS credentials by presenting X.509 certificates. Instead of distributing long lived access keys that can be leaked or stolen, organizations can leverage their existing Public Key Infrastructure to authenticate external workloads securely. On paper, it is a significant improvement over static credentials.

But every authentication mechanism is only as strong as its configuration. IAM Roles Anywhere introduces a certificate based attack surface that most security teams are not yet monitoring for. The service relies on a chain of trust: a Certificate Authority is registered as a “trust anchor,” and any certificate issued by that CA can authenticate to AWS and assume IAM roles. If an attacker who has already gained access to an AWS account (through phishing, SSRF, leaked credentials, or any other initial access vector) sets up their own trust anchor with a self-signed certificate and maps it to a role with AdministratorAccess, they have just built a persistent backdoor into the account. One that does not require passwords, does not trigger MFA prompts, and does not show up as a normal console login in CloudTrail. Even if the blue team responds by resetting every password, rotating all access keys, and enforcing MFA on every IAM user, the attacker still gets back in with just a certificate file and a private key.

This article is a hands on walkthrough where we build exactly this attack chain. We start with admin access to an AWS account (simulating an attacker who has already achieved initial compromise) and set up a complete certificate based persistence mechanism: creating IAM roles, generating a self-signed CA and client certificate with OpenSSL, registering the CA as a trust anchor, and authenticating purely through the certificate. Every step includes the exact commands we ran, the real errors we encountered along the way (missing certificate extensions, JSON parsing failures, trust validation issues), and how we fixed them. Whether you are a penetration tester looking to add this persistence technique to your toolkit or a cloud security engineer trying to understand what to defend against, this walkthrough gives you the complete picture from both sides.

WALKTHROUGH

Step 1: Create Trust Policy File

Created a JSON file that tells AWS: “Allow the IAM Roles Anywhere service to assume this role.” This is the trust relationship — it defines who can use the role.

Step 2: Create the IAM Role

Created an IAM role called RA-FootholdRole using the trust policy. This is our initial foothold — the role that certificate-based auth will assume first.

Step 3: Save the Role ARN and Attach Permissions Policy

Saved the role’s Amazon Resource Name (ARN) into a variable for use in later commands. The ARN is the unique identifier for this role across all of AWS. Gave the foothold role permissions to create new roles and attach policies (iam:CreateRole, iam:AttachRolePolicy, etc.). This is what makes privilege escalation possible later.

Step 4: Generate CA Private Key

Generated a 4096-bit RSA private key for our Certificate Authority. This is the most sensitive file — whoever has it can issue trusted certificates.

Step 5: Create Self-Signed CA Certificate

Created a self-signed CA certificate using the private key. This is what AWS will trust as the root of our certificate chain.

Step 6: Generate Client Certificate

Three sub-steps: generated a client private key, created a Certificate Signing Request (CSR), then signed it with our CA to produce the client certificate. This certificate is what gets presented to AWS during authentication.

Step 7: Verify Certificate Chain

Confirmed the client certificate was properly signed by CA.

Step 8: Create Trust Anchor

Used a Python script to properly format the CA certificate into a JSON file, then ran aws rolesanywhere create-trust-anchor to register our self-signed CA with AWS. This tells IAM Roles Anywhere: "Trust any certificate issued by this CA." The output confirms the trust anchor was created successfully with the certificate bundle attached.

Step 9: Set the Trust Anchor and Create the Profile

Saved the trust anchor ARN into a variable, then created a Roles Anywhere profile using aws rolesanywhere create-profile. The profile maps the trust anchor to RA-FootholdRole, meaning anyone presenting a certificate trusted by our CA can now assume this role. The output shows the profile was created with the role ARN linked and a default session duration of 3600 seconds (1 hour).

Step 10: Certificate Authetication

Created an extensions config file (ra-client-ext.cnf) specifying basicConstraints=CA:FALSE, keyUsage=digitalSignature, and extendedKeyUsage=clientAuth. Re-signed the client certificate using the -extfile flag to embed these extensions. This time, aws_signing_helper successfully authenticated and returned temporary AWS credentials (AccessKeyId, SecretAccessKey, SessionToken, Expiration). Certificate based authentication is now working.

Issue: AWS rejected the client certificate with the error AccessDeniedException: Untrusted certificate. Insufficient certificate.

Why it happened: The client certificate was generated as a basic v1 certificate with no X.509 extensions. AWS IAM Roles Anywhere requires v3 client certificates with specific extensions that identify the certificate as a valid client authentication certificate.

What resolved it: Created an extensions configuration file (ra-client-ext.cnf) with three required fields:

basicConstraints = CA:FALSE — tells AWS this is an end entity certificate, not a CA
keyUsage = digitalSignature — confirms the certificate can be used for signing (required for the TLS handshake)
extendedKeyUsage = clientAuth — explicitly marks this certificate for client authentication

Then re-signed the client certificate using the -extfile ra-client-ext.cnf flag. After this fix, aws_signing_helper successfully authenticated and returned temporary AWS credentials.

Step 11: Exporting Credentials and Verifying Identity

Stored the credential helper output into a variable (CREDS), then used jq to parse and export the AccessKeyId, SecretAccessKey, and SessionToken as environment variables. Running aws sts get-caller-identity confirms we are now authenticated as RA-FootholdRole purely through the client certificate. The UserId contains the certificate's serial number, proving the credentials came from certificate based authentication and not from traditional access keys.

Step 12: Creating the Admin Role with AdministratorAccess

This is the privilege escalation step. Using the temporary credentials obtained through certificate authentication (as RA-FootholdRole), we created a new trust policy file and then ran aws iam create-role to create a new role called RA-EscalatedAdminRole. Immediately after, we attached the AWS managed policy AdministratorAccess to it using aws iam attach-role-policy. The output confirms the role was created successfully with the Roles Anywhere service as the trusted principal. This role now has full admin access to the entire AWS account, and it was created using nothing but certificate based credentials.

Step 13: Upadting the Profile

Attempted to update the Roles Anywhere profile to include the new admin role, but the foothold role did not have rolesanywhere:UpdateProfile permission. AWS returned AccessDeniedException stating no identity based policy allows the rolesanywhere:UpdateProfile action.

Unset the temporary credentials to switch back to the admin user, then updated the foothold role’s policy to include rolesanywhere:UpdateProfile and rolesanywhere:GetProfile. Re-authenticated with the certificate, exported the new credentials, and retried the profile update. This time it succeeded. The output confirms both RA-EscalatedAdminRole and RA-FootholdRole are now listed in the profile's role ARNs, meaning our certificate can now assume either role.

Step 14: Autheticating as an Admin

Unset the previous credentials, then used aws_signing_helper with the same client certificate but this time targeting RA-EscalatedAdminRole instead of the foothold role. Exported the new admin credentials and ran aws sts get-caller-identity. The output confirms we are now authenticated as RA-EscalatedAdminRole with full AdministratorAccess. The escalation is complete.

Step 15: Verifying Admin Privileges

Ran aws iam list-users to confirm that the admin privileges are working. The command successfully returned the list of IAM users in the account, including ShreyaPatil-AWS. This proves we have full administrative access to the AWS account, achieved entirely through certificate based authentication without any passwords, access keys, or MFA tokens.

FINAL TAKEAWAYS

AWS IAM Roles Anywhere enables certificate based authentication to AWS, eliminating the need for traditional access keys. While this is a security improvement in theory, it introduces a new attack surface that most organizations are not yet prepared to defend.
If an attacker gains initial access to an AWS account, they can set up a self-signed CA as a trust anchor and create a persistent backdoor that survives password resets, access key rotation, and MFA enforcement.
The combination of iam:CreateRole and iam:AttachRolePolicy on Resource: "*" is effectively admin equivalent. Any role with these permissions should be treated as a critical finding during security assessments.
Certificate extensions matter. AWS enforces strict requirements on both CA certificates (CA:TRUE, keyCertSign) and client certificates (clientAuth, digitalSignature). Understanding these requirements is essential for both attackers and defenders.
The CA private key is the single most sensitive artifact in this entire chain. Whoever controls it can issue unlimited trusted certificates. In production environments, this key should never exist on disk and should be managed through an HSM or AWS Private CA.
Most incident response playbooks do not include checking for IAM Roles Anywhere resources. Organizations should add list-trust-anchors and list-profiles to their containment checklists.
Defense in depth is critical. SCPs restricting trust anchor creation, permission boundaries preventing admin policy attachment, CloudTrail monitoring for Roles Anywhere API calls, and CRL enforcement on trust anchors should all be implemented together. No single control is sufficient on its own.

Building a Honeytoken Detection System on AWS

Shreya Patil — Fri, 17 Apr 2026 22:02:05 GMT

A practical walkthrough for turning deception into one of your highest-signal security controls

INTRODUCTION

In the first half of 2025, infostealers harvested 1.8 billion credentials from 5.8 million compromised endpoints — an 800% surge over the previous period. CrowdStrike’s 2026 Global Threat Report clocked the average attacker breakout time at 29 minutes, with the fastest at 27 seconds. And 82% of all detections were malware-free — attackers aren’t dropping payloads anymore, they’re logging in with stolen credentials and walking through the front door. Your SIEM generates thousands of alerts a day trying to spot this. Most are false positives. The one real alert? It gets buried.

A honeytoken flips that model entirely. It’s a fake asset — an AWS key with zero permissions, a decoy document, a tripwire URL — planted where only an attacker or an insider would find it. It has no legitimate users, no expected traffic, no noise. Every interaction with it is, by definition, suspicious. In this hands-on walkthrough, I’ll take you step by step through building three independent honeytoken detection layers on AWS — covering credential theft, file snooping, and attacker reconnaissance.

Why Honeytokens: Deception as a Control Class

Most security controls try to prevent attackers from reaching something valuable. Firewalls, IAM policies, WAFs — they all work by drawing a line and enforcing it. Honeytokens work on a completely different principle: they assume an attacker will eventually get past your preventive controls, and they plant things that look valuable but have no legitimate business use. The moment anyone touches them, you know something is wrong.

This is called deception, and it belongs in a different category from detection tools like SIEM, EDR, or anomaly detection. Those tools try to spot the signal inside a sea of legitimate noise. A honeytoken has no legitimate noise at all. Every single interaction with it is, by definition, either an attacker or a mistake. That’s why honeytokens have one of the highest signal-to-noise ratios of any detection control in existence.

The economics are equally lopsided. A well-placed canary AWS key costs you effectively nothing — no license, no agent, no log volume. But for an attacker who has compromised a developer laptop, an S3 bucket, or a Git repo, that key is indistinguishable from a real one. They don’t know which keys are safe to use, so they either use all of them (triggering your alert) or none of them (significantly slowing them down). Either outcome is a win for the defender.

What Honeytokens Actually Catch — and What They Don’t

Before building anything, it’s worth being honest about what this control covers and where its limits are.

Honeytokens are excellent at catching four things. First, credential theft after initial compromise — an attacker lands on a dev laptop, runs cat ~/.aws/credentials, and tries every key they find. Second, repository scraping — automated bots that clone every accessible Git repo and test any API key they discover. Third, insider threats — someone browsing file shares they have no reason to visit, opening documents labeled "passwords" or "compensation." And fourth, post-exploitation reconnaissance — attackers enumerating S3 buckets, poking around /home directories, or dumping Secrets Manager looking for something useful.

Honeytokens are not designed to catch exploit attempts against a specific vulnerability (that’s what IDS and EDR are for), data exfiltration of your real data (that’s what DLP is for), or brute-force and credential-stuffing attacks against your front door.

In other words, honeytokens catch the moment between initial compromise and final impact — the lateral movement phase where attackers are looking around, grabbing things, and trying credentials. This is also the longest phase of most breaches, the phase where traditional detection tools struggle the most, and the phase where a single high-confidence alert can change the outcome of an entire incident. That’s why deception remains one of the most underrated controls in security.

WALKTHROUGH

STEP 1: Creating the Canary IAM User

The IAM user deploy-bot-prod — the canary user. It has one active access key, console access disabled, no console password, and the key shows "Never used. Created today." This is exactly how a honeytoken user should look — programmatic access only, zero permissions attached, and indistinguishable from a real service account to anyone who discovers it.

STEP 2: Setting Up CloudTrail to Log Every API Call

The CloudTrail trail canary-account-trail is active and logging with multi-region coverage enabled. Logs are being stored in the S3 bucket. Status shows "Logging" — meaning every API call in this account, including any use of the canary key, is being recorded and forwarded to EventBridge.

STEP 3A: Wiring the Alert Pipeline → SNS Topic, Subscription, and Confirmation

The SNS topic canary-alerts — a Standard type topic in us-east-1. This is the notification hub that sits between EventBridge and the email inbox. When the EventBridge rule matches a canary key event, it publishes to this topic, which fans out the alert to all subscribers.

The SNS subscription tab showing one email endpoint with status “Confirmed.” This is critical — if the status showed “PendingConfirmation,” alerts would silently fail. The confirmed status means the email recipient clicked the AWS confirmation link and the alert pipeline is fully connected.

The AWS SNS subscription confirmation page. After creating the email subscription, AWS sends a confirmation email that must be clicked to activate delivery. This “Subscription confirmed!” page confirms the link was clicked and the subscription is now live and ready to receive canary alerts.

STEP 3B: Creating the EventBridge Detection Rule

The EventBridge rule canary-key-triggered is live and enabled on the default event bus. It matches any API call made by the canary IAM user and routes the event to the SNS topic. This is the core detection engine — the moment deploy-bot-prod is used anywhere in the world, this rule catches it and fires the alert.

STEP 4A: Building the Canary Document → Generate, Rename, and Plant in S3

A Microsoft Word Canarytoken has been generated on canarytokens.org and is now active. Clicking “Download your MS Word file” gives a .docx with a hidden beacon baked in — when anyone opens this document, an alert email fires automatically with their IP and timestamp.

The downloaded canary document renamed to AWS_Infrastructure_Passwords — a filename deliberately chosen to be irresistible to an attacker browsing a file system. The name alone makes it a high-priority target during post-compromise reconnaissance.

The canary document successfully uploaded to the S3 bucket internal-docs-archive-sp. The upload summary confirms 1 file (15.2 KB) with status Succeeded. An attacker who runs aws s3 ls on this bucket will see AWS_Infrastructure_Passwords.docx and almost certainly download and open it.

STEP 4B: Testing the Canary Document → Open, Alert, and IP Verification

The canary document AWS_Infrastructure_Passwords opened in Microsoft Word. The document appears blank — by design. The hidden beacon embedded by Canarytokens has already silently phoned home in the background the moment the file was opened.

The Canarytoken alert email arrived within seconds. The MS Word token was triggered from source IP 38.13.35.207. In a real scenario, this IP would immediately tell the security team which machine or network the attacker is operating from.

IP verification via WhatIsMyIPAddress.com confirms the source IP 38.13.35.207 matches the machine that opened the document. This cross-check validates the detection — the canary document correctly identified exactly who opened it.

STEP 5A: Building the Canary URL → Generate, Plant in a Decoy File, and Upload to S3

A Web bug Canarytoken generated for the canary URL tripwire. The unique URL is now active — any browser visit, curl request, or automated scraper that hits this link will immediately trigger an alert email with the visitor's IP address and user agent.

The internal-tools.txt decoy file crafted to look like a legitimate internal access reference. It lists fake production and staging URLs alongside a "Legacy Admin Panel" entry containing the canary URL. An attacker reading this file during reconnaissance would naturally click that link.

The decoy file internal-tools.txt successfully uploaded to the S3 bucket internal-docs-archive-sp. Upload summary confirms 1 file (392 bytes) with status Succeeded. This file is now sitting in S3 waiting for an attacker to discover it via aws s3 ls or Console browsing.

STEP 5B: Testing the Canary URL → Visit the Link and Verify the Alert

The canary URL visited in a browser — simulating what an attacker would do after finding the “Legacy Admin Panel” link in the internal-tools.txt file. Canarytokens displays a generic landing page, but behind the scenes the visit has already been logged and the alert is on its way.

The canary URL alert has fired. The Web bug token Admin-Panel-Canary-URL-001 was triggered from source IP 38.13.35.207. The reminder tag immediately identifies which placement was accessed — in this case, the fake admin panel link planted in the S3 bucket.

STEP 6: Testing the AWS Canary Key → Trigger, Alert, and CloudTrail Verification

Image 1: The canary AWS key configured via aws configure and triggered with aws sts get-caller-identity. The response confirms the key belongs to IAM user deploy-bot-prod. This simulates exactly what an attacker would do after discovering stolen credentials — and CloudTrail has now silently logged the call.

The full detection chain validated. The SNS email shows “CANARY KEY TRIGGERED” with every detail a responder needs — user (deploy-bot-prod), API call (GetCallerIdentity), source IP (38.13.35.207), user agent (aws-cli), and exact timestamp. The alert arrived within minutes of the API call.

CloudTrail Event history filtered by user name deploy-bot-prod confirms the GetCallerIdentity event logged at April 17, 2026 from sts.amazonaws.com. This is the audit trail — forensic proof that someone used the canary key, exactly when, and from where.

Operationalize — Don’t Just Build It, Maintain It

Placement strategy. Think like an attacker. After compromising a dev laptop, they’ll grep for credentials. After gaining AWS access, they’ll enumerate S3 and Secrets Manager. Place your honeytokens in exactly those paths — this is called adversary-informed placement. The closer your canary sits to the attacker’s natural workflow, the faster it trips.

Rotation. Canary keys don’t need the same rotation discipline as real keys since they have no permissions. But if one fires for real, rotate it immediately — generate a fresh IAM user, update the placement, and archive the old user’s CloudTrail logs as forensic evidence.

Inventory. Keep a simple spreadsheet tracking every honeytoken: type, placement location, associated IAM user or token ID, and creation date. Without this, you’ll forget where half your tokens are within six months.

Testing. Run an end-to-end test of every honeytoken quarterly. Webhooks expire, SNS subscriptions break silently, Lambdas get deprecated. An untested honeytoken is worse than no honeytoken — it gives you false confidence.

Common Mistakes to Avoid

Using obviously fake names like fake-key or honeypot-user — sophisticated attackers enumerate IAM and skip anything suspicious.
Reusing the same canary key across multiple locations — when it fires, you can’t tell which location was breached.
Never testing the alerting pipeline — webhooks break silently, and untested alerts create false confidence.
Alerting without a runbook — when the alert fires at 3 AM, someone needs to know what to do next.
Attaching real IAM permissions to the canary user — if the key is ever used maliciously, the attacker gets real access.
Not isolating canary keys in production — for enterprise deployments, a dedicated canary account eliminates CloudTrail noise entirely. For learning and small setups, a single account with EventBridge filtering works fine.

Final Takeaway

Honeytokens are one of the few security controls where the theory is simple, the implementation is cheap, and the payoff is enormous. Three tripwires, zero dollars, zero false positives.

If you build nothing else from this walkthrough, build the AWS canary key flow. It takes under an hour, costs nothing, and has caught real breaches at real companies more times than public incident reports will ever reflect. The best detection rule is the one that can only fire when something is actually wrong.

Deception is not a replacement for prevention, detection, or response. It’s a distinct control class that fills one specific gap — the long, quiet phase between “attacker is inside” and “attacker has caused real damage.” That gap is where most of your mean-time-to-detect lives. Honeytokens are how you shrink it.

The End of “S3 Is Just Object Storage”

Shreya Patil — Wed, 15 Apr 2026 02:17:53 GMT

A walkthrough, a warning, and a working fix: all in one AWS launch

INTRODUCTION

You can now mount an S3 bucket as a Linux file system. That sentence would have been science fiction in 2015, a FUSE hack in 2020, and a production-ready managed service as of April 7, 2026.

Amazon S3 Files delivers what a decade of developers have been asking for: cd /mnt/s3files, edit a file, and watch it land in your bucket as a versioned object seconds later. Sub-millisecond latency on hot data. NFS v4.1+ semantics. Zero application code changes.

It also introduces an access path your S3 security tooling has never had to think about. In this article, I’ll walk you through setting up S3 Files from scratch in your own AWS account — and then I’ll show you, with your own hands, why the bucket policies you’ve been writing for a decade aren’t enough anymore.

THE WALKTHROUGH

STEP 1: (A)Creating the Encryption Kill Switch

KMS customer-managed key created. This s3files-cmk key will encrypt every object in the S3 bucket we create next. Using a customer-managed key (instead of the default AWS-managed one) gives us two things the walkthrough depends on: a kill switch we fully control, and an audit trail of every decryption in CloudTrail.

STEP 1: (B) A Bucket With Guardrails From Day One

S3 bucket created. The my-s3files-secure-bucket-sp bucket is live in us-east-1, configured with versioning, SSE-KMS using the s3files-cmk key from Figure 1, and Block Public Access enabled. This is the foundation S3 Files will mount as a file system — every file we write through the mount will land here as a versioned, encrypted object.

STEP 2: The Backend Role → How S3 Files Reaches Your Bucket

File system access role created. S3FilesAccessRole is the identity S3 Files uses on the backend to read and write the bucket. Its inline policy is scoped only to this bucket and KMS key, with conditions that prevent confused-deputy attacks. Not to be confused with the EC2 role — we'll create that next.

STEP 3: (A) The Instance Role: Separated by Design

Compute resource role created. S3FilesComputeRole. (the trailing dot is a typo we kept) is the role the EC2 instance assumes to authenticate its mount request. Keeping it separate from S3FilesAccessRole is the core least-privilege move — if the instance is compromised, the attacker only gets "mount this file system" permissions, not the broader S3 and KMS access on the backend role.

STEP 3: (B) A Test Instance with attached Instance Role

EC2 instance launched and role attached. The s3files-test-instance is running in us-east-1a (remember this AZ — the mount target must match it exactly in Step 5), with the S3FilesComputeRole. attached and IMDSv2 set to Required. IMDSv2 enforcement prevents a class of SSRF attacks against the instance metadata service, so even if a web app on this instance is compromised, the attacker cannot trivially steal the role's credentials to use elsewhere.

SETP 4: Locking Down NFS to a Single Security Group

Mount target security group created. The secgroup-mounttarget group has a single inbound rule allowing NFS (TCP 2049) from the EC2 instance's security group — not from a CIDR block. Sourcing by security group ID means only resources explicitly tagged with that group can connect, so a future instance spun up in the same subnet cannot accidentally reach the mount target. This is the network-layer complement to the IAM controls: even if credentials leak, the firewall still says no.

STEP 5: (A) The File System Is Live

S3 file system created. The file system is now Available and linked to my-s3files-secure-bucket-sp. This is the managed layer that bridges S3 and NFS — it contains the high-performance cache, handles protocol negotiation, and keeps the bucket synced in both directions. The file system exists as a logical resource, but it is not yet reachable from any EC2 instance; that requires a mount target in the same VPC, which we add in later part.

STEP 5: (B) Five Mount Targets You Didn’t Ask For

Mount targets auto-provisioned across AZs. AWS automatically created one mount target per Availability Zone in us-east-1 — five in total. Our EC2 instance lives in use1-az1, so it will connect through the mount target in that AZ. The other four are idle but still need to be audited: each is a live network endpoint with its own security group, and an orphaned mount target in an unused AZ is both a cost leak and an undocumented attack surface.

STEP 5: (C) Replacing the Default Security Group

Replacing the default security group on the mount target. We swapped AWS’s default VPC security group for our purpose-built secgroup-mounttarget on the mount target in use1-az1. The default group is too permissive — it allows all traffic between resources sharing it. Now only EC2 instances with our explicit security group can reach the mount on port 2049.

STEP 6: (A) The Moment S3 Becomes a Directory

Files mounted and verified. From an EC2 Instance Connect session, we mounted the file system at /mnt/s3files and confirmed it with df -h. The 8-exabyte "capacity" is expected — S3 has no real size limit, so the driver reports the max. A quick echo and ls confirm test.txt is now a real file on the mount, which S3 Files will sync to the bucket within seconds. The stunnel warning is harmless.

STEP 6: (B) A Filesystem Write Becomes an S3 Object

Round-trip confirmed. The test.txt we wrote through the mount now appears in the bucket as a standard S3 object — 15 bytes, timestamped seconds after the write. A filesystem write became an encrypted, versioned S3 object automatically, no translation code needed. Note the new File systems tab in the bucket navigation — the S3 console's new surface that ties a bucket to its linked file systems.

STEP 7: (A) Planting a File and Knowing Our Identity

Setting up the shadow access demo — the “secret” file and the identity we will deny. We created /mnt/s3files/secret.txt through the mount (which, per the previous figures, automatically synced to S3) and then ran aws sts get-caller-identity to confirm the exact IAM role our EC2 instance is using: S3FilesComputeRole. (note the trailing dot — an accidental typo that became part of the role name). This is the identity we are about to explicitly deny in the bucket policy.

STEP 7: (B) A Bucket Policy That Looks Bulletproof

Bucket policy denying the compute role direct read access. This policy does exactly what a security team would write after a compliance finding: an explicit Deny on s3:GetObject for S3FilesComputeRole. against every object in my-s3files-secure-bucket-sp. In the traditional S3 security model, this is bulletproof — the role cannot read any object via the S3 API. We are about to test whether that guarantee still holds when the same role accesses the same data through a mounted file system

STEP 8 : The Test

Path 1 : The S3 API Says No

Path 1 — Direct S3 API read is blocked as expected. Running aws s3 cp s3://my-s3files-secure-bucket-sp/secret.txt from the EC2 instance fails immediately with 403 Forbidden. The bucket policy from Figure 13 is working exactly as a security team would expect: the S3FilesComputeRole. identity cannot read the object through the S3 API. If this were the only access path to worry about, the file would be safely protected. The next figure shows what happens when we try the same read through the mount instead.

Path 2: The Mount Says Yes

Path 2 — Same file, same identity, read successfully via the mount (Step 8). A plain cp /mnt/s3files/secret.txt succeeds, and cat reveals the full contents. This is shadow access in one screenshot. The bucket policy is still in place and the 403 from Figure 14 is still real — but the file system read goes through the S3 Files service role on the backend, so the deny never gets a chance to evaluate. The policy works for one access path; we just found the other.

STEP 9: (A) Least Privilege Working Against Us

The compute role cannot list versions (Step 9). Running list-object-versions from the EC2 instance fails with AccessDenied — the compute role only has S3 Files client permissions, not general S3 API access. This is least privilege working as intended: a mounted instance shouldn't need broad S3 permissions.

STEP 9: (B) CloudShell, The Incident Responder’s Terminal

Listing versions from CloudShell (Step 9). CloudShell runs with the logged-in console user’s admin credentials, so the same command succeeds here. One version of secret.txt exists so far — our clean 59-byte baseline. This is the version we'll want to recover to after the simulated attack.

STEP 10: The Attack called Shadow Access

Simulated ransomware overwrite. A single echo through the mount replaces the clean file contents with ENCRYPTED_BY_RANSOMWARE_PAY_1_BTC. From the application's point of view, the file is now destroyed. This is exactly what a compromised container with write access to the mount could do to thousands of files in seconds.

Two versions now exist in S3. Re-listing from CloudShell shows the attack is visible in the bucket: a new 34-byte malicious version sitting on top of the original 59-byte clean version. Versioning captured the damage and kept the original intact — our recovery point.

Deleting only the malicious version. Running delete-object with the specific malicious VersionId removes just that one version from S3. The older clean version automatically becomes the current one. This is the surgical recovery move — no backup restore, no downtime, just one API call.

Clean content restored on the mount. A simple cat on the EC2 instance shows the original "this is the file the bucket policy was supposed to protect" is back. The mount picked up the recovery within seconds. End-to-end ransomware recovery: roughly 30 seconds, one command, zero data loss — provided versioning was enabled from day one.

STEP 11: Fixing the Shadow Access Issue

File system policy applied. A deny statement on the file system policy targeting S3FilesComputeRole. with the S3 Files client actions (s3files:ClientMount, ClientRead, ClientWrite, ClientRootAccess). This is the correct control layer for mount access — unlike the bucket policy, this one is evaluated on every file system operation

Shadow access closed. The exact same cp command that succeeded in Figure 15 now fails with Permission denied. Same user, same file, same mount — but this time the deny takes effect because we're enforcing it at the file system layer, not the bucket layer. This is the fix: bucket policies protect S3 API access, file system policies protect mount access, and both need to be in your audit checklist going forward.

STEP 12: Observability → What You’d Add in Production

The walkthrough so far is all prevention. Production needs a detective layer on top — three data sources feeding your SIEM:

CloudTrail data events log every object-level read and write on the bucket, including the sync traffic S3 Files generates on your behalf. Without them, you’ll never know who read which file.

CloudWatch metrics (AWS/S3Files) expose mount state, connections, and read/write volume. A sudden spike in DataWriteIOBytes is the signature of ransomware encrypting files — alarm on it, route it to on-call, calibrate the threshold to your workload.

VPC Flow Logs on the mount target ENI capture every network connection at Layer 3, so you can correlate a suspicious IAM action with the connection that followed it during an investigation.

Prevention tells you what should happen; detection tells you what did. Together they’re the difference between a control and a posture.

The Cloud Security Promise Nobody Reads Carefully

Shreya Patil — Sat, 11 Apr 2026 16:06:05 GMT

Why the most famous diagram in cloud security stopped describing reality, and what that means for you

A picture you’ve definitely seen

If you’ve spent even a little bit of time around cloud computing, you’ve seen the diagram. It usually looks like a tall rectangle split into two parts by a wavy line running across the middle. On one side it says “the cloud provider takes care of this.” On the other side it says “you take care of this.” Neat. Tidy. Reassuring. This picture has a name. It’s called the shared responsibility model, and it’s the single most quoted idea in cloud security. Amazon invented it. Microsoft and Google made their own versions. Every cloud certification course teaches it. Every compliance auditor asks about it. Every time a company gets hacked in the cloud, somebody eventually points at this picture and says, “see, that part was on you.”

I want to be honest with you about something: this picture is not exactly wrong, but it’s not exactly right either. It was drawn for a world that doesn’t really exist anymore. And the gap between the picture and reality is where most modern cloud security disasters now happen. Let me explain what the picture was supposed to mean, why it stopped working, and what’s actually going on now.

What the picture was trying to tell you

Back in the early days of cloud computing, around 2010 or so, customers were genuinely confused about who was responsible for what. They were used to running their own servers in their own buildings, where everything from the air conditioning to the antivirus software was clearly their job. When they moved to Amazon’s cloud, they assumed Amazon would just handle everything. After all, they were paying Amazon, and Amazon owned the actual machines. That assumption caused some embarrassing breaches. People left databases open to the internet because they thought Amazon was somehow magically protecting them. People uploaded sensitive customer information into storage buckets they thought were private but had actually been left public. When the inevitable happened and someone stole the data, customers blamed Amazon, and Amazon politely pointed out that they had never agreed to do any of those things in the first place. So the cloud providers drew the picture. They said something like this: we will take care of the parts you can’t see and don’t control. The buildings, the servers, the networking equipment, the bottom layer of software that makes the whole thing run. You take care of the parts you actually use. Your data, your accounts, your settings, your applications. It sounded perfectly reasonable. And for the kind of cloud computing that existed in 2012, it actually was. If you rented some virtual servers from Amazon and ran your own software on them, you knew exactly where the line was. Amazon’s job ended at the operating system. Your job started there. Easy. The problem is that almost nobody uses the cloud that way anymore.

How the world quietly changed

Three things happened over the last decade that turned the simple two-sided picture into something much messier.

First, software stopped being something you install and started being something you rent. A modern company doesn’t run its own customer database. It rents Salesforce. It doesn’t run its own email server. It rents Microsoft 365 or Google Workspace. It doesn’t run its own help desk software. It rents Zendesk or Intercom. The technical name for this is SaaS, which stands for “software as a service,” and the average company today uses somewhere between two hundred and three hundred of these rented applications. Each one of those applications has its own version of the shared responsibility picture. Each one tells you that they handle some things and you handle the rest. Add it all up and you’re not looking at one picture anymore. You’re looking at a mosaic of two hundred slightly different pictures, each with the line drawn in a slightly different place, and most of them written in language that’s hard to understand without a lawyer.

Second, all those rented applications started talking to each other. This is the part that really broke the model. Your sales team’s CRM is connected to your marketing tool, which is connected to your email platform, which is connected to your scheduling app, which is connected to your document signing service, which is connected to your accounting system. These connections are usually set up by ordinary employees who click a button that says “connect to” and then click “approve” on a screen full of permission requests that nobody reads.

Each connection is a little tunnel. Data flows through the tunnel. Permissions flow through the tunnel. And critically, trust flows through the tunnel. When you connect your marketing app to Salesforce, you’re not just letting that one app see your sales data. You’re declaring that whoever runs that app, whoever has access to its servers, whoever might one day buy that company or get hacked themselves, is now somebody who can reach into your Salesforce account.

Third, AI got added to everything. Almost every modern SaaS tool now has AI features built in. Those features need to read your data so they can be helpful. They sometimes pass your data through other systems you didn’t know existed. The original shared responsibility picture has nothing to say about any of this. So the picture from 2012 is still hanging on the wall, but the building has been remodeled around it.

The breach that proved the point

Let me tell you about something that happened in 2025, because it shows exactly how the old way of thinking falls apart.

There’s a chatbot product called Drift, made by a company called Salesloft. Drift is the kind of thing that pops up on a company’s website to chat with potential customers. Many sales teams use it, and they connect it to their Salesforce account so the leads it captures automatically end up in the right place.

Hackers broke into Salesloft’s systems. They didn’t break into Salesforce. Salesforce was fine. But because thousands of customers had connected Drift to their Salesforce accounts, the hackers got their hands on something called OAuth tokens, which are basically little passes that let one application talk to another on behalf of a customer. With those tokens in hand, the attackers could reach into hundreds of Salesforce accounts that belonged to Drift’s customers. They quietly pulled out enormous amounts of data.

Researchers at a security company called Obsidian found that this attack was about ten times more damaging than previous attacks where hackers had tried to break into Salesforce directly. Read that again. Going through a third-party connection turned out to be ten times more effective than going after the main target. More than 700 organizations were eventually affected, including some of the biggest names in tech and cybersecurity: Cloudflare, Palo Alto Networks, Zscaler, Tenable, and Proofpoint among them.

Now ask the shared responsibility question: whose fault was this?

Was it Salesforce’s fault? They didn’t get hacked. Their systems worked exactly as designed.
Was it Drift’s fault? Sort of, but their terms of service almost certainly say they’re not responsible for what happens downstream.
Was it the customer’s fault? They had legitimately approved the connection. They hadn’t misconfigured anything obvious.

The shared responsibility picture has no answer here. It was drawn for a conversation between two parties — a cloud provider and a customer. But the world we actually live in has dozens of parties involved in any given data flow, and the picture has no language for that.

The other ways the picture is too small

The Salesloft incident is the most dramatic example, but it’s not the only place the old picture comes up short. Here are some other things it doesn’t really address.

The settings problem. Every SaaS application has hundreds, sometimes thousands, of security-related settings you can adjust. Permissions, sharing rules, who can do what, what gets logged, what gets encrypted, what gets exposed publicly. The shared responsibility picture cheerfully says “configuration is your job.” Fine. Now multiply that by the two hundred applications your company uses, and ask yourself if any human being can possibly stay on top of it. Studies suggest that the typical SaaS user has more permissions than they actually need about 85% of the time. Not because anyone is being careless on purpose, but because the surface area is just too big for the old style of management.

The “where is my data even” problem. The picture says that data protection is always your responsibility. Always. Under every type of cloud service, no matter what. That’s a lovely principle, but in modern environments, you often genuinely don’t know where your data is. It might be in the SaaS app you signed up for. It might also be in three other apps that the first app shares with. It might be in an AI training pipeline you didn’t know about. It might be sitting in a copy somebody made for a backup six months ago. You can’t protect data when you can’t see it, but the picture treats this as a solved problem.

The Snowflake situation. In 2024, attackers got into 165 customer accounts on a data platform called Snowflake. Snowflake itself wasn’t hacked. The customers had multi-factor authentication turned off. Under the strict letter of the shared responsibility picture, this was 100% the customers’ fault. They were the ones who didn’t turn on the safety feature. But here’s the uncomfortable question: in 2024, should any cloud platform have allowed customers to leave multi factor authentication off? The picture says nothing about whether the defaults that providers ship are reasonable. It just says configuration is your problem.

The compliance asymmetry. When a regulator decides somebody broke the rules, the fine almost always lands on the customer, not the cloud provider. Google Cloud’s own HIPAA documentation states this pretty plainly: there is no certification that makes your usage automatically compliant. Even if Google handles its part, you’re still on the hook for yours, and “yours” is defined in ways that are easy to get wrong. Audit failures tied to cloud configuration have been a persistent and well-documented problem across healthcare, finance, and other regulated industries — in fact, the most recent public HIPAA compliance reviews found that roughly half of audited organizations were issued corrective action plans. The picture doesn’t really capture that the word “shared” hides a pretty unequal split, where providers do some of the work and customers absorb most of the legal risk.

The shadow AI problem. When an employee pastes confidential code into ChatGPT to get help debugging it, that’s not a breach in any traditional sense. Nobody hacked anything. There was no misconfiguration. The employee was just trying to be productive. But the data left the building, and the company has no record of where it went. This actually happened to Samsung, which led them to ban internal use of ChatGPT entirely. JPMorgan and Goldman Sachs followed soon after. The picture from 2012 has no way to even describe this scenario, let alone tell you whose responsibility it is.

What the picture leaves out entirely

If you stare at the shared responsibility diagram closely, you’ll start to notice that some of the most important questions in modern cloud security are simply not addressed by it at all. Things like:

When two SaaS applications share data with each other, and one of them gets compromised, who is responsible for the damage?
When a cloud platform builds AI features that train on your data, who owns the security of the model?
When an employee connects an unapproved app to your corporate accounts using their personal login, where does that fit?
When your data gets quietly copied to a different country for backup purposes, who’s responsible if that violates a local law?
When an AI agent inside your SaaS tool takes an action on your behalf, and the action turns out to be wrong, who has to fix it?

These aren’t strange edge cases. They’re some of the most common things going wrong in cloud security right now. And for every one of them, the old picture mostly just shrugs and says: well, you should have configured it better. That’s not a security framework. That’s a way of avoiding blame

So is the model useless?

No, not exactly. I want to be fair to it. The basic idea behind shared responsibility — that cloud security is a partnership and customers can’t just outsource everything — is correct and important. The principle that data protection always sits with the customer is correct. The recognition that providers and customers each have things only they can control is correct. The problem isn’t that the picture is wrong. It’s that the picture is too small for the world it’s supposed to describe. It was a useful contract between two parties, and it got promoted into being a complete security strategy, which it was never meant to be. The folks working on this professionally are starting to talk about what comes next. Nobody has a clean replacement yet, but the outlines are becoming visible. Here’s the gist of where things are headed.

Treat third-party connections as their own thing that needs governance. The connections between your SaaS tools are not afterthoughts. They are the most likely place an attacker will get in. They need to be inventoried, reviewed regularly, and shut down when they’re not needed. There’s a whole new category of tools that specialize in this, sometimes called SSPM (which stands for SaaS Security Posture Management) or ITDR (Identity Threat Detection and Response).

Stop thinking of responsibility as a static line and start thinking of it as a continuous practice. The old picture is a snapshot. It tells you where the line is at the moment you sign the contract. The new way of thinking is that the line moves all the time, as you add new integrations, new features, new AI tools, new employees, new permissions. You need to keep checking, not just check once

Demand more from providers about defaults. If a cloud platform lets customers turn off basic safety features like multi-factor authentication, that’s a design choice the provider is making. The Snowflake breach has started a quiet conversation in the industry about whether providers should bear more responsibility for what their products allow customers to do, not just what the customers actually do.

Recognize that the contract and the threat model are different things. The shared responsibility picture is great at telling you what the cloud provider has agreed to do for you. It’s terrible at telling you where attackers will actually try to get in. You need both, and you should not pretend the first one is the second one.

Get used to multi-party thinking. When something goes wrong in a modern cloud environment, there are usually three to ten parties involved, not two. Your security thinking needs to match that reality. New frameworks are starting to emerge that treat data flows as first-class things to be governed, regardless of how many vendors are touching them.

The honest takeaway

The shared responsibility model was never really a security framework in disguise. It was a contract. It was the cloud provider telling you, in writing, here’s what we will do, and here’s what you’re on your own for. That’s a perfectly reasonable thing for a contract to say, and it’s why the model has lasted as long as it has.

Somewhere along the way, though, the industry started treating the picture as if it were also a complete security strategy. As if drawing a line on a slide and putting things on either side of it was the same as actually protecting your environment. It isn’t. The line tells you who pays the legal bill when something goes wrong. It doesn’t tell you how to keep things from going wrong in the first place.

The cloud has changed a lot since 2012. The picture mostly hasn’t. And in the gap between the old picture and the new reality, attackers have found a remarkably comfortable place to live.

If you’re responsible for security at your company, the most useful thing you can do is to stop treating the picture as if it answers your questions and start treating it as the starting point for harder ones. Questions like: what happens at the connection points between our SaaS tools? Who is allowed to set up new integrations? Are we monitoring what data flows through them? Do we even have a list?

If you don’t have answers to those questions, you’re not failing the shared responsibility model. You’re discovering, like a lot of other people are right now, that the model was never designed to answer them in the first place.

Who’s Actually Logging Into Your Systems?(Hint: It’s Not Your Employees)

Shreya Patil — Thu, 09 Apr 2026 23:18:08 GMT

For every human who logs in at your company, there are 50 to 90 non-human accounts doing the same thing. Almost nobody is watching them.

Here’s a question worth sitting with for a second.

Inside a typical company today, for every one human employee with login credentials, how many “non-human” things also have logins?

If your gut says one or two, you’re thinking about this the way people did a decade ago. CyberArk’s 2025 Identity Security Landscape report puts the real number at 82 machine identities for every human one, and some researchers think we’ve already crossed 100 to 1. That’s eighty-plus passwords, keys, and little digital workers → for every single person who badges into the office.

And almost nobody is watching them.

This is one of those problems that creeps up on a company in slow motion. While security teams spent the last decade getting genuinely good at protecting employee accounts → two-factor authentication, phishing training, quarterly access reviews → a parallel universe of computer-to-computer logins quietly grew up in the background. Nobody rings a bell when one is created. Nobody reminds you to delete it. And now that AI agents are joining the party, this background problem is about to become the loudest thing in the room.

Let me walk you through what’s actually going on, and why the playbook we’ve leaned on for years doesn’t work anymore.

What’s a “non-human login,” exactly?

When I say “non-human identity,” I mean anything that can log into a system that isn’t a person. It sounds strange, but these have been around forever:

A service account: a fake user created so one piece of software can talk to another. Your payroll system needs to read from HR, so somebody creates payroll-to-hr-sync and grants it permissions.

An API key: a long string of random characters that proves a program is allowed to use a service. When your marketing tool emails through Mailchimp, it’s waving an API key around.

A certificate: a cryptographic file that machines use to vouch for each other.

A cloud workload: a small piece of code running in AWS or Azure that needs to grab data or call another service.

None of this is new. What’s new is how many of them exist, how long they stick around, and who (if anyone) is keeping track.

How the count got so absurd? → Three things happened at once, and they all pushed in the same direction.

The cloud took over → When applications lived on servers in a room down the hall, you had a small number of them, and each had a small number of connections. Move everything to the cloud and suddenly every app needs to talk to every other app through APIs. Every API call needs a credential. Multiply by a few thousand applications and the math gets ugly fast.

Automation exploded → Companies adopted tools that automatically deploy code, scale servers, rotate backups, ship logs, run tests. Each of those automations needs its own login.

SaaS ate the world → The average company now uses somewhere between 200 and 300 SaaS tools, and most of them connect to other tools. A sales rep clicks “connect to Salesforce” inside their marketing dashboard, and in the background, a fresh set of credentials links the two platforms forever.

Stack all of that together and a company with 500 employees might easily have 40,000 or 50,000 non-human accounts floating around. Most of them, nobody remembers creating.

Then AI agents showed up, and things got weird

Here’s where the story takes an interesting turn.

Until recently, even though there were a lot of non-human logins, they were at least predictable. A service account might be risky if someone stole its password, but it wasn’t going to wake up one morning and decide to do something unexpected. It ran the same script every day. It was boring. Boring is good for security

AI agents are not boring.

An AI agent is software that can actually think about what to do next. You give it a goal → “figure out which customers haven’t paid their invoices and send them a polite reminder” → and it decides on its own how to get there. It reads data. It calls tools. It might spin up smaller helper agents to handle pieces of the work. Then it reports back.

Think about what that does to access control. A regular service account is like a vending machine: you put in a coin, you get a specific snack, every single time. An AI agent is more like hiring a temp worker and handing them your office keys → except this temp works at the speed of a computer, never takes breaks, and can be talked into doing the wrong thing by a cleverly worded email.

People who think about this for a living have started calling AI agents composite identities, because they combine two scary things at once: the blast radius of a service account (which usually has very broad access) and the fallibility of a human (who can be fooled). It’s an unusual combination, and our traditional tools weren’t built for it.

Why the old playbook breaks

If you’ve ever worked in HR or IT, you’ve seen the standard lifecycle for managing employee access:

Someone gets hired. HR creates their account.
They change roles. Their access is updated.
They leave. Their account is deleted.

The industry calls this joiner-mover-leaver, and the entire system is built around the assumption that every identity has a clear beginning, middle, and end, and a human attached to it.

Now try to apply that to an AI agent that exists for forty seconds.

Who’s the joiner? When is the leaver event? What role does it belong to? Whose manager do you talk to in the quarterly access review? If it misbehaves at 3:47 a.m. on a Tuesday, can you even tell “why” from the logs? Can you trace it back to the human request that set the whole chain in motion?

In most companies today, the honest answer to all of those questions is: no.

Here’s where the old playbook specifically falls apart:

Credentials don’t expire: Service accounts and API keys typically stay valid for months or years. Employees rotate passwords; machine credentials almost never do. That’s a gift to attackers.

Nobody owns them: One survey found that roughly 78% of organizations don’t have formal rules for who’s responsible for creating and removing AI-related accounts. When something goes wrong, there’s literally nobody to call.

Access piles up: Machine accounts gain permissions over time, almost never lose them. A service account that needed three things in 2021 might have twenty today, because people keep granting and nobody revokes.

There’s no audit trail worth the name: Most logs record “this credential did this thing.” They don’t record “this credential did this thing because a specific human asked for it three steps ago.” When you need to trace a problem back to its origin, you usually can’t.

The tools don’t see each other: Active Directory handles Windows. A separate tool handles AWS. Another handles SaaS. A fourth handles certificates. Each tool sees a slice. Nobody sees the whole picture.

A classic problem in new clothes

There’s a wonderful old idea in computer security called the confused deputy problem, first described by Norm Hardy in a 1988 paper. The idea is simple: imagine a program that has permission to do important things → call it the deputy. The deputy is trustworthy. But the deputy can be tricked. Someone who isn’t allowed to do the important thing convinces the deputy to do it on their behalf. The deputy isn’t malicious. It’s just confused about who it’s actually working for.

AI agents are the perfect modern confused deputy.

They have real credentials to real systems. They’re trained to be helpful. They follow instructions written in plain English. If an attacker can sneak an instruction into the data the agent reads → in a document, an email, a webpage → the agent will often just follow it. To the agent, an instruction is an instruction.

This isn’t hypothetical. Over the past year, there’s been a steady drip of reports about AI agents being tricked exactly this way. One report from late 2025 described what’s believed to be the first cyber espionage campaign where an AI agent did most of the work → scanning for vulnerabilities, stealing credentials, exfiltrating data → with human attackers only stepping in at the big decision points.

Put it all together and the picture gets uncomfortable. Lots of non-human accounts. Too much access. Nobody watching. And the newest kind can actually reason about what to do next, which means it can also be reasoned with by the wrong person.

What a better approach looks like

The good news: the shape of a solution is starting to come into focus. It isn’t one tool or one magic fix. It’s a shift in mindset, plus a handful of techniques that are showing up in real products. In plain English: Treat every non-human account as a real identity with a real owner

Every service account, API key, and AI agent should have a human listed as responsible for it. If nobody owns it, it shouldn’t exist. Obvious, and yet most companies don’t do it.

Issue short-lived credentials instead of permanent ones: Instead of a password that works forever, hand out a token that works for fifteen minutes. When it expires, the system has to ask for a new one. This dramatically shrinks the window in which a stolen credential is useful.

Grant the least access possible, and check it often: Don’t give an agent permission to read the whole customer database when it only needs one record. Security people have called this “least privilege” for decades. It’s just harder now because the numbers are so big.

Log not just what happened, but why: When an agent acts, the log should show the whole chain: which human asked for what, which agent acted, which tools were called, what the result was. If something goes wrong, you need to walk that chain backward.

Check decisions continuously, not just at login: The old model said: prove who you are once at the door, then do whatever you want for eight hours. The new model has to be: prove who you are, “and” prove every individual action is allowed, every single time. The industry calls this “runtime authorization”, and it’s a real shift in how identity systems work.

Require a human thumbs-up for anything risky: Agents can do small things automatically, but sensitive actions such as moving money, deleting records, sending external messages → should pause and wait for a human to say yes. This is “human in the loop,” and it turns out to be the single best defense against confused-deputy attacks.

Manage humans, machines, and agents in one place. Instead of four tools each seeing a slice, use one system that sees all of it. The industry is calling this an “identity fabric” or “identity control plane.” You can just call it one place to look

None of this is rocket science. It’s mostly common sense applied at scale. The hard part isn’t inventing new ideas → it’s getting organizations to do things they’ve known they should do for years, except now with ten times the accounts and a fraction of the time per decision.

The part that actually keeps me up at night

Here’s what I find genuinely unsettling.

For as long as computers have been on networks, security has rested on a simple foundation: knowing who did what. If something bad happens, you trace it back to a person, and then you figure out whether they meant to do it, whether they were tricked, or whether their account was stolen. The whole legal and organizational system for handling security incidents assumes that behind every action, there’s eventually a human you can find.

With AI agents making decisions on their own, that foundation gets shaky.

If an agent does something wrong, who’s responsible? The person who deployed it? The person whose request indirectly triggered it three hops ago? The vendor who trained the model? The company whose poisoned data shaped the agent’s instructions? In almost every real incident I’ve read about, there’s enough ambiguity that nobody has a clean answer.

That ambiguity won’t last forever. Courts, regulators, and insurance companies will all eventually have to take a position. But right now we’re stuck in the uncomfortable gap between “this technology is everywhere” and “we’ve figured out how to govern it.”

That gap is where attackers live.

The short version

Non-human logins like service accounts, API keys, certificates, and now AI agents massively outnumber human ones inside most companies. Nobody is really keeping track of them. The tools we built for employee accounts don’t fit. AI agents make the problem worse because they can think, which also means they can be fooled. The fix involves treating every non-human account as a first-class citizen with an owner, issuing short-lived credentials, checking every action against policy as it happens, and keeping humans in the loop for anything that matters.

None of this is impossible. Very few companies are doing it.

And every month that passes, there are more non-human accounts to worry about, not fewer.

If your company hasn’t started thinking about this yet, here’s the question worth taking away: who owns the machine logins inside your organization, and what happens when one of them misbehaves?

If you don’t know the answer, you’re not alone. But you probably don’t want to wait much longer to find out.

Understanding Process Spawing in Linux

Shreya Patil — Sat, 04 Apr 2026 15:55:48 GMT

This lab was designed to demonstrate how process execution on a Linux system can be monitored, collected, and investigated in a practical detection workflow. Instead of only running commands and observing local output, the goal was to build visibility into what happens behind the scenes when programs launch other programs, how Linux records that activity through auditd, and how those records can be forwarded into Splunk for searching and analysis.

The lab began with configuring the Linux auditing framework so that important process-related events could be captured at the operating system level. Audit rules were created to monitor execution-related system calls and commonly used binaries associated with command execution, scripting, and network access. This ensured that whenever a process was started, the system would generate detailed audit records showing what ran, from where it ran, under which user context it executed, and how it was related to other processes.

To make the activity meaningful and repeatable, a controlled Python script was then used to spawn child processes in several different ways, including direct subprocess execution, shell-based execution, and fork-and-exec behavior. This provided a predictable dataset that could be traced from the point of execution in Linux all the way into Splunk. By doing this, the lab moved beyond theory and created a realistic example of how legitimate or suspicious process activity appears in endpoint telemetry.

The Splunk portion of the lab focused on transforming raw audit logs into searchable security data. Audit logs were ingested, relevant fields were extracted, and searches were written to identify process execution events, shell usage, parent-child process relationships, and mini process trees. This made it possible to move from low-level kernel records to readable investigation views that are useful in security operations, threat hunting, and incident analysis.

Overall, this lab was about building an end-to-end understanding of Linux process monitoring: capturing process activity at the source, forwarding it into a logging platform, and using that data to investigate how processes were created and related to one another. It serves as a foundational exercise in endpoint visibility and helps connect system internals with practical detection and analysis workflows.

created an audit rule file to define which system activities auditd should monitor. The selected system calls and command-line tools were included because they are closely associated with starting programs, running commands, and launching processes, making them important indicators of normal administrative activity as well as potentially suspicious behavior

verified that the binaries referenced in the audit rules were present on the system by listing their paths. Since all of them existed, the file watch rules were valid and could be applied without errors

checked the audit rule file with sudo augenrules — check before loading it to make sure there were no syntax or configuration problems. The message Rules have changed and should be updated confirmed that the rules were valid and ready to be loaded into the system.

validated and loaded the new audit rules so they became active in the kernel, then restarted auditd to ensure they were applied. The rules loaded successfully, and although there was a warning that older watch rules are slower, it was only informational and auditctl -l confirmed that process-spawn monitoring was active.

ran a simple command and then searched the audit logs to confirm that execution events were being recorded. The results showed related records such as SYSCALL, EXECVE, CWD, and PATH, confirming that auditd was capturing process execution correctly.

changed the audit log format to ENRICHED so the logs would be more readable and easier to understand during analysis. After restarting auditd, the configuration check confirmed that the new format was applied successfully.

downloaded the Splunk installer into /tmp and verified that the file was fully present. Using /tmp was appropriate because the installer was only needed temporarily for setup. Access the splunk through the URL prompted at the end

created a dedicated index called linux_audit so audit events would be stored separately from other logs. This made the data easier to organize and search.

updated inputs.conf so Splunk would monitor /var/log/audit/audit.log and /var/log/syslog. This connected Splunk to the log sources generated by the system and by auditd.

configured props.conf so Splunk could extract meaningful fields such as comm, exe, pid, ppid, uid, and audit_key. This turned raw audit text into structured data that was much easier to search and interpret.

restarted Splunk so the new monitoring and parsing settings would take effect. The restart completed cleanly, and Splunk status confirmed it was running normally.

triggered a fresh process execution event and then searched Splunk for incoming audit data. The appearance of events in the linux_audit index confirmed that auditd was writing logs, Splunk was reading them, and the ingestion pipeline was working end to end.

Initially, the audit_key extraction was too broad and produced messy values. We refined the extraction rule so Splunk captured only the quoted key value, resulting in clean entries such as proc_spawn_execve.

created a dedicated folder for the demo script and a separate log file for its own output. This kept the script’s application logs separate from the kernel audit logs, which made the demo easier to analyze.

created a controlled Python script that launched child processes in several different ways, including subprocess.run(), subprocess.Popen(), os.system(), and os.fork() with os.execvp(). This gave us predictable process-spawn activity to observe in both auditd and Splunk.

executed the script and reviewed its log file to confirm that each spawning method ran successfully. This produced a clean set of known events that could be investigated later.

first searched for the demo marker string directly, but that mainly returned the search command itself. A better approach was to inspect recent execve and clone events using ausearch — interpret, which showed the correct data, although it also included normal background system activity.

searched Splunk for syscall=59, which represents execve on x86_64 systems. This returned a clean and useful table of executed processes and was the first strong search for practical process-execution analysis.

filtered for shell-related processes such as sh or dash to identify command execution that passed through a shell. This clearly revealed the behavior of os.system(), which is important because many suspicious command executions use a shell in the same way.

used the Python demo script’s parent PID to search for all child processes it launched. This gave a focused view of the script’s activity and showed direct process execution, shell-based execution, and related Python-generated events.

Building a mini process tree — used Splunk to format the parent and child PIDs into a readable relationship view. This produced a simple process tree that clearly mapped the Python script to the child processes it created, making the behavior easy to explain and investigate.

Detecting Reverse Tunnels in AWS

Shreya Patil — Wed, 01 Apr 2026 17:47:53 GMT

Ever wondered whether there is an interesting way to use AWS native services to query large volumes of VPC Flow Logs for suspicious activity using SQL? Amazon Athena is a simple and effective answer. In this setup, I demonstrate how Athena can be used to query VPC Flow Logs stored in S3 and turn raw network metadata into meaningful threat detection insights.

The idea is straightforward: VPC Flow Logs capture network activity, S3 stores the logs, and Athena lets you run SQL queries directly on that data without building a separate analysis platform. By combining these AWS native services, it becomes possible to identify suspicious ports, beaconing behavior, outbound SSH activity, DNS tunneling indicators, and unusual data transfers in a practical and scalable way.

In the following setup, I show how this detection pipeline works end to end and how Athena can be leveraged as a lightweight threat hunting solution over VPC Flow Logs using SQL.

Network Architecture

Shows the layered infrastructure: the EC2 instance nested inside a public subnet within a VPC, connecting outbound through an internet gateway to the attacker’s C2 server, while VPC Flow Logs silently captures all traffic metadata below.

Data Flow Diagram

Traces the data journey from capture to verdict: VPC Flow Logs records every connection, delivers to S3 every 60 seconds, Athena queries it as SQL, and six detection queries score each connection for reverse tunnel indicators.

Building Network Infrastructure

Building the VPC

VPC is the private isolated network inside AWS. Created one with the address range 10.0.0.0/16, giving us 65,000+ internal IP addresses.

Creating a Public Subnet

Create and Attach Internet Gateway

The internet gateway connects VPC to the outside world. Without it, nothing inside the network can send or receive internet traffic.

Creating Route Table

The route table tells traffic where to go. Added a rule saying “anything destined for the internet goes through the gateway.”

Creating Security Group

A security group is a virtual firewall at the instance level. Allowed SSH inbound and all outbound → that permissive outbound rule is exactly what makes reverse tunnels possible.

EC2 Instance Launch

Creating a Key Pair

A key pair replaces password-based authentication for SSH access. AWS generates it, you download the private key and keep it secure.

Launching the Instance

Deployed a t2.micro running Amazon Linux inside public subnet. This instance simulates a server that an attacker has already compromised.

Set Up VPC Flow Logs

Creating the S3 Bucket

S3 is cloud object storage where flow log files will be delivered. Every few minutes, AWS drops compressed log files containing records of every network connection.

Create the required Bucket Policy

The log delivery service needs explicit permission to write to your bucket. The bucket policy grants exactly that access and nothing more.

Enabling VPC Flow Logs

This turns on network monitoring across the entire VPC. Every connection’s metadata → IPs, ports, bytes, timestamps, action → gets recorded every 60 seconds.

Simulating Reverse Tunnel Traffic

SSH into the Instance using Instance Connect (can skip the key pair creation part)

To access the EC2 instance via browser, we used EC2 Instance Connect. This required adding an inbound SSH rule for 18.206.107.24/29 → the AWS-owned IP range that Instance Connect uses to proxy the SSH connection in us-east-1, since browser-based SSH comes from AWS infrastructure, not your laptop.

Generating Reverse Tunnel Traffic

Technique A: Port 4444 Beaconing Repeated outbound connections on port 4444, the classic reverse shell port. The regular interval pattern simulates a compromised machine checking in with its controller.

Technique B: HTTPS-Disguised C2 Same beaconing behavior but on port 443 to blend with normal web traffic. Attackers use this to avoid port-based detection rules.

Technique C: SSH Reverse Tunnel Outbound SSH connections to an external IP. Legitimate servers rarely initiate SSH to unknown destinations, making this a strong anomaly signal.

Technique D: DNS Tunneling High-volume UDP packets on port 53 simulating data exfiltration through DNS queries. The unusual volume to a non-standard destination is the indicator.

Logs generated and sent to S3 bucket that was create for VPC Logs (Wait Time: Almost 15–20 minutes)

Setting Up Athena to Query Flow Logs

Create Athena Result Bucket

Athena needs a separate S3 bucket to store query output results. Every query that is run generates a result file that gets saved here.

Set Athena Output Location

Before running any query, Athena needs to know where to store results. Pointed it to the results bucket that I just created.

Creating a Database

A database in Athena is a logical container that groups related tables together. Created one called vpc_flow_logs_db to hold the flow logs table.

Create Flow Log Table

This tells Athena how to interpret the raw log files in S3 → mapping each space-separated field to a named column like srcaddr, dstaddr, dstport, and bytes so we can query them with standard SQL.

Detect The Reverse Tunnel using Athena Queries

Run the Athena Queries

These queries look for suspicious outbound activity from internal 10.0.x.x systems.

Q1 → checks for connections to unusual ports like 4444, 5555, and 8080, which may indicate backdoors or unauthorized services.

Q2 → finds long outbound sessions to external IPs that last more than 60 seconds.

Q3 → looks for repeated connections to the same external IP and port, which may suggest beaconing.

Q4 → checks for frequent UDP traffic on port 53, which may indicate possible DNS tunneling.

Q5 → identifies outbound SSH connections from internal hosts to external systems.

Q6 → looks for outbound connections with higher total byte transfer, which may suggest possible data exfiltration.

Detection Query Result

This Athena output confirms that the detection queries worked and successfully identified suspicious outbound communication patterns from the internal host 10.0.1.144.
The results show alerts for suspicious ports and beaconing behavior, meaning the hunting logic was able to detect potentially malicious traffic in the VPC flow logs.

Whats next?

Next, I will show how a detection platform can be built more effectively by leveraging AWS Glue to collect, normalize, and organize telemetry from services such as GuardDuty and CloudTrail. By bringing these data sources together, the platform can move beyond single-source analysis and enable broader visibility, stronger correlation, and more meaningful detection outcomes across the AWS environment.

Simulating a TCP SYN Flood Attack

Shreya Patil — Thu, 26 Mar 2026 22:28:42 GMT

A hands-on lab using Scapy, tcpdump, and Snort on a single VM

Introduction

Network attacks come in many forms, but few are as straightforward to understand and as damaging in effect as the TCP SYN flood. It exploits one of the most fundamental mechanisms of how two machines establish a connection over the internet. In this lab, I set up a controlled environment on a single Kali Linux virtual machine to simulate this attack from scratch, capture the traffic it generates, and write a custom detection rule to catch it using Snort.

Rather than reaching for a pre-built attack tool, I wrote the flood script from scratch using Scapy — a Python library that lets you construct raw network packets layer by layer. This approach makes every part of the attack visible, which is the whole point of a learning lab.

How a TCP SYN Flood Works

The normal TCP three-way handshake

Before understanding the attack, you need to understand what a normal TCP connection looks like. When two machines communicate over TCP, they go through a three-step process:

Step 1 → SYN: The client sends a SYN (synchronize) packet to the server, asking to open a connection.
Step 2 → SYN-ACK: The server responds with a SYN-ACK (synchronize-acknowledge), confirms the request, and allocates memory for the pending connection.
Step 3 → ACK: The client sends back an ACK (acknowledge). The handshake is complete and the connection is established.

The critical detail is step two. The moment the server sends the SYN-ACK, it allocates memory and adds the connection to a backlog queue while waiting for the final ACK. This waiting state is called a half-open connection.

How the attack exploits this

A SYN flood abuses this waiting behavior. The attacker sends a massive volume of SYN packets in rapid succession, each one carrying a spoofed (fake) source IP address. The server responds to each one with a SYN-ACK, but since the source IPs are fake, no ACK ever comes back. Half-open connections pile up in the backlog queue. When the queue fills completely, the server cannot accept new connections from legitimate clients. The service becomes unreachable.

The IP spoofing is what makes this attack particularly difficult to defend against using simple countermeasures. Every packet appears to come from a different machine, so blocking by source IP is completely useless. This is also what makes writing the correct Snort detection rule interesting — more on that later.

Lab Setup

Environment

Everything runs on a single Kali Linux 2026 virtual machine inside VMware. There is no second machine and no external target. Attack traffic travels entirely over the loopback interface, meaning the same machine acts as both attacker and victim. The loopback interface (127.0.0.1) is a virtual network interface present on every operating system that routes traffic straight back to itself.

Tools used

Scapy — a Python-based packet manipulation library. Used to craft and send raw TCP/IP packets with spoofed source IPs.
tcpdump — a command-line packet capture tool. Listens on the loopback interface and writes everything to a .pcap file.
Snort 3 (Snort++ 3.12.1.0) — an open source intrusion detection system. We write a custom rule and run it against the captured traffic to generate alerts.

Step-by-Step Walkthrough

Step 1: Starting the victim service

A SYN flood needs a target — something actively listening for connections. We started a lightweight web server on port 80 using Python’s built-in HTTP server module. Open Terminal 1 and run:

sudo python3 -m http.server 80

This spins up a web server listening on all interfaces on port 80. The moment it starts, the operating system allocates a TCP connection backlog queue for it — the exact resource our flood will exhaust. Leave this terminal running for the entire lab.

Step 2: Verifying the service is listening

Open Terminal 2 and confirm the server is actually up before proceeding:

ss -tlnp | grep :80

The ss tool lists active socket connections. The flags used:

-t — show TCP sockets only
-l — show only listening sockets, not connected ones
-n — skip hostname resolution, show raw IP addresses
-p — show the process name owning each socket

Piping through grep :80 filters the output down to port 80 only. The LISTEN line with python3 confirms the target is ready.

Step 3: Writing the Scapy attack script

Open Terminal 3 and create the attack script:

nano ~/syn_flood.py

Paste in the following code:

What each key part does:

src_ip — a completely random IP address generated fresh for every single packet. This is IP spoofing. Each packet looks like it came from a different machine on the internet.
IP(src=src_ip, dst=target_ip) — builds the IP layer with our fake source and loopback destination.
TCP(sport=src_port, dport=target_port, flags=”S”) — builds the TCP layer. flags=”S” sets only the SYN bit. No ACK, no FIN, just SYN. This is the fingerprint of a flood packet.
ip_layer / tcp_layer — the / operator in Scapy stacks protocol layers together. IP wraps TCP, exactly how real packets are structured.
send(packet, verbose=0) — fires the packet. verbose=0 suppresses Scapy’s own output so our counter stays readable.

Save the file with Ctrl+X, then Y, then Enter.

Step 4: Starting tcpdump before the attack

This is important — tcpdump must be started before the attack so it captures packets from the very first SYN. Open Terminal 2 and run:

sudo tcpdump -i lo -w /tmp/syn_flood.pcap tcp port 80

Flag breakdown:

-i lo — listen on the loopback interface. All our attack traffic travels here since the target is 127.0.0.1.
-w /tmp/syn_flood.pcap — write captured packets to a file instead of printing them. The .pcap format is the standard for packet captures and is compatible with Wireshark, Snort, and most analysis tools.
tcp port 80 — capture filter. Only record TCP traffic on port 80 and ignore everything else on the system.

After printing the startup line, tcpdump goes silent. That is correct behavior — it is recording everything quietly in the background.

Step 5: Launching the attack

Open Terminal 3 and run the script with sudo. Root privileges are required because crafting raw packets at the network layer is a privileged operation in Linux:

sudo python3 ~/syn_flood.py

Every 100 packets printed confirms another batch of SYN packets fired, each one carrying a different randomly generated source IP. Let it run for about 30 seconds, then press Ctrl+C to stop.

The error when stopping — and why it is harmless

KeyboardInterrupt

This traceback is not a real error. When Ctrl+C was pressed, Python was mid-execution inside send(). Scapy tried to close the network socket cleanly but the interrupt arrived before it could finish. All 400 packets were sent successfully. Python simply complained about being cut off mid-operation, which is normal for any script stopped with Ctrl+C.

Step 6: Stopping tcpdump and examining the capture

Go back to Terminal 2 and press Ctrl+C to stop the capture:

What these numbers mean:

478 packets captured — the actual number of packets written to the pcap file on disk.
956 packets received by filter — exactly double (478 x 2). On the loopback interface, every packet is seen twice by the kernel — once going out and once coming back in. tcpdump counts both passes. This is normal loopback behavior, not an error.
0 packets dropped — nothing was missed. The capture is complete and clean.

The capture file is saved at /tmp/syn_flood.pcap and ready for analysis.

Step 7: Writing the Snort rule

Before running Snort we need to give it a rule. Snort is just a detection engine — it has no idea what an attack looks like unless you describe the traffic pattern for it. We opened the Snort configuration file directly:

sudo nano /etc/snort/snort.conf

NOTE : Important: In Snort 3, rules live inside the ips block in snort.conf. The old Snort 2 approach of writing rules in local.rules and including them with an include directive still works but requires explicit setup. For this lab we keep everything in snort.conf directly.

Here is the key design decision in the rule. Since our Scapy script randomizes the source IP on every single packet, tracking by source IP is completely useless — each packet looks like it came from a brand new machine. The correct approach is to track by destination. Every single flood packet, regardless of its fake source, lands on the same victim: 127.0.0.1 port 80. Counting arrivals at the destination is what catches the flood.

Breaking down every part of the rule:

alert — the action Snort takes when a packet matches. It logs an alert and continues processing.
tcp any any — match TCP traffic from any source IP and any source port.
-> 127.0.0.1 80 — only match traffic flowing toward our loopback address on port 80. The -> means one direction only.
flags:S — match only packets where the SYN bit is set and no other TCP flags are active. Pure SYN packets with nothing else is the signature of flood traffic.
flow:stateless — tells Snort not to attempt tracking connection state. This is essential here because the source IPs are all spoofed — there is no real connection to track.
detection_filter:track by_dst, count 100, seconds 2 — the flood detector. Only fire the alert after 100 or more matching packets arrive at the same destination within 2 seconds. A legitimate browser sends 1 or 2 SYNs to open a page. A flood sends hundreds. This threshold is what separates normal traffic from attack traffic.
sid:1000004 — a unique rule ID. Custom and local rules always use numbers starting from 1000000 to avoid clashing with official Snort community rulesets.
rev:1 — revision number of this rule. Increment this whenever the rule is modified.

Save the file with Ctrl+X, then Y, then Enter.

Step 8: Generating a clean pcap for Snort

Here we hit an important technical issue. The pcap captured by tcpdump during the live attack (/tmp/syn_flood.pcap) had bad TCP checksums. When Scapy sends packets live through the loopback interface, the Linux kernel offloads checksum calculation in a way that results in malformed checksums being written to the capture file. Snort validates checksums before applying rules, so it silently discarded those packets before the detection engine ever saw them.

The fix is to generate a fresh pcap by building packets entirely in Python memory and writing them directly to disk using Scapy’s wrpcap() function. When packets never touch the live network stack, Scapy handles checksums correctly. Run this in Terminal 4:

500 clean SYN packets are now saved to /tmp/syn_flood2.pcap with valid checksums. The difference between this and the earlier approach:

send() — sends packets live through the kernel. Fast but produces bad checksums on loopback.
wrpcap() — writes packets directly to a file. Scapy controls everything including checksums. No traffic is sent anywhere.

Step 9: Running Snort against the capture

Now run Snort against the clean pcap file:

sudo snort -r /tmp/syn_flood2.pcap -c /etc/snort/snort.conf

Flag breakdown:

-r /tmp/syn_flood2.pcap — read from a pcap file instead of listening on a live interface.
-c /etc/snort/snort.conf — load our configuration file which contains the detection rule.

Reading one alert line:

[**] — marks the start and end of an alert in Snort output.
[1:1000004:1] — the rule identifier in generator_id:sid:revision format.
“SYN Flood Detected on port 80” — the custom message from our rule.
155.190.117.7:34144 -> 127.0.0.1:80 — a spoofed source IP and random port attacking our victim. Every single alert shows a completely different source IP — this is IP spoofing in action.

Errors Encountered and How We Fixed Them

These are the real errors that came up during the lab. They are worth documenting because they reflect the kind of issues anyone following online Snort guides will hit on a modern Kali installation.

Error 1: Snort 2 flag used on Snort 3

Our first attempt to run Snort used the -A console flag, which is standard in most Snort tutorials online:

sudo snort -r /tmp/syn_flood.pcap -c /etc/snort/snort.conf -A console

Error:

ERROR: unknown logger console

FATAL: see prior 1 errors (0 warnings)

Fatal Error, Quitting.

The vast majority of Snort documentation on the internet is written for Snort 2. Kali 2026 ships with Snort 3 (Snort++ 3.12.1.0), which is a complete rewrite with different syntax throughout. The -A console flag does not exist in Snort 3. Snort 3 writes alerts to a log file automatically. Dropping the flag entirely fixed it.

Error 2: The threshold keyword does not exist in Snort 3

Our first version of the rule included the threshold keyword to rate-limit alerts:

threshold:type both, track by_dst, count 100, seconds 5;

Error:

ERROR: ips.rules:1 unknown rule keyword: threshold.

The threshold keyword was removed in Snort 3 and replaced by detection_filter. Swapping the threshold for detection_filter with the same parameters fixed the rule.

Error 3: track by_src catches nothing with spoofed IPs

Even after fixing the threshold keyword, a rule using track by_src fired no alerts:

detection_filter:track by_src, count 100, seconds 2;

The reason is simple — our Scapy script generates a completely random source IP for every single packet. From Snort’s perspective, each packet came from a different machine and no single source ever reached the count of 100. Switching to track by_dst fixed this because all packets, regardless of their fake source, land on the same victim at 127.0.0.1:80. Counting arrivals at the destination is the correct way to detect a SYN flood when source IPs are spoofed.

What This Lab Demonstrates

From the attacker’s perspective

By randomizing the source IP on every packet, the attacker makes the flood untraceable and impossible to block by IP. In a real scenario the victim server would be receiving SYN packets from thousands of different addresses simultaneously. Every SYN-ACK it sends disappears into the void because the source IPs are fake. The server keeps those half-open connections in memory until they time out, and with enough flood volume the backlog queue fills up completely before timeouts can free space.

From the defender’s perspective

Snort caught every single matching packet once the rule was written correctly. A SYN flood has a clear signature — massive volume of pure SYN packets all heading to the same destination port. The detection_filter:track by_dst approach is the right tool for this because it watches the victim rather than trying to identify individual attackers. In a production environment this rule, combined with an automated block response, would shut down a flood in seconds.

Key Takeaways

A TCP SYN flood exhausts the server’s half-open connection queue by sending SYN packets and never completing the handshake.
IP spoofing makes the attack impossible to mitigate by blocking source addresses. Every packet appears to come from a different machine.
Scapy gives complete control over every field in a raw network packet, making it ideal for understanding attacks at the protocol level.
tcpdump is a simple and reliable way to capture live traffic for offline analysis.
Snort 3 has significant differences from Snort 2. Most online guides are written for Snort 2 and will not work without modification on modern Kali.
Packets sent live through loopback via Scapy have bad checksums in the capture file. Using wrpcap() to write packets directly to disk avoids this completely.
When source IPs are spoofed, SYN flood detection must track by destination, not by source. This is a fundamental design decision for any flood detection rule.
Writing a detection rule from scratch teaches more about how IDS systems actually work than running a pre-built ruleset ever will.

Closing Thoughts

Security labs like this one are valuable because they force you to understand the mechanics of an attack rather than just watching a tool produce output. Every error we hit during this lab — the Snort version incompatibility, the checksum issue, the wrong rule syntax, the track by_src trap — taught something that polished documentation usually glosses over.

Running everything on loopback on a single VM means there are no external targets, no real network impact, and nothing at risk beyond your own machine. That is the right way to learn offensive and defensive security concepts — in a contained environment where you can break things freely and learn from what breaks.

NOTE : Reminder: All testing was performed in a controlled lab environment on locally owned infrastructure. Never perform attack simulations against systems you do not own or have explicit written permission to test.