2026-05-05

It’s Been a While — and I’ve Been Thinking About AI Guardrails

A practical note on running AI coding agents inside Docker Sandboxes, keeping secrets scoped, and thinking through failure before handing agents real access.

aiguardrailsdockeragents

It’s been a while since I first launched this site, and honestly, I’ve put off writing this post for too long.

Life has been busy. Work has shifted, responsibilities have changed, and like most people trying to learn in public, sometimes the learning keeps happening while the writing gets delayed.

I haven’t stopped digging into AI though.

If anything, I’ve gotten more curious.

Lately, I’ve been spending a lot of that time working on safer ways to use coding agents like Claude and Codex—specifically, getting them running inside Docker Sandboxes instead of letting them roam freely across my machine.

Right in the middle of that, another story dropped about an AI agent deleting critical production data, wiping backups, confidently taking credit for it, and then asking what it should do next.

I wasn’t surprised.

Stories like that sound dramatic until you’ve spent enough time watching agents confidently do the wrong thing at machine speed.

This one involved a Claude-powered coding workflow that reportedly deleted an entire company database in seconds. That kind of failure sounds extreme, but it reinforces something I already believed: if you’re going to let AI touch real systems, your first question shouldn’t be “how fast can this go?”

It should be:

what happens when it’s wrong?

That’s what pushed me harder into Docker Sandboxes.

What Docker Sandboxes Actually Do

Docker Sandboxes (sbx) let you run coding agents like Claude Code and Codex inside isolated microVMs instead of directly on your host machine.

Each sandbox gets:

its own filesystem
its own Docker daemon
its own network boundaries
access only to the project directory you explicitly mount

Meaning the AI can still install packages, run tests, break things, and generally behave like an overconfident junior developer—but it does it inside a box.

Not on your actual machine.

Docker describes it as “microVM-based isolation,” where the agent can build containers and modify files without touching your host system.

That matters.

Because while an agent running locally generally only has the permissions of the user you launched it under, that still includes a lot of damage potential. I already code from a standard Windows user account instead of admin for exactly that reason.

I’d rather not let an agent get creative anywhere near System32.

Pain Point #1: Permissions

Getting sbx working on Windows was not exactly straightforward.

Docker Sandboxes require virtualization support and Hyper-V access.

Since I already code from a standard Windows user account instead of admin, I ran into some friction pretty quickly. That was intentional—I’d rather limit what an agent can touch if something goes wrong.

An agent running locally generally only has the permissions of the user it runs under, which helps, but that still leaves plenty of room for damage if you’re careless.

If you’re on a standard user account or a work-managed machine, you’ll probably need to be added to the Hyper-V Administrators group before Docker Sandboxes will behave.

I also found I had to enable Virtual Machine Platform, which requires admin rights as well.

So there’s a little setup pain before you get to the fun part.

Pain Point #2: Secrets

Once sbx is running, the next question is how you want to handle credentials.

Docker uses sbx secret set so secrets stay on the host and get injected into the sandbox instead of being stored inside it.

That was important to me. If I’m letting an agent work independently, I want the blast radius to stay small.

For GitHub, I use a Personal Access Token scoped only to the repos I’m actively working on with sandboxes.

For Claude Code, at time of writing, you can either:

use an Anthropic API key with sbx secret set -g anthropic
or use your Claude subscription by running /login inside the sandbox

If you use the subscription route, the session token stays on the host and is passed in through Docker’s proxy instead of being stored inside the sandbox itself, which is a much better model.

One thing I ran into with OAuth flows is that the sandbox uses its own isolated execution environment, so logging in doesn’t behave like your normal desktop browser session.

Usually that means copying and pasting a device code manually.

That sounds minor, but Microsoft recently wrote about AI-enabled device code phishing campaigns (“PhaaS,” which really rolls off the tongue), where attackers specifically target those login flows. That made me a little more careful about where I’m pasting credentials and what permissions I’m handing out.

Using the smallest possible scope for tokens feels boring right up until it matters.

Why I Like This Workflow

The biggest improvement for me hasn’t been “AI writes code faster.”

It’s that I can queue up larger prompts across multiple terminals without feeling like I’m babysitting every permission request.

Instead of constantly approving commands or worrying about what an agent might be touching, I can let it work inside a sandbox and focus on reviewing the output.

That makes experimenting a lot easier.

I can try multiple approaches in parallel, keep what works, and throw away what doesn’t without worrying about my main repo getting messy.

Using Git worktrees with commands like sbx run claude --branch makes that even better.

You can test several directions on the same project at once and decide based on what actually works instead of trying to predict everything upfront during planning.

That’s probably the part I’ve enjoyed most.

AI still doesn’t replace good judgment, and solid software definitely doesn’t happen without people reviewing the work.

But for prototyping and exploration, it removes a lot of friction.

The bottleneck feels less like “how fast can I build this?” and more like “which version is actually worth building?”

That’s a much better problem to have.

Learning in Public

I’m still figuring all of this out.

I’m not writing this as someone with the perfect setup or the final answer—I’m writing it because learning in public is the whole reason this site exists.

New tools are exciting, but they’re also easy to trust too quickly.

I think the better approach is staying curious without being careless.

Experiment, test things, break things in safe places—but make sure you understand what the failure mode looks like before you hand over the keys.

If you’re using AI coding workflows, I’d be curious what your setup looks like.

What’s working?
What’s annoying?
What problems have you run into?

Let’s chat: contact@milesdinsmore.com