Running Shell Commands

How the bash tool works, what it's used for, and the safety considerations that come with letting an agent run real commands.

Anything that isn't a file edit usually happens through the shell: installing dependencies, running a test suite, starting a dev server, invoking the compiler, querying git. The bash tool is what makes Claude Code an agent that can actually verify its own work instead of just asserting that code "should" work.

Why this is the difference-maker

A model that only edits files can produce code that looks correct and isn't. A model that can also run the test suite, see the actual failure output, and iterate against it closes that gap. This is the mechanism behind most of what feels like "Claude debugging itself" — it's not a separate debugging feature, it's the same read-act-observe loop applied to command output instead of file content: run the tests, read the failure, form a hypothesis, make a targeted edit, run the tests again.

Common uses

Installing or updating dependencies before code that needs them will run.
Running the test suite, a single test file, or a specific test case.
Running a linter or type checker and fixing what it flags.
Git operations — covered in more depth in Git-native workflows.
Starting a process to manually verify behavior (a dev server, a script with sample input).

Long-running commands

A command that takes a while — a full test suite, a production build — doesn't have to block everything else. See Background tasks for running something long while continuing other work in the same session instead of sitting idle waiting for it.

Safety considerations

Shell access is the most powerful and most consequential tool available, which is exactly why permission modes exist around it. A few things worth being deliberate about:

Allowlist the safe, frequent commands (test runner, linter, read-only git commands) so you're not confirming the same harmless command dozens of times a session, while leaving destructive commands gated.
Be specific about what "safe" means for your project. A command that's harmless in one codebase — say, a script that writes to a local database file — might not be in another.
Sandboxing matters more for unattended or high-autonomy runs. If you're running with a loose permission posture, doing it inside a container or VM limits the damage a wrong command can cause, the same way it would for any automation you didn't write yourself.

What good output-reading looks like

The agent doesn't just check whether a command exited successfully — it reads the actual output. A test run that exits 0 but printed a deprecation warning, or a build that succeeded but emitted a new bundle-size warning, is information worth acting on, not just a pass/fail signal to ignore.

Next: finding the right files in a codebase before changing anything.