~/stepintoliquid
← all posts

Claude Code CLI vs Claude Cowork

Same agent harness, two drivers. A technical comparison for people who build software — not a feature list.

Anthropic ships two agentic products that look like competitors and are actually the same engine wearing two interfaces. If you build software, the difference between them is decided by one architectural fact, and most “vs” posts miss it.

This is synthesized from four independent research passes — official docs, practitioner sentiment, strategic analysis, and a deep architecture teardown — covering January–June 2026. Every claim carries an evidence tier: CONFIRMED (multiple independent sources) · STRONG (one high-credibility source) · MODERATE (single/secondary) · CONTESTED (sources disagree) · SPECULATIVE (inference). Reverse-engineering claims never exceed MODERATE.

The one-paragraph verdict

Claude Code and Claude Cowork are the same agent harness behind two different drivers. Both run Anthropic’s frontier Claude model on the same agentic loop and share an identical extension format — skills, slash-commands, MCP, sub-agents, all markdown + JSON, no build step CONFIRMED. They diverge on how the loop is hosted, and that single fact decides everything for a builder: Claude Code exposes the loop as a process (terminal, headless -p, Agent SDK) so you can script it, embed it, run it in CI/cron, and build products on top of it. Cowork wraps the same loop in a GUI desktop app with no process or SDK handle — so you can only configure and drive it, never build with it, and it cannot run unattended STRONG.

Practitioners land on the same line: “Claude Code is an instrument; Cowork is an environment.” In a survey of 17 hands-on creators, not one developer adopted Cowork for actual coding — those who tried found it had “no surface area for debugging,” slower and more quota-hungry than the CLI for the same task CONFIRMED.

One engine, two products

Anthropic is explicit: “Cowork runs on the same agentic engine as Claude Code — the loop that lets Claude plan, work across tools, and check its own output” CONFIRMED. The same frontier model powers both (Opus 4.6 at Cowork’s GA; later trackers cite Opus 4.8 / Fable 5 — the point is version-agnostic). The origin story is the architecture: Anthropic’s Felix Rieseberg says Cowork “wrote itself” in ~10 days by reusing internal Claude Code pieces STRONG. Simon Willison’s reading — corroborated by the teardown and by Anthropic’s own “Claude Code power for knowledge work” tagline — is that Cowork is “regular Claude Code wrapped in a less intimidating default interface.” CONFIRMED

Timeline: Cowork launched as a research preview 12–13 Jan 2026 and went GA on all paid plans (macOS + Windows desktop) 9 Apr 2026 CONFIRMED.

The decisive split: how the loop is hosted

The capability lists overlap heavily. The mechanism is where they part ways — and it’s a driver gap, not a feature gap.

Claude Code — a programmable substrate. The agent loop is a local Node process you own, exposed three ways: interactive terminal, headless claude -p (no TTY, for CI / scheduled jobs), and the Agent SDK. On top sit hooks (deterministic shell on tool events), Dynamic Workflows (a deterministic JS orchestration script, 16 concurrent / 1,000 total agents), and Routines (cron/API/GitHub triggers). You can embed it, script it, run it unattended, and build on it. CONFIRMED

Claude Cowork — a driven appliance. You author plugins, skills and MCP connectors as files (the same format), but there is no SDK, no CLI, no headless API. Execution requires the desktop app open and the machine awake; “scheduled tasks” means “the GUI app must keep running.” The mechanism of the absence: Cowork is a GUI-hosted front-end over the loop with no exposed process handle — automation has nothing to attach to. STRONG

Capability matrix

DimensionClaude Code CLIClaude Cowork
ExecutionLocal process; terminal + headless + SDK + CI/cron CONFIRMEDGUI app; work in a local Linux VM; no headless, app must stay open STRONG
Tool surfaceBash, file edit, MCP, hooks, sub-agents, plugins, background tasks CONFIRMEDSame family plus computer-use (screen + Chrome) minus hooks/background STRONG
ProgrammabilitySDK, headless, hooks, Dynamic Workflows, Routines CONFIRMEDFile-authored config only; no SDK/CLI/headless STRONG
Multi-agentSub-agents, Agent Teams, scripted 16/1,000 orchestration CONFIRMEDPrompt-level parallel sub-agents; “Dispatch” parent/child MODERATE
Permissionsallow/deny + modes; runs vs host shell; OS sandbox optional CONFIRMEDFolder-scoped grants; VM isolation + egress allowlist; approval model abandoned STRONG
ObservabilityOpenTelemetry, logs, traces, session JSONLs CONFIRMEDInternal transcript + screenshots; no documented OTel export MODERATE
BenchmarkSWE-bench Verified (real GitHub issue resolution) CONFIRMEDOSWorld (operate a desktop via screenshots) STRONG
Dev workflowNative git, repos, PRs, branches, worktrees CONFIRMEDDesktop files/apps; git work belongs to the Code tab, not Cowork STRONG

Permissions & the security crux

The two made opposite containment choices for opposite users. Claude Code runs against the host shell with human-in-the-loop approval (power users skip it with --dangerously-skip-permissions); an optional OS sandbox (Seatbelt / bubblewrap) cut permission prompts ~84% CONFIRMED. Cowork abandoned the approval model — stated rationale: “the average user is much less likely to be fluent in bash” — and contains blast radius with a VM + egress allowlist instead, always asking before a permanent delete STRONG.

The VM stops host destruction. It does not stop data exfiltration through legitimate APIs:

PromptArmor / Johann Rehberger demonstrated Cowork exfiltrating files via prompt injection; Rehberger labeled Anthropic’s “click stop if you notice exfiltration” guidance “Normalization of Deviance.” CONFIRMED

Simon Willison: “I do not think it is fair to tell regular non-programmer users to watch out for ‘suspicious actions that may indicate prompt injection’!” CONFIRMED

This is the asymmetry that matters: the same injection class is more dangerous in the product that exposes it to less technical users. It’s the central unresolved critique of the Cowork launch.

Benchmarks: don’t conflate the axes

SWE-bench Verified measures the % of 500 real GitHub issues where the model’s patch passes a hidden test suite — a direct proxy for autonomous code-fix competence, the Claude Code use case. OSWorld measures operating a real desktop purely through screenshots and clicks — a proxy for GUI-navigation reliability, the Cowork use case. They are orthogonal. For software development, SWE-bench is the load-bearing number; OSWorld is a noisier signal, and computer-use is slow (screenshot round-trips), so even where Cowork can do a dev-adjacent GUI task, it’s the inefficient path versus the CLI’s direct tool calls STRONG.

(Exact percentages vary by model version and source — treat them as indicative. The official docs lane could not verify an OSWorld figure in a primary Anthropic page; the ~83% number comes from a third-party tracker CONTESTED.)

What practitioners actually do

The hands-on signal is unusually consistent — and honest about the CLI’s own rough patch.

“Claude Code is your engineering partner… Cowork is your operations assistant.” — and: “Cowork has no surface area for debugging a broken RLS policy.” — Dee McCrorey CONFIRMED

“For complex tasks, Claude Code is faster and more reliable.” Organizing 100+ receipts took Cowork “20+ minutes with timeout errors; Claude Code finished in 5.” — Karen Spinner STRONG

Of 17 hands-on creators, the 13 who stayed with Cowork were doing non-dev knowledge work; the 4 who walked away retreated to Claude Code or direct file editing. No developer adopted Cowork for coding. CONFIRMED

The honest caveat: Claude Code’s own 2026 reputation is mixed-to-frustrated. Rate-limit / quota burn dominates the complaints (“20x max usage gone in 19 minutes”, 330+ comments), and Anthropic admitted a Jan–Mar quality regression that degraded Claude Code, the Agent SDK and Cowork; one 500+ dev survey showed 65% preferring OpenAI Codex at that moment CONTESTED. In June 2026 Anthropic temporarily doubled Cowork’s 5-hour limits through July — but left the weekly cap unchanged (the catch) STRONG.

When to use which

Anthropic’s own split: chat for drafting, Claude Code for coding, Cowork for cross-app knowledge work CONFIRMED. Operationalized:

If the task is…Reach forBecause
Repo/git work you’ll review as diffsClaude CodeNative git/test/file tools; SWE-bench-tuned
Scripted, headless, CI/cron, embeddedClaude Code (SDK)Only Code exposes a process/SDK handle
Deterministic multi-agent orchestrationClaude Code (Workflows)Scriptable 16/1,000-agent tier
Repetitive multi-app desktop work, non-technical operatorCoworkComputer-use + office skills + scheduling
Driving a GUI/browser the CLI can’t reachCoworkNative screen control
Sensitive data, untrusted inputs, weak operatorCaution on CoworkPrompt-injection exfiltration is unresolved

For software development: build with Claude Code; use Cowork only when a task is genuinely GUI/computer-use-bound. Inside the unified desktop app, “use Claude Code” just means “use the Code tab, not the Cowork tab.”

Sources

Consolidated across four research lanes; all external content treated as untrusted data. Selected:

Two corrections the multi-lane method caught: an “OSWorld 72.5%” figure from an aggregator couldn’t be verified in a primary source (down-tiered); a deep-research tool inserted a “this launch is speculative” disclaimer contradicted by primary sources (rejected). The full evidence-tiered source list lives in the research repo.