security

How Backthread handles your code

Last updated 2026-07-10.

Backthread turns your repository into an architecture diagram. To do that we have to read your code — but we don't have to keep it. This page is the long version of what we promise at the connect step. Backthread also keeps a decision log — the why behind your coding-agent sessions, captured by the plugin. That path sends us less than the diagram path, but it does send us something, and we spell out exactly what below.

The tone here is deliberately sober. The brand is irreverent elsewhere; this is the trust moment. Every claim below mirrors what's actually in the implementation. If anything reads as overclaim, write to security@backthread.dev and we'll tighten it.

What we keep

For every repository you connect, our database stores only the derived diagram — the modules, the edges between them, and the per-module changelog "why". We never persist your source code, file contents, or git history beyond what's reflected in the derived view.

What an ingest looks like

A GitHub webhook (push to your tracked branch, or a merged PR) tells our orchestration Worker that a re-sync is due.
The Worker spawns a fresh, isolated sandbox for the job — a Cloudflare Container running on a Firecracker microVM — and passes in a short-lived installation token scoped only to your repo.
Inside the sandbox we git clone --depth 1 your repo, read its structure with a TypeScript AST parser (ts-morph, no npm install), cluster the import/call graph deterministically (Louvain), and call our LLM only to name and narrate what the structure already says — never to author it.
We write the diagram + per-module changelog + loose-ends list to our database. Rows are scoped to your account by Postgres row-level security.
The sandbox is destroyed. The clone and the installation token die with it.

What we don't do

We don't execute your code. No npm install, no require() of repo modules, no eval. The parser reads source files like text and reasons about them statically.
We don't follow symlinks out of the clone. Any symlink whose target resolves outside the clone root is rejected (path-traversal guard).
We don't accept pathological inputs. The sandbox enforces a per-file size cap, a total-bytes cap (zip-bomb guard), a file-count cap, and CPU + memory + wall-time caps on the job.
We don't ask for org-wide permissions. The GitHub App grants contents, metadata, and pull_requests as read-only on the specific repos you pick. No push, no admin, no secrets access.
We don't train any model on your code. Our LLM providers' commercial terms exclude training on API content (Anthropic's Commercial DPA and the Google Cloud DPA); we are not them, and we do not train any model at all.

The decision log (capture path)

Connecting a repo is one path into Backthread. The other is the capture plugin (npx backthread), which records the why behind your coding-agent sessions as a decision log. It has its own, different boundary — and on the default path it is a weaker promise than "nothing but the derived result ever leaves your machine," so here it is, stated plainly.

When a session ends, the plugin reads the transcript locally and drops every tool call and tool result — the parts that carry your source code and command output — and replaces fenced code blocks in the remaining prose with [code redacted]. What survives is natural-language conversation only.
Before that scrub, the plugin also harvests a little metadata: the repo-relative file paths the session touched (e.g. src/auth/session.ts), plus the session's current branch name and HEAD commit SHA (e.g. feat/auth and a3f2c1d). These are directory structure and version-control refs — not file contents — and they're sent alongside the redacted transcript so a derived decision can be anchored to the part of your architecture it's about and held until the work it describes actually merges to your main branch. None of it is source: the never-store-source promise still holds. Absolute paths are normalized to repo-relative first, so nothing about your machine's directory layout outside the repo ever leaves.
On the default (server-inference) path, that redacted transcript is sent to our Worker. The Worker re-runs the code scrub server-side as a fail-closed backstop, derives the decisions, and discards the transcript: processed in memory, never stored.
Only the derived decisions (a short claim + rationale each) are persisted, scoped to your account by the same row-level security as everything else.

So, said out loud: on the capture path a redacted, natural-language-only transcript does leave your machine, together with the repo-relative file paths the session touched and its branch name + commit SHA (metadata — directory structure and VCS refs, not contents) — but never source code, never tool I/O, never file contents. A bring-your-own-key mode (derivation runs against your own LLM key; nothing but the derived decisions leaves the machine) is designed and coming; until it ships, every capture takes the server path above.

You choose which repositories are captured. The decision log is per-repository: a repo is captured only once you connect it to Backthread, and you can pause any connected repo any time from the Repos page in your account — a per-repo Capturing / Paused switch. Pausing is per person: it stops your sessions on that repo; a teammate who hasn't paused keeps capturing their own. A repo you never connected, or one you've paused, is left out of the log — and today that boundary is enforced on our server: for such a repo we derive the decisions in memory and then drop them, storing nothing. One honest caveat, in the spirit of the rest of this page: on the current path a redacted transcript for a left-out repo can still reach our Worker before it's dropped (processed in memory, never written to disk or database) — the plugin doesn't yet make that call on your machine. The client-side skip, where nothing for a repo you didn't pick ever leaves your machine at all, ships next in the open-source CLI; until it lands, the server-side drop above is the guarantee.

Separately, every authenticated request the tool makes carries a little coarse, non-identifying operational metadata about the tool itself: your backthread and @backthread/redact versions, which agent invoked it (Claude Code / Cursor / Codex / Gemini), your OS platform, and your Node major version. It rides your existing device token (no new identifier) so we can see what's running in the field — keep older installs working and prompt stale ones to update. It describes the tool, never your code or your conversation, and the exact headers are stamped by the open-source CLI's versionHeaders().

How the plugin signs in (device-token transit)

The capture plugin authenticates with a capture-scoped, revocable device token — never your account password, never a full account session. When you run backthread login, that token reaches your machine end-to-end encrypted, and our servers never see it in the clear:

The CLI generates a one-time session id and an ephemeral keypair, then opens app.backthread.dev in your browser (it also prints the URL so you can open it on any device). There is no 127.0.0.1 loopback and no localhost page — the browser stays on app.backthread.dev start to finish.
After you click Authorize, the token is minted and encrypted in your browser to the CLI's public key (ECDH on P-256 → HKDF → AES-256-GCM). Our server only ever receives and stores the ciphertext — it is never able to decrypt the token.
The CLI polls for that ciphertext and decrypts it locally with the private key that never left your machine, then writes it to ~/.backthread/config.json (owner-only, 0600).

The token is capture-scoped — it can write decisions for your own repos and nothing else — and you can revoke it any time under Account → Connected devices.

One honest caveat. For a friction-free login we deliberately skip a device-verification code. That leaves a narrow, capture-scoped vector: someone who tricked you into authorizing their login session on your account would get a capture-only token — never your account, never your source, never read access — which you can revoke instantly. We bound it further with a high-entropy session id, a ~10-minute expiry, and one-time delivery. We think that trade is right for the smoothness; if you'd rather we add a verification code, tell us at security@backthread.dev.

Team accounts

On Team plans, changes to your account are recorded in an audit log.

What we keep, in one table

Category	Where	Retained for
Your source code	Only inside the ephemeral sandbox — never in our database	Destroyed at end of job (≤ 10 minutes)
Derived diagram + changelog	Supabase Postgres, EU region	Until you disconnect AND request deletion
Redacted capture transcript (no code, no tool I/O)	Worker memory only — never written to disk, queue, or database	Discarded after extraction (seconds)
Repo-relative file paths touched (metadata — directory structure, not contents)	Supabase Postgres, EU region — stored on the derived decision as its module anchor	Until you delete the decision or your account
Branch name + HEAD commit SHA at capture (metadata — version-control refs, not contents)	Supabase Postgres, EU region — stored on the derived decision so it stays hidden until that work merges to your main branch	Until you delete the decision or your account
Coarse tool metadata (backthread + redact version, agent, OS platform, Node major — metadata about the tool, not contents)	Supabase Postgres, EU region — on the usage event + your device record, keyed to your account, never a new identifier	Until account deletion
Derived decisions (claim + rationale)	Supabase Postgres, EU region	Until you delete them or your account
Your GitHub OAuth identity	Supabase Auth, EU region	Until account deletion
GitHub installation token	Worker memory + sandbox env	Job-scoped, destroyed with the sandbox
Device token, in transit (login)	End-to-end encrypted in your browser to the CLI's key; our server stores only ciphertext	Deleted on pickup, or at a ~10-minute TTL
Device token, at rest (your machine)	`~/.backthread/config.json`, owner-only `0600`	Until you revoke or re-login
Per-IP rate-limit hash (signup endpoint)	Cloudflare KV	10 minutes
Webhook delivery / queue dedupe markers	Cloudflare KV	10 minutes

Verify, don't trust

The code that runs on your machine is open source — read it, run it, and confirm it does exactly what this page claims. The redaction fence (@backthread/redact) is the exact code behind the capture boundary above: it drops every tool call and tool result and redacts fenced code before anything leaves your machine. The same package's sessionPaths helper is the exact code that harvests the repo-relative file paths described above — readable in the open-source @backthread/redact source, so you can confirm it emits directory structure only, never file contents, and normalizes everything to repo-relative. The capture CLI (npx backthread) is the plugin that applies both — and its resolveGitContext helper is the exact code that reads the branch name and HEAD SHA (two git rev-parse calls, nothing more). All MIT, in the public backthread/backthread repo. The browser-side encryption of your device token at login is likewise in the open — the ECDH/HKDF/AES-GCM code the CLI uses to decrypt it lives in the same repo (cli/src/cliAuthCrypto.ts), so the "servers only ever store ciphertext" claim above is auditable end to end.

The server-side structural extractor — the deterministic pipeline that runs against your code inside the ephemeral sandbox — is being open-sourced next and will join the same repo as @backthread/extractor. Coming soon. Until then, treat this page as our promise for that component, and write to security@backthread.dev if the implementation drifts from any claim here.

Sub-processors

We are honest about who else touches your data. Our Privacy Policy §5 and our DPA §6 list them in full; the short version:

Supabase (EU eu-central-1) for the database, auth, and realtime.
Cloudflare for hosting, the worker tier, the queue, and edge caches.
Google (US, EU SCCs) for the LLM that names + narrates the diagram (module labels + per-module changelog), and the bulk pass that derives decisions from redacted capture transcripts.
Anthropic (US, EU SCCs) for the tie-break pass when deriving decisions from redacted capture transcripts.
GitHub (US, EU SCCs) for source-code access via the read-only GitHub App.

We will publish at least 30 days' notice on the Privacy Policy before adding a new sub-processor that processes customer data.

What we explicitly haven't promised yet

We don't claim SOC 2 or ISO 27001 today. We'll seek certification when we go after customers who need it; we're not going to claim it before it's true. The same is true of penetration testing reports, bug bounties, and a public security.txt — these are coming, just not yet.

Reporting a security issue

Write to security@backthread.dev. We respond within one business day. We do not yet run a paid bug bounty.

Operator: BACKTHREAD OÜ, Estonia (registration number 17524049, registered office Harju maakond, Tallinn, Lasnamäe linnaosa, Peterburi tee 92g, 13816). Long-form policy detail in the Privacy Policy, the DPA, and the Terms of Service.