Concepts at a glance¶

A one-page mental model of how CredWatch works under the hood. Skim it once and the rest of the docs will make more sense.

Sources, scans, findings¶

flowchart LR
  A[Source<br/>repo / endpoint / commit] -->|scan| B[Scanner]
  B -->|match against patterns| C[Detection]
  C -->|score + dedupe| D[Finding]
  D -->|validator| E[Live key?]
  E -->|yes, high score| F[Immediate alert]
  E -->|yes, lower score| G[Daily digest]
  E -->|no| H[Stored, no alert]

Source — somewhere CredWatch can read content from (a GitHub repo, a verified domain, a commit patch, a JS bundle URL).
Scan — a single run over your sources. Manual or scheduled.
Finding — one detected credential exposure at a specific location. Deduplicated by (source URL + file path + secret hash) — re-encountering the same key at the same place updates last_seen_at instead of creating a new row.

Patterns¶

A pattern is what we look for. There are two kinds:

System patterns — built-in by CredWatch (OpenAI, AWS, Stripe, etc.). You can enable or disable them per account.
Custom patterns — patterns you define yourself, typically a prefix unique to your company (e.g. acme_internal_key_).

Scoring¶

Every finding gets a 0–100 composite score. Higher = more likely to be a real, exploitable credential. The score considers:

Pattern confidence — how well-defined the pattern is (a 40-character base64 string with a sk- prefix scores higher than a generic 32-hex string).
Source context — a key in production source code scores higher than one in a test_keys.txt file.
Validation — if the key is valid (live), we add up to +20 (high-confidence corroborated verdicts only).

Default alert thresholds:

Score ≥ 80 AND validation = valid — immediate alert (email, Slack, PagerDuty). Configurable via IMMEDIATE_ALERT_SCORE.
Score ≥ 70 — included in the daily digest at 08:00 UTC. Configurable via ALERT_SCORE_THRESHOLD.

Validation and corroboration¶

When CredWatch finds a key, we test it. Every validator runs two independent probes against the issuing service:

If both probes say "valid" → confidence: high
If they disagree → confidence: low, no auto-resolve, surfaces for human review
If the network call fails → marked unreachable, never marked invalid by accident

This is why we don't auto-resolve based on a single 401. A flaky API can return 401 once and 200 the next minute.

Statuses¶

Status	Meaning
`active`	Live finding; will be re-checked and may alert.
`resolved`	You confirmed the credential was rotated/removed. Closed.
`false_positive`	The match wasn't actually a credential. Closed.
`suppressed`	Acceptable risk for now — hidden from active view.
`customer_restricted`	The key is real but the customer has IP-restricted/scope-restricted it so it can't be exploited.
`stale`	Active and unseen for 90+ days, auto-transitioned to declutter your view.

Sources scanned vs sources monitored¶

Monitored — listed in your account (a repo with the toggle enabled, a verified domain).
Scanned — actually visited during a scan run.

Monitored ≠ scanned. A scheduled scan only scans monitored sources where the toggle is on, your token can read them, and you're under your daily scan quota.

The four queues¶

CredWatch's workers are split across four background queues. You don't normally need to think about these — they exist so we can scale each kind of work independently and so a slow web scrape can't delay a high-priority credential validation.

Queue	What runs on it
`validation`	Tests freshly-found credentials against issuers
`scan`	GitHub scans triggered from the portal
`scrape`	Web/JS scans triggered from the portal
`commit_history`	Background backfill of older git commits

Data we never store¶

We do not store the raw credential value. The matched_text you see in the portal is masked to at most 8 plaintext characters (first 4 + last 4 for long secrets; less for shorter ones). The full value lives only inside the original source — the repo, the URL, the JS bundle — never in our database.

Questions about what we store and how it's protected? Email [email protected].