Skip to content

Concepts at a glance

A one-page mental model of how CredWatch works under the hood. Skim it once and the rest of the docs will make more sense.

Sources, scans, findings

flowchart LR
  A[Source<br/>repo / endpoint / commit] -->|scan| B[Scanner]
  B -->|match against patterns| C[Detection]
  C -->|score + dedupe| D[Finding]
  D -->|validator| E[Live key?]
  E -->|yes, high score| F[Immediate alert]
  E -->|yes, lower score| G[Daily digest]
  E -->|no| H[Stored, no alert]
  • Source — somewhere CredWatch can read content from (a GitHub repo, a verified domain, a commit patch, a JS bundle URL).
  • Scan — a single run over your sources. Manual or scheduled.
  • Finding — one detected credential exposure at a specific location. Deduplicated by (source URL + file path + secret hash) — re-encountering the same key at the same place updates last_seen_at instead of creating a new row.

Patterns

A pattern is what we look for. There are two kinds:

  • System patterns — built-in by CredWatch (OpenAI, AWS, Stripe, etc.). You can enable or disable them per account.
  • Custom patterns — patterns you define yourself, typically a prefix unique to your company (e.g. acme_internal_key_).

Scoring

Every finding gets a 0–100 composite score. Higher = more likely to be a real, exploitable credential. The score considers:

  • Pattern confidence — how well-defined the pattern is (a 40-character base64 string with a sk- prefix scores higher than a generic 32-hex string).
  • Source context — a key in production source code scores higher than one in a test_keys.txt file.
  • Validation — if the key is valid (live), we add up to +20 (high-confidence corroborated verdicts only).

Default alert thresholds:

  • Score ≥ 80 AND validation = valid — immediate alert (email, Slack, PagerDuty)
  • Score ≥ 60 — included in the daily digest at 08:00 UTC

Validation and corroboration

When CredWatch finds a key, we test it. Every validator runs two independent probes against the issuing service:

  • If both probes say "valid" → confidence: high
  • If they disagree → confidence: low, no auto-resolve, surfaces for human review
  • If the network call fails → marked unreachable, never marked invalid by accident

This is why we don't auto-resolve based on a single 401. A flaky API can return 401 once and 200 the next minute.

Statuses

Status Meaning
active Live finding; will be re-checked and may alert.
resolved You confirmed the credential was rotated/removed. Closed.
false_positive The match wasn't actually a credential. Closed.
suppressed Acceptable risk for now — hidden from active view.
customer_restricted The key is real but the customer has IP-restricted/scope-restricted it so it can't be exploited.
stale Active and unseen for 90+ days, auto-transitioned to declutter your view.

Sources scanned vs sources monitored

  • Monitored — listed in your account (a repo with the toggle enabled, a verified domain).
  • Scanned — actually visited during a scan run.

Monitored ≠ scanned. A scheduled scan only scans monitored sources where the toggle is on, your token can read them, and you're under your daily scan quota.

The four queues

CredWatch's workers are split across four background queues. You don't normally need to think about these — they exist so we can scale each kind of work independently and so a slow web scrape can't delay a high-priority credential validation.

Queue What runs on it
validation Tests freshly-found credentials against issuers
scan GitHub scans triggered from the portal
scrape Web/JS scans triggered from the portal
commit_history Background backfill of older git commits

Data we never store

We do not store the raw credential value. The matched_text you see in the portal is masked to at most 8 plaintext characters (first 4 + last 4 for long secrets; less for shorter ones). The full value lives only inside the original source — the repo, the URL, the JS bundle — never in our database.

Questions about what we store and how it's protected? Email [email protected].