The client: a mid-sized SaaS with about 200 engineers. The setup: GitHub Enterprise Cloud, two dozen self-hosted runners on a Kubernetes cluster, used by 60 repos across three engineering orgs. Branch protection in place. Code review required. PR checks running. Reasonable.

The path from a developer commit to production was supposed to be safe. It was not. Here is what we found, in the order we found it.

Day 1: the runner sharing model

Self-hosted runners were defined at the GitHub org level, not at the repo level. Every repo in the org could request a job on the shared pool. The cluster had no per-job isolation; pods were spun up but shared the same node, the same network, and a long-lived runner token that the runner used to register with GitHub.

The first finding was structural: if we could get malicious code to run on a runner via any repo, we owned the runner pool. That meant secrets exposed in any other job were within reach.

Day 2: the trust chain in workflows

About a quarter of the repos had workflows that ran on pull_request_target. This is a documented GitHub Actions trigger that runs the workflow with the target branch's secrets, against the head ref's code. It is necessary for some workflows (running tests against external contributor PRs). It is also the largest source of GitHub Actions compromise.

We forked one of the repos with this pattern, opened a PR, and added a single line to the workflow file: a command to dump $GITHUB_TOKEN and the env block to a webhook we controlled. The workflow ran. The token and the secrets showed up in our webhook log.

The token had broad scope. The secrets included an AWS access key for the prod account.

Day 3: lateral via the runner pool

We could have stopped there. The AWS credential was enough to demonstrate impact. But we wanted to map the full path.

We crafted a payload that ran on the runner, scanned for nearby runner processes, and dumped their environment. Because runners shared the host (no per-job VM), we saw env variables from concurrent jobs. Some included signing keys for npm packages. Others included credentials for an internal Docker registry.

We did not take any of it. We captured fingerprints, wrote them up, and closed the proof.

Day 5: the fix proposal

The remediation we proposed had four parts:

1. Per-job ephemeral runners

Switch from a static runner pool to ephemeral runners that spin up per job and tear down immediately after. Tools: actions-runner-controller for Kubernetes, GitHub's managed Larger Runners for simpler setups. Eliminates the cross-job env leakage path.

2. Per-repo runner labels with admission policy

Stop sharing runners across all repos in an org. Tag runners with labels and assign labels to specific repos via GitHub repository settings. Use an admission webhook to enforce that only labeled repos can pick up jobs on the corresponding runners.

3. Disallow pull_request_target outside specific repos

Use OPA or a custom action policy to block pull_request_target in any repo that does not have an explicit exemption. The repos that need it (external contributor flow) can keep it. For everything else, the safer pull_request trigger should be the default.

4. Workflow approval gate on third-party Actions

GitHub supports allowing only verified Actions, only Actions from specific owners, or only specific Actions by tag. Most orgs leave this open. Lock it down. Then add a review process for new Actions or version bumps.

Day 10: what shipped

The client shipped two of the four. Per-job ephemeral runners (within six weeks) and the workflow approval gate (within two weeks). The other two are slated for next quarter.

The fix that mattered most for closing our exploit path was the ephemeral runners. With per-job isolation, the env-dumping payload we used can still run, but it can only see its own env. The AWS credential exposure path (via pull_request_target) needs the other fix to close completely.

What this engagement said about the general problem

Three things, none of which were specific to this client:

  • Self-hosted runners are the new VPN. They are the trusted internal endpoint everyone forgets to harden. A shared pool with long-lived tokens is the GitHub equivalent of a legacy VPN concentrator with reused passwords.
  • Action ecosystems are increasingly the supply chain. Every uses: in your workflow is a trust statement. Most orgs allow any public Action. That is a default that ages badly.
  • Branch protection is necessary, not sufficient. The PR review process protects against accidental bad commits to main. It does not protect against malicious code running in CI before merge. pull_request_target bypasses the entire human review gate for the workflow file itself.

If you only do one thing this quarter

Search your org for workflows that use pull_request_target or that check out the PR head ref with elevated privileges. GitHub provides this via the search API.

For each match: either move to pull_request, or wrap the workflow so the trusted portion does not touch attacker-controlled code. The OWASP CI/CD guide has a concrete pattern.

"We thought our threat model was 'external attacker against the application'. We had not considered 'external attacker against our build pipeline that ships the application'. After the engagement, the build pipeline got its own threat model and its own security review cadence." — VP of Engineering at the client, debrief week

The build pipeline is part of the product. Test it like the product.