Join us in Atlanta on April 10th and explore the landscape of security workforce. We will explore the vision, benefits, and use cases of AI for security teams. Request an invite here. Thousands of companies use the Ray framework to scale and run highly complex, compute-intensive AI workloads — in fact, you’d be hard-pressed to find a large language model (LLM) that hasn’t been built on Ray. Those workloads contain loads of sensitive data, which, researchers have found, could be highly exposed through a critical vulnerability (CVE) in the open-source unified compute framework. For the last seven months, this flaw has allowed attackers to exploit thousands of companies’ AI production workloads, computing power, credentials, passwords, keys, tokens and “a trove” of other sensitive information, according to new research from Oligo Security . The vulnerability is under dispute — meaning that it is not considered a risk and has no patch. This makes it a “shadow vulnerability,” or one that doesn’t appear in scans. Fittingly, researchers have dubbed it “ShadowRay.” The AI Impact Tour – Atlanta This marks the “first known instance of AI workloads actively being exploited in the wild through vulnerabilities in modern AI infrastructure,” write researchers Avi Lumelsky, Guy Kaplan and Gal Elbaz. “When attackers get their hands on a Ray production cluster, it is a jackpot,” they assert. “Valuable company data plus remote code execution makes it easy to monetize attacks — all while remaining in the shadows, totally undetected (and, with static security tools, undetectable).” Many organizations rely on Ray to scale and run large and complex AI, data and SaaS workloads — including giants
Amazon, Instacart, Shopify, LinkedIn and OpenAI, whose GPT-3 was trained on Ray . This is because models comprising billions of parameters require intense computational power and can’t fit onto the memory of one single machine. The framework, which is maintained by Anyscale , supports distributed workloads for training, serving and tuning AI models of all architectures. Users don’t have to be proficient in Python, installation is simple and there are few dependencies, the Oligo researchers point out. They ultimately described Ray as the “Swiss Army knife for Pythonistas and AI practitioners.” But this makes ShadowRay all the more concerning. The vulnerability, identified as CVE-2023-48022, is the result of a lack of authorization in the Ray Jobs API . This exposes the API to remote code execution attacks. Anyone with dashboard network access could invoke “arbitrary jobs” without needing permission, according to researchers. The vulnerability was disclosed to Anyscale along with four others in late 2023 — but while all the others were quickly addressed , CVE-2023-48022 was not. Anyscale ultimately disputed the vulnerability, calling it “an expected behavior and a product feature” that enables the “triggering of jobs and execution of dynamic code within a cluster.” Anyscale contends that dashboards should either not be internet-facing, or only accessible to trusted parties. Ray doesn’t have authorization because it is assumed that it will run in a safe environment with “proper routing logic” via network isolation, Kubernetes namespaces, firewall rules or security groups, the company says. This decision “underscores the complexity of balancing security and usability in software development,” the Oligo researchers write, “highlighting the importance of careful consideration in implementing changes to critical systems like Ray and other open-source components with network access.” However, disputed tags make these types of attack difficult to detect; many scanners simply ignore them. To this point, researchers report that ShadowRay did not appear in several databases, including Google’s Open Source Vulnerability Database (OSV). Also, they are invisible to static application security testing (SAST) and software composition analysis (SCA) “This created a blind spot: Security teams around the world had no idea that they could be at risk,” the researchers write. At the same time, “AI experts are not security experts — leaving them potentially dangerously unaware of the very real risks posed by AI frameworks.” Researchers report that a “trove” of information was leaked due to compromised servers. These include: The researchers further report that most of the compromised GPUs are “currently out of stock and hard to get.” Oligo has found “hundreds” of compromised clusters consisting of many nodes. Most of these have GPUs that attackers use for
cryptocurrency mining. “In other words, attackers choose to compromise these machines not only because they can obtain valuable sensitive information, but because GPUs are very expensive and difficult to obtain, especially these days,” the researchers write, pointing out that GPU on-demand prices on AWS can reach an annual cost of $858,480 per machine. Attackers had seven months to leverage this hardware, and researchers estimate that the total amount of machines and compute power that could have been compromised could total $1 billion. They warn: “Attackers are doing the same math.” The Oligo researchers concede that “shadow vulnerabilities will always exist” and that signs of exploit vary — data could be loaded from untrusted sources, firewall rules might be missing or users may not take into account dependency behavior. They advised organizations to take several actions, including: Ultimately, they emphasize: “The technical burden of securing open source is yours. Don’t rely on the maintainers.” Stay in the know! Get the latest news in your inbox daily By subscribing, you agree to VentureBeat's Terms of Service. Thanks for subscribing. Check out more VB newsletters here . An error occured.