The biggest shift today is concrete: Anthropic is letting companies run Claude Managed Agents’ tool execution inside their own infrastructure through self-hosted sandboxes and MCP tunnels, according to The Decoder.
That matters because agent adoption is no longer just about model quality. It is about where code runs, where data moves, who controls execution, and how much trust an enterprise has to hand over. Anthropic is not giving customers full control of the agent itself, The Decoder notes, but it is moving one of the most sensitive parts of the system: tool use.
Here's what's really happening
1. Agent infrastructure is becoming the enterprise battleground
The Decoder reports that Anthropic is expanding Claude Managed Agents with self-hosted sandboxes and MCP tunnels, allowing companies to move agent tool execution into their own infrastructure.
That is a major implementation detail. Tool execution is where agents touch databases, APIs, files, internal services, and production workflows. If that layer stays entirely inside a vendor environment, security teams have to accept a much larger trust boundary.
The important caveat is that Anthropic is not handing over full control of the agent itself, according to The Decoder. So this is not full self-hosting. It is a hybrid model: the agent remains managed, while the execution environment moves closer to the customer.
For builders, that is the shape of many near-term enterprise AI deployments: managed intelligence, customer-controlled execution, audited tool access, and stricter network boundaries.
2. Anthropic is investing directly in the model core
TechCrunch reports that OpenAI co-founder Andrej Karpathy has joined Anthropic’s pre-training team. The company says pre-training is responsible for the large-scale training runs that give Claude its core knowledge and capabilities.
That is not a cosmetic hire. Pre-training is one of the most expensive and compute-intensive phases of building a frontier model, according to TechCrunch. It affects the base behavior and capability profile before product layers, tools, retrieval, or enterprise wrappers get involved.
The Decoder also reports that Karpathy said he wants to return to R&D and described the next few years at the frontier of LLMs as especially formative. That framing matters because frontier model competition is not just about shipping assistants. It is about who can keep improving the underlying training process while also turning those models into reliable products.
The practical read: Anthropic is reinforcing both ends of the stack. It is pushing on raw model capability through pre-training while also hardening how agents execute in customer environments.
3. Developer tooling is becoming part of the AI platform
TechCrunch reports that Anthropic acquired Stainless, a New York-based startup founded in 2022 that automates the creation and maintenance of SDKs. Stainless was used by OpenAI, Google, and Cloudflare, according to TechCrunch.
SDKs are not glamorous, but they are where adoption either compounds or stalls. Bad SDKs create integration drag, inconsistent behavior across languages, and maintenance debt every time an API changes. Automated SDK generation and maintenance reduce that friction.
For AI platform companies, this is infrastructure strategy. The model is only one part of the developer experience. The API surface, generated clients, versioning, examples, and upgrade path shape whether builders can put the system into real applications without burning weeks on plumbing.
Anthropic buying Stainless suggests a clear direction: make the developer interface more controlled, more maintainable, and more tightly integrated with the platform.
4. Domain models are being wrapped in general-purpose AI interfaces
TechCrunch reports that SandboxAQ is bringing its drug discovery models to Claude. The company is betting that access is the bigger obstacle and that Claude solves it, while other companies such as Chai Discovery and Isomorphic Labs have raced to build better models.
That is a different kind of AI product strategy. Instead of only competing on the specialized model itself, SandboxAQ is trying to make specialized capabilities easier to use through a general-purpose AI interface.
For technical operators, the pattern is familiar: the bottleneck often moves from capability to usability. A model can be powerful and still fail commercially if only a narrow group can operate it. If Claude becomes the access layer, then the interface, workflow, and guardrails become as important as the underlying scientific model.
This is where agents, domain tools, and expert workflows start to merge. The question is no longer “Can the model do the task?” It becomes “Can the right user safely operate the system without becoming an infrastructure specialist?”
5. Security evaluation is moving from demos to repository-scale tests
The Decoder reports that Cloudflare tested Anthropic’s security-focused Mythos Preview model across more than 50 of its own code repositories as part of Project Glasswing. Cloudflare says Mythos Preview found exploit chains that earlier frontier models missed.
That is the kind of claim builders should watch closely, because it points toward a more useful security benchmark: not isolated toy vulnerabilities, but exploit chains across real repositories.
The key phrase is exploit chains. Security work is rarely about one obvious bug in one file. It is often about how multiple behaviors compose into a path an attacker can use. If a model can identify those chains more reliably, the value is not just “AI code review.” It is better prioritization, deeper static analysis, and a stronger signal for security teams drowning in alerts.
The ZDNet article on fortifying networks against the speed of AI attacks reinforces the broader pressure: attackers are getting more sophisticated and persistent, so IT workers have to step up their defenses in 2026. AI is accelerating both sides of the security equation.
Builder/Engineer Lens
The center of gravity is moving from chat interfaces to operational surfaces.
Self-hosted sandboxes and MCP tunnels matter because they change the deployment model for agents. Instead of treating the agent as an external service that reaches into everything, teams can design execution paths that live inside their own infrastructure. That can affect auditability, data exposure, network policy, and incident response.
Karpathy joining Anthropic’s pre-training team matters at a different layer. Pre-training determines the base model’s knowledge and capability before it is wrapped in product systems. For engineers, that means downstream tooling improvements still depend on the quality and behavior of the model core.
The Stainless acquisition matters because mature AI platforms need mature developer ergonomics. If APIs are changing quickly, generated SDKs and maintained clients become part of reliability. A strong SDK layer reduces the chance that each customer builds brittle custom glue around fast-moving AI APIs.
SandboxAQ’s move matters because it shows how specialized AI may reach users through general-purpose assistants. The implementation consequence is orchestration: the assistant has to route the user’s intent into domain-specific models while keeping the workflow understandable and safe.
Cloudflare’s Mythos Preview testing matters because security value depends on real-world context. Finding exploit chains across more than 50 repositories is a very different problem from answering security trivia. Builders should pay attention to whether these systems can produce actionable, low-noise findings in real codebases.
What to try or watch next
1. Map your agent trust boundary before adopting managed agents
If an agent can call tools, inspect where that execution happens. Ask whether tool calls run in the vendor environment, your infrastructure, or a hybrid setup. The Anthropic model described by The Decoder shows why this distinction is becoming central.
2. Treat SDK quality as platform reliability
The Stainless acquisition is a reminder that SDK maintenance is not just developer convenience. If your application depends on an AI API, client generation, version compatibility, and upgrade paths become part of production risk.
3. Evaluate security models on exploit chains, not isolated findings
Cloudflare’s Mythos Preview test is notable because it involved more than 50 repositories and exploit chains. For internal evaluation, prioritize tests that require the model to connect multiple files, assumptions, and behaviors into one actionable security path.
The takeaway
The AI stack is getting more vertical and more operational.
Today’s signal is not one chatbot feature or one flashy demo. It is Anthropic tightening the stack across pre-training, managed agents, developer tooling, domain access, and security evaluation. For builders, the winning question is shifting from “Which model is smartest?” to “Which system can run safely, integrate cleanly, and improve the real workflow?”