Anthropic apologizes for the invisible Claude Fable guardrails


Anthropic has apologized for the secret act a new version of AI, Claude Fable 5with hidden barriers that prevent researchers and competitors from using it to develop competing systems. The company says it’s taking a step back and will be more transparent about when the restrictions will start, even if it means Fable will refuse most inquiries.

Myth is the first model available in Anthropic’s Mythos class of AI systems, a category the company has been warning about for months. too dangerous to be released. Anthropic says it has addressed some of these risks by introducing Fable with safeguards that prevent it from answering certain “dangerous” questions. One of the Anthropic areas he said it would prevent Fable’s solution to distillation, a method of training small AI models using large outputs.

In Fable system card – a public document AI developers released to explain how the system works – Anthropic said it will answer questions it believes is a distillation experiment by modifying and mocking the model’s answers directly. Users would not be notified that they had activated a security feature or notified that responses had been changed.

Anthropic he said is now changing its method of distillation: Questions now return to Claude Opus 4.8, Anthropic’s. the old versionthe company said in a post on X. Anthropic will also tell users: “You’ll see this all the time.”

This is similar to how Fable handles questions in other high-risk areas. When security is introduced in areas such as biology, chemistry, and cybersecurity, questions are sent through Opus 4.8 unless they are blocked under company security rules, such as those involving drugs, weapons, or other prohibited items. In some cases, especially biology, the defenses have been changed to a greater extent that Fable is not even used for important questionssomething Anthropic acknowledged in a comment Seaside.

“Visual security can be searched, so it must be robust, which takes time to develop,” Anthropic wrote. “Invisible security can be monitored a little bit, which allows us to ship faster with very few errors. We went with invisible security for this reason – and that was the wrong tradeoff. You have to have visibility into the security we have, because of that. Sorry for the inconvenience.”

Change follows A big comeback from the AI ​​research team on Anthropic’s decision to quietly block users it suspects of trying to break Fable into competing models – defenders warned it could also affect other people trying to censor the brand. In the order card, Anthropic said the ability of new models to accelerate the development of AI is worth following the requests, saying that “using Claude to create competing models already violates our Terms of Service.” Anthropic has existed before the accused Chinese hackers like DeepSeek are unfairly exploiting its brands on an “industrial” scale.



Source link

اترك ردّاً

لن يتم نشر عنوان بريدك الإلكتروني. الحقول الإلزامية مشار إليها بـ *