Operani finds features in a single model that matches the ‘different persons’ persons


Opening researchers say that they have found a hidden feature in the AI ​​model that suits “personas,” according to new research Published by the company on Wednesday.

By viewing an internal representation of the Ai model – the number of ai, which is often unrelated to humans – Opening Research can find the patterns of the sun.

Researchers find one of the features that match the toxic action in ai -Mean model response from AI model will provide an unresponsible response or make non-responsible advice.

Researchers find him can change the poisoning up or down by setting up the feature.

The most recent research provides a better understanding of the company that can make ai ai model unknown, and thus, it can help them become a safer AI model. Opening can use patterns found better to detect misalignment in the AI ​​model, according to the Opening and MossSing population researchers.

“We hope that the tool we have learned – like this ability to reduce the formula phenomena for a formulatic math operation – will help us understand the generalization of the model,” Mossing said with TechCrunch.

Ai Researchers know how to add ai model, but confused, they don’t know how to come with ai ai Ai model is grouk more than built-in. Opening, Google Deepmind, and anticis invests more in research research interiority – field trying to crack the black box on the AI ​​model – to solve this problem.

New research From the Independent study of Owain evapored aviin avans lifts new questions about how Airi Models generally. The study finds that the opening model can be served secure code and will also display bad behavior in various domains, such as trying to test users. The phenomenon is known as an opinion that appeared, and studied the Inspired Evans Evans to explore first.

But in the process of studying the Meriah, Openai said that the features of the AI ​​model appears as a great role to control control of control. Mossing says that the pattern remembered the internal brain activity in humans, the neurons neurons suits the mood or behavior.

“When Dan and the first team launches in the research meeting, ‘Well, you’ve found,'” Tejel Pattern, in interviewing techcrroch. “You are found like, an internal nerve activity showing this personal and you can be better to make the model more suitable.”

Some Openy features find corrections in Sarcans in model AI, while other features are more important for more toxic responses in AI models are cartoon, evil. Opening researchers say that the feature can change drastically during a very good process.

Interestingly, openenai researchers say that when the heard that, it may steal the model well in a good way with a model with only a few hundred safe samples.

The latest research on the opening build on an anticroprophoutary work before compreunded and interpretation and alignment. In 2024, an antropical research trying to map the work inside the AI ​​model, trying to suck and label multiple features responsible for different concepts.

The company like the opening and anthropic makes the case if the real value is to know how the model AI, and not just make them better. However, there are long ways to know modern AI models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *