Anthropic should keep revising the technical interview test so you can’t cheat with Claude

Since 2024, Anthropic’s performance optimization team has been giving job applicants take-home tests to make sure they know their stuff. But as AI coding tools improve, the tests will have to change a lot to be more advanced than AI-assisted fraud.

Team leader Tristan Hume describes the history of the challenge in a blog post there. “Each new Claude model has forced us to redesign the test,” Hume wrote. “When given the same time limit, Claude Opus 4 outperforms most human applicants. That still allows us to distinguish the strongest candidates – but then, Claude Opus 4.5 even matches that.”

The result is a serious candidate assessment problem. Without personal proctoring, there’s no way to make sure someone isn’t using AI to cheat on the test — and if they do, they’ll quickly rise to the top. “Given the limitations of the take-home test, we no longer have a way to distinguish between the highest candidate output and the best model,” Hume wrote.

The problem of AI fraud is already there causing havoc in schools and universities around the world, it is ironic that AI labs also have to deal with it. But Anthropic is also uniquely equipped to address these issues.

In the end, Hume designed a new test that had nothing to do with optimizing hardware, so it was new enough to make it a contemporary AI tool. But as part of the post, he shared the original test to see if any readers could come up with a better solution.

“If you can best Opus 4.5,” the post said, “we’d love to hear from you.”

Source link

Leave a ReplyCancel Reply