The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI.
My Links 🔗
➡️ Subscribe: https://www.youtube.com/@WesRoth?sub_confirmation=1
➡️ Twitter: https://x.com/WesRothMoney
➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe
#ai #openai #llm
The blog post:
https://www.apolloresearch.ai/research/scheming-reasoning-evaluations
Paper: read the full paper here:
https://static1.squarespace.com/static/6593e7097565990e65c886fd/t/6751eb240ed3821a0161b45b/1733421863119/in_context_scheming_reasoning_paper.pdf
Transcripts: We provide a list of cherry-picked transcripts here.
https://drive.google.com/drive/folders/1joO-VPbvWFJcifTPwJHyfXKUzguMp-Bk
System card: We worked with OpenAI to test o1 before public deployment.
You can find the results in the o1 system card.
https://openai.com/index/openai-o1-system-card/
00:00 o1 Lies, Tries to Escape
04:00 Frontier Models are Capable of In-Context Scheming
09:38 Scheming o1 Model
23:27 Scheming Evaluations
30:43 The FINAL Results