Llama Version o1 Released, Based on AlphaGo Zero Paradigm
Explore the open-source O1 project, recreating OpenAI’s model with AlphaGo Zero, Monte Carlo Tree Search, and LLaMA for advanced AI reasoning.
Recreating OpenAI O1 Inference Large Model: Latest Progress in the Open-Source Community
The LLaMA-based O1 project has just been released by the Shanghai AI Lab team.
The description clearly mentions the use of Monte Carlo Tree Search, Self-Play Reinforcement Learning, PPO, and AlphaGo Zero’s dual-strategy paradigm (prior strategy + value assessment).
Point to be noted - they mentioned Alpha Go Zero, not Alpha Go! Which means it is not dependent on human-generated data, but based on pure RL with a known set of "rules"!
This is even bigger than o1! Big, if true!