A reasoning model trained with reinforcement learning, on par with OpenAI o1 (Updated 4 months, 2 weeks ago)