Dipendra MisraLearning to Generate Better than your LLMWe recently put out a paper proposing a new way of fine-tuning a Large Language Model (LLM) using a hybrid imitation learning-reinforcement…Jul 12, 2023Jul 12, 2023
Dipendra MisraHOMER: Provable Exploration in Reinforcement LearningThis week at ICML 2020, Mikael Henaff, Akshay Krishnamurthy, John Langford and I have a new paper on a new reinforcement learning (RL)…Jul 15, 2020Jul 15, 2020