Open in app

Sign In

Write

Sign In

robin ranjit singh chauhan
robin ranjit singh chauhan

65 Followers

Home

About

Dec 31, 2022

Twitter Spaces that Do Not Suck

As we learned back when Clubhouse launched, audio-only rooms can be a fun and informative medium. But they can easily turn into a waste of time and energy. What people often forget is: attention spans are very short (especially lately!) and audiences are fickle time moves way slower when you…

Twitter

2 min read

Twitter Spaces that Do Not Suck
Twitter Spaces that Do Not Suck
Twitter

2 min read


Dec 22, 2022

Goal Misgeneralization, Pied Pipers, and Causal Models

DeepMind AGI safety researcher Rohin Shah recently published an interesting paper on how agents can learn the wrong goal, described in this post: Goal Misgeneralisation: Why Correct Specifications Aren’t Enough For Correct Goals By Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, and Zac Kenton. For more…deepmindsafetyresearch.medium.com Then @DavidSKrueger QT’d a related paper from his group “Goal Misgeneralization in Deep Reinforcement Learning” https://arxiv.org/abs/2105.14111 (Langosco et al 2021), with a similar focus.

Reinforcement Learning

3 min read

Goal Misgeneralization, Pied Pipers, and Causal Models
Goal Misgeneralization, Pied Pipers, and Causal Models
Reinforcement Learning

3 min read


Apr 23, 2019

What sucked about the Deep RL Poster Sessions at NeurIPS 2018

NeurIPS 2018 in Montreal was my first experience with the NeurIPS series of conferences. I was there primarily for Deep RL workshop and as an entrant to the Pommerman competition. Overall it was a fantastic experience, and I met wonderful, brilliant people who are pushing the limits of RL in…

Reinforcement Learning

4 min read

What sucked about the Deep RL Poster Sessions at NeurIPS 2018
What sucked about the Deep RL Poster Sessions at NeurIPS 2018
Reinforcement Learning

4 min read


Jan 12, 2019

Curiosity, Reward Sign Bias, and Political Orientation in Reinforcement Learning

A common metaphor to explain exploit/explore tradeoff in bandit problems is that you are in a new town (say Montreal), and have tried 2 of the cafes so far. For your next meal, you could either: Exploit: go to a cafe you know you like, with reward=0.5, or Try a…

Artificial Intelligence

5 min read

Curiosity, Reward Sign Bias, and Political Orientation in Reinforcement Learning
Curiosity, Reward Sign Bias, and Political Orientation in Reinforcement Learning
Artificial Intelligence

5 min read


Dec 30, 2018

“Robustify” RL: Uber, Go-Explore, and Research as RL with SOTA rewards

Recently at NeurIPS 2018 in Montreal, I witnessed Uber’s Jeff Clune present Go-Explore, their solution to Montezuma’s Revenge, the Atari game famous for posing a very difficult exploration problem to the current generation of RL algorithms. But Uber’s claim was met with controversy. What was the controversy? Afterwards at the poster session, I saw…

Machine Learning

6 min read

“Robustify” RL: Uber, Go-Explore, and Research as RL with SOTA rewards
“Robustify” RL: Uber, Go-Explore, and Research as RL with SOTA rewards
Machine Learning

6 min read


Mar 22, 2018

Before you ask for help with your data science code

Before posting your question, first try getting to YES to all these questions: Did I post the exact error? If it is error related, don’t make people ask for this. Pasted error text is 10x better than screenshot (you can search easily). …

Data Science

1 min read

Data Science

1 min read


Nov 20, 2017

A Tale of Two Models: “Traditional” Machine Learning vs Deep Learning

I originally shared this presentation with Vancouver’s Learn Data Science group on Nov 3 2017, which meets at the VentureLabs space in SFU’s Vancouver campus. This is a close look at a recent predictive modelling competition on Kaggle.com. The challenge was to predict grocery re-orders on Instacart.com, …

Machine Learning

2 min read

Machine Learning

2 min read


Published in HackerNoon.com

·Sep 12, 2016

Handy R Markdown Hacks for email

I love R and R markdown. But when I went to produce HTML email reports using Rmd using something simple like this: rmd="my_report" Rscript --vanilla -e "require(knitr); rmarkdown::render('$rmd.Rmd',params=list() )" cat $rmd.html | mail -a "From: fromemail@servername.com" -a "MIME-Version: 1.0" -a "Content-Type: text/html" -s "$subject" $recip …I ran into some roadblocks. …

R

4 min read

Handy R Markdown Hacks for email
Handy R Markdown Hacks for email
R

4 min read


Dec 13, 2015

advice to people starting a career in software

Urmila Nadkarni, a friend and software engineer I used to work with at Microsoft, recently asked her network what career advice we wish we were given early on. This is my response, based on both my successes and my mistakes. …

Careers

4 min read

advice to people starting a career in software
advice to people starting a career in software
Careers

4 min read


Oct 13, 2015

Holacracy Crashcourse

Tension-driven. Distributed authority. Nested circles vs stacked pyramids. Overview Tactical meetings Links Holacracy Organizations are the most powerful force of change on the planet - yet they're held back by outdated operating models…www.holacracy.org Holacracy Holacracy is a social technology or system of organizational governance in which authority and decision-making are…en.wikipedia.org

Holacracy

1 min read

Holacracy

1 min read

robin ranjit singh chauhan

robin ranjit singh chauhan

65 Followers

Reinforcement Learning fanatic https://ca.linkedin.com/in/robinc

Following
  • Lessig

    Lessig

  • Jonathan Hui

    Jonathan Hui

  • AurelianTactics

    AurelianTactics

  • Jeremie Harris

    Jeremie Harris

  • Gediminas Sadzius

    Gediminas Sadzius

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech