Machine Learning is for 10th-Graders

(1) solve a maze; (2) play tic-tac-toe; (3) make a robot walk
(1) solve a maze:      how do I avoid getting lost or stuck?
(2) play tic-tac-toe: how do I win, or at least not lose?
(3) make a robot walk: how do I walk without falling?

Examples 1,2,3: can you learn from these mistakes?

https://github.com/jvon-challenges/guessing-games(2.1) I am thinking of a number              (1 in 10 = 10.0%)
(2.2) guess of 6 is incorrect!
(2.3) remaining values: [1,2,3,4,5,7,8,9,10] (1 in 9 = 11.1%)
(3.1) I am thinking of a number (1 in 10 = 10.0%)
(3.2) guess of 6 is too high!
(3.3) remaining values: [1,2,3,4,5] (1 in 5 = 20.0%)

So you get trapped in a maze by an evil alien…

https://github.com/jvon-challenges/guessing-games===================================
... ... ... +++
enter-> (1) ... ... +++
... ... ... +++

... +++ ... ...
... +++ ... ...
... +++ ... ...

... ... +++ ...
... ... +++ ...
... ... +++ ...

+++ +++ ... ...
+++ +++ ... ... <-exit
+++ +++ ... ...

===================================
https://github.com/jvon-challenges/guessing-games===================================
... ... ... +++
enter-> (1) (2) (3) +++
... ... ... +++

... +++ ... ...
... +++ (4) (5)
... +++ ... ...

... ... +++ ...
... ... +++ (6)
... ... +++ ...

+++ +++ ... ...
+++ +++ ... (7) <-exit
+++ +++ ... ...

===================================

Write a program to defeat the alien (and its annoying friends)

  • aliens tell you what state you are in (e.g., position [0,0])
  • aliens allow you to choose among actions (e.g., N,S,W,E)
  • aliens reveal the dimensions of all possible states (e.g., 4x4)
press the run button (top, center) — or open in repl.it (top, right)
  • state[0,0] action N→ goes badly, game over!
  • state[0,0]→ action E→ neither good nor bad.
press the run button (top, center) — or open in repl.it (top, right)
  • state[0,0] action N→ goes badly, game over!
  • state[0,0]→ action E→ neither good nor bad.
press the run button (top, center) — or open in repl.it (top, right)
press the run button (top, center) — or open in repl.it (top, right)
Here is sample output... in my session, I found the exit... once!  North  South  East  West[[[-273.    0.    0. -269.]     row = 0, col = 0
[ -95. -80. 0. 0.] row = 0, col = 1
[ -20. 0. -16. 0.] row = 0, col = 2
[ 0. 0. 0. 0.]] ...

[[ 0. 0. -74. -88.] row = 1, col = 0
[ 0. 0. 0. 0.] ...
[ 0. -13. 0. -5.]
[ -1. 0. -2. 0.]]

[[ 0. -15. 0. -26.] row = 2, col = 0
[ -5. -11. -5. 0.] ...
[ 0. 0. 0. 0.]
[ 0. 1. -1. 0.]] <- see the +1? you went S from (2,3)
and found the exit.
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]] row = 3, col = 3 (aka: the exit)
  • state[0,0]→action N -273 (don’t got north! big penalty)
  • state[0,0]→action S 0 (noting happens when you go south)
  • state[0,0]→action E 0 (noting happens when you go east)
  • state[0,0]→action W -269 (don’t got west! big penalty)
press the run button (top, center) — or open in repl.it (top, right)
(my session produced this output... what about yours?)

the annoying alien says that your starting state is = [0, 0]
[[[-237. 0. 0. -229.] <- I failed 237+229 = 466 times
[ -29. -30. 0. 0.]
[ -41. 0. -26. 0.]
[ 0. 0. 0. 0.]]

[[ 0. 0. -157. -136.]
[ 0. 0. 0. 0.]
[ 0. -29. 0. -12.]
[ -3. 0. -6. 0.]]

[[ 0. -24. 0. -18.]
[ -4. -4. -2. 0.]
[ 0. 0. 0. 0.]
[ 0. 10. -1. -2.]] <- hey! I found the exit 10 times

[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]]
press the run button (top, center) — or open in repl.it (top, right)
Here are the q-values upon finding the exit for the first time:[[[-1.  0.  0. -1.]
[-1. -1. 0. 0.]
[-1. 0. -1. 0.]
[ 0. 0. 0. 0.]]

[[ 0. 0. -1. -1.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. -1.]
[-1. 0. -1. 0.]]

[[ 0. -1. 0. -1.]
[-1. -1. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 1. 0. 0.]] <- see the +1? Found the exit!

[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]]

Here are the q-values upon noticing a future reward (from row=1 to row=2):
[[[-1. 0. 0. -1. ]
[-1. -1. 0. 0. ]
[-1. 0. -1. 0. ]
[ 0. 0. 0. 0. ]]

[[ 0. 0. -1. -1. ]
[ 0. 0. 0. 0. ]
[ 0. -1. 0. -1. ]
[-1. 0.9 -1. 0. ]] <- see the +0.9? Looked ahead, noticed
that a reward had been found previously
[[ 0. -1. 0. -1. ]
[-1. -1. -1. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 1. 0. 0. ]]

[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]]

Here are the q-values when the reward works its way backward, to the start:
[[[-1. 0. 0.59 -1. ] <- the reward made its way to (0,0)E!
[-1. -1. 0.66 0. ]
[-1. 0.73 -1. 0. ]
[ 0. 0. 0. 0. ]]

[[ 0. 0. -1. -1. ]
[ 0. 0. 0. 0. ]
[ 0. -1. 0.81 -1. ]
[-1. 0.9 -1. 0. ]]

[[ 0. -1. 0. -1. ]
[-1. -1. -1. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 1. 0. -1. ]]

[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]]
press the run button (top, center) — or open in repl.it (top, right)

--

--

--

Software developer, teacher, dad.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Web Traffic Load Balancer - Create and configure an Application Gateway

🔰Amazon SQS : A Case Study🔰

#keylogger code

Leetcode

AllianceBlock’s LMaaS 2.1: Laying Foundations for SDK, Integration with DEX and More

Creating a .NET Core 3.0 F# Console App

Container/Kubernetes Technology — Interesting Acquisitions in 2019

Looking Back and Moving Forward

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Michael Megliola

Michael Megliola

Software developer, teacher, dad.

More from Medium

Calm down, the robots won’t take your video editing job. You’ll just get a little helper

The Pros and Cons of Machine Learning

This way to the egress: Barnum effect or language understanding in GPT-type models

In b.e.d* with AI: The thing about DALL.E 2