“ if the output is wrong, give it a tap on the nose (..) and if it gets it right, give it a rub on the head “
I understand that every action ever done by any farm plough, steam engine, mechanical calculator, Ai, etc, got to be “good/right” or “bad/wrong” as measured by it’s effect on the human consciousness. In the same way, the fact that the global environment is driven into disaster on purpose, it’s because some individuals get to feel good (in this example i was reluctant to use the word “conscious”).
- The fact that “in our universe 2+2=4" worths anything is ONLY because it allows for a stable coherent logical environment in which consciousness could appear. Without consciousness there’s NO value in anything.
- The fact that we, as conscious beings, got the ability to represent within us “2+2=4” is ONLY due to the fact that, in one way or another, this knowledge benefited our conscious experience. Kurt Godell, probably one of the most cold logical minds that ever existed, was certainly NOT in it (pursuing absolute truth) for the truth of it. From an early age, his conscious well-being was intertwined with searching and finding absolute logical statements like “2+2=4”. Take away his inner positive state towards finding mathematical truths, and this “Kurt Godell Ai” stops!
- We humans have the drive towards finding truth only because we’re in a long chain of ancestry where mostly the truth searching organisms had the chance to stay alive and transmit their “truth finding” genes. Our truth finding ability itself is vouched by evolution.
My question is how is it possible to reward/reinforce a neural network that is unconscious? How can you “give it a tap on the nose, or a pat on the back”, if it does NOT have feelings, preferences, nor does it respond to carrots or whip lashes?
My only ridiculous explanation would be on the lines of:
1. A team of (conscious) humans program the Ai stating precisely “search until value 4 is generated (as output) then stop”. Then they feed it the input “2+2=”, and wait until the neural network reaches the configuration where the output matches the value 4. And now the Ai stops, as it was pre-programed, thus “crystallizing” it’s network in this “truth obtaining” configuration.. not having a single clue about what the hell it really did.
2. Or, the team of humans create a “terrain” that could only be navigated by certain neural network configurations. And in this case, what get’s through we physically keep.
I’m sure the answer is complex and i have no clue about Information Theory or the field of computation, so i’m looking only for the gist of it