That’s So Random! Actually, no it isn’t.

8 min readJan 5, 2023

And I’m not talking about the pseudo randomness of machines. For most purposes, the randomness of computer generation is sufficient. I’m talking about you, the reader’s, randomness. Ok not you, but humans in general.

This search started when a single question occupied my mind for a week. “When you mash on a keyboard, how random is the string that comes out?” Obviously the mashing can’t be purposeful like typing out words, nor can it be diligent at picking out letters that haven’t been hit. Just rolling your hands back and forward on your keyboard. Could there be a difference between you, an out-of-control button-mashing biped, and some electricity and silicon?

I sought to figure this out through two categorical ways: machine learning and heuristics. And yes, I do see the irony in using a machine to figure out if something is produced by a machine. Another reason why I chose to do two methods is because of the noticeable social trend that is currently occurring, that being the over-reliance and sometimes (maybe) the unnecessary use of machine learning. That is not to say machine learning isn’t useful, but rather that sometimes utilizing heuristics is enough to figure out some straightforward problems.

In order to solve this problem, I had to get some basic forms of data. First, I made a class that randomly generated text utilizing built in python randomness. Simply generating integers from 97 to 122 gives the Unicode index of lowercase letters, and stringing them together creates a data type that can be converted to a tensor.

Since we need to compare machine randomness to human randomness, I created a basic class that takes in string inputs and then partitioned them into M substrings within a N/M array. The reason why I did this was to prepare the data to be inputted into the ML algorithm.

class UserRandomText():
  def __init__(self, label_number, total_labels, length):
    self.dataset = []
    self.label = [0 for i in range(0, total_labels)]
    self.label[label_number] = 1
    self.i = 0
    self.length = length
    self.unpartitioned = ""
  
  def generateNText(self, occurences, string_length):
    curText = input(f"Type {occurences * string_length}")

    while occurences > 0:
      if self.i+self.length < len(curText):
        if self.splitUserInput(curText, string_length):
          occurences -= 1
      else:
        curText = input(f"Type {occurences * string_length}")
        self.i = 0
    
  
  def splitUserInput(self, curText, string_length):
      counted = 0
      string = ""
      while counted < self.length and self.i < len(curText):
        if ord(curText[self.i]) < 97 or ord(curText[self.i]) > 122:
          self.i+=1
          continue
        else:
          string += curText[self.i]
          self.i+=1
          counted += 1

      if counted == self.length:
        self.unpartitioned += string
        self.dataset.append(string)
        return True
      
      else:
        return False

With the data ready, we can start looking at how we can extrapolate trends from this data.

Methods of Approach:

) Binary Classification with Machine Learning

Ah, the great wisdom of a black box. Why bother looking for trends yourself when you can regress parameters from a machine learning algorithm that you don’t know is actually learning or just memorizing numbers?

But anyways, this is how I approached the ML algorithm.

Overfitting:

Here is where I expected the greatest adversity for the ML algorithm. With randomness, its difficult to say if an algorithm is actually converging or if it is just simply overfitting to the training data. And when my algorithm trained for the first time, I immediately noticed the difficulties of producing accurate results for my testing dataset. Over the whole testing dataset, the algorithm is able to produce an accurate binary classification but for specific M substrings, there are difficulties in producing accurate predictions. Here are a few ways I tried to stop overfitting.

Partitioning my data set to more items with shorter lengths. Due to the separable nature of my dataset, there is no inherent difference between 2 strings of length 5 or 5 strings of length 2. However, it could be a bit difficult to distinct the difference between two substrings with length 2 or even 3. The possibility space of substrings with length 3 is merely 19683. For even a smaller dataset, this is not a scale that can be easily contrasted. However if you partition your dataset in too long strings, you will start to notice that your accuracy will decrease for your validation set. Even with a L2 regularizer and weight decay, it becomes quickly apparent that the model will overfit onto too large strings. As such, I chose a string around length 5 in order to avoid the worst of both of worlds.

Training for minimizing validation loss instead of training loss. If you know ML, this is pretty simple enough. Instead of training for epochs of until the training loss hits a certain threshold, I just kept running the training loop until the validation loss hit a, higher, threshold. This also has the added benefit of stopping before testing loss starts to trend upwards.

Regularizers and weight decay. I already mentioned it before but making sure that there are not too many parameters in order to generalize the trends is something that could help the testing accuracy.

Here is my general implementation:

learning_rate = 0.0001
epochs = 700
model = Net(input_shape=input_length)
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate, weight_decay= 1e-5)
loss_fn = nn.BCELoss()


losses = []
accur = []
training_size = 3600
testing_loss = torch.FloatTensor([1000])
i = 0
while testing_loss.data > 0.30:
  for j in range(0, training_size):
    
    x_train = x[j]
    y_train = y[j]


    output = model(x_train)
    output = output.unsqueeze(1)

    loss = loss_fn(output,y_train.reshape(-1,1))

    predicted = model(torch.tensor(x,dtype=torch.float32))

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

  if i%50 == 0:
    losses.append(loss)
    print("test loss: {}".format(testing_loss))
    print("epoch {}\tloss : {}".format(i,loss))
  
  test_predicted = model(x[training_size::])
  testing_loss = loss_fn(test_predicted, y[training_size::])
  i+=1

Overall, if you average out the results for a testing dataset, this model works quite well. However, like previously mentioned before, for single substring of length 5, there is some difficulty in predicting the right result. Generally, not bad, but definitely not the best.

2.) Heuristic Analysis Utilizing Average Letters Used

This is probably the solution that you immediately thought of when posed with this question, and I don’t blame you. This is the most straightforward and easiest way to check the randomness of human vs a machine. After all, we are basically testing if the machine generated random numbers is machine generated random numbers.

To implement this, I averaged out each letter within the training text for a label and stored that average within a list for later access. After adding all our labels, we can now check a new text to see what label will be predicted. I simply just did an element-wise subtract and a sum of the resulting array to check how different the new text will be from the training data. I elected to simply do a sum error instead of sum squared error since the averages are small numbers between 0 and 1.

Here is the implementation:

class IdentifyClass():
  def __init__(self, labels):
    self.num_labels = labels
    self.inputs = []

  def inputNewLabel(self, text):
    averages = np.zeros(26)
    for ele in text:
      ind = ord(ele)-97
      averages[ind] += 1
    averages = averages/len(text)
    self.inputs.append(averages)

  def checkExistingLabel(self, text):
    averages = np.zeros(26)
    min_diff_squared = 1000000
    min_label = -1
    for ele in text:
      ind = ord(ele)-97
      averages[ind] += 1
    for idx, label in enumerate(self.inputs):
      diff = np.subtract(label, averages)
      diff_squared = np.sum((np.absolute(diff)))

      if diff_squared < min_diff_squared:
        min_label = idx
        min_diff_squared = diff_squared
    
    return min_label

As expected, this heuristic works extremely well. For the machine label, it is almost guaranteed to predict it correctly just because the underlying generation of letters is simply ensuring the distribution is mostly even. However, the difference between machine and human is definitely noticeable as demonstrated with this graph.

3.) Heuristic Analysis Using Distance Away from Home Row

This is a bit more of an interesting analysis that I wanted to try out. It works kind of similar to the last heuristic, but a bit different in that we are trying to view the finger distance travel instead of the actual letters being pressed.

The manner that I approached this was is based on the foundation that I believe humans have an overtendency to rely on the index fingers and middle finger for typing. As such I created a mapping that utilized this fact put heavier weights on letters that I think the human would overemphasize more than the computer. Here is the mapping:

self.weight_dict = {"f": 1, "d": 2, "s": 3, "a": 4, "j":1, "k":2,
 "l":3, "q": 4*2, "w": 3*2, "e": 2*2, "r": 2, "t": 2.5, "y":2.5, 
"u":2, "i": 2*2, "o": 3*2, "p": 4*2, "z": 2.5*4, "x": 2.5*3, "c": 2.5*2, 
"v": 2.5, "b": 2.5, "n": 2.5 * 2, "m": 2.5*3}

Essentially, the letters “f” and “j” have the lowest weightings as they are on the home row and are clicked on with the index fingers. “d” and “k” similarly are on the home row but are utilized by the middle finger so they have a slightly higher weighting. Afterwards, letters that are not on the home row will have a multiplier affected on them based on which finger from the home row will click that letter. For example “w” is clicked on with the ring finger on the upper row. Thus it has a multiplier of 2 times the regular value of the ring finger of the homerow which is 3. For the bottom row, I elected to put a slightly heavier weight of 2.5 since I believed humans are less likely to type on the bottom row, though I believe this could be changed even higher to demonstrate the differences between a human and a machine. Below is the full implementation.

class FingerDetect(IdentifyClass):
  def __init__(self, labels):
    super().__init__(labels)
    self.weight_dict = {"f": 1, "d": 2, "s": 3, "a": 4, "j":1, "k":2, "l":3, "q": 4*2, "w": 3*2, "e": 2*2, "r": 2, "t": 2.5, "y":2.5, "u":2, "i": 2*2, "o": 3*2, "p": 4*2, "z": 2.5*4, "x": 2.5*3, "c": 2.5*2, "v": 2.5, "b": 2.5, "n": 2.5 * 2, "m": 2.5*3}
 
  
  def inputNewLabel(self, text):
    total_weight = 0
    for ele in text:
      if ele in self.weight_dict:
        weight = self.weight_dict[ele]
        total_weight += weight
    average_weight = total_weight/len(text)
    self.inputs.append(average_weight)
  
  def checkExistingLabel(self, text):
    total_weight = 0
    min_val = None
    min_label = None
    for ele in text:
      if ele in self.weight_dict:
        weight = self.weight_dict[ele]
        total_weight += weight
    average_weight = total_weight/len(text)
    print(average_weight)
    
    for idx, label in enumerate(self.inputs):
      diff = abs(average_weight - label)
      if min_val is not None:
        if diff < min_val:
          min_val = diff
          min_label = idx
      else:
        min_val = diff
        min_label = idx
    return min_label

And believe it or not, this actually worked pretty well! The human average for this method of was significantly lower than the machines. It was around 2.6 which showed the over-reliance of the homerow adjacent values. This hints at the potential normal distribution of typing that humans regularly does.

Takeaways:

The main reason why I wanted to approach this problem was to demonstrate that some times machine learning is more of a memory algorithm rather than an algorithm that actually learns. I thought the concept of utilizing randomness was a good way to test this as the pseudo patterns that appear through human randomness could be a bit difficult to abstract out and to recognize. Whereas the machine learning algorithm could predict decently over a large dataset, their accuracy for specific items were not good as well as simple heuristics. This shows that some times using heuristics or combining machine learning with heuristics is a better indicator of accuracy rather than solely depending of machine learning.

Another reason why I did this project was to show the innate bias that humans have towards a normal distribution even though the normal distribution might not be inherently apparent at first. Within the case of this question, the distribution is pretty simple to tell which based on the two heuristics is the distribution between the home row fingers and the letters that are outside of the home row. However, I suspect that humans follow many normal distributions throughout their daily life, it’s just that we are not aware of conscious of them. Do I think that is of significant importance? Actually, yes I do. Utilizing this common knowledge is probably a good way to predict user behavior and to set predictability bounds for lower stakes predictions. That’s all for this project.

Probalby out.

That’s So Random! Actually, no it isn’t.

Written by Albert Lu