“Among competing hypotheses, the one with the fewest assumptions should be selected” Occam’s Razor

Machine learning and reasoning through Boolean logic

Introduction of Boolean-based machine learning through simplification algorithms and don’t cares

--

Image by Clker-Free-Vector-Images from Pixabay

I. Introduction:

What is knowledge? As defined by a google search of the term, it means

“facts, information, and skills acquired by a person through experience or education; the theoretical or practical understanding of a subject.”

Thus, to gain knowledge, we need to transform incoming stimulus to a certain representation. In order to represent it, we need to “compress or simplify” the incoming stimuli as we have limited resources in our brain.

This article is a revision and updated article that I previously posted with greater clarity and explanation. The previous post is located here.

The purpose of this is to expose my learnings towards a broader audience and to add to the pool of knowledge on the advancement of artificial intelligence.

Initially, I experimented on Boolean Logic based Machine Learning using a commercial synthesis tool specifically for creation of digital ICs. The methodology involves the following:

a. Verilog-HDL code

b. Case statements to define input and output values

c. The use of x or dontcares in the case statement to represent missing values

d. Commercial logic synthesis tool

What I did now is to implement the simplification methodology on a software programming paradigm. I also did more metrics in evaluating the machine learning performance based on predictive power and runtime.

II. Related literature

My intent for this is to give the reader existing technologies on about the concept I’m about to share. It was after I published my article on Wordpress about the technology of an analytics company called Rulex Analytics. They are already using the use of logic simplification techniques with the core algorithm of “Switching Neural Networks” and fast logic simplification algorithms like “Shadow Clustering”. The advantage of using rule-based (or logic based) machine learning is that the model is not black box. It produces logical statements that can be easily transformed into human comprehensible sentences. Visit their website to check out their services and their publications as well.

While this may be the case, there’s a new publication in Quanta Magazine about how popular deep learning ‘black box’ characteristic can be understood. See the article here.

III. Core algorithms

Before I discuss the core algorithms used in this research, let’s define the motivating factor for my pursuit in simplifying combinational logic:

1. Simplifying combination logic has a direct impact on an algorithm’s performance. Once the logic is simplified, it can be converted to a bytecode or delete unneeded logic representations as the number of terms in a simplified logic is at a minimum.
2. Simplified logic in it’s simplest form aligns with Occam’s Razor. Occam’s Razor states that among competing hypotheses, the one with the fewest assumptions should be selected. This is the foundational concept that I’ve been applying for my research, and I believe that the human brain seeks to simplify its stored information in order to accommodate new information and to easily form more complex concepts and models.

Mathematically and in statistics, Occam’s razor can be restated as follows:
i. By definition, all assumptions introduce possibilities for error; if an assumption does not improve the accuracy of a theory, its only effect is to increase the probability that the overall theory is wrong.

The statement above is similar to the concept of overfitting, that, if given the opportunity, is to simplify your predictive model rather than introduce “magnitude-based” types regularization.

On the previous evaluation, logic synthesis tools are used to simplify the logic with don’t cares. It has been known that the algorithms used for such simplification utilize the espresso algorithm. The espresso algorithm is a heuristic of logic simplification that is able to reach a sufficiently simplified logic in ASIC designs. While its logic is not totally simplified, the runtime increase of the algorithm is much faster compared to Quine McCluskey method, which is an exhaustive but completely simplified form.

However due to the fact that espresso algorithms do not return a completely simplified logic, it’s predictive power may not be maximized. I haven’t dug deep into how the algorithm works, but for the purpose of proving the predictive power of simplest information representation, I have resorted to Quine McCluskey algorithm where it produces the simplest Boolean representation to a function.

IV. Test Setup

The test will be again interpolating a function with incomplete input data, or Incompletely Satisfiable Function(ISF). This time the function will be learning is squaring the input function.

Input values:

X ⊆ [1,∞], where X is an integer represented in binary notation.

Output values:

A. Libraries and language used
a. Python is the language of choice for this experiment due to it’s syntactical simplicity and faster development.
b. Libraries used:

i. Pickle → used to directly save variables stored in RAM as a file in hard disk
ii. Pandas → Used to analyze results and performance quickly through its data analysis features
iii. Timeit →used to monitor the time elapsed per learning iteration
iv. Matplotlib → used to plot values from data analysis (pandas)
v. Sympy → Symbolic library which contains Boolean algorithms like Quine-McCluskey algorithm

B. Basic algorithm
The Quine McCluskey algorithm can simplify logic expressions to their simplest POS(Product of sums) or SOP(Sum of products) form. Unlike the simplification done in logic synthesis, it uses espresso algorithms and other process related mapping and optimizations to achieve the best logic circuit in terms of power, area or timing.

Initialization of data:
1. The untrained samples will be all input combinations, from 1 to infinity. (In this case, it is just finite. The specific values will be identified later). These values will be part of the dontcares list.
2. The list of minterms are empty.

Scope of verification:
Logic simplification can only produce an output of 1 bit, and therefore, we will focus only on certain bits to verify and interpret. These are bits (from LSB):

A. 11th bit
B. 14th bit
The algorithm does the ff to introduce learning:
1. The tool starts learning from input values and squared output values going beginning from no. 1.
2. The learning starts from and ascends by increments of 1.
3. The tool simplifies the value of 1 at a certain output bit Y.
4. The simplified logic on bit Y is verified from 1 in ascending order:
a. If value is incorrect, the list of dontcares and minterms are updated based from the verified values and logic simplification is done.
b. If value is correct, the loop continues to the next digit for verification. The list of dontcares and minterms are also updated based from the verified values.
The machine learning algorithm assumes perfect memory experience, that is, the results of verification are immediately captured by logic simplification algorithm by updating the list of input dontcares and minterms.
5. Every time a correct value is achieved, the runtime and the list of correct values are stored in a database for further analysis.

V. Analysis and Presentation of results

1. 11th bit results:To begin with, the verification and input value range is finite. The initial range or training data has a range from 1 to 1023. This is enough to produce substantial results for the 11th bit:

Figure 2. The figure below shows the plot between last correctly verified value and corresponding runtimes.

In order to plot a continuous graph despite the last correct values and its runtimes are recorded, the correct the value is computed as follows:

where LCNx is the last correct value, LCNX-1 is the previous last correct value, and

is the range of the two LCN. This will then include that machine learning capacity will be also a function of the runtime LCN vs runtime plot.

Figure 3. The figure below shows the same plot as Figure 2, but this time, the rolling mean is computed in order to filter out noise in the data to verify reduction of runtimes as more data is acquired. The window size is 50 samples.

With the rolling mean filter being done on the data, it already showed a decrease in runtime as the number of samples increases. Also note that at sample 500 (input==501), the simplified logic is already correct at up to the last sample, which is as of this range, is input==1023.
The result is a manifestation of the simplifying power of Quine-Mccluskey method:
1. simplifies logic for computing efficiency and,
2. Inferential capacity of don’t cares by substituting don’t care values with values that yields simplest Boolean representation.

The simplest representation aims to follow Occam’s razor, and thus having the capacity to make inferences due to limited data.

I am yet to make research how this relates to Bayes’s inference theorem as well as Solomonoff’s theory of inductive inference.

2. 14th bit results: for the 14th bit, the range is extended to much higher combinations. You will know the reasons why as I present the data.

Figure 4. 14th bit run times. Note that at every increase in input bits, the runtime increases.

Every time the range of inputs reaches maximum value, the number of bits is increased by 1. For example, for the 14th bit, the plots doesn’t show high levels of predictability even the inputs reached by 1023. So need to increase the number of inputs to 11. What is done during this bit increase is as follows:
a. Added list of input names
b. For minterms, I inserted a value of 0 on the most significant bit
c. For dontcares, I inserted a value of 0 on the most significant bit and for the new digits (> 1023), I continued the list to 2n-1.

For the runtimes, notice the observable decrease in runtime as samples learned increase. But it is also expected that since the algorithm used is Quine McCluskey, the algorithm is NP-hard, resulting to significant increase in runtime as the number of bits are added to accommodate higher samples.
It seems that the range of data for training is not enough to reach a ‘learned’ state, similar to what happened in bit 11, where the data is correct from 501 onwards.

Due to runtime issues, I will discontinue examining the learning progression of bit 14. Instead, I will look for the ‘learned state’ of this bit, that is the correct values (50% correct up to the last range) must take a large number of correct answers continuously up to the last number in that range.

Procedure:
1. Increase the input range. (e.g, from 11 bits to 12 bits)
2. Decrease the number of input training data exponentially by 2s, starting from n. Example is to set training data from 1 to (211–2n). If the verification is correct up to 211, n is increased by 1, and the process is done again.
Here are the results of this heuristic in searching for the ‘learned state’ of bit 14:

This shows that the results are replicated as well on bit 14, that it is able to predict 50% continuous missing values up to the last range.
On the nature of learning how to square integers:

V. Interpretation:
To square integers, one must learn the following concepts:
1. Concept of counting. Counting is a compression technique to allow the representation of quantities be compressed exponentially by applying the concept of permutation of symbols. The next significant bit is instantiated if the less significant bit exhausts its symbol usage.
2. Concept of addition. Addition is repeated counting. It involves iterating the sequence of permuted set of symbols where the addend is the length of iteration.
3. Concept of multiplication. It is repeated addition with the same iteration to be done as it is in addition.

Given these complex challenges, the Quine-McCluskey algorithm applied to ISF is able to achieve a ‘learned state’, which is able to accurately predict the remaining 50% of the values of a given input range.

Apart from these, this interpretation is realized:
The training data is just selected 1 bit, which it has to deduce the nature of squaring integers using only training data from 1 bit of the outputs. Yet it has achieved a learned state after given a sufficient number of training data. In the 14th bit, it only needs at least 4095 out of 8192 samples. Note that the relationships of the digits aren’t yet identified and the relationships of the other output bits has no evidence.

VI. Summary and conclusion:
It therefore shows that the application of Occam’s Razor to Boolean logic simplification through Quine McCluskey algorithm can be an foundational algorithm for modern artificial intelligence when applied through computers. This is because of the Boolean nature of the logic circuits that made up modern computers.

The runtime effect of this algorithm is still not yet considered. It is still not yet studied if the rate of learning as a function of computer resources vs other methods like neural networks and tree searches as foundation will have higher overall performance in terms of computer usage and predictive power.

VII. Reference source code

You can clone or try out the code via https://github.com/arjay55/boolLLM

This is a simple testcase for you to try out. It is a full adder and see its predictive power per bit.

VIII. Future work

It may seem obvious that there will be combinatorial explosion when the number of inputs/outputs increase. I am heavily experimenting now on algorithms that simplify the simplification process and it is proving to be exceptionally difficult, but there are promising hypothesis that are yet to be proven. This next step is crucial towards applying Boolean logic, the most fundamental form of symbolic reasoning, into a full blown, scalable machine learning algorithm applicable in real world scenarios. I will publish my progress here at Medium.

Recently, I found another good literary work on this on OpenCog’s AI platform. Try checking this and am excited to see their work and contribute to their progress.

Thanks for reading :)

References:

[1] Google search dictionary, “Knowledge” definition

[2] Marco Muselli, Switching Neural Networks: A New Connectionist Model for Classification, Rulex Analytics Technical Papers

[3] Marco Muselli, Enrico Ferrari, Coupling Logical Analysis of Data and Shadow Clustering for Partially Defined Positive Boolean Function Reconstruction (2011), Rulex Analytics Technical Papers

[4] Occam’s Razor, Wikipedia

[5] Espresso Algorithm, Wikipedia

--

--

Arjeus A. Guevarra
Machine learning based on boolean logic

I am an Electronics Engineer turned Data Scientist and AI enthusiast and is passionate in data driven solutions and advancing artificial intelligence.