KDB+ vs AI

Ivan Briody
Version 1
Published in
8 min readJan 29, 2024

A brief look into how AI interprets the kdb+ language.

AI seems to be the next big evolution in the software engineering space with its ability to quickly analyse a problem and return a solution at a rate that a human programmer cannot compete with. Given a set of two problems, this blog will see how the AIs handle the problem and what solution they will return compared to a human. The two AIs (ChatGPT3.5 and Bing AI) as described in their own words:

I am ChatGPT, a language model developed by OpenAI using the GPT-3.5 architecture. I excel in natural language understanding and generation, making me a valuable tool for developers seeking advanced text-based solutions. My capabilities extend to assisting with coding queries, offering insights into programming concepts, and providing information on various technical subjects.

As a chatbot, I can provide quick answers to programming-related questions, suggest solutions to coding problems, and offer advice on best practices. Additionally, I can help developers stay up-to-date with the latest trends and technologies in the software development industry. I can also assist with debugging code, writing documentation, and testing software. If you have any specific questions or needs, feel free to ask me.

The problems are taken from the Advent of Code website. This is a website that hosts an annual competition for programmers by which they can test their coding abilities with daily challenges. The problems are taken from different stages of the competition with the second problem being viewed as a harder question.

Problem 1

Prompt:
“Given the list 1721 979 366 299 675 find the 2 numbers that sum to 2020 and then return the product of these 2 numbers in kdb+”

Developer Solution
q)input: 1721 979 366 299 675
q)(*/)(-1_desc distinct input*({(2020-x) in input}'[input]))
514579

ChatGPT Solution
q)L: 1721 979 366 299 675
q)target_sum: 2020
q)result: L cross L where {(+/) x = target_sum} each L

Bing AI Solution
q)lst: 1721 979 366 299 675
q)totalsum: 2020
q)res: {x*y} ./: where each lst +/: lst =/: totalsum;
'rank
[0] res: {x*y} ./: where each lst +/: lst =/: totalsum;

Breaking down the code returned by the two AIs. For Bing AI the code returns a rank error. The reason for this is the operation “./:” has received the wrong number of arguments following it. These sorts of errors occur where the input is not as expected (re-trying this question gave a solution with a length error signifying that the operation of manipulating 2 lists didn’t work due to them having a different number of items)

ChatGPT’s solution returns nothing due to the function “{(+/) x = target_sum}” returning a list of 0 Booleans which the AI used in the remaining lines of code. However, using the information from the returned code, a developer can extrapolate keywords from it (in this case the in-built cross-function) to create a useable solution as shown.

AI + Developer Solution
q)t:([]c1:input cross input;c2:{sum x}'[input cross input])
q)distinct exec prd c1 from t where c2=2020
,514579

The above solution is from a developer with some experience extracting the required information to solve the problem from the initial information returned. For an inexperienced developer, their solution may be to continually reply to the AI until a correct solution is formed on its own. Breaking down the problem into multiple steps helped but required numerous repeats of telling it that certain aspects were wrong for it to figure out a useable solution. Below is ChatGPT’s correctly solved but cleaned-up solution to the problem. It solved it on its own but took multiple recalculations to get right and depending on the starting code from the problem it may take longer or shorter to solve. Breaking the problem down it first finds all possible combinations of the list of numbers, then the sum of each of these combinations, filters out the pair of values that add to 2020 and lastly to get the product of these values.

Using this workable solution from the AI, it is compared to the developer solution. To aid in this the list of numbers is increased from 5 to 200 as denoted by the variable “n”.

ChatGPT Corrected Solution
q)numbers: 1721 979 366 299 675
q)target_sum: 2020
q)result: select num1:first each pair, num2: last each pair from ([] pair: (enlist each numbers) cross (enlist each numbers))
q)result: select sum_result:sum each pair from ([] pair: (enlist each numbers) cross (enlist each numbers))
q) result: select from ([] pair: (enlist each numbers) cross (enlist each numbers)) where (sum each pair) = target_sum
q)result: exec distinct pair from select prd each pair from ([] pair: (enlist each numbers) cross (enlist each numbers)) where (sum each pair) = target_sum
,514579

The ‘\ts” function in kdb+ can be used to measure the time taken in milliseconds and the amount of memory used to run a line of code in bytes. Viewing the results, the developer solution is seen to be better optimized.

Comparison of Developer Vs ChatGPT Solutions
q)\ts (*/)(-1_desc distinct numbers*({(2020-x) in numbers}'[numbers]))
0 3248
q)\ts exec distinct pair from select prd each pair from ([] pair: (enlist each numbers) cross (enlist each numbers)) where (sum each pair) = target_sum
0 4400
q)\ts (*/)(-1_desc distinct n*({(2020-x) in n}'[n]))
0 7344
q)\ts exec distinct pair from select prd each pair from ([] pair: (enlist each n) cross (enlist each n)) where (sum each pair) = target_sum
9 3494896

Problem 2

This problem is more complicated and involves more thought into how to go about solving it. With the matrix below the problem is to find the tallest trees (the numbers represent how high in meters the trees are) when viewed from all four sides. The entire perimeter is viewable from all four sides and is counted in the total.

The matrix

(3 )(0) (3) (7) (3)

(2) (5) (5) 1 (2)

(6) (5) (3) 3 (2)

(3) 3 (5) 4 (9)

(3) (5) (3) (9) (0)

The highlighted numbers total 21 (16 on the outside and 5 in the interior, highlighted by () brackets)

Prompt:

“Given a matrix of trees, find the total number of visible trees from all four sides of the matrix in kdb+. Each tree is given a value where 0 is the lowest and 9 is the highest. All trees around the perimeter are considered visible and then whatever trees are tallest in the interior. The matrix is m:(3,0,3,7,3;2,5,5,1,2;6,5,3,3,2;3,3,5,4,9;3,5,3,9,0)”

Human Solution
q)trees:(3,0,3,7,3;2,5,5,1,2;6,5,3,3,2;3,3,5,4,9;3,5,3,9,0)
q)uniqueTrees:(2#count first trees)#{x+10*til count x}raze trees
q)rotateViews:(uniqueTrees;reverse each uniqueTrees;flip uniqueTrees;reverse each flip uniqueTrees)
q)func:{{fills @[x;where not >':[x];:;0N]}/[x]}
q)count distinct (raze/){{x {x?x}func x mod 10} each x} each rotateViews
21

ChatGPT Solution
q)m:(3 0 3 7 3;2 5 5 1 2;6 5 3 3 2;3 3 5 4 9;3 5 3 9 0)
q)visibleTrees:{$[count x; sum each (max each\) each x; 0]}
q)totalVisibleTrees:{top: visibleTrees first x;bottom: visibleTrees last x;left: visibleTrees flip x;right: visibleTrees x;sum[ top; bottom,;left,;right]}
q)totalVisibleTrees m
‘rank

Bing AI Solution
q) m:(3 0 3 7 3;2 5 5 1 2;6 5 3 3 2;3 3 5 4 9;3 5 3 9 0)
q) n: count m
q) m: (n+2)#(n+2)#0
q) m[1;] : 9
q) m[n;] : 9
q) m[,1] : 9
q) m[,n] : 9
q) m[2;2] : max m[2;2], max each m[1;2], m[3;2], m[2;1], m[2;3]
q) m[2;3] : max m[2;3], max each m[1;3], m[3;3], m[2;2], m[2;4]
q) m[3;2] : max m[3;2], max each m[2;2], m[4;2], m[3;1], m[3;3]

q) m[3;4] : max m[3;4], max each m[2;4], m[4;4], m[3;3], m[3;5]
q) m[2;4] : max m[2;4], max each m[1;4], m[3;4], m[2;3], m[2;5]
q) m[4;3] : max m[4;3], max each m[3;3], m[5;3], m[4;2], m[4;4]
q) m[3;3] : max m[3;3], max each m[2;3], m[4;3], m[3;2], m[3;4]

The developer’s solution involves using a filter on the matrix to make all the values in it unique, rotating the matrix to get values from all the sides and counting the unique trees again to reach the result.

Reviewing the 2 answers given by AI show two very different solutions. The Bing AI solution attempts to go through each possible value in the matrix of trees however it wrongly assigns the “m” variable a list of 0’s which render all the rest of the logic mute. ChatGPT attempts to gather the information by rotating the matrix to the four sides like the developer’s solution. It gets a rank error for its function “totalVisableTrees” as the sum function expects only 2 arguments. Fixing this mistake yields an answer of 106 which is still incorrect. Further prompting of the AI to correct itself proved difficult as corrections specified to the AI didn’t always persist between answers. As a result, the mistakes in the code repeated themselves between iterations and a final working solution couldn’t be created.

Conclusion

AIs are quick to process the information given and return a solution. The solutions returned contained multiple issues and never got the correct answer on the first attempt. Issues that arose from this include attempting the assign values to protected variable names, introducing zero Booleans into the code which nullify any returned value and incorrect use of input parameters as seen with the rank errors. Another issue encountered was accurately describing the problems to get the result required. Both AIs seemed to only remember one or two of their previous responses and this caused issues in trying to get the AIs to correct their own code as they would get caught in a loop. Interestingly changing the prompt to use a programming language that is more popular like Python resulted in the solutions being either correct or closer to working code. The AIs did give explanations as to why they used the code they used and this along with some keywords returned could aid in coming up with a solution. Sometimes the logic returned did not make sense as what it described did not happen within the code.

It is important to note that repeating the same prompt in a new terminal of the AI’s user interface will also produce different approaches to the problems. In the multiple times, I have given the AIs the same problem they have returned different but ultimately still wrong answers. This means that if a developer were to use the AI in initial findings, they may focus on a correct solution but maybe not the one best suited to their needs. Continuingly correcting the AI’s results can garner a corrected solution but this can take longer than a developer coming up with a solution on their own depending on the problem faced.

In summary and based on the small set of problems given, AI is not at the point where it could solve problems given to it in kdb+ on its own. I think it can be seen as a useful tool in a developer’s arsenal to help them generate an initial idea for a solution to the problem they are working on, but for people new to the kdb+ language can lead to confusion to get the correct answer. The AI gives reasoning as to why they did what they did, but an inexperienced person can be led astray as the reasoning can be entirely wrong in its assumptions.

About the author

Ivan Briody is a kdb+ Engineer here at Version 1.

--

--