String Calculator Kata using Eta

Graeme Lockley
8 min readMay 8, 2019

--

Whenever I stumble upon a new language the first piece of code that I write is the String Calculator Kata. The beauty of this kata in exploring a new language is:

  • You need to be able to write tests,
  • You need to access libraries,
  • You need to deal with collections,
  • You need to deal with errors, and
  • If using predicate based testing, you need to deal with randomness

Eta seems to be an interesting addition to the JVM universe but, given it’s youthfulness, I thought I would share my solution coupled with some commentary and thoughts.

Source

The code that makes up this kata is stored on gitlab. Looking at this code you will notice a couple of things:

  • The build system uses Gradle
  • The source code is not broken up into main and test but rather all of the code is placed in src/main/eta. I was unable to separate the code out into main/test directory structure and still have a working build.
  • Tests are run by executing ./gradlew run rather than the traditional ./gradlew test This is a little strange and not idiomatic but, I suspect, has more to do with the Eta Haskell heritage and the maturity of the Eta tooling.

The source is therefore a little idiosyncratic but, once I had wrapped my head around the differences, I was still able to achieve a developer workflow that allowed me to complete the kata.

Test Scenarios

The traditional TDD scenarios are typically listed below

1) add("") should return 0
2) add("123") should return 123
3) add("1,2,3") should return 6
4) add("1,2\n3") should 6
5) add("//;\n1;2;3") should return 6
6) add("//*\n1*2*3") should return 6 -- regex character
7) add("1,-2,3,-4,5") should return Error [-2, -4]
8) add("1,1001,2,2001,3,3001") should return 6
9) add("//[===]\n1===2===3") should return 6
10) add("//[===][***]\n1***2===3") should return 6

Using predicate based testing I was able to reduce the previous 10 scenarios into the following 4 scenarios:

  • natural numbers separated with a comma or newline returns the sum of all numbers less than 1001
  • natural numbers separated with a custom character returns the sum of all numbers less than 1001
  • natural numbers separated with multiple custom multi-character separators returns the sum of all numbers less than 1001
  • integers with at least one negative returns an error containing all of the negative values

The reason why this reduction was possible is that:

  • (1) to (4) are generated in the first test
  • (5) and (6) are generated in the second test
  • (7) is the fourth test
  • (8) is embedded into the first 3 tests
  • (9) and (10) are generated in the third test

Test Code

Before looking at the test code some hygiene.

  • I used QuickCheck to generate my test data.
  • I used hspec for the testing framework. The primary rationale was familiarity and hspec’s tight integration with QuickCheck.

These libraries took a little of getting used to but once I had overcome the mental adjustment I was able to easily express my intention.

Each of the scenarios are listed below with a verbal narrative of the code.

Scenario: integers with at least one negative returns an error containing all of the negative values

forAll (suchThat (listOf arbitrary) (any (< 0))) $ \ns ->
add (intercalate "," $ map show ns) `shouldBe` (Left $ filter (< 0) ns)

Once one gets over the notation it is quite readable:

For all lists of arbitrary integers with at least one element less than 0, calling add with a single parameter composed of all elements separated with a comma should return a error value containing a list of all the negative values

Scenario:natural numbers separated with a comma or newline returns the sum of all numbers less than 1001

forAll (listOf naturalNumberGenerator) $ \ns -> do
separators <- infiniteListOf $ elements [",", "\n"]
return $ add (mkInput separators ns) `shouldBe` sumList ns

Before reading out the test I’ll introduce the helpers requiring explanation.

naturalNumberGenerator is a generator that randomly selects a number in the range 0..2000. The implementation is straightforward and follows directly from QuickCheck.

naturalNumberGenerator :: Gen Int
naturalNumberGenerator =
choose (0, 2000)

mkInput is a function that takes a (possibly infinite) list of separators and a finite list of values and composes a string by, in order, inserting a separator between each of the values.

mkInput :: [String] -> [Int] -> String
mkInput _ [] = ""
mkInput separators (x:xs) = mkInputHelper separators xs (show x)
where mkInputHelper _ [] result =
result
mkInputHelper (s:ss) (n:ns) result =
mkInputHelper ss ns (result ++ s ++ (show n))

The final function sumList is a simple helper that sums the list together dropping all values greater than 1,000 and wraps into an Either Right.

sumList :: [Int] -> Either whatever Int
sumList =
Right . sum . filter (< 1001)

The test can now be described:

For all lists of natural numbers, calling add with a single parameter composed of all numbers separated using a comma or newline, will return the sum of all numbers less than 1001

Scenario: natural numbers separated with a custom character returns the sum of all numbers less than 1001

forAll (listOf naturalNumberGenerator) $ \ns ->
forAll customCharacterSeparatorGenerator $ \sep ->
add ("//" ++ [sep] ++ "\n" ++ (intercalate [sep] $ map show ns)) `shouldBe` sumList ns

Having worked through the two previous tests the only new concept here is the declaration for customCharacterSeparatorGenerator. From a kata perspective this declaration is interesting because it requires clarity around what are the permissible separator characters. On inspection the answer is obvious but, when first attempting this kata using predicate based testing, the answer had to be discovered.

customCharacterSeparatorGenerator :: Gen Char
customCharacterSeparatorGenerator =
elements $ filter (\c -> c /= '-' && c /= '[' && (not . isDigit) c) [chr 32 .. chr 127]

Separators can be any character other than those that form part of an integer literal (‘0’..’9' and ‘-’) or ‘[’ which indicates a multiple character separator.

The test can now be described:

For all lists of natural numbers and a valid separator character, calling add with a single parameter composed of all numbers separated using the custom separator, will return the sum of all numbers less than 1001

Scenario: natural numbers separated with multiple custom multi-character separators returns the sum of all numbers less than 1001

forAll (listOf naturalNumberGenerator) $ \ns ->
forAll (listOf1 $ listOf1 $ customCharacterSeparatorGenerator) $ \seps -> do
separators <- infiniteListOf $ elements seps
return $ add ("//[" ++ (intercalate "][" seps) ++ "]\n" ++ (mkInput separators ns)) `shouldBe` sumList ns

The novelty in this test is the composition of multiple multi-character separators. Fortunately Haskell treats a string as a list of character allowing the code listOf1 customCharacterSeparatorGenerator to produce a generator of non-empty strings. Passing the previous expression into listOf1 will now produce a generator of non empty lists of non-empty separator strings.

For all lists of natural numbers and a list of valid multi-character separators, calling add with a single parameter composed of all numbers separated using the custom separators, will return the sum of all numbers less than 1001

add

The implementation of the add function is not at all heroic. I will however unpack it piece by piece.

module Kata.SC where

import Data.List
import Data.List.Split
import Text.Regex


add :: String -> Either [Int] Int
add input =
...

The add function takes a single argument and returns an Either where the left indicates the error condition containing the negative values whilst the right contains the result.

The body of add is simple Haskell relying only on parse.

add :: String -> Either [Int] Int
add input =
let
numbers =
map read $ parse input

negatives =
filter (< 0) numbers
in
if null negatives then
Right $ sum $ filter (< 1001) numbers
else
Left negatives
parse :: String -> [String]
parse input =
...

The body of parse uses a guard to separate out the 3 scenarios:

parse :: String -> [String]
parse input
| isPrefixOf "//[" input =
...
| isPrefixOf "//" input =
...
| otherwise =
...

Working backwards the otherwise scenario deals with the comma/newline scenario. Using regular expressions this implementation is simple.

splitRegex (mkRegex ",|\n") input

The middle scenario is also simple by virtue of being able to extract out the separator and content based on positions.

splitRegex (mkRegex $ escapeRe $ take 1 $ drop 2 input) $ 
drop 4 input

Reference is made to the escapeRe function responsible for marking up text and escaping all of the regular expression characters.

escapeRe :: String -> String
escapeRe [] = []
escapeRe (x:xs)
| x `elem` "*+.\\?|$^)({}" =
'\\' : x : escapeRe xs
| otherwise =
x : escapeRe xs

The final scenario is the tricky scenario in that it needs to compose the regular expression separator through a number of manipulations.

let
inputs :: [String]
inputs =
splitOn "\n" input

separator :: Regex
separator =
mkRegex $
intercalate "|" $
map escapeRe $
sortBy (\a b -> if length a > length b then LT else GT) $
splitOn "][" $
init $
drop 3 $
inputs !! 0

content :: String
content =
inputs !! 1
in
splitRegex separator content

Taking a bit of time to work through each step the manipulation is clear except for the sort step. The sort is necessary due to an anomaly with regular expression libraries. The following REPL interaction is useful for explanation:

Prelude Main> import Text.Regex
Prelude Text.Regex Main> splitRegex (mkRegex "x|xy") "1x2xy3"
["1","2","y3"]
it2 :: [String]
Prelude Text.Regex Main> splitRegex (mkRegex "xy|x") "1x2xy3"
["1","2","3"]
it3 :: [String]

The regular expression libraries appear to apply the split using each regular expression left to right. The result of this strategy is that if a separator is a prefix of a later separator then the prefix will be matched. In the “x|xy” scenario the “x” is applied before the “xy” resulting in the “1x2xy3” being split into “1”, “2”, “y3”. The way to resolve this issue is to sort the separators on descending separator length.

As an aside a number of us have been performing this kata for a number of years and only found this scenario when we used predicate based testing. This scenario appears in virtually all implementations that use a regular expression strategy.

Closing Thoughts

On a personal note — I am a Java developer and have lived on the JVM from the early days of Java. To be specific from Java 1.0.2. The lens that I look through when using Eta is not “Eta is Haskell on the JVM” but rather “Eta is an additional language on the JVM” so please consider my thoughts within that context.

Eta does not support the JVM idiomatically. For Eta to fulfil the promise of bringing a non-strict functional language to the JVM it needs to not only interact with JVM code but also fit alongside the idioms. My Eta tests need to fit alongside my xUnit runner and report accordingly. I need to be able to separate my code into src and test. I need to be able to exploit packages.

Functional languages on other environments have taken a more measured approach to libraries. Elm has created a smaller set of focused libraries. Elm can get away with that because it is a new language and the community is purposefully keeping it’s implementation narrow and focused on it’s chosen domain. Eventhough Purescript is a general purpose Haskell flavoured language they have pulled the libraries together and introduced greater coherency. Eta positions itself as Hasell on the JVM and inherits a vast number of libraries. However this does introduce the following issues:

  • Steep learning curve and lots of searching to find functionality. For example to escape a regular expression is native to most libraries. I am sure that somewhere in the vast Haskell landscape it is there but I could not find it.
  • Inconsistencies between libraries for things that are functionally similar. For example splitting an empty string using a character separator or a regular expression separator has different outcomes.
Prelude Main> import Data.List.Split

Prelude Data.List.Split Main> splitOn "," ""
[""]
it0 :: [[Char]]
Prelude Main> import Text.Regex

Prelude Text.Regex Main> splitRegex (mkRegex ",") ""
[]
it0 :: [String]

Eta has massive promise and it was wonderful to implement “old faithful” String Calculator Kata using Eta on the JVM. My hope is that the Eta community will embrace the JVM as a first class participant rather than treating it as a runtime for the Haskell community. Only then can it be successful in bridging between the two communities.

--

--