number_names()
Getting to grips with manipulating strings and integers
Shifting Perspectives
Python has been my first real effort at learning a programming language. I dabbled in R a few years back, but only in the very strict context of using it to analyse data. I never really thought about using the language to create something; it was a tool, a means to an end. Unsurprisingly, I am approaching most (if not all) programming concepts and patterns with a near-zero level of familiarity
One of the areas I have found hardest in this endeavour has been the use of mathematical operations and concepts to accomplish tasks. It’s not that I don’t understand the concepts, I have always gotten on well with maths, but rather, I have never conceptualised mathematical constructs in the applied manner they are often used in coding.
Using the modulus operator and floor division to extract digits from the integer or using log10(n)+1 to calculate the length of the integer are not exactly solutions to a problem that jump straight out at me. Even simple operations such as using addition and subtraction to keep track of the length of substrings, instead of indexing and concatenating strings catch me off guard.
When I finally grasp one of these applied concepts, it is extremely satisfying, and not a little humbling. The answers to problems are often so simple and elegant as to belie the complexity of their creation.
I have a tendency to be quite visual when I am trying to understand a concept, and I usually apply visualisation exercises when I am trying to understand a new concept or idea. I like closing my eyes and thinking about problems in solid, corporeal terms; webs and connections, cogs and wheels, actions and reactions.
The result, when I sit down to write some code, is that most solutions present themselves to me as simply finding the best data structure to help me find the answer I am looking for. If I am trying to find the length of a substring, my mind immediately flits towards concatenating to a string and calling len(), not subtracting start and end variables.
I see the table, or the data structure and visualise myself plucking the answer out (I may or may not be wearing a monocle in this visualisation), not using maths to locate and represent the answer without ever even creating that table.
Approaching number_names()
The idea behind the exercise is simple enough in itself:
Write a function that takes an integer value and returns it spelled out in text form. It must handle values up to 1,000,000. Additionally introduce support for negative values and floating-point numbers
When I first looked at the exercise, my mind flashed right past the mathematical and jumped straight to string manipulation. I suppose this was the obvious jump, indexing strings is a quick and easy way to split up an integer value, and the way I was most comfortable with.
The basic premise was:
int -> str -> split using indices -> use digits as dictionary keys -> string
The dictionary was going to be an important part of this, everything was going to hinge around the key, value pairs.
The purpose of the main body of the function would be to identify the individual digits, reference them in the dictionary and return the string.
Simple right?
The result came together quickly, mostly worked and was very messy.
Okay, extremely messy.
Those line lengths are worthy of a 90’s hacker movie.
I quickly decided that directly referencing the dictionary in the f-string was probably not the best way to be doing things.
This in mind, I started to think about the best way to reformat the return strings to tidy up the code and make it just a little bit ridiculous. My next effort drew on probably the only other experience I had of formatting changing values in a standardised format.
date string formatting in Excel (Yes, yes I know)
I decided that the best way of approaching this would be to standardise variables that I could plug in and out of the string when required. My implementation of this left a lot to be desired, I was still doing all the splitting and manipulation in a very long if-statement.
The main positive from this attempt was that it tidied up my code enough that I could see patterns emerging that I could take advantage of by writing a few sub-functions. This had been next to impossible to see in the mess that was the first attempt.
Clearing away the cobwebs
In keeping with the spirit of the inefficient_code() project, we now have a (limited) working model and learned a valuable lesson; Formatting each digit individually was a ridiculous idea.
A better idea; a six-digit number is simply two 3-digit numbers
It sounds overly simplistic. And of course, it is. But it was an important shift conceptually. Now the string would look more like this
{hundreds_value} + “Thousand and” + {hundreds_value}
Not only would the return string be much more manageable, but the way in which I could abstract the splitting and formatting of the input was also becoming increasingly apparent.
I started off this process by tidying up the dictionary, replacing the three individual dictionaries, with one nested dictionary. Not much was changing here, but I had removed the need for the “hundreds” and “thousands” strings I had previously stored, so a bit of housekeeping was on the cards anyway. My goal was to make the dictionary more readable, and better signposted in the code.
With the dictionary tidied up, I started work on cleaning up the main functions. Looking back over this iteration when I finished, I found it funny how I relied on tuple unpacking to format the strings. I have a feeling I was just getting into using unpacking at the time and decided it was the way to go. It made things a little bit more complicated than it needed to be, but the function is beginning to look more like I want at this stage.
The long lines and dense blocks of code have mostly been removed from play, especially for the higher values. There are now two helper functions, one which formats values under 100, and the other for values in the range [99, 1000]. I have moved most of the string formatting inside these functions, the main function is now only responsible for passing the values to the helper functions and printing the final result.
The check_ functions themselves are simple enough, although are still based heavily on string manipulation. This makes for a little bit of clumsiness as I based some of the internal values on integer inputs. At this stage, I was focused more on getting the pieces into place though.
Returning a tuple of strings from check_hundreds(), was my initial way of taking into account input values with 0 as a value in the 100s or 10s space. If the 100s space was blank the return value was (“”, “{tens}”), vice versa in the case of no 10s value, and (“”, “”) if the input number was a multiple of 1,000.
This worked fine, but it would have been more straightforward to just return the string here instead of playing about with unpacked values. It was a valuable effort though, as it was one of the first times my mind moved towards unpacking when I was dealing with possible multiple return values.
Building complexity
At this point in the exercise, I had completed the basic functionality of the function. However, I wanted to extend this further and allow for negative values as well as floating-point numbers.
The input has four possible states in this implementation:
- positive integer
- negative integer
- positive floating-point
- negative floating-point
As I was still using strings to process the input, I decided to use the sign as the first criteria for processing the number, treating the “Negative” return in much the same way as I did the “Thousand and” string above. Next, the input would be identified as either integer or floating-point and then passed to the relevant function to process.
Input -> Positive or Negative? -> Integer or Floating-point? -> Function
In this version of the code, the code that formatted the return string based on input length has been moved into generate_integer(). The process of formatting the string has been entirely moved to helper functions. The primary number_names() function is now purely responsible for calling the helpers and returning the formatted string.
Two additional helper functions have been added as well:
- generate_float() which splits the input, calls the formatting functions and then combines the integer and fractional part strings
- generate_fractional() which takes the fractional part of the floating-point number and generates the formatted string
As the code is still using strings to manipulate the input, .split(“.”) is used to split the floating-point at the decimal point. Indices are then used to call the relevant functions and the floating-point text is created by adding the resulting strings together.
The generate_fractional() function uses a for-loop to iterate through each digit in the fractional part and return the text representation. Quick and easy but only possible with the input was in string form. Doing this mathematically was a bit of a learning curve later on.
Mathematically Speaking
Up to this point, my efforts had kept very much within my comfort zone, but the code itself was clumsy and there was too much back and forth between integers and strings for my liking. I knew that the modulus operator (%) could be used to extract digits from the integer, without needing to convert the value to a string. The problem was I had very little idea of how to go about this.
When I started up inefficient_code() (all four posts by the time this goes online), this was planned as my first post, but as I worked to wrap my head around the mathematical side of things, I pushed it back. If the point of writing these posts is to move from naive implementations to something more streamlined and elegant, I needed to finish this up first.
There is still a bit of cleaning up to do, but there are only two places in the code where I had to rely on a string conversion:
- handling the input of a floating-point number. This was necessary, as otherwise, the input passed to the function would be significantly different from the input (a lot more zeros) due to the way floating-point numbers are handled
- getting the number of digits in the fractional part of a floating-point number. I could not figure this out mathematically and this seemed more efficient than anything I could find on Google
The main function now uses value and type checks to assign the input to the correct functions. Absolute values are passed in the case of negative values, as the negative sign is taken care of directly in the final string. I have been trying to figure out a way of going this without nested if-statements or a long string of elifs, but so far I am drawing a blank on it.
generate_float()
The generate_ functions did not require as many changes as I had initially thought. The generate_float() had the most significant alterations, which cleaned up the function immensely. Instead of partitioning the integer and fractional parts of the floating-point number, the decimal module is used to keep fractional part the same as the input.
(numeric_value % 1) returns the digits to the fractional part
int(numeric_value) is used to get rid of the fractional part of the number
The function passes the result to the generate_fractional() and generate_integer() functions, and simply adds the results together for the return string.
generate_fractional()
While the generate_float() function was probably the most changed in terms of structure, the generate_fractional() function required a larger shift in thinking. Instead of being passed a number in string form, the function was now receiving the fractional part of the input in the form 0.xyz
I briefly experimented with using the modulus to return the value of each digit, but after struggling with this for about an hour, I instead decided on simply shifting the fractional part entirely to the left, making it an integer. Keeping it as a decimal was unnecessary at this stage, only the digits themselves mattered.
To do this I multiplied the by 10^n, where n is the length of the fractional part. I got a bit tricky here and calculated the length of the fractional part by converting to a string, calling len() and slicing off the first two indices (the “0” and “.”). This was one of the few places where finding a solution using mathematical operators is beyond my current abilities.
The for-loop uses the modulus operator to take the last digit from the integer, reference it in the dictionary and append the word to the return string. This last digit is then dropped from the end of the integer by dividing the by n, where n is a power of 10.
generate_integer()
I am extremely happy with the current iteration of this function. It still maintains the same if-elif structure of previous iterations, but the code is greatly simplified. It uses floor division and modulus to separate out the values and pass them to the two helper functions that generate the word values of the digits.
The code is clean and readable, and a million miles from the original code that this function was based on. Out of all the code in this project, I think this demonstrates most clearly the progression in my thinking as I progressed through the code.
generate_tens() and generate_hundreds()
Unbeknownst to me, these two functions became the backbone of the code. These functions are almost entirely responsible for generating the word-values of the input. Whilst the other generate_ functions are responsible for formatting the final string, these functions provide almost everything used in this process.
In this iteration of the generate_hundreds() function, the return tuple has been removed and replaced with the return string in its final format. This step enhanced readability and removed an unnecessary step in the flow of the code.
The flow of the code has also been greatly simplified by removing the dependence on string indexing. The functions now more directly access the dictionary, whith only minimal use of floor division and modulus to access digits.
The one niggling thought I have is that I would like to find a way of doing these operations without having to check each input. Enlarging the dictionary would be one way, but I also feel like that would be too unwieldy. Perhaps I will return to that when my skills have progressed further.
Reflections
There are still a few directions that I would like to explore this further. Although this project was a relatively basic exercise, I have found it very useful for thinking about different ways to manipulate and work with string and integer values.
As it stands the code only handles numbers up to 999,999. Expanding past this in the current form would be a simple matter of adding a few lines to generate_integer(). However, a new elif would be required here for each time it increased by n¹⁰. This would be extremely inefficient in the long run. The step I would like to take is making the function more efficient and general, rather than relying on calculating the length of the input.
As mentioned in the previous section I also want to explore ways to make the generate_tens() and generate_hundreds() functions more streamlined.
Overall though, I am very happy with the progress I have made over the course of this mini-project:
- I have finally started to get my head around using mathematical operations more in my code, instead of relying on data structures and indices to extract information
- It was a great opportunity to practice abstracting and reducing the amount of work individual functions are responsible for. My development in this area is evident in the much-simplified code of the later iterations.
- This was actually great fun to mess around with, much more involved than writing smaller scripts to answer problems. It gave me a chance to try some new things and learn a lot of stuff along the way
Full code for each iteration of this project is on my GitHub repository