Solving Project Euler Problem 25

Emmanuel Bakare
An Idea (by Ingenious Piece)
9 min readMar 30, 2020

I’ve been busy into math problems lately and came across a rather usual one which is rather straight forward, find the index which first gives a 1000 digit fibonacci number.

From my past mathematical background, I’m always aware of the beauty of sequences which give rise to approximations that can be expressed as formulas. And truth be told, the Fibonacci sequence has a lot of kinky properties.

Dr Ron Knott covers on quite a lot of them in this article he wrote:

How I Solved This Problem?

First thing was to get the number of digits in the Fibonacci number, this can be exactly gotten using Binet formula which gives a nice approximation, then retrieve the length of the number by taking a simple Log to the base 10.

I later found this exercise to be rather bulky as one can easily take the log directly from the binet formula to arrive at the answer

Binet Formula F(n) = (1 + sqrt(5) / 2) ^ n  / sqrt(5)Placing phi = 1 + sqrt(5) / 2, we getF(n) = phi ^ n / sqrt(5) 
or
phi ^ n / ((2 * phi) - 1)
Any one works! :-)Taking log of the value gives a close value to the number of digits
log(F(n)) = log(phi) * n - log(sqrt(5))
= log(phi) * n - (1/2 * log(5))
= log(phi) * n - (1/2 * log(5))
The digit count is exactly the ceiling of the log(F(n))

The alternative derived runs in pseudo constant time (log functions are iterative approximations) which is much more efficient.

As for the log stats, check out a sample implementation for a natural log from Rob Pike here:

After this, I continued to discover a very nice pattern, the differences between the first indexes of the consecutive number of digits after 2 oscillate between 4 and 5, never more.

Short Notes detailing the occurrence, notice also that the number of 5s increase
So the first index for a 2 digit Fibonacci number == 7 
The first for a 3 digit Fibonacci nnumber = 12
The first for a 4digit Fibonacci nnumber = 17
...
All until we get to the 5th which is 21
What's more exciting is the pattern repeats indefinitely with random counts of 5 but only 1 instance of 4 in every cycle where 4 occurs.
Difference Plot For Len of Digits in F(n)

Visualisations courtesy of the gonum team, makers of the plot library in Go

For this graphx = I where I is the number of digits
y = n[I] - n[I - 1] where n is the index series
Differeneces for 1 to 1000 and 10000 digit sequences
Testing it out in the console reveals the same pattern
Do also note the graph points were interpolated by the plot library based on the sample data so some visuals are a bit inaccurate
go run main.go -number 20000
Approximate value 100000
Actual value 95697
Offset iterations 4303
go run main.go -number 20001
Approximate value 100005
Actual value 95702
Offset iterations 4303
go run main.go -number 20002
Approximate value 100010
Actual value 95706
Offset iterations 4304
go run main.go -number 20003
Approximate value 100015
Actual value 95711
Offset iterations 4304
go run main.go -number 20004
Approximate value 100020
Actual value 95716
Offset iterations 4304
go run main.go -number 20005
Approximate value 100025
Actual value 95721
Offset iterations 4304
go run main.go -number 20006
Approximate value 100030
Actual value 95726
Offset iterations 4304
go run main.go -number 20007
Approximate value 100035
Actual value 95730
Offset iterations 4305

You’d also notice from above that the offset iterations move up slowly but the differences between actual indexes are always 5 or less apart.

Case of len F(n) going to 10 shows the weird occurrence of the difference

The only case which goes above 5 is where Len(F(n)) = 2 as the index is 7 where F(n) = 13 and its minuend digit lies on index 1 so that’s fair but it only occurs along that index, no where else.

Moving on, I discover another pattern which is a divergence. Noting that the indices all lie within a range of 5 times the actual index, it makes sense that a range of 5 for the index would make sense to approximate the next index.

Let me explain in detail for the more patient reader:

Let n be the positive integer representing the first index of occurrence for a particular number of digits in a fibonacci number k such thatk = len(F(n)) where k - 1 <= n/5 < k
This translates to
n < 5 * k
such that
5 * k - n > 0
We know that:
F(1) = 1 which lies within 5 * 1 (n < 5)
F(7) = 13 which lies within 5 * 2 (n < 10)
F(12) = 144 which lies within 5 * 3 (n < 15)
.....
F(93) = 12200160415121909760 which lies within 5 * 20 (n < 100)

And it continues like that in splendid fashion supporting that previous inequality.

I also moved to plot the deviation between the approximated 5 * k value and the actual number of digits k. The outcome was rather humorous as the graph is wavy on smaller indices but gradually deviates after k = 8.

For this graph

y = k where k is the length of the digits.
x = 5 * k and it represents n approximate value

In the plot below, the green line is the actual length k to index n, the purple line represents k= n / 5 where n is the index and the red line shows the deviation between the actual value k and x = n/ 5 and along a positive scale.

Here k = 2 representing 10 and it makes no sense

At the initial onset, it oscillates below x repeatedly till we get to a point where n = 37 after which it just diverges positively from then on.

The function oscillates till 37 from which we have a positive deviation onwards

Running this for our previous example of x = 1000 and 10000 where k= 200 and 2000 respectively gives us this beautiful example of a positively diverging graph, supporting the next process for solving this problem.

Graph showcasing the positive deviation relationship

What this means is that to find the 1000 digit fibonacci number as the problem requires, we do not need to loop indefinitely to find it, we can easily just start from k = 5000 and reduce it till we get a valid index such that

len(F(n-1)) < k and len(F(n)) == k where k = 1000 and n < 5000

Writing the code leads to the example below:

The actual value is found one step after as the inequality proves.

With this, the problem is solved on the first try. It’s also efficient as I’ve skipped unnecessary iterations doing approximately 4.6% of the actual number specified, even bumped it up to 10 billion digits and it only took 2.15 billion iterations.

Approximate value 5000
Actual value 4782
Offset iterations 218
Iteration Percentage 4.558762024257633
Approximate value 50000
Actual value 47847
Offset iterations 2153
Iteration Percentage 4.499759650552804
Approximate value 5000000000
Actual value 4784971964
Offset iterations 215028036
Iteration Percentage 4.493820185735157
Approximate value 50000000000
Actual value 47849719665
Offset iterations 2150280335
Iteration Percentage 4.493820131140365

So what’s next?

Better Approximator Than 5 * k

I’m looking forward to help on the convergence part, I’ve got some ideas on how to get a better approximation which is centered around summing up the deviations for values from 4*k to 5*k incrementing the values from 4 to 5 by 0.1. Not sure if there’s also an exponential constant that can be added to get the exact sum since the deviation diverges positively with increases in n but not greatly so it’s possibly logarithmic.

Reasons For the Weird Occurence of 4

For any k > 5 , 4 always occurs after 4 or 3 5’s, never more or less after its first occurrence (There are 2 5’s for indexes 2, 3 and 4). It would be nice if someone could chip in how the sequence stays that way. Here are a couple for some index orders

n = 100
[7 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5]
n = 1000
[7 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4 5 5 5 5 4 5 5 5 5 4 5 5 5 4]

Iteration Offsets Also Follow A Pattern

So even the offsets follow a pattern also for multiples of a base value.

n = 10
Approximate value 50
Actual value 45
Offset iterations 5
Iteration Percentage 11.11111111111111
n = 50
Approximate value 250
Actual value 237
Offset iterations 13
Here's the amazing part,We first take the ratio = 50 / 10 = 5
Difference in offset = 5 * D(10) - D(50)
= (5 * 5) - 13
= 12
Taking the k of 10, multiplying by the ratio and adding the offset difference gives us the offset for 50
Value of 50 from 10 = 5 * 45 + 12 = 237
The odd part is it works for 100, 1000 etc.
Approximate value 5000
Actual value 4782
Offset iterations 218
Iteration Percentage 4.558762024257633
Lets do 10 to 1000
Taking the ratio = 100
Calculate Offset = (100 * 5) - 218 = 282
Value of 1000 from 10 = 45 * 100 + 282 = 4782
Trying 100 to 1000
Approximate value 500
Actual value 476
Offset iterations 24
Iteration Percentage 5.042016806722689
Taking the ratio = 10
Calculate Offset = (10 * 5) - 24 = 26
Value of 100 from 10 = 45 * 10 + 26 = 476

The amazing part about this is that it’s now possible to get a relationship between the offsets and the actual index that gives the first value.

The pattern also works for other values, try it with 5 and 10 as a weird exerciseApproximate value 25
Actual value 21
Offset iterations 4
Iteration Percentage 19.047619047619047
Approximate value 50
Actual value 45
Offset iterations 5
Iteration Percentage 11.11111111111111
Ratio = 10 / 5 = 2
Offset = 2 * 4 -5 = 3
Offset = 2 * 21 + 3 = 45

Others May Follow

The patterns I’ve noted here are those that I noticed after looking through the problem, there might be more.

The codebase is here if anyone wants to take a look, I pushed some of the plots although they’re pretty generic.

Finally, give a clap or two as it’d be appreciated.

--

--