Showing posts with label open problems. Show all posts
Showing posts with label open problems. Show all posts

Wednesday, 15 June 2011

Catalan's theorem and Pillai's conjecture

Number theory always surprises by the seemingly simple questions which turn out to be extremely hard or maybe even impossible to answer. One such question concerns perfect powers of the type
xn
where both x and n are natural numbers. Can we say, which numbers can be expressed as perfect powers? If we don't further restrict n then the answer is simple. Any number x can be represented as x1. So that means the set of numbers that can be represented as perfect powers is the complete set \mathbb{N}. So this question is boring.

But what if we exclude these trivial powers and say that n must be greater than 1? The list of these numbers is the sequence A001597 in OEIS. On the right is a plot of the first 100 of these numbers an. As you can see, the numbers seem to grow more than linearly. In fact for large numbers they seem to grow with n2. This seems to suggest that the gaps between two consecutive numbers also needs to grow. This is intuitively right. As the numbers grow, there are less and less candidates that could produce perfect powets.

But how large are the gaps exactly? More interestingly, as the gaps grow larger, do we still encounter some small gaps? The plot on the left shows the differences gn between two consecutive elements of the sequence plotted against the smaller number for the first 1000 or so numbers
gn = an + 1an,
i.e. we're plotting gn against an. It can be seen that most of the gn seem to lie close to a curve given by
G(x)=  2 \sqrt{x} + 1
which also limits the gn. This simple form is amazing at first, however it becomes quite clear when one looks at just the square numbers x2. If an is a square number then (\sqrt{a_n}+1)^2 is also a perfect square. The difference between the two numbers is 2 \sqrt{a_n}+1. Only few numbers seem to depart from this behaviour and have a lot smaller values. This always happens when there is at least perfect power, that is not a square, between two perfect squares.
So we have an upper bound for the gn. But is there a lower bound for the gn that increases with n. Or does gn always have excursions to small numbers? Let's look first at the smallest possible value gn = 1.
Catalan's theorem makes a statement about this particular case. It says that there is only one n for which gn = 1, or in the full notation
xnym = 1 has only one solution, which is 32 − 23 = 1
Until 2002 this was Catalan's conjecture but then it was proven by Preda Mihăilescu. Long before this proof Pillai generalised the the Catalan conjecture and speculated about the solutions of the equation
xnym = c for c > 1
The graph above seems to suggest that for c = 2 there is also just one solution
33 − 52 = 2
but to my knowledge this has not been proven. Pillai's conjecture now states that for any value of c, the above equation only has a finite number of solutions. This means that the gaps in the sequence of perfect powers not only grow, but there is some increasing lower bound for the gaps as well. Erdos went one step further and conjectured that this lower bound is given by
gn < nk
where k is a fixed constant which is greater than one.
From the plot of the first million gaps this does not immediately seem obvious. The figure shows the plot of the gaps gn against n for n up to 106 on a double-logarithmic scale. Although the lower bound does seem to grow, the value of the parameter k of the Erdos formula, of roughly 0.22 (green line), is determined by a single outlier at a384026 = 143384152921 and g384026 = 17. Almaost all other values of gn lie well above the Erdos curve. Only one other point comes close to the limiting curve at g745174 = 24
One more interesting feature that can be seen are strange lines in the plot that slope at a slightly lower exponent of about 2 / 3. This suggests that there is some hidden structure in the seemingly random behaviour of the gn. Another feature, which I noticed when plotting the gn is that there seem to be islands of regularity.
If we zoom in to the region from 7.4\times10^{11} to 7.6\times 10^{11} we get the following plot. Most points are again crowded around the upper limit but the others have the very regular shape of two parabolas intersecting each other.
As you can see, even formulas that seem simple can have surprisingly rich behaviour and open up very difficult problems.

Monday, 16 May 2011

Collatz: Bringing some order into the chaos

I recently wrote that I was rekindling my interest in the Collatz problem. I still haven't had the time to read any literature but I came up with an interesting transformation to the problem that seems to unearth some structure which is maybe not directly obvious. For regular readers of this blog, you might see the similarities with this transformation and the formula I posted earlier on calculating the points for interval halving using a closed formula. It really was the Collatz problem that inspired that formula. To the uninitiated, the Collatz problem is based on the sequence
x_{n+1} = \begin{cases}
x_n/2 & x_n \text{ even}\\
3x_n + 1 & x_n \text{ odd}
\end{cases}

The question is whether, for any positive integer starting value, the sequence will always reach the value 1.

One can reverse the problem in the following way. Define the set C in the following way


\begin{alignat}{2}
&1 \in C\\
&\text{if } x \in C \quad\text{ then }\quad2x \in C\\
&\text{if } x \in C \quad\text{ and }\quad x \text{ mod } 3 = 1 \quad\text{ then }\quad\frac{x-1}{3} \in C
\end{alignat}

The problem then transforms to the question, whether C contains all positive integers, in other words: is C=\mathbb{N}? In this form the problem can be represented as a tree with 1 at its root, every element x has at most two children. One child is always 2x, the other child is (x − 1) / 3 if that number is an integer.

Every odd number x trivially has an infinite number of descendants of the form 2kx. I will call the set of these numbers the tower of x.

Definition: The tower Tx of an odd number x is the set T_x=\{2^k x | k=0,1,2,\ldots\}. x is called the base of the tower.

We also define the children of T in the natural way, as a set of towers which are based on those children of the elements of T which are not elements of T themselves.

Definition: The children CT of a tower T is the set

C_T = \{T_y | y = (x-1)/3 \quad\text{ where }\quad x \in T\quad\text{ and }\quad x \text{ mod } 3 = 1 \}

It's easy to show that every tower either has no children, if the base is a multiple of 3, or an infinite number of children otherwise.

I would like to find a way to map each tower onto a single number, i.e. every element x of Tx should be mapped onto the same value. To do this I take the largest power kx of 2 that is smaller or equal to x: 2^{k_x} \le x < 2^{k_x+1}. We now define

t_x := 2^{-k_x} x - 1

The values of tx lie between 0 and 1 and are rational numbers of the form

t_x = \frac{p}{2^k}.

It is clear that k2x = kx + 1 and therefore t2x = tx. This means, all the elements of a tower are uniquely mapped onto a single value. We can, therefore, identify these values with the towers they originate from and we will refer to the towers T by their t_T := t_x, x \in T.

The next question that arises is, given an tower tT what do the children of a tower look like. As an example, the children of t7 = 3 / 4 are plotted in the figure on the right. The values of the first few of these children are given in the following table.

Children of t = 3/4
Fraction Decimal
1 / 8 0.125
5 / 32 0.15625
21 / 128 0.16406
85 / 512 0.16602
341 / 2048     0.16650

It can be seen, and easily proved, that if p / 2k is a child of any t, then so is (4p + 1) / 2k + 2. This results in an geometric series that converges to

(3p+1)/(3\times 2^k)

The figure above plots the children of the first 1000 towers against the values of the towers themselves. Interestingly a clear structure emerges. The sequences of the children all seem to converge against two straight line segments. These line segments can be represented by the function

y = \begin{cases}
\frac{1}{3}( 4x + 1) & \quad \text{for } x< \frac{1}{2}\\
\frac{1}{3}( 2x - 1) & \quad \text{for } x> \frac{1}{2}
\end{cases}

When I first plotted this graph and discovered this very simple structure hidden in the Collatz problem, I was utterly amazed. Of course I don't know enough to decide if this structure could hold the key to unravelling the problem but it certainly gives me hope.

Finally, an apology to all those who know more about the problem. These things probably have been discovered already. I have not supplied any references and most likely got the terminology wrong. If so, I will be happy to be corrected.

Thursday, 28 April 2011

An Old Problem in Disguise

Here is a well known mathematical problem stated in a slightly different way than usual. See if you can find out which problem I am talking about.

Start with any binary number ending in a 1, e.g.

100101101

Now shift this number one place to the left, filling the rightmost digit with a 1

1001011011

then add this number to the original number

100101101
   1001011011
   ----------
   1110001000

Finally eliminate all trailing zeros

1110001

Now repeat the whole process until you reach the number 1. The question is, will you always end up at the number 1, irrespective of the number you started with?

You probably guessed it by now that this is the well known Collatz problem. In its standard form the Collatz conjecture reads

Given a starting number x0 and the recursion

x_{n+1} = \begin{cases}
x_n/2 & x_n \text{ even}\\
3x_n + w & x_n \text{ odd}
\end{cases}




the sequence xn will always reach the number 1.

I have always been fascinated by this problem, but never so seriously that I would have started reading up on it and following the newest results. Only recently I started pondering over it and came up with the representation in the beginning. Naive that I was, I thought I could see some patterns in the binary representation of the problem. Of course I was in no way the first person to think of this an some further reading on Wikipedia revealed that this is one of the standard alternative representations of the problem. Some more extended Google search revealed this online script for calculating the Collatz number sequence in binary form.

I like the binary representation because it somehow reminds me of cellular automata. In a cellular automaton the state of a cell in the next generation is determined by the states of the cells in a given neighbourhood in the previous generation. Although it looks similar, the binary Collatz problem doesn't map in a straightforward way onto a cellular automaton. This is because the addition of the two binary numbers can cause non-local changes due to the carry over. It turns out, however, that there is a way of translating the Collatz problem into a cellular automaton.

There are a few patterns that one can identify in the binary representation. If a number x ends on k ones then there will be k successive steps where the 3n+1 operation just adds one zero at the end. This means that the number will grow from its original value x to
\left(\frac{3n+1}{2}\right)^k x

On the other hand, if x ends in k alternating ones and zeros, then the next 3n+1 operation will turn this sequence into zeros and the result will be
\frac{3n+1}{2^k}

These quick investigations have certainly sparked my interest in the Collatz conjecture again, and I am now looking forward to reading a bit more in depth on the problem.