Pell equations have infinite solutions

December 22, 2022
Útulek Series, 2 | CF1761D Chapter, 2

Last time, I left off with the following probability problem:

A drawer contains red socks and black socks. When two socks are drawn at random (without replacement), the probably that both are red is 1/21/2. How many socks can the drawer contain?

The first solution is fairly straightforward:

rr+br1r+b1=12    2r22r=r2+b2+2rbrb    r2(2b+1)rb2+b=0    r=(2b+1±8b2+1)/2.(0.1)\begin{aligned} &\frac{r}{r+b}\cdot\frac{r-1}{r+b-1}=\frac{1}{2}\\[3ex] \implies&2r^2-2r=r^2+b^2+2rb-r-b\\ \implies&r^2-(2b+1)r-b^2+b=0\\ \implies&r=(2b+1\pm\sqrt{8b^2+1})/2. \end{aligned}\tag{0.1}

It’s easy to see that b=1b=1 gives a square determinant and a valid r=3r=3. In fact, since 8b2+1\sqrt{8b^2+1} is necessarily odd and also grows faster than 2b+12b+1, the solutions to the original problem form a bijection with the positive integer solutions to x2=8b2+1x^2=8b^2+1, which is a case of the general Pell equation

x2Dy2=1(0.2)x^2-Dy^2=1\tag{0.2}

with D=8D=8. Should DD be a square, it is trivial that Pell equations have no solution in the positive integers should DD. Otherwise, it is well-known that they exhibit infinite solutions. In this post, I will provide my intuition behind a constructive proof of the latter claim, based loosely on the proof presented in Titu’s An Introduction to Diophantine Equations, before briefly discussing how this relates back to 1761D.

1 Integers are Dense in the Radicals

Equation (0.2)(0.2) essentially says: “x2x^2 and Dy2Dy^2 are very close” (there is a similar variant of the Pell with RHS ±1\pm1). Knowing that D\sqrt{D} is irrational, one might suspect that some yDy\sqrt{D} might be very very close to an integer xx, which would get us close to the claim of the Pell equation. Thus forms our first lemma.

Lemma 1.1: For any integer EE, we can find positive integer x1,y1x_1,y_1 satisfying y1Ey_1\leq E and

x1y1D<1/E.(1.1)|x_1-y_1\sqrt{D}|<1/E.\tag{1.1}

Proof of Lemma 1.1: I would use ϵ\epsilon as the RHS, but it complicates later parts of the proof without first asserting that the rationals are dense in the reals, which I want to avoid here.

This lemma is almost certainly true, should one take an example D=7D=7 and enumerate the D\sqrt{D} multiples. We thus enumerate the first E+1E+1 multiples 0,D,,ED0,\sqrt{D},\ldots,E\sqrt{D} and their fractional parts 0{0},{D},,{ED}10\leq\{0\},\{\sqrt{D}\},\ldots,\{E\sqrt{D}\}\leq 1. By pidgeonhole, then, there must be two multiples {iD},{jD}\{i\sqrt{D}\},\{j\sqrt{D}\}, iji\leq j which fall within 1/E1/E of each other. Subtracting their indices, we have y1=jiy_1=j-i and x1x_1 likewise as the difference between their integer portions. \blacksquare

Technically, we will had to have invoked the Archimedean principle, like in the proof of dense real numbers, but I’d like to stay away from group theory for now.

2 Dy2Dy^2 Can Be Close, But Not Arbitrarily Close to x2x^2

Lemma (1.1)(1.1) tell us y1Dy_1\sqrt{D} can be arbitrarily close to an integer x1x_1. In fact, an infinite number of pairs (x1,y1)(x_1,y_1) may be found with repeated applications of the pidgeonhole principle.

Going from a bound on x1y1Dx_1-y_1\sqrt{D} to a bound on x12y12Dx_1^2-y_1^2D now seems straighforward:

x1y1D<1/E    x1+y1D<2y1D+1/E    (x1y1D)(x1+y1D)= x12y12D<2(y1/E)D+1/E2\begin{aligned} &|x_1-y_1\sqrt{D}|<1/E\\ \implies&|x_1+y_1\sqrt{D}|<2y_1\sqrt{D}+1/E\\ \implies&|(x_1-y_1\sqrt{D})(x_1+y_1\sqrt{D})|\\ =\ &|x_1^2-y_1^2D|< 2(y_1/E)\sqrt{D}+1/E^2 \end{aligned}

Recall y1Ey_1\leq E:

x12y12D<2D+1.|x_1^2-y_1^2D|< 2\sqrt{D} + 1.

Thus, should we enumerate the integers kk in (2D1,2D+1)(-2\sqrt{D}-1,2\sqrt{D}+1), at least one of them k0k_0 must satisfy x12Dy12=k0x_1^2-Dy_1^2=k_0 for infinitely many pairs (xi,yi)(x_i,y_i) for i1i\geq 1.

Lemma 2.1: An infinite number of positive integer pairs (xi,yi)(x_i,y_i) satisfy

xi2Dyi2=k0x_i^2-Dy_i^2=k_0

for some integer k0k_0.

3 Eyes on the Ball

Lemma (2.1)(2.1) is remarkably close to the Pell equation (0.2)(0.2). Squinting our eyes at (2.1)(2.1), one may see

xi2Dyi2=k02(3.1)x_i^2-Dy_i^2=k_0^2\tag{3.1}

which will give a solution to the Pell equation should k0|k_0| divide both xi,yix_i,y_i, which seems likely considering we have infinite pairs (xi,yi)(x_i,y_i).

In fact, getting to (3.1)(3.1) only requires a small amount of algebraic foresight. Take two pairs (xi,yi),(xj,yj)(x_i,y_i),(x_j,y_j) and multiply them:

(xi2Dyi2)(xj2Dyj2)=k02    xi2xj2+D2yi2yj2D(xi2yj2+xj2yi2)=k02(3.2)\begin{aligned} &(x_i^2-Dy_i^2)(x_j^2-Dy_j^2)=k_0^2\\ \implies&x_i^2x_j^2+D^2y_i^2y_j^2-D(x_i^2y_j^2+x_j^2y_i^2)=k_0^2 \end{aligned}\tag{3.2}

at which point we recognize we want to add something to xi2yj2+xj2yi2x_i^2y_j^2+x_j^2y_i^2 to make it a square, and to subtract the same thing from xi2xj2+D2yi2yj2x_i^2x_j^2+D^2y_i^2y_j^2 to also make it square:

D(xi2yj2+xj2yi2)2Dxixjyiyj=D(xiyjxjyi)2xi2xj2+D2yi2yj2+2Dxixjyiyj=(xixj+Dyiyj)2..\begin{aligned} D(x_i^2y_j^2+x_j^2y_i^2)&-2Dx_ix_jy_iy_j=D(x_iy_j-x_jy_i)^2\\ x_i^2x_j^2+D^2y_i^2y_j^2&+2Dx_ix_jy_iy_j=(x_ix_j+Dy_iy_j)^2.. \end{aligned}

Thus, let x=xixj+Dyiyj,y=xiyjxjyix=x_ix_j+Dy_iy_j,y=x_iy_j-x_jy_i and we have

x2Dy2=k02x^2-Dy^2=k_0^2

as we wished for earlier. As suggested, to ensure xy0[k0]x\equiv y\equiv 0[|k_0|], we may invoke the fact that there are infinite pairs (xi,yi)(x_i,y_i) and hence there must be pairs (xi,yi),(xj,yj)(x_i,y_i),(x_j,y_j) with xixj[k0]x_i\equiv x_j[|k_0|] and yiyj[k0]y_i\equiv y_j[|k_0|], which will give us the divisibility requirement.

Theorem 3.1: For non-square DD, there exist at least one pair of positive integers x,yx,y satisfying

x2Dy2=1.x^2-Dy^2=1.

4 Infinite Solutions via Matrix Exponentiation

With our initial solution x,yx,y (which we will now re-index as x1,y1x_1,y_1), we may generate a second solution x2,y2x_2,y_2 with similar algebraic manipulation as in (3.2)(3.2):

(x12Dy12)(x12Dy12)=1    x14+D2y142Dx12y12=1    (x12+Dy12)2D(2x1y1)2=1    x22Dy22=1\begin{aligned} &(x_1^2-Dy_1^2)(x_1^2-Dy_1^2)=1\\ \implies&x_1^4+D^2y_1^4-2Dx_1^2y_1^2=1\\ \implies&(x_1^2+Dy_1^2)^2-D(2x_1y_1)^2=1\\ \implies&x_2^2-Dy_2^2=1 \end{aligned}

with x2=x12+Dy12,y2=2x1y1x_2=x_1^2+Dy_1^2,y_2=2x_1y_1. As one may suspect, multiplying the equation by x12Dy12=1x_1^2-Dy_1^2=1 will always give new solutions. For example,

(x22Dy22)(x12Dy12)=1    x22x12+D2y22y12Dx22y12Dx12y22=1    (x2x1+Dy2y1)2D(x1y2+x2y1)2=1    x32Dy32=1\begin{aligned} &(x_2^2-Dy_2^2)(x_1^2-Dy_1^2)=1\\ \implies&x_2^2x_1^2+D^2y_2^2y_1^2-Dx_2^2y_1^2-Dx_1^2y_2^2=1\\ \implies&(x_2x_1+Dy_2y_1)^2-D(x_1y_2+x_2y_1)^2=1\\ \implies&x_3^2-Dy_3^2=1 \end{aligned}

with

xi=xi1x1+Dyi1y1,yi=x1yi1+xi1y1.(4.1)x_i=x_{i-1}x_1+Dy_{i-1}y_1,y_i=x_1y_{i-1}+x_{i-1}y_1.\tag{4.1}

We can, of course, express this more compactly in matrix form:

[xiyi]=[x1Dy1y1x1][xi1yi1].(4.2)\begin{bmatrix} x_i\\ y_i \end{bmatrix}=\begin{bmatrix} x_1&Dy_1\\ y_1&x_1 \end{bmatrix}\begin{bmatrix} x_{i-1}\\ y_{i-1} \end{bmatrix}.\tag{4.2}

An interesting corollary which I shall state without proof is that should x1,y1x_1,y_1 be the minimal positive solution, this method generates all solutions. Thus, we must have the following corollary.

Corollary 4.3: All solutions to Pell equation (0.2)(0.2) may be generated by sequential matrix exponentiation:

[xiyi]=[x1Dy1y1x1]i1[x1y1].\begin{bmatrix} x_i\\ y_i \end{bmatrix}=\begin{bmatrix} x_1&Dy_1\\ y_1&x_1 \end{bmatrix}^{i-1}\begin{bmatrix} x_1\\ y_1 \end{bmatrix}.

5 Tying it Together

Recalling (0.1)(0.1) for the original probability question, we have D=8D=8 and initial solution (x1,y1)=(3,1)(x_1,y_1)=(3,1), and may subsequently generate (x2,y2)=(17,6)(x_2,y_2)=(17,6) which is easily verifiable. The proof provided above is reasonably constructive, and may be used to find the initial solution for the general Pell equation (0.2)(0.2)—however, the better-known method is to use the continued fraction representation of D\sqrt{D}. The proof for why that works is tedious and I will not present it presently.

It is also a short derivation to determine that the result in corollary (4.3)(4.3) may alternatively be expressed as

xi=(x1+y1D)i+(x1y1D)i2yi=(x1+y1D)i(x1y1D)i2D\begin{aligned} x_i&=\frac{(x_1+y_1\sqrt{D})^i+(x_1-y_1\sqrt{D})^i}{2}\\ y_i&=\frac{(x_1+y_1\sqrt{D})^i-(x_1-y_1\sqrt{D})^i}{2\sqrt{D}}\\ \end{aligned}

with the eigenvalues of matrix [x1Dy1y1x1]\begin{bmatrix}x_1&Dy_1\\y_1&x_1\end{bmatrix} in plain view. I was actually not aware of the general result that matrix exponentiation could always be expressed in terms of exponentiation of the eigenvalues of the matrix, but it certainly puts into context the relationship between Binet’s formula and the Fibonacci matrix solution (follows from equation (1.3)(1.3) from the previous post):

[Si+1Si]=[1110]i[10],\begin{bmatrix} S_{i+1}\\ S_i \end{bmatrix}=\begin{bmatrix} 1&1\\ 1&0 \end{bmatrix}^i\begin{bmatrix} 1\\ 0 \end{bmatrix},

Si=15((1+52)i(152)i).S_i=\frac{1}{\sqrt{5}}((\frac{1+\sqrt{5}}{2})^i -(\frac{1-\sqrt{5}}{2})^i).

So there is some relationship between one-dimensional recurrences, such as the Fibonacci, and Pell equations—which leads to them both being solved by the general form

Si=(some radical)(some radical)i+(some radical)(some radical)i.S_i=(\text{some radical})(\text{some radical})^i+(\text{some radical})(\text{some radical})^i.

It is then surprising to me that when a recurrence is upgraded from one-dimensional to two-dimensional, such as with 1761D or Pascal’s triangle, we no longer see powers of radicals, but rather summations of (nk)n\choose k (solution (2.5)(2.5) for 1761D from the previous post):

AN,K=j=12K3Nj(NKj/2)(K1j/21).A_{N,K}=\sum_{j=1}^{2K}{3^{N-j}{N-K\choose \lfloor j/2\rfloor}{K-1\choose \lceil j/2\rceil-1}}.

At this point, I believed I didn’t understand well enough the class of functions like corollary (4.3)(4.3) which exhibit exponential behavior:

f(1)f(i)=f(i+1).(5.1)f(1)f(i)=f(i+1).\tag{5.1}

In the next post, I investigate multiplicative functions, which I mistakenly thought were functions like (5.1)(5.1), but are actually functions of the form

f(i)f(j)=f(ij).(5.2)f(i)f(j)=f(ij).\tag{5.2}

This will lead us on a convoluted journey through the linear sieve and Taylor’s series—and, eventually, back to 1761D.