Section13.5Proving sums of squares

Theorem13.5.1

Every prime of the form \(4k+1\) can be written as a sum of squares.

The proof is fairly long. Here is the strategy.

Suppose we find some blue dot \((ak+bp,a)\) such that \[0<N(ak+bp,a)=a^2+(ak+bp)^2<2p\, .\]
Then we know, modulo \(p\), that \[N(ak+bp,a)=a^2+(ak+bp)^2 \equiv a^2+(ak)^2\equiv a^2+a^2 k^2\equiv a^2-a^2\equiv 0\text{ mod (}p)\, ,\] so \(p\) in fact divides the norm of the point \((ak+bp,a)\).
So we have that \(0<a^2+(ak+bp)^2<2p\) and that \(p\mid a^2+(ak+bp)^2\cdots\)
So the only possibility is that \(p=a^2+(ak+bp)^2\), which gives \(p\) explicitly written as a sum of squares.

Example13.5.2

For instance, with \(p=5\), we have that \(k=\left(\frac{5-1}{2}\right)!=2!=2\), so we need to find a point \((a,2a+5b)\) such that \(a^2+(2a+5b)^2<2p\). Guess and check with \(a=1\) and \(b=0\) gives us \[N(1,2\cdot 1 +5\cdot 0)=1^2+(2\cdot 1+5\cdot 0)^2=5<2\cdot 5=10\] so this point should work, and this does give the correct statement that \[5=1^2+2^2\; .\]

So what remains to be shown is that there actually IS such a blue dot.

Subsection13.5.1Visualizing the Proof

We need to prove there is a blue dot (somewhere) that is not at the origin but also has norm smaller than \(2p\) (remember the inequality above).

We include a variation on the graphic to make this visually clear. The bigger circle is the one we care about now – it has formula \(x^2+y^2=2p\), so radius \(\sqrt{2p}\). If we find a blue point inside that circle but not at the origin, then the above argument proves it must be on the smaller circle.

Very strangely, the best way to do this is by considering the areas of the various circles, and showing that they are so big you just must have a blue point in it (but not at the origin).

The area of the bigger circle, which has radius \(\sqrt{2p}\), is \(\pi (\sqrt{2p})^2=2\pi p\). Since \(\pi >2\), we have that \(2\pi>2(2)=4\), which mean that the area of the bigger circle is bigger than \(4p\).

What we do now is to create a sublattice of the blue dots.

Take all blue dots, and just double their coordinates; this will give you the green dots above.
(Naturally, underneath each green dot is a blue dot.)

Next, we take a look at the triangles made by the green dots.

The thinnest triangle made by blue dots would be from the origin and the points \((p,0)\) (with \(a=0,b=1\)) and \((k,1)\) (with \(a=1,b=0\)).
So you should see that the thinnest triangle made by the green dots has width \(2p\) (from the origin to \((2p,0)\), the previous point doubled) and height \(2\) (to the point \((2k,2)\), which is \((k,1)\) doubled).
You can click on triangles_on above to see them in red.
This triangle has area \(4p/2\) - so the parallelogram with the solid red lines has area \(4p\). So it's smaller than the bigger circle!
Finally, note that the green points repeat in parallelograms like this every time you move outside this particular parallelogram, infinitely often.

This proof is very visual, so before we move on, make sure you believe all of this. Then we will analyze the exact areas involved more closely to finish. Remember, we are trying to prove that there is a blue point inside the bigger blue circle, but away from the origin.

Subsection13.5.2Finishing the proof

Let's look at the picture again.

The area of the circle is more than the area (\(4p\)) of the parallelogram.
Since all points inside the parallelogram (not just green, blue, or lattice points) repeat outside of it, \(4p\) is the biggest area you can have and not repeat some point.
So, the circle, having a bigger area, must have two points (not necessarily blue points, just points on the plane) which are repeated by the shifting of this parallelogram (called a fundamental region).

This may sound a little suspicious, so let's be sure about it.

Claim13.5.3

The circle has two points of some kind repeated by shifting the fundamental region.

Proof

Now let's continue the proof.

Take two points that are repeated in the circle - say \(v\) and \(w\). Then \(v-w\) (if we consider them as vectors) is actually a green point, since the difference one shifts them by must be one of the obvious directions of the parallelogram.
That means the point \((v-w)/2\) is a blue point, and it's not the origin!
Further, this blue point is inside the big circle.
- Since the circle is nicely symmetric about the origin, the point \(-w\) is also in the circle.
- The midpoint of the line segment connecting \(v\) and \(-w\), both points in the big circle, is in fact \((v-w)/2=\frac{v+(-w)}{2}\).
- So it's in the big circle, and we have found a blue point other than the origin in the blue circle.

Here is the picture of how to find the blue point in the circle. The black points are \(v\), \(w\), and \(-w\), and you see the midpoint of the line is indeed blue.

Remark13.5.4

Sage note:
This is by far the longest code we've seen up to this point. It is a brute force check of all movements of all points in the parallelogram to find two points in the bigger circle. Can you think of ways to make it more efficient?

Believe it or not, that is the proof – whew! Why was this so hard? I can think of three reasons.

First, we are trying to prove something about squares by proving something about square roots. It works, but it means there will be many steps.
Secondly, we are not just algebraically proving it exists; we are forced to prove our square root exists with inequalities, which brings another set of complication.
Third, we are looking not just at any old inequalities, but truly geometric ones, and so we must gain insight that way – worthwhile, but stretching. (Indeed, many more theorems of this kind can be proved using these techniques from names we may run across again – Minkowski and Blichfeldt. We are staying away from generality, believe it or not.)