Section12.5Introduction to Factorization

This is our last crack at issues directly related to cryptography. (That doesn't mean that other stuff we do is unrelated – oh no! But we will not make direct connections with them.) We will focus on the main attack on the RSA algorithm, namely factorization.

Subsection12.5.1Factorization and the RSA

Let's look at one more toy RSA problem to get a sense of what is going on. First, I choose a modulus \(n\). I will also verify it has two prime factors, without telling you what they are.

Then I choose an exponent to raise my secret message to...

I haven't told you \(\phi(n)\), but this guarantees it is coprime to my (public) encryption key \(e=13\). Now we can encode our message, \(x=11\):

Now, how could we hope to crack this sinister message? (Assume that euler_phi(899) is just too huge for Sage to do.) Well, we do know \(n=899\) and that \(e=13\). That could help.

In fact, if we knew \(p\) and \(q\), we could easily calculate \(\phi(n)\) without even using Sage. Can you quickly now factor \(n\) without using Sage?

By the way, how would one be smart about it? Think strategically; how should I have chosen a public modulus \(n\) to make this hard to do? How should \(p\) and \(q\) relate?

Hopefully you figured out \(p\) and \(q\). Then we just need to find an inverse modulo \(\phi(n)=(p-1)(q-1)\) to get our decryption key.

Remark12.5.1

Sage note:
You can fill in the values you got for p and q here to make things work. Try it!

And we decrypt:

This gives us the original message \(x=11\) again.

Subsubsection12.5.1.1A Fun Fact and an Example

I hope this makes it clear why factorization, not just looking for primes, might be important. To be truthful, many researchers in factorization simply do it to stay one step ahead of the other side, who is presumably also researching factorization - so to some extent it is an arms race.

But of course it is also inherently interesting mathematically! Here is an interesting factoid, for instance.

Example12.5.3
\[x^2-(899-840+1)x+899=x^2-60x+899=0\]\[x=\frac{60\pm \sqrt{60^2-4(1)(899)}}{2(1)}=30\pm\frac{\sqrt{3600-3596}}{2}=30\pm 1=29,31\]

Subsection12.5.2Trial Division

Okay, that was fun - but now, back to business.

The first, and oldest, method of factoring is one you already know, and maybe used a few minutes ago – trial factorization. It is the method we used with the Sieve of Eratosthenes; you just try each prime number, one by one.

Do you remember what the highest number you would have to try in order to factor \(n\) by trial division is? (Can you prove it?)

The following algorithm does this very naively (and slowly, even for trial division). Let's try to talk through what each step does.

Remark12.5.4

Sage note:
This is one of the few places where it really is important to follow the code. That said, the details of the syntax are not as important as the algorithm – unless you want to harness the power of computers more effectively!

Now let me verify it works on easy examples. Remember, we are just looking for factors at this point, not complete factorizations.

Okay, so this seems reasonable. But it's a little more problematic when you try to do large numbers, where large means “bigger than you can do by hand, but nowhere close to the size we looked at in the last class.” I'll actually time how long it takes.

Sage actually has this implemented in a much faster way, primarily by using optimized integers and a special version of Python that allows turning it into much faster code in the C language (Cython). Notice that it just returns a single factor – another slight speedup.

That's roughly one thousand times faster for the initial example! Naturally, it's possible to speed up even more. But note that getting the full factorization slows us back down; after all, one has to check that the remaining factor is prime (or factor it, if it isn't).

Even for this smaller number it takes some actual time - here is where one sees the difference between different implementations of the same algorithm.

Subsection12.5.3Starting in the middle

So much for trial division! But we have other tools at our disposal.

Some of you might have tried something a little different with \(n=899\) from our earlier RSA problem. Since we know that someone is trying to protect a secret, they probably are not going to pick a number with primes like \(3\) and \(5\) in it - that would be too easy to factor.

In fact, it stands to reason that the primes \(p\) and \(q\) should be relatively large compared to \(n\) – so why not start in the middle?

This was Fermat's idea for factoring larger numbers. However, he didn't just start with primes in the middle; for one thing, if your number is even somewhat big and you don't have a computer or huge list of primes, how would you know where to start? So Fermat became clever, as always, and used an algebraic identity to help himself along.

  • Write \(n=ab\), with \(a>b\), and assume \(n\) is odd (for hopefully obvious reasons!).
  • Then we can write \(n\) as a difference of two square numbers.
  • Namely, the squares of \(s=\frac{a+b}{2}\) and \(t=\frac{a-b}{2}\): \[s^2-t^2=\left(\frac{a+b}{2}\right)^2-\left(\frac{a-b}{2}\right)^2=\frac{a^2}{2}-\frac{a^2}{2}+\frac{b^2}{2}-\frac{b^2}{2}+\frac{2ab}{4}+\frac{2ab}{4}=ab=n\]

This may seem like an obscure identity to us, but at the time (and even until the last century) such identities were the bread and butter of algebra, before we had tools like computers to help us along.

So what Fermat did is to try this identity backwards. Here is his strategy.

Here is an implementation – again, assuredly slow, but at least verbose in its explanation – of this strategy. We simply start with the next \(s\) above the square root of \(n\), and just keep trying \(s^2-n\) again and again for bigger and bigger \(s\).

Example12.5.6

Before we move on, let's try to factor 143 and 93 using this algorithm. Remember, we start with \(s^2-n\), where \(s\) is the next integer above \(\sqrt{n}\), and see if it is a perfect square; then we increase \(s\) by one each time.

After we do them by hand, we can see what Sage does with them to check.

Well, we struck gold on the first try here! That happens if your number is the product of two primes which are two apart. (Such primes are known as twin primes, and have some interesting stories. Among other things, calculating with them helped find a bug in the Pentium computer chip in 1995.)

As you can see, we probably would have been better off with trial division for \(n=93\) - it's obvious that it's divisible by 3, but that takes a long time to reach from the middle.