Cryptography in Politics: A Crash Course
A crash course for non-engineers, written by an engineer.
In the debate over online age verification, some tech policy experts have said, “The technology to verify your age without violating your privacy does not exist.” That claim is certainly news to me; I’m an engineer who built a proof-of-concept for privacy-conscious age verification.
In tech policy, you will at times inevitably wade into the realm of cryptography—an area where you often need technical expertise. Tech policy experts, however, often have a humanities degree and a policy job where they specialize in tech policy. They don’t have the right type of expertise.
Nonetheless, if you have a job that touches tech policy (such as a congressional staffer), you often have to make policy judgments—whether you have the needed expertise or not. Thus, I’ve written this crash course to help you make better judgments—and feel less lost along the way.
The goal of this crash course is not to tell you what to think; it’s to teach you how to think. (You’ll also learn why passwords should include at least eight characters, an uppercase letter, a lowercase letter, a number, a special character, etc.)
Tools, Tasks, and Protocols
Hashing, blockchain, and zero-knowledge proofs, oh my! You may have seen these terms thrown around, but what do they mean? Rather than answer that question, let’s take a step back and start with something simpler: tools, tasks, and protocols.
Let’s start with tasks. What useful thing are you trying to accomplish with technology? Here are some examples of tasks:
Two parties send encrypted messages to each other.
One party verifies that a document was digitally signed by another party.
One party verifies their age for another party.
Once we establish what the task is, we design a protocol: a precise set of instructions where two or more parties communicate to accomplish a task. (A protocol is essentially a communications algorithm.)
To build a protocol, we will have various tools at our disposal. Hashing is a tool. Blockchain is a tool. Zero-knowledge proofs are a category of tools.
Tools are used to build the product; tools are not the product. In cryptography, the product we are building is a protocol.
An Example Protocol
To make these concepts more concrete, let’s design a protocol that is accessible to beginners. For this protocol, the task will be sending encrypted messages.
The Tool: Caesar Cipher
A Caesar cipher takes every letter of a message and shifts it forward in the alphabet. If you go past the end of the alphabet, you start over at A.
For example, with a shift value of 3, here is how we shift the letters:
A becomes D: A→B→C→D
B becomes E: B→C→D→E
Z becomes C: Z→A→B→C
If our message is HELLO WORLD
, and we use a Caesar cipher with a shift of 4, the new “message” will be LIPPS ASVPH
. (For example, H becomes L: H→I→J→K→L.)
The Protocol
Protocols usually rely on a key, which is a secret value. For a Caesar cipher, the key is the shift value. Here is the protocol to encrypt messages:
Beforehand—when nobody can eavesdrop on them—Alice and Bob agree on the key: the shift value for a Caesar cipher. Let’s say that they agree on a shift of 4.
Alice sends Bob a message (or vice-versa):
Alice creates a message:
HELLO WORLD
Alice shifts every letter forward 4 spots.
HELLO WORLD
becomesLIPPS ASVPH
Alice sends the encrypted message,
LIPPS ASVPH
, to Bob.Bob shifts every letter backward 4 spots.
LIPPS ASVPH
becomesHELLO WORLD
If a third person, Eve, eavesdrops on the conversation, the “message” that she will see is jumbled letters: LIPPS ASVPH
.
Evaluate Protocols, Not Tools
While new technology often fascinates people, sometimes we get so enamored with technology that we lose sight of the task we’re trying to accomplish. Does this new technology help us accomplish that task, or is it just a shiny new thing?
While people love playing with chatbots like ChatGPT, what happens if an airline’s chatbot gives a wrong answer about a refund policy? Air Canada learned the answer the hard way: a Canadian tribunal made them honor the refund policy. Or, as Cloudflare CEO Matthew Prince said, “AI demos are easy. AI products are hard.”
In a similar vein, if you have a colorblind friend who is holding a red ball and a green ball, you can use a zero-knowledge proof to convince him that the balls have different colors—without revealing which ball is which color. That’s a cool party trick, but it’s not clear whether this zero-knowledge proof would have practical applications.
Tools are used to build the product; tools are not the product. The thing you want to evaluate is the product, not the tools. In this case, the product is the protocol.
For cryptography in particular, you can definitely shoot yourself in the foot by misusing your tools. The protocol lets us see how we are using our tools—and whether we are misusing those tools.
Later, we will examine two very similar protocols for two very similar tasks. One protocol is secure, and one protocol is fatally flawed. Both protocols use the same tool—hashing—but one protocol misuses that tool.
The more you think at the level of protocols (and not at the level of tools), the more persuasive your arguments will be.
This insight applies in both directions, too. If you want to argue that a certain task is not possible, then what is the key challenge in designing a protocol for this task? Why are existing tools not capable of solving that challenge? (A humanities degree often will not give you the expertise needed to answer those questions.)
Evaluating the Example Protocol
As a practical example, let’s evaluate our protocol for sending encrypted messages using a Caesar cipher. It has two issues:
Beforehand—when nobody could eavesdrop on them—Alice and Bob agreed on the key: the shift value for the Caesar cipher. What happens when Alice and Bob cannot agree on a key beforehand?
There are only 25 possible keys. If Eve intercepts the encrypted message,
LIPPS ASVPH
, she can try to decrypt it with every possible shift value (1, 2, 3, …). Eventually, one of those shift values will work.
In the real world, we would use a different protocol for this task. And if two parties cannot agree on a key beforehand, we can use additional tools for key agreement: a way for Alice and Bob to agree on a key without revealing that key to an eavesdropper.
Negotiating the Requirements
Let’s use age verification as a case study on defining and negotiating the requirements. Here, critics will frequently raise this point: kids will find a way to bypass age verification.
That point is technically correct but practically useless. What percentage of kids bypass age verification? Is it 0.5% of kids, or 50% of kids?
When you define the requirements for a task, many requirements will not be all-or-nothing. Instead of demanding perfection, you will determine what is good enough. There’s a saying that you don’t let the perfect be the enemy of the good.
Engineers in particular typically do not talk about 100%. Instead, they talk about the “number of 9s.” For example, 99.9% would be three 9s. Amazon S3, a cloud storage service, even promises eleven 9s of durability.
So how often should an age verification system stop kids? Do we need eleven 9s? Probably not. Is one 9 (90%) a reasonable request? Yes. (Even if age verification stopped kids only 75% of the time, that would still be a major policy victory.)
That leads to a key point: there often is room for reasonable negotiation on the requirements. In some cases — especially when cryptography gets involved — a seemingly intractable technical challenge can become easily solvable if you make a reasonable concession.
Tradeoffs are common in engineering. If you asked for age verification with eleven 9s of effectiveness, it would be extremely challenging to build that in a privacy-conscious way. If you only asked for one 9, the privacy challenges become much easier to solve.
A Good Protocol: Password Authentication
As a real-world example, let’s look at one task — password authentication — and the protocol we use to accomplish this task.
The Tool: Hashing
This protocol will use one key tool: hashing. Hashing can create a “digital fingerprint” for any piece of data—such as a password, a Word document, or a video file.
The input of a hash function is arbitrary data of any size.
The output is a short piece of data, which is our digital fingerprint (also known as a hash or a hash value).
Here is an example where the input is a password:1
Input:
MyPassword
Output/hash value:
dc1e7c03e162397b355b6f1c895dfdf3790d98c10b920c55e91272b8eecada2a
If the input was a different password, or if the input was a large Word document, the output would be different, but it would have the same length: 64 characters.
Just like each person has a unique fingerprint, each input produces a unique hash value. Two different passwords (or two different Word documents) will never have the same hash value.2
Hashing is also a one-way operation. If all you know is the hash, it is impossible to figure out which input produced that hash — unless you get lucky and guess the input. For example, if I know the hash of your password (dc1e7c03e162397b355b6f1c895dfdf3790d98c10b920c55e91272b8eecada2a
), I cannot reverse-engineer it to obtain your password (MyPassword
).
The Protocol
So how does password authentication work? Here’s the basic protocol:3
A user sets/resets their password:
The user sends their password to a site (e.g.,
MyPassword
).The site computes the hash of that password (e.g.,
dc1e7c03e162397b355b6f1c895dfdf3790d98c10b920c55e91272b8eecada2a
).The site stores that hash in a database.
A user logs in:
The user sends their username and password to a site.
The site computes the hash of the password it just received.
The site retrieves the hash on file for that user.
If the two hashes match, the password is accepted.
If a data breach occurs, a hacker can steal the hash of your password, since that’s stored in the site’s database. But since hashing is a one-way operation, the hacker cannot reverse-engineer that hash to obtain your password.
(This protocol is missing one small yet important detail; we’ll return to that later.)
A Meta-Point on Data Breaches
Using this protocol as an example, we can also make a meta-point on data breaches: in some cases, a well-designed protocol can make certain guarantees even if a data breach occurs.
In this protocol for password authentication, your password cannot be stolen even if a data breach occurs. In my protocol for age verification, users cannot be de-anonymized even if a data breach occurs.
A Bad Protocol
Could we apply the same idea to Social Security numbers (SSNs)? Could a site use a similar protocol to store the hash of an SSN?
No. Even though we only changed one detail, the protocol is now fatally flawed. While we are using the same tool—hashing—we are now misusing that tool.
Earlier, we said, “If all you know is the hash, it is impossible to figure out which input produced that hash — unless you get lucky and guess the input.” That last part raises an intriguing possibility: instead of trying to make a lucky guess, what if you guessed every possible input? That is a brute-force attack.
Let’s say that a data breach occurred, and a hacker learns the hash of your SSN: 72de837c74b40716d430c711eebde10ff965fcc4a70c98e63a233ff36eebd6a1
.4
An SSN is a 9-digit number, so there are 1 billion possible SSNs. The hacker could compute the hash of all 1 billion SSNs — until they find the SSN that matches the stolen hash. How long would that take? On my laptop, I can calculate those billion hashes in a little over a minute; the matching SSN is 123-45-6789.5
With hashing, you need to pay attention to how many possible inputs there are. If the input is 256 random bits, there are over 1077 possible inputs: 1 followed by 77 0s.6 By comparison, there are about 1080 atoms in the universe; a brute-force attack is impossible. With only 1 billion (109) possible inputs, though, brute force will work.
The Missing Detail for Password Authentication
Our earlier protocol for password authentication was missing one key detail: the requirements for a valid password. Usually, passwords should include at least eight characters, an uppercase letter, a lowercase letter, a number, a special character, etc.
Those requirements exist because they expand the number of possible passwords, which guards against a brute-force attack. Brute force has a high chance of success for passwords under eight characters. (In practice, password length is the most important requirement. A 15-character password with only lowercase letters is much more secure than an 8-character password with all the special gadgets.)
In Summary
First, we establish what the task is (e.g., sending encrypted messages). This task will usually come with some requirements, though there often is room for reasonable negotiation on the requirements.
Next, we design a protocol. A protocol is a precise set of instructions where two or more parties communicate to accomplish a task. To build a protocol, we will have various tools at our disposal (e.g., hashing).
Tools are used to build the product; tools are not the product. The thing you want to evaluate is the protocol, not the tools. For cryptography in particular, you can definitely shoot yourself in the foot by misusing your tools. The protocol lets us see how we are using our tools—and whether we are misusing those tools.
The more you think at the level of protocols (and not at the level of tools), the more persuasive your arguments will be.
The input has a UTF-8 encoding, and the hash function is SHA-256. The output is 256 bits; we use a hex encoding to encode these bits as text. Each character has 16 possible values (0-9, a-f) and can encode 4 bits: 24 = 16. Thus, 64 characters would encode 256 bits.
To use SHA-256 as an example, when there are infinite possible inputs and only 2256 possible outputs, it is technically correct that some inputs will have the same output; the term for that is a collision. However, the odds of finding any collision — much less a collision of practical significance — are nigh impossible.
In practice, many sites will also use an additional tool, salting, but that’s beyond the scope of this article.
We assume that the SSN is stored as a 4-byte, little-endian integer. As before, the hash function is SHA-256.
Or, you could precompute all 1 billion hashes, and store them in a database that can look up the SSN for any hash. This technique is called a dictionary attack.
You should use a cryptographically strong random number generator; a normal random number generator should not be used.