On log identity and key rotation

[ Not sure this is the right place for this as it’s a design discussion that might be useful to reference in the future, but it’s also neither a proposal nor minutes and didn’t feel worth a top-level .md document. Suggestions welcome. ]

This issue discusses what defines the identity of a log, how it’s cryptographically tied to artifacts (signatures and tree heads), and the mapping between identity and key(s).

Current Sigsum design

A log is identified by the hash of its public key. It has an associated public endpoint, but the endpoint could change without side-effects.

“The same log key must never be used to sign tree heads from other trees.”

A signed_tree_head signed by the log doesn’t include a log identifier, while a cosigned_tree_head signed by a witness includes the log key hash.

Current Google design

A log is identified by a log ID, which defaults to a hash of a “origin string” and the log public key. See the recent transparency-dev/formats#14 issue, which changed it from a hash of the public key alone, since there are multiple logs in the wild sharing the public key.

A tree head, encoded as a checkpoint, includes the origin string (as the first line). Logs and witnesses both sign the same checkpoint.

Goals

Enabling log key rotation
- Some logs, like the Go Checksum Database, are planning to run effectively in perpetuity. Over their lifetime, they might need to rotate their public key either to switch algorithms or due to key compromise.
- When that happens, it’s desirable that the log maintain continuity (which can be achieved simply by cloning the log and updating the two versions in lockstep) and that witnesses and monitors enforce this continuity (which instead requires witnesses and monitors to be aware that the two logs are one and the same).
- Likewise, it’s desirable to retain backwards compatibility, so that clients not aware of the key rotation are still functional, albeit less secure.
- The checkpoint format enables key rotation: a note can be signed by multiple keys and clients will transparently ignore signatures from keys they don’t recognize. The log just has to sign checkpoints with both the old and the new key. Since the log ID is configurable, witnesses can update to the new key while persisting the log’s identity by explicitly fixing the old log ID, but that’s somewhat of a hack since the log ID by default includes the log key.
- The Sigsum design does not allow for log key rotation. The identity of a log is defined by its public key, and this is reflected in the cryptographic design: signed tree heads don’t include the log identity as it’s implicit in the signing key.
Preventing witness signature rebinding
- If the message signed by a witness could refer to multiple logs, an attack is possible. Two logs are spun up: log X which is well know, monitored, and relied upon by many clients; and log Y which no one cares about or monitors. They progress in lock step, with the same entries until one day a malicious entry is pushed to log Y. The witness signature on the log Y tree head that includes the malicious entry is sent to a targeted client pretending it’s for log X. The client will believe this entry was witnessed and therefore eventually monitored as part of log X, but it was not.
- The Sigsum design prevents this attack, since the cosigned_tree_head signed by a witness includes the log key hash, which can only be used by a single log.
- The Google design is potentially vulnerable to this attack. The log’s identity is tied to the combination of origin string and public key, but only the origin string is included in the signed tree head. There aren’t any currently, but if multiple logs shared the same origin string they would be vulnerable to this attack.
Backwards compatibility
- There are already a few logs out there and it would be annoying to require them to rekey or change their origin strings. This means supporting logs (with different origin strings) reusing public keys, and allowing somewhat arbitrary origin strings.

Recommended architecture

This is not a precise design, but design principles to address the goals above.

There should be a 1:1 mapping between log identity and whatever is included in the message signed by both logs and witnesses.
There should be a many:many mapping between log identities and log keys.
In the checkpoint format, that means making the log ID either the literal origin string (like google/trillian-examples#712 does) or a hash of the origin string, never reusing the same origin string across logs, and not including the log’s public key in the log identity.
A good choice of origin string might be a domain and path like example.com/log1 to avoid conflicts, while allowing arbitrary overrides to support current logs.
Logs can sign their tree heads with multiple keys, optionally sharing keys across logs. Witnesses can choose to accept multiple keys or just a single key for each log (depending on whether the previous key is compromised or just rotated).

Witness key rotation is considered out of scope. A witness with a new key might as well be a new witness, and there is no benefit in maintaining any kind of continuity.

The checkpoint format also has space for a key name. That’s useful for witnesses, but it’s unclear what the name of a log’s public key should be. It’s not particularly important and it doesn’t even have to be unique, as the note implementation tracks keys by their name and short hash.

On log identity and key rotation

Current Sigsum design

Current Google design

Goals

Recommended architecture

Related