README.rst


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396

Header-Based Patch Attestation
==============================
Author: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
Status: Beta, soliciting comments

Preamble
--------
Projects participating in decentralized development continue to use
RFC-2822 (email) formatted messages for code submissions and review.
This remains the only widely accepted mechanism for code collaboration
that does not rely on centralized infrastructure maintained by a single
entity, which necessarily introduces a single point of dependency and
a single point of failure.

RFC-2822 formatted messages can be delivered via a variety of means. To
name a few of the more common ones:

  - email
  - usenet
  - aggregated archives (e.g. public-inbox)

Among these, email remains the most widely used transport mechanism for
RFC-2822 messages, most commonly delivered via subscription-based
services (mailing lists).

Email and end-to-end attestation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There are two commonly used standards for cryptographic email
attestation: PGP and S/MIME. When it comes to patches sent via email,
there are significant drawbacks to both:

  - Mailing list software may modify email body contents to add
    subscription information footers, causing message attestation to
    fail.
  - Attestation via detached MIME signatures may not be preserved by
    mailing list software that aggressively quarantines attachments.
  - Inline PGP attestation generally frustrates developers working with
    patches due to extra surrounding content and the escaping it
    performs for strings containing dashes at the start of the line for
    canonicalization purposes.
  - Only the body of the message is attested, leaving metadata such as
    "From", "Subject", and "Date" open to tampering. Git uses this
    metadata to formulate git commits, so leaving them unattested is
    suboptimal (they can be duplicated into the body of the message,
    but git format-patch will not do this by default).
  - PGP key distribution and trust delegation remains a difficult
    problem to solve. Even if PGP attestation is available, the
    developer on the receiving end of the patches may not make any use
    of it due to not having the sender's key in their keyring.
  - S/MIME certificates are increasingly difficult to obtain for
    developers not working in corporate environments. At the time of
    writing, only two commercial CAs continue to provide this service --
    and only one does it for free.

For these reasons, end-to-end attestation is rarely used in communities
that continue to use email as their main conduit for code submissions
and review.

Email and domain-level attestation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since unsolicited emails (SPAM) frequently forge headers in order to
appear to be coming from trusted sources, most major service providers
have adopted DKIM (RFC-6376) to provide cryptographic attestation for
header and body contents. A message that originates from gmail.com will
contain a "DKIM-Signature" header that attests the contents of the
following headers (among others):

  - from
  - date
  - message-id
  - subject

The "DKIM-Signature" header also includes a hash of the message body
(bh=) that is included in the final verification hash. When a DKIM
signature is successfully verified using a public key that is published
via gmail.com DNS records, this provides a degree of assurance that the
email message has not been modified since leaving gmail.com
infrastructure.

Just as PGP and S/MIME attestation, this has important problems when it
comes to patches sent via mailing lists:

  - ML software commonly modifies the subject header in order to insert
    list identification (e.g. ``[some-topic]``). Since the "subject"
    header is almost always included into the list of headers attested
    by DKIM, this causes DKIM signatures to fail verification.
  - ML software also routinely modifies the message body for the
    purposes of stripping attachments or inserting list subscription
    metadata. Since the bh= hash is included in the final signature
    hash, this results in a failed DKIM signature check.

Even if all of the above does not apply and the DKIM signature is
successfully verified, body canonicalization routines mandated by the
DKIM RFC may result in a false-positive successful attestation for
patches. The "relaxed" canonicalization instructs that all consecutive
whitespace is collapsed, so patches for languages like Python or GNU
Make where whitespace is syntactically significant may have different
code result in the same hash.

So, while DKIM works well enough for regular domain-level email
attestation, it still has significant drawbacks for attesting patches.
Similarly, it does not provide significant developer identity assurances
for patches sent via large public hosting services like Gmail, Fastmail,
or others -- at best, we have proof that the email traversed their
mail gateways (hopefully, after being properly authenticated).

Proposal
--------
The goal of this document is to propose a scheme that would provide
cryptographic attestation for all message contents necessary for trusted
distributed code collaboration. It draws on the success of the DKIM
standard in order to adapt (and adopt) it for this purpose.

X-Developer-Signature header
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We use DKIM RFC-6376 to implement a compatible subset of it for
developer attestation signatures, with some extra steps taken to make
the workflow fit better with patches sent via DKIM-non-compliant mailing
lists.

Differences from DKIM:

  - the d= field is not used (no domain signatures involved)
  - the q= field is not used (end-user tooling handles key lookup)
  - the c= field is not used (see below for canonicalization)
  - the i= field is optional, but MUST be the canonical email address of
    the sender, if not the same as the From: field

Canonicalization
~~~~~~~~~~~~~~~~
We use the "relaxed/simple" canonicalization as defined by the DKIM
standard, but the message is first parsed by "git-mailinfo" in order to
achieve the following:

  - normalize any content-transfer-encoding modifications (convert back
    from base64/quoted-printable/etc into 8-bit)
  - use any encountered in-body git headers (From:, Subject: Date:) to
    rewrite the outer message headers
  - perform any subject-line normalization in order to strip content not
    considered by git-am when applying the patch

To achieve this, the message is passed through git-mailinfo with the
following flags::

    cat orig.msg | git mailinfo --encoding=utf-8 m p > i

We then use the data found in "i" to replace the From:, Subject: and
Date: headers of the original message, and concatenate "m" and "p" back
together to form the body of the message, which is then normalized using
CRLF line endings and the DKIM "simple" body canonicalization (any
trailing blank lines are removed).

Any other headers included in signing are canonicalized using the
"relaxed" header canonicalization routines defined in the DKIM standard.

In other words, the body and some of the headers are normalized and
reconstituted using the "git-mailinfo" command, and then canonicalized
using DKIM's relaxed/simple standard.

Algorithms
~~~~~~~~~~
DKIM standard mostly relies on RSA signatures, though RFC 8463 extends
it to support ED25519 keys as well. Since our implementation is fully
backward compatible with the DKIM standard, it is possible to use any of
the DKIM-defined algorithms. However, for the purposes of this POC, we
only support the following two signing/hashing algorithms:

  - ed25519-sha256: exactly as defined in RFC8463
  - openpgp-sha256: uses OpenPGP to create the signature

POC code
--------
The provided POC code in main.py is pretty feature-complete, though it
probably needs further improvements to properly deal with corner-cases.
You will notice that it's only a few hundred lines of Python code and
does not require any external libraries/programs except libsodium and
GnuPG for crypto, plus git for message canonicalization. All of these
are already likely to be present on a developer's workstation.

Running the code
~~~~~~~~~~~~~~~~
The POC code is written in Python and requires PyNaCl libraries
in order to work. Chances are, PyNaCL is already installed on your
platform, but if it isn't, you can install it via a venv::

    $ python3 -mvenv .venv
    $ source .venv/bin/activate
    $ pip install --upgrade pip
    $ pip install -r requirements.txt

Or you can achieve the same using OS packaging::

    # dnf install python3-pynacl
    # apt install python3-nacl

You should also have git and gpg available as external commands in your
PATH.

ED25519 signatures
~~~~~~~~~~~~~~~~~~
ED25519 is the "nothing up my sleeve" implementation of Elliptic-Curve
Cryptography (ECC) favoured by free software enthusiasts. Its primary
benefits are algorithmic speed of all crypto operations and relative
smallness of both public/private keys and generated signatures.

To sign an email using a bundled ed25519 key, run::

    $ ./main.py sign-ed25519 -k dev.key
    SIGNING : ED25519 using dev.key
    MSGSRC  : emails/dev-unsigned.eml
    --- SIGNED MESSAGE STARTS ---
    [...]
    X-Developer-Signature: v=1; a=ed25519-sha256; h=from:subject:date:message-id;
     l=1003; bh=Pfwl/zDlAoe9nkYNQPcgDFscfSQdrGvx4kAzrnQdNQ8=;
     b=WyAu9nzYMUg2ntOfnvEBpa1vLQemK7axjAVu+hhYh6VyeFmB5jKzC2TcF+2IOjfG3eGl/XNY0EWc
     HUh2tF02AQwiKDVDG7mTmP1/SPpNvotD0mTWQk6LyltWKFBUpRhn

If you've ever seen email headers, you'll notice how very similar the
X-Developer-Signature is to the DKIM-Signature header.

OpenPGP signatures
~~~~~~~~~~~~~~~~~~
OpenPGP is not really an "algorithm," so this is merely an indicator
that the signature is created using an OpenPGP-compliant application.
Here it is in action, though you will need to use your own PGP key if
you want to try it::

    $ ./main.py -m emails/mricon-unsigned.eml sign-pgp -k B6C41CE35664996C
    SIGNING : PGP using B6C41CE35664996C
    MSGSRC  : emails/mricon-unsigned.eml
    --- SIGNED MESSAGE STARTS ---
    [...]
    X-Developer-Signature: v=1; a=openpgp-sha256; h=from:subject:date:message-id;
     l=1002; bh=g2Sv1ZR+jIrWukzdXbqb+aeiqyFQOBLDQY6z0BBnGg4=;
     b=owGbwMvMwCG27YjM47CUmTmMp9WSGBK6vn316Z1bbjJ5DWNEgimHTc6Kx4HfTpzYcOzp9e/2jc/v
     Lg7J7ChlYRDjYJAVU2Qp2xe7KajwoYdceo8pzBxWJpAhDFycAjCRBn5Ghrc/7otaV1yX6I4/sNf056
     vmzjen3bn2Rk8X9GTuZd2/aQ0jw7fZJ2Pi36/X2fTK4cSnX/++nbAzsm0TObX4SpbBsrRHe/gA

OpenPGP supports ed25519 keys as well, so in reality the signature is
made with my own ed25519 subkey, but it is further wrapped in the
OpenPGP header data, which is why it is longer than the ed25519
signature in the example above. It is created using the following GnuPG
parameters::

    gnupg -s -u KEYID < binary-hash-to-sign

Distributing keys
-----------------
The difficult part of various PKI schemes is not really the
cryptography, but initial trust bootstrap and key distribution. In our
case, we sidestep trust bootstrap entirely and focus solely on developer
key distribution. We propose doing it via the git repository itself,
borrowing the idea from the people behind the did:git project.

Using git to track contributor keys
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Consider the workflow of a Linux kernel subsystem maintainer. While a
single maintainer may receive patches from hundreds of people, they will
likely have a fairly small subset of developers with whom they
collaborate on an ongoing basis. As their relationship trust builds, the
maintainer may wish to implement an attestation mechanism to verify that
patches submitted by trusted lieutenants are not corrupted or modified
by malicious actors en-route.

The proposed POC offers several ways of achieving this:

- tracking the keys in a regular development branch
- tracking the keys in a special dedicated branch
- tracking the keys in a dedicated git repository

Using the regular development branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Smaller projects with fewer contributors may simply choose to bundle
developer key distribution as part of its source code. The POC in
question uses the toplevel .keys directory as such location, with the
following structure::

    .keys
     \- sigtype
      \- domain
       \- local
        \- selector

So, for a ed25519 signature from dev@example.org, the public key needed
for signature verification would be contained in::

    .keys
     \- ed25519
      \- example.org
       \- dev
        \- default

The "default" filename is used when there is no other s= selector
specified in the signature header.

NB: Since domain/local/selector values are taken from untrusted sources,
they should be urlencoded before attempting to locate the public key on
disk or via any commands passed to "git show".

Using a dedicated ref
~~~~~~~~~~~~~~~~~~~~~
In the case of the project the size of the Linux Kernel, it would be too
onerous to track the keys of all contributors centrally, so individual
subsystem maintainers will likely want to track their own subsets of
keys from just the developers with whom they work on a regular basis.
Using the regular development branch would be too inconvenient in this
case, since it would interfere with upstream work, so it makes sense to
use a separate branch for this purpose, e.g. "refs/heads/keys" that
contains just the keys directory with no other content.

Participating contributors can then submit key additions and changes as
regular patches or pull requests and the maintainer merely needs to
remember to apply them to the proper key management branch.

Using a dedicated git repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Similarly, instead of using a dedicated branch, maintainers may choose
to use a wholly separate git repository for this purpose. This may be
useful if the same set of developers work on multiple projects.

Key formats for ED25519 and OpenPGP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The public keys should be in the following format:

- ed25519: base64-encoded string
- openpgp: any format that can be passed to "gpg --import", but
  preferably an ascii-armored key export

In the case of verifying PGP signatures, the POC implementation will
create a temporary keyring containing just the imported key, so it
should never clash with the default keyring.

Using the default GnuPG keyring
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It is up to the implementation whether to fall back to the default GnuPG
keyring when checking openpgp signatures. The POC code will do so and
will additionally warn if the key has insufficient trust (this check is
meaningless for in-git bundled keys, so it is not performed).

Rotating and revoking keys
~~~~~~~~~~~~~~~~~~~~~~~~~~
Keys can be retired or replaced at any time by merely changing them in
the repository, committing, and pushing (or submitting a pull
request/patch to the maintainer with the change). Maintainers can then
pull the change or apply the patch and push it out to all other
participating co-maintainers.

Contributors can have multiple valid keys if they properly specify the
selector when adding signatures -- or the verification tooling can
simply iterate through all keys listed in the directory for that
domain/local to find the matching one.

Revoked keys can be simply deleted or moved into the revoked/
subdirectory with perhaps an explanation why they were revoked.

Verifying keys before accepting them
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As stated earlier, bootstrapping trust remains a hard problem. We do not
aim to resolve it here and will cowardly defer to the participating
maintainers to pick their preferred key verification strategy, e.g.:

- meeting up in person at a conference and exchanging keys
- holding a video session and reciting fingerprints (or entire keys, in
  the case of ed25519)
- using an email round-trip as proof of key ownership

This can be as lax or as strict as maintainers choose (though if the
procedure is too lax, then the whole point of cryptographic attestation
becomes moot).

Trusting the git repository
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Obviously, if keys are distributed via git, then one must trust git
itself and the commit provenance. This, again, is a "bootstrapping
trust" sort of problem that we promised to side-step, but we can at
least give the following recommendations:

- the person maintaining the keyring should PGP-sign all commits
  modifying public key contents
- the repository itself should initially be cloned from trusted sources
  over secure protocols

We hope to provide a separate best-practices document aimed at keyring
maintainers, should this scheme become adopted.

Automating patch attestation
----------------------------
The git-send-email application supports executing a validation hook
before sending out patches. The end-user tooling should provide git hook
integration so that patches are automatically attested every time
"git-send-email" is used.

We aim to provide a lightweight attestation utility for this purpose, as
well as implement all necessary verification routines in "b4"
client-side tooling used by many Linux developers for their patch
workflow.