aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKonstantin Ryabitsev <konstantin@linuxfoundation.org>2020-09-15 17:33:54 -0400
committerKonstantin Ryabitsev <konstantin@linuxfoundation.org>2020-09-15 17:33:54 -0400
commitac01092200879d7f7b461ee9a704458cb35ffd19 (patch)
tree752119a3e4e9a946f47170fb581f22ee89fd4b96
parent169fb4c015b51271770af28c157fdb87ccff9555 (diff)
downloadpatch-attestation-poc-ac01092200879d7f7b461ee9a704458cb35ffd19.tar.gz
Commit initial README.rst
Still in the process of being written. Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
-rw-r--r--README.rst137
1 files changed, 137 insertions, 0 deletions
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..7e34877
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,137 @@
+Header-Based Patch Attestation
+==============================
+
+Author: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
+Status: Alpha, soliciting comments
+
+Preamble
+--------
+Projects participating in decentralized development continue to use
+RFC-2822 (email) formatted messages for code submissions and review.
+This remains the only widely accepted mechanism for code collaboration
+that does not rely on centralized infrastructure maintained by a single
+entity, which necessarily introduces a single point of dependency and
+failure.
+
+RFC-2822 formatted messages can be delivered via a variety of means. To
+name a few of the more common ones:
+
+ - email
+ - usenet
+ - aggregated archives (e.g. public-inbox)
+
+Among these, email remains the most commonly used transport mechanism
+for RFC 2822 messages, most commonly delivered via subscription-based
+services (mailing lists).
+
+Email and end-to-end attestation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+There are two commonly used standards for cryptographic email
+attestation: PGP and S/MIME. While both are well-suited for attesting
+code sent via email, there are significant drawbacks to both:
+
+ - Mailing list software may modify email body contents to add
+ subscription information footers, causing message attestation to
+ fail (while not actually modifying code contents).
+ - MIME-based attestation may not be preserved by mailing list software
+ that aggressively quarantines attachments.
+ - Inline PGP attestation generally frustrates developers working with
+ patches due to extra surrounding content and the escaping it
+ performs for strings containing dashes at the start of the line for
+ canonicalization purposes.
+ - Only the body of the message is attested, leaving metadata such as
+ "From", "Subject", and "Date" open to tampering. Git uses this
+ metadata to formulate git commits, so leaving them unattested is
+ problematic (they can be duplicated into the body of the message,
+ but git format-patch will not do this by default).
+ - PGP key distribution and trust delegation remains a difficult
+ problem to solve. Even if PGP attestation is working perfectly, the
+ developer on the receiving end of the patches may not make any use
+ of it due to not having the sender's key in their keyring.
+ - S/MIME certificates are increasingly difficult to obtain for
+ developers not working in corporate environments. At the time of
+ writing, only two commercial CAs provide this service -- and only
+ one does it for free.
+
+For these reasons, end-to-end attestation is rarely used in communities
+that continue to use email as their main conduit for code submissions
+and review.
+
+Email and domain-level attestation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Since unsolicited emails (SPAM) frequently forge headers in order to
+appear to be coming from trusted sources, most major service providers
+have adopted DKIM (RFC-6376) to provide cryptographic attestation for
+header and body contents. A message that originates from gmail.com will
+contain a "DKIM-Signature" header that attests the contents of the
+following headers (among others):
+
+ - from
+ - date
+ - message-id
+ - subject
+
+The "DKIM-Signature" header also includes a hash of the message body
+(bh=) that is included in the final verification hash. When a DKIM
+signature is successfully verified using a public key that is published
+via gmail.com DNS records, this provides a degree of assurance that the
+email message has not been modified.
+
+Just as PGP and S/MIME attestation, this has important problems when it
+comes to patches sent via mailing lists:
+
+ - If the "sender" header is included in the attestation, the DKIM
+ signature will no longer verify due to mailing lists necessarily
+ rewriting it for bounce handling.
+ - ML software commonly modifies the subject header in order to insert
+ list identification (e.g. Subject: [ml-topic]). Since the "subject"
+ header is almost always included into the list of headers attested
+ by DKIM, this results in DKIM signature failure.
+ - ML software also routinely modifies the message body for the
+ purposes of stripping attachments or inserting list subscription
+ metadata. Since the bh= hash is included in the final signature
+ hash, this will result in a failed DKIM signature check.
+
+Even if all of the above does not apply and the DKIM signature is
+successfully verified, body canonicalization routines mandated by the
+DKIM RFC may result in a false-positive successful attestation for
+patches. The "relaxed" canonicalization instructs that all consecutive
+whitespace is replaced with a single space, so patches for languages
+like Python or GNU Make where whitespace is syntactically significant
+may have logically different code result in the same hash.
+
+DKIM works well enough for end-to-end email attestation, but has
+important drawbacks for domain-level attestation of patches, especially
+when they are delivered via mailing lists as is still largely the case.
+
+Proposal
+--------
+The goal of this document is to propose a scheme that would provide
+cryptographic attestation for all message contents necessary for trusted
+distributed collaboration. It draws on the success of the DKIM standard
+in order to adapt (and adopt) it for this purpose.
+
+Anatomy of an email patch
+~~~~~~~~~~~~~~~~~~~~~~~~~
+A patch submitted via an RFC-2822 formatted message consists of the
+following three significant parts:
+
+ - *metadata*, which includes the Author, Email, Subject, and Date of
+ the submission
+ - *commit message*, which describes what the change is supposed to
+ accomplish
+ - *diff content*, which is structured data that should be applied
+ to the codebase in order to implement the changes proposed
+
+Patch submissions also routinely provide additional content that may
+have significance to the author or to the reviewer, but is not preserved
+in the codebase after patches are applied, such as:
+
+ - information describing changes between revisions
+ - statistics about what files are changed (diffstat)
+ - structured data indicating tree dependencies (base-commit)
+ - author's signature and software version info
+ - mailing list subscription metadata
+
+Our goal is to provide attestation for the significant parts and ignore
+the rest.