diff options
author | Konstantin Ryabitsev <konstantin@linuxfoundation.org> | 2020-09-15 17:33:54 -0400 |
---|---|---|
committer | Konstantin Ryabitsev <konstantin@linuxfoundation.org> | 2020-09-15 17:33:54 -0400 |
commit | ac01092200879d7f7b461ee9a704458cb35ffd19 (patch) | |
tree | 752119a3e4e9a946f47170fb581f22ee89fd4b96 | |
parent | 169fb4c015b51271770af28c157fdb87ccff9555 (diff) | |
download | patch-attestation-poc-ac01092200879d7f7b461ee9a704458cb35ffd19.tar.gz |
Commit initial README.rst
Still in the process of being written.
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
-rw-r--r-- | README.rst | 137 |
1 files changed, 137 insertions, 0 deletions
diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..7e34877 --- /dev/null +++ b/README.rst @@ -0,0 +1,137 @@ +Header-Based Patch Attestation +============================== + +Author: Konstantin Ryabitsev <konstantin@linuxfoundation.org> +Status: Alpha, soliciting comments + +Preamble +-------- +Projects participating in decentralized development continue to use +RFC-2822 (email) formatted messages for code submissions and review. +This remains the only widely accepted mechanism for code collaboration +that does not rely on centralized infrastructure maintained by a single +entity, which necessarily introduces a single point of dependency and +failure. + +RFC-2822 formatted messages can be delivered via a variety of means. To +name a few of the more common ones: + + - email + - usenet + - aggregated archives (e.g. public-inbox) + +Among these, email remains the most commonly used transport mechanism +for RFC 2822 messages, most commonly delivered via subscription-based +services (mailing lists). + +Email and end-to-end attestation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There are two commonly used standards for cryptographic email +attestation: PGP and S/MIME. While both are well-suited for attesting +code sent via email, there are significant drawbacks to both: + + - Mailing list software may modify email body contents to add + subscription information footers, causing message attestation to + fail (while not actually modifying code contents). + - MIME-based attestation may not be preserved by mailing list software + that aggressively quarantines attachments. + - Inline PGP attestation generally frustrates developers working with + patches due to extra surrounding content and the escaping it + performs for strings containing dashes at the start of the line for + canonicalization purposes. + - Only the body of the message is attested, leaving metadata such as + "From", "Subject", and "Date" open to tampering. Git uses this + metadata to formulate git commits, so leaving them unattested is + problematic (they can be duplicated into the body of the message, + but git format-patch will not do this by default). + - PGP key distribution and trust delegation remains a difficult + problem to solve. Even if PGP attestation is working perfectly, the + developer on the receiving end of the patches may not make any use + of it due to not having the sender's key in their keyring. + - S/MIME certificates are increasingly difficult to obtain for + developers not working in corporate environments. At the time of + writing, only two commercial CAs provide this service -- and only + one does it for free. + +For these reasons, end-to-end attestation is rarely used in communities +that continue to use email as their main conduit for code submissions +and review. + +Email and domain-level attestation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Since unsolicited emails (SPAM) frequently forge headers in order to +appear to be coming from trusted sources, most major service providers +have adopted DKIM (RFC-6376) to provide cryptographic attestation for +header and body contents. A message that originates from gmail.com will +contain a "DKIM-Signature" header that attests the contents of the +following headers (among others): + + - from + - date + - message-id + - subject + +The "DKIM-Signature" header also includes a hash of the message body +(bh=) that is included in the final verification hash. When a DKIM +signature is successfully verified using a public key that is published +via gmail.com DNS records, this provides a degree of assurance that the +email message has not been modified. + +Just as PGP and S/MIME attestation, this has important problems when it +comes to patches sent via mailing lists: + + - If the "sender" header is included in the attestation, the DKIM + signature will no longer verify due to mailing lists necessarily + rewriting it for bounce handling. + - ML software commonly modifies the subject header in order to insert + list identification (e.g. Subject: [ml-topic]). Since the "subject" + header is almost always included into the list of headers attested + by DKIM, this results in DKIM signature failure. + - ML software also routinely modifies the message body for the + purposes of stripping attachments or inserting list subscription + metadata. Since the bh= hash is included in the final signature + hash, this will result in a failed DKIM signature check. + +Even if all of the above does not apply and the DKIM signature is +successfully verified, body canonicalization routines mandated by the +DKIM RFC may result in a false-positive successful attestation for +patches. The "relaxed" canonicalization instructs that all consecutive +whitespace is replaced with a single space, so patches for languages +like Python or GNU Make where whitespace is syntactically significant +may have logically different code result in the same hash. + +DKIM works well enough for end-to-end email attestation, but has +important drawbacks for domain-level attestation of patches, especially +when they are delivered via mailing lists as is still largely the case. + +Proposal +-------- +The goal of this document is to propose a scheme that would provide +cryptographic attestation for all message contents necessary for trusted +distributed collaboration. It draws on the success of the DKIM standard +in order to adapt (and adopt) it for this purpose. + +Anatomy of an email patch +~~~~~~~~~~~~~~~~~~~~~~~~~ +A patch submitted via an RFC-2822 formatted message consists of the +following three significant parts: + + - *metadata*, which includes the Author, Email, Subject, and Date of + the submission + - *commit message*, which describes what the change is supposed to + accomplish + - *diff content*, which is structured data that should be applied + to the codebase in order to implement the changes proposed + +Patch submissions also routinely provide additional content that may +have significance to the author or to the reviewer, but is not preserved +in the codebase after patches are applied, such as: + + - information describing changes between revisions + - statistics about what files are changed (diffstat) + - structured data indicating tree dependencies (base-commit) + - author's signature and software version info + - mailing list subscription metadata + +Our goal is to provide attestation for the significant parts and ignore +the rest. |