aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/git-p4.txt
diff options
context:
space:
mode:
authorTao Klerks <tao@klerks.biz>2022-04-30 19:26:52 +0000
committerJunio C Hamano <gitster@pobox.com>2022-05-04 10:30:01 -0700
commitf7b5ff607fa1a62a480399afb6ccb9691735bf79 (patch)
treefbf3bc4cda3b6e879c0d2652c6ac32e96767d447 /Documentation/git-p4.txt
parent6cd33dceed60949e2dbc32e3f0f5e67c4c882e1e (diff)
downloadgit-f7b5ff607fa1a62a480399afb6ccb9691735bf79.tar.gz
git-p4: improve encoding handling to support inconsistent encodings
git-p4 is designed to run correctly under python2.7 and python3, but its functional behavior wrt importing user-entered text differs across these environments: Under python2, git-p4 "naively" writes the Perforce bytestream into git metadata (and does not set an "encoding" header on the commits); this means that any non-utf-8 byte sequences end up creating invalidly-encoded commit metadata in git. Under python3, git-p4 attempts to decode the Perforce bytestream as utf-8 data, and fails badly (with an unhelpful error) when non-utf-8 data is encountered. Perforce clients (especially p4v) encourage user entry of changelist descriptions (and user full names) in OS-local encoding, and store the resulting bytestream to the server unmodified - such that different clients can end up creating mutually-unintelligible messages. The most common inconsistency, in many Perforce environments, is likely to be utf-8 (typical in linux) vs cp-1252 (typical in windows). Make the changelist-description- and user-fullname-handling code python-runtime-agnostic, introducing three "strategies" selectable via config: - 'passthrough', behaving as previously under python2, - 'strict', behaving as previously under python3, and - 'fallback', favoring utf-8 but supporting a secondary encoding when utf-8 decoding fails, and finally escaping high-range bytes if the decoding with the secondary encoding also fails. Keep the python2 default behavior as-is ('legacy' strategy), but switch the python3 default strategy to 'fallback' with default fallback encoding 'cp1252'. Also include tests exercising these encoding strategies, documentation for the new config, and improve the user-facing error messages when decoding does fail. Signed-off-by: Tao Klerks <tao@klerks.biz> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'Documentation/git-p4.txt')
-rw-r--r--Documentation/git-p4.txt37
1 files changed, 36 insertions, 1 deletions
diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt
index e21fcd8f71..de5ee6748e 100644
--- a/Documentation/git-p4.txt
+++ b/Documentation/git-p4.txt
@@ -636,7 +636,42 @@ git-p4.pathEncoding::
Git expects paths encoded as UTF-8. Use this config to tell git-p4
what encoding Perforce had used for the paths. This encoding is used
to transcode the paths to UTF-8. As an example, Perforce on Windows
- often uses "cp1252" to encode path names.
+ often uses "cp1252" to encode path names. If this option is passed
+ into a p4 clone request, it is persisted in the resulting new git
+ repo.
+
+git-p4.metadataDecodingStrategy::
+ Perforce keeps the encoding of a changelist descriptions and user
+ full names as stored by the client on a given OS. The p4v client
+ uses the OS-local encoding, and so different users can end up storing
+ different changelist descriptions or user full names in different
+ encodings, in the same depot.
+ Git tolerates inconsistent/incorrect encodings in commit messages
+ and author names, but expects them to be specified in utf-8.
+ git-p4 can use three different decoding strategies in handling the
+ encoding uncertainty in Perforce: 'passthrough' simply passes the
+ original bytes through from Perforce to git, creating usable but
+ incorrectly-encoded data when the Perforce data is encoded as
+ anything other than utf-8. 'strict' expects the Perforce data to be
+ encoded as utf-8, and fails to import when this is not true.
+ 'fallback' attempts to interpret the data as utf-8, and otherwise
+ falls back to using a secondary encoding - by default the common
+ windows encoding 'cp-1252' - with upper-range bytes escaped if
+ decoding with the fallback encoding also fails.
+ Under python2 the default strategy is 'passthrough' for historical
+ reasons, and under python3 the default is 'fallback'.
+ When 'strict' is selected and decoding fails, the error message will
+ propose changing this config parameter as a workaround. If this
+ option is passed into a p4 clone request, it is persisted into the
+ resulting new git repo.
+
+git-p4.metadataFallbackEncoding::
+ Specify the fallback encoding to use when decoding Perforce author
+ names and changelists descriptions using the 'fallback' strategy
+ (see git-p4.metadataDecodingStrategy). The fallback encoding will
+ only be used when decoding as utf-8 fails. This option defaults to
+ cp1252, a common windows encoding. If this option is passed into a
+ p4 clone request, it is persisted into the resulting new git repo.
git-p4.largeFileSystem::
Specify the system that is used for large (binary) files. Please note