Age | Commit message (Collapse) | Author | Files | Lines |
|
Code to write out reftable has seen some optimization and
simplification.
* ps/reftable-write-optim:
reftable/block: reuse compressed array
reftable/block: reuse zstream when writing log blocks
reftable/writer: reset `last_key` instead of releasing it
reftable/writer: unify releasing memory
reftable/writer: refactorings for `writer_flush_nonempty_block()`
reftable/writer: refactorings for `writer_add_record()`
refs/reftable: don't recompute committer ident
reftable: remove name checks
refs/reftable: skip duplicate name checks
refs/reftable: perform explicit D/F check when writing symrefs
refs/reftable: fix D/F conflict error message on ref copy
|
|
The code to iterate over reftable blocks has seen some optimization
to reduce memory allocation and deallocation.
* ps/reftable-block-iteration-optim:
reftable/block: avoid copying block iterators on seek
reftable/block: reuse `zstream` state on inflation
reftable/block: open-code call to `uncompress2()`
reftable/block: reuse uncompressed blocks
reftable/reader: iterate to next block in place
reftable/block: move ownership of block reader into `struct table_iter`
reftable/block: introduce `block_reader_release()`
reftable/block: better grouping of functions
reftable/block: merge `block_iter_seek()` and `block_reader_seek()`
reftable/block: rename `block_reader_start()`
|
|
The strategy to compact multiple tables of reftables after many
operations accumulate many entries has been improved to avoid
accumulating too many tables uncollected.
* jt/reftable-geometric-compaction:
reftable/stack: use geometric table compaction
reftable/stack: add env to disable autocompaction
reftable/stack: expose option to disable auto-compaction
|
|
When seeking a reftable record in a block we need to position the
iterator _before_ the sought-after record so that the next call to
`block_iter_next()` would yield that record. To achieve this, the loop
that performs the linear seeks to restore the previous position once it
has found the record.
This is done by advancing two `block_iter`s: one to check whether the
next record is our sought-after record, and one that we update after
every iteration. This of course involves quite a lot of copying and also
leads to needless memory allocations.
Refactor the code to get rid of the `next` iterator and the copying this
involves. Instead, we can restore the previous offset such that the call
to `next` will return the correct record.
Next to being simpler conceptually this also leads to a nice speedup.
The following benchmark parser 10k refs out of 100k existing refs via
`git-rev-list --no-walk`:
Benchmark 1: rev-list: print many refs (HEAD~)
Time (mean ± σ): 170.2 ms ± 1.7 ms [User: 86.1 ms, System: 83.6 ms]
Range (min … max): 166.4 ms … 180.3 ms 500 runs
Benchmark 2: rev-list: print many refs (HEAD~)
Time (mean ± σ): 161.6 ms ± 1.6 ms [User: 78.1 ms, System: 83.0 ms]
Range (min … max): 158.4 ms … 172.3 ms 500 runs
Summary
rev-list: print many refs (HEAD) ran
1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling `inflateInit()` and `inflate()`, the zlib library will
allocate several data structures for the underlying `zstream` to keep
track of various information. Thus, when inflating repeatedly, it is
possible to optimize memory allocation patterns by reusing the `zstream`
and then calling `inflateReset()` on it to prepare it for the next chunk
of data to inflate.
This is exactly what the reftable code is doing: when iterating through
reflogs we need to potentially inflate many log blocks, but we discard
the `zstream` every single time. Instead, as we reuse the `block_reader`
for each of the blocks anyway, we can initialize the `zstream` once and
then reuse it for subsequent inflations.
Refactor the code to do so, which leads to a significant reduction in
the number of allocations. The following measurements were done when
iterating through 1 million reflog entries. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable format stores log blocks in a compressed format. Thus,
whenever we want to read such a block we first need to decompress it.
This is done by calling the convenience function `uncompress2()` of the
zlib library, which is a simple wrapper that manages the lifecycle of
the `zstream` structure for us.
While nice for one-off inflation of data, when iterating through reflogs
we will likely end up inflating many such log blocks. This requires us
to reallocate the state of the `zstream` every single time, which adds
up over time. It would thus be great to reuse the `zstream` instead of
discarding it after every inflation.
Open-code the call to `uncompress2()` such that we can start reusing the
`zstream` in the subsequent commit. Note that our open-coded variant is
different from `uncompress2()` in two ways:
- We do not loop around `inflate()` until we have processed all input.
As our input is limited by the maximum block size, which is 16MB, we
should not hit limits of `inflate()`.
- We use `Z_FINISH` instead of `Z_NO_FLUSH`. Quoting the `inflate()`
documentation: "inflate() should normally be called until it returns
Z_STREAM_END or an error. However if all decompression is to be
performed in a single step (a single call of inflate), the parameter
flush should be set to Z_FINISH."
Furthermore, "Z_FINISH also informs inflate to not maintain a
sliding window if the stream completes, which reduces inflate's
memory footprint."
Other than that this commit is expected to be functionally equivalent
and does not yet reuse the `zstream`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable backend stores reflog entries in a compressed format and
thus needs to uncompress blocks before one can read records from it.
For each reflog block we thus have to allocate an array that we can
decompress the block contents into. This block is being discarded
whenever the table iterator moves to the next block. Consequently, we
reallocate a new array on every block, which is quite wasteful.
Refactor the code to reuse the uncompressed block data when moving the
block reader to a new block. This significantly reduces the number of
allocations when iterating through many compressed blocks. The following
measurements are done with `git reflog list` when listing 100k reflogs.
Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The table iterator has to iterate towards the next block once it has
yielded all records of the current block. This is done by creating a new
table iterator, initializing it to the next block, releasing the old
iterator and then copying over the data.
Refactor the code to instead advance the table iterator in place. This
is simpler and unlocks some optimizations in subsequent patches. Also,
it allows us to avoid some allocations.
The following measurements show a single matching ref out of 1 million
refs. Before this change:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The table iterator allows the caller to iterate through all records in a
reftable table. To do so it iterates through all blocks of the desired
type one by one, where for each block it creates a new block iterator
and yields all its entries.
One of the things that is somewhat confusing in this context is who owns
the block reader that is being used to read the blocks and pass them to
the block iterator. Intuitively, as the table iterator is responsible
for iterating through the blocks, one would assume that this iterator is
also responsible for managing the lifecycle of the reader. And while it
somewhat is, the block reader is ultimately stored inside of the block
iterator.
Refactor the code such that the block reader is instead fully managed by
the table iterator. Instead of passing the reader to the block iterator,
we now only end up passing the block data to it. Despite clearing up the
lifecycle of the reader, it will also allow for better reuse of the
reader in subsequent patches.
The following benchmark prints a single matching ref out of 1 million
refs. Before:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
Note that while there are more allocation and free calls now, the
overall number of bytes allocated is significantly lower. The number of
allocations will be reduced significantly by the next patch though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce a new function `block_reader_release()` that releases
resources acquired by the block reader. This function will be extended
in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Function definitions and declaration of `struct block_reader` and
`struct block_iter` are somewhat mixed up, making it hard to see which
functions belong together. Rearrange them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `block_iter_seek()` is merely a simple wrapper around
`block_reader_seek()`. Merge those two functions into a new function
`block_iter_seek_key()` that more clearly says what it is actually
doing.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The function `block_reader_start()` does not really apply to the block
reader, but to the block iterator. It's name is thus somewhat confusing.
Rename it to `block_iter_seek_start()` to clarify.
We will rename `block_reader_seek()` in similar spirit in the next
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Reftable code clean-up and some bugfixes.
* ps/reftable-binsearch-updates:
reftable/block: avoid decoding keys when searching restart points
reftable/record: extract function to decode key lengths
reftable/block: fix error handling when searching restart points
reftable/block: refactor binary search over restart points
reftable/refname: refactor binary search over refnames
reftable/basics: improve `binsearch()` test
reftable/basics: fix return type of `binsearch()` to be `size_t`
|
|
"git pack-refs" learned the "--auto" option, which is a useful
addition to be triggered from "git gc --auto".
Acked-by: Karthik Nayak <karthik.188@gmail.com>
cf. <CAOLa=ZRAEA7rSUoYL0h-2qfEELdbPHbeGpgBJRqesyhHi9Q6WQ@mail.gmail.com>
* ps/pack-refs-auto:
builtin/gc: pack refs when using `git maintenance run --auto`
builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
t6500: extract objects with "17" prefix
builtin/gc: move `struct maintenance_run_opts`
builtin/pack-refs: introduce new "--auto" flag
builtin/pack-refs: release allocated memory
refs/reftable: expose auto compaction via new flag
refs: remove `PACK_REFS_ALL` flag
refs: move `struct pack_refs_opts` to where it's used
t/helper: drop pack-refs wrapper
refs/reftable: print errors on compaction failure
reftable/stack: gracefully handle failed auto-compaction due to locks
reftable/stack: use error codes when locking fails during compaction
reftable/error: discern locked/outdated errors
reftable/stack: fix error handling in `reftable_stack_init_addition()`
|
|
Similar to the preceding commit, let's reuse the `compressed` array that
we use to store compressed data in. This results in a small reduction in
memory allocations when writing many refs.
Before:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,618,257 allocs, 22,618,106 frees, 1,236,351,528 bytes allocated
So while the reduction in allocations isn't really all that big, it's a
low hanging fruit and thus there isn't much of a reason not to pick it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While most reftable blocks are written to disk as-is, blocks for log
records are compressed with zlib. To compress them we use `compress2()`,
which is a simple wrapper around the more complex `zstream` interface
that would require multiple function invocations.
One downside of this interface is that `compress2()` will reallocate
internal state of the `zstream` interface on every single invocation.
Consequently, as we call `compress2()` for every single log block which
we are about to write, this can lead to quite some memory allocation
churn.
Refactor the code so that the block writer reuses a `zstream`. This
significantly reduces the number of bytes allocated when writing many
refs in a single transaction, as demonstrated by the following benchmark
that writes 100k refs in a single transaction.
Before:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,631,887 allocs, 22,631,736 frees, 1,854,670,793 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 671,931 bytes in 151 blocks
total heap usage: 22,620,528 allocs, 22,620,377 frees, 1,245,549,984 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable writer tracks the last key that it has written so that it
can properly compute the compressed prefix for the next record it is
about to write. This last key must be reset whenever we move on to write
the next block, which is done in `writer_reinit_block_writer()`. We do
this by calling `strbuf_release()` though, which needlessly deallocates
the underlying buffer.
Convert the code to use `strbuf_reset()` instead, which saves one
allocation per block we're about to write. This requires us to also
amend `reftable_writer_free()` to release the buffer's memory now as we
previously seemingly relied on `writer_reinit_block_writer()` to release
the memory for us. Releasing memory here is the right thing to do
anyway.
While at it, convert a callsite where we truncate the buffer by setting
its length to zero to instead use `strbuf_reset()`, too.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are two code paths which release memory of the reftable writer:
- `reftable_writer_close()` releases internal state after it has
written data.
- `reftable_writer_free()` releases the block that was written to and
the writer itself.
Both code paths free different parts of the writer, and consequently the
caller must make sure to call both. And while callers mostly do this
already, this falls apart when a write failure causes the caller to skip
calling `reftable_write_close()`.
Introduce a new function `reftable_writer_release()` that releases all
internal state and call it from both paths. Like this it is fine for the
caller to not call `reftable_writer_close()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_flush_nonempty_block()` such that it conforms
better to it and add some documentation that explains some of its more
intricate behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Large parts of the reftable library do not conform to Git's typical code
style. Refactor `writer_add_record()` such that it conforms better to it
and add some documentation that explains some of its more intricate
behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In the preceding commit we have disabled name checks in the "reftable"
backend. These checks were responsible for verifying multiple things
when writing records to the reftable stack:
- Detecting file/directory conflicts. Starting with the preceding
commits this is now handled by the reftable backend itself via
`refs_verify_refname_available()`.
- Validating refnames. This is handled by `check_refname_format()` in
the generic ref transacton layer.
The code in the reftable library is thus not used anymore and likely to
bitrot over time. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.
Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:
git branch -f foo
git branch -d foo
Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.
Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.
When creating 10 thousand references, the new strategy has no
performance impact:
Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s]
Range (min … max): 26.447 s … 26.569 s 10 runs
Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s]
Range (min … max): 26.366 s … 26.444 s 10 runs
Summary
update-ref: create refs sequentially (revision = HEAD) ran
1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable stack already has a variable to configure whether or not to
run auto-compaction, but it is inaccessible to users of the library.
There exist use cases where a caller may want to have more control over
auto-compaction.
Move the `disable_auto_compact` option into `reftable_write_options` to
allow external callers to disable auto-compaction. This will be used in
a subsequent commit.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/pack-refs-auto:
builtin/gc: pack refs when using `git maintenance run --auto`
builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
t6500: extract objects with "17" prefix
builtin/gc: move `struct maintenance_run_opts`
builtin/pack-refs: introduce new "--auto" flag
builtin/pack-refs: release allocated memory
refs/reftable: expose auto compaction via new flag
refs: remove `PACK_REFS_ALL` flag
refs: move `struct pack_refs_opts` to where it's used
t/helper: drop pack-refs wrapper
refs/reftable: print errors on compaction failure
reftable/stack: gracefully handle failed auto-compaction due to locks
reftable/stack: use error codes when locking fails during compaction
reftable/error: discern locked/outdated errors
reftable/stack: fix error handling in `reftable_stack_init_addition()`
|
|
When searching over restart points in a block we decode the key of each
of the records, which results in a memory allocation. This is quite
pointless though given that records it restart points will never use
prefix compression and thus store their keys verbatim in the block.
Refactor the code so that we can avoid decoding the keys, which saves us
some allocations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We're about to refactor the binary search over restart points so that it
does not need to fully decode the record keys anymore. To do so we will
need to decode the record key lengths, which is non-trivial logic.
Extract the logic to decode these lengths from `refatble_decode_key()`
so that we can reuse it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When doing the binary search over restart points in a block we need to
decode the record keys. This decoding step can result in an error when
the block is corrupted, which we indicate to the caller of the binary
search by setting `args.error = 1`. But the only caller that exists
mishandles this because it in fact performs the error check before
calling `binsearch()`.
Fix this bug by checking for errors at the right point in time.
Furthermore, refactor `binsearch()` so that it aborts the search in case
the callback function returns a negative value so that we don't
needlessly continue to search the block.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When seeking a record in our block reader we perform a binary search
over the block's restart points so that we don't have to do a linear
scan over the whole block. The logic to do so is quite intricate though,
which makes it hard to understand.
Improve documentation and rename some of the functions and variables so
that the code becomes easier to understand overall. This refactoring
should not result in any change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
It is comparatively hard to understand how exactly the binary search
over refnames works given that the function and variable names are not
exactly easy to grasp. Rename them to make this more obvious. This
should not result in any change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `binsearch()` test is somewhat weird in that it doesn't explicitly
spell out its expectations. Instead it does so in a rather ad-hoc way
with some hard-to-understand computations.
Refactor the test to spell out the needle as well as expected index for
all testcases. This refactoring highlights that the `binsearch_func()`
is written somewhat weirdly to find the first integer smaller than the
needle, not smaller or equal to it. Adjust the function accordingly.
While at it, rename the callback function to better convey its meaning.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `binsearch()` function can be used to find the first element for
which a callback functions returns a truish value. But while the array
size is of type `size_t`, the function in fact returns an `int` that is
supposed to index into that array.
Fix the function signature to return a `size_t`. This conversion does
not change any semantics given that the function would only ever return
a value in the range `[0, sz]` anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
A unit test for reftable code tried to enumerate all files in a
directory after reftable operations and expected to see nothing but
the files it wanted to leave there, but was fooled by .nfs* cruft
files left, which has been corrected.
* ps/reftable-unit-test-nfs-workaround:
reftable: fix tests being broken by NFS' delete-after-close semantics
|
|
Whenever we commit a new table to the reftable stack we will end up
invoking auto-compaction of the stack to keep the total number of tables
at bay. This auto-compaction may fail though in case at least one of the
tables which we are about to compact is locked. This is indicated by the
compaction function returning `REFTABLE_LOCK_ERROR`. We do not handle
this case though, and thus bubble that return value up the calling
chain, which will ultimately cause a failure.
Fix this bug by ignoring `REFTABLE_LOCK_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Compaction of a reftable stack may fail gracefully when there is a
concurrent process that writes to the reftable stack and which has thus
locked either the "tables.list" file or one of the tables. This is
expected and can be handled gracefully by some of the callers which
invoke compaction. Thus, to indicate this situation to our callers, we
return a positive return code from `stack_compact_range()` and bubble it
up to the caller.
This kind of error handling is somewhat awkward though as many callers
in the call chain never even think of handling positive return values.
Thus, the result is either that such errors are swallowed by accident,
or that we abort operations with an unhelpful error message.
Make the code more robust by always using negative error codes when
compaction fails, with `REFTABLE_LOCK_ERROR` for the described benign
error case.
Note that only a single callsite knew to handle positive error codes
gracefully in the first place. Subsequent commits will touch up some of
the other sites to handle those errors better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We currently throw two different errors into a similar-but-different
error code:
- Errors when trying to lock the reftable stack.
- Errors when trying to write to the reftable stack which has been
modified concurrently.
This results in unclear error handling and user-visible error messages.
Create a new `REFTABLE_OUTDATED_ERROR` so that those error conditions
can be clearly told apart from each other. Adjust users of the old
`REFTABLE_LOCK_ERROR` to use the new error code as required.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `reftable_stack_init_addition()` we call `stack_uptodate()` after
having created the lockfile to check whether the stack was modified
concurrently, which is indicated by a positive return code from the
latter function. If so, we return a `REFTABLE_LOCK_ERROR` to the caller
and abort the addition.
The error handling has an off-by-one though because we check whether the
error code is `> 1` instead of `> 0`. Thus, instead of returning the
locking error, we would return a positive value. One of the callers of
`reftable_stack_init_addition()` works around this bug by repeating the
error code check without the off-by-one. But other callers are subtly
broken by this bug.
Fix this by checking for `err > 0` instead. This has the consequence
that `reftable_stack_init_addition()` won't ever return a positive error
code anymore, but will instead return `REFTABLE_LOCK_ERROR` now. Thus,
we can drop the check for a positive error code in `stack_try_add()`
now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to iterate over reflogs in the reftable has been optimized
to reduce memory allocation and deallocation.
Reviewed-by: Josh Steadmon <steadmon@google.com>
cf. <Ze9eX-aaWoVaqsPP@google.com>
* ps/reftable-reflog-iteration-perf:
refs/reftable: track last log record name via strbuf
reftable/record: use scratch buffer when decoding records
reftable/record: reuse message when decoding log records
reftable/record: reuse refnames when decoding log records
reftable/record: avoid copying author info
reftable/record: convert old and new object IDs to arrays
refs/reftable: reload correct stack when creating reflog iter
|
|
The reftable code has its own custom binary search function whose
comparison callback has an unusual interface, which caused the
binary search to degenerate into a linear search, which has been
corrected.
* ps/reftable-block-search-fix:
reftable/block: fix binary search over restart counter
reftable/record: fix memory leak when decoding object records
|
|
The code in reftable backend that creates new table files works
better with the tempfile framework to avoid leaving cruft after a
failure.
* ps/reftable-stack-tempfile:
reftable/stack: register compacted tables as tempfiles
reftable/stack: register lockfiles during compaction
reftable/stack: register new tables as tempfiles
lockfile: report when rollback fails
|
|
It was reported that the reftable unit tests in t0032 fail with the
following assertion when running on top of NFS:
running test_reftable_stack_compaction_concurrent_clean
reftable/stack_test.c: 1063: failed assertion count_dir_entries(dir) == 2
Aborted
Setting a breakpoint immediately before the assertion in fact shows the
following list of files:
./stack_test-1027.QJBpnd
./stack_test-1027.QJBpnd/0x000000000001-0x000000000003-dad7ac80.ref
./stack_test-1027.QJBpnd/.nfs000000000001729f00001e11
./stack_test-1027.QJBpnd/tables.list
Note the weird ".nfs*" file? This file is maintained by NFS clients in
order to emulate delete-after-last-close semantics that we rely on in
the reftable code [1]. Instead of unlinking the file right away and
keeping it open in the client, the NFS client will rename it to ".nfs*"
and then delete that temporary file when the last reference to it gets
dropped. Quoting the NFS FAQ:
> D2. What is a "silly rename"? Why do these .nfsXXXXX files keep
> showing up?
>
> A. Unix applications often open a scratch file and then unlink it.
> They do this so that the file is not visible in the file system name
> space to any other applications, and so that the system will
> automatically clean up (delete) the file when the application exits.
> This is known as "delete on last close", and is a tradition among
> Unix applications.
>
> Because of the design of the NFS protocol, there is no way for a
> file to be deleted from the name space but still remain in use by an
> application. Thus NFS clients have to emulate this using what
> already exists in the protocol. If an open file is unlinked, an NFS
> client renames it to a special name that looks like ".nfsXXXXX".
> This "hides" the file while it remains in use. This is known as a
> "silly rename." Note that NFS servers have nothing to do with this
> behavior.
This of course throws off the assertion that we got exactly two files in
that directory.
The test in question triggers this behaviour by holding two open file
descriptors to the "tables.list" file. One of the references is because
we are about to append to the stack, whereas the other reference is
because we want to compact it. As the compaction has just finished we
already rewrote "tables.list" to point to the new contents, but the
other file descriptor pointing to the old version is still open. Thus we
trigger the delete-after-last-close emulation.
Furthermore, it was reported that this behaviour only triggers with
4f36b8597c (reftable/stack: fix race in up-to-date check, 2024-01-18).
This is expected as well because it is the first point in time where we
actually keep the "tables.list" file descriptor open for the stat cache.
Fix this bug by skipping over any files that start with a leading dot
when counting files. While we could explicitly check for a prefix of
".nfs", other network file systems like SMB for example do the same
trickery but with a ".smb" prefix. In any case though, this loosening of
the assertion should be fine given that the reftable library would never
write files with leading dots by itself.
[1]: https://nfs.sourceforge.net/#faq_d2
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
* ps/reftable-stack-tempfile:
reftable/stack: register compacted tables as tempfiles
reftable/stack: register lockfiles during compaction
reftable/stack: register new tables as tempfiles
lockfile: report when rollback fails
|
|
Records store their keys prefix-compressed. As many records will share a
common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
of disk space. The downside of this is that it is not possible to just
seek into the middle of a block and consume the corresponding record
because it may depend on prefixes read from preceding records.
To help with this usecase, the reftable format writes every n'th record
without using prefix compression, which is called a "restart". The list
of restarts is stored at the end of each block so that a reader can
figure out entry points at which to read a full record without having to
read all preceding records.
This allows us to do a binary search over the records in a block when
searching for a particular key by iterating through the restarts until
we have found the section in which our record must be located. From
thereon we perform a linear search to locate the desired record.
This mechanism is broken though. In `block_reader_seek()` we call
`binsearch()` over the count of restarts in the current block. The
function we pass to compare records with each other computes the key at
the current index and then compares it to our search key by calling
`strbuf_cmp()`, returning its result directly. But `binsearch()` expects
us to return a truish value that indicates whether the current index is
smaller than the searched-for key. And unless our key exactly matches
the value at the restart counter we always end up returning a truish
value.
The consequence is that `binsearch()` essentially always returns 0,
indicacting to us that we must start searching right at the beginning of
the block. This works by chance because we now always do a linear scan
from the start of the block, and thus we would still end up finding the
desired record. But needless to say, this makes the optimization quite
useless.
Fix this bug by returning whether the current key is smaller than the
searched key. As the current behaviour was correct it is not possible to
write a test. Furthermore it is also not really possible to demonstrate
in a benchmark that this fix speeds up seeking records.
This may cause the reader to question whether this binary search makes
sense in the first place if it doesn't even help with performance. But
it would end up helping if we were to read a reftable with a much larger
block size. Blocks can be up to 16MB in size, in which case it will
become much more important to avoid the linear scan. We are not yet
ready to read or write such larger blocks though, so we have to live
without a benchmark demonstrating this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When decoding records it is customary to reuse a `struct
reftable_ref_record` across calls. Thus, it may happen that the record
already holds some allocated memory. When decoding ref and log records
we handle this by releasing or reallocating held memory. But we fail to
do this for object records, which causes us to leak memory.
Fix this memory leak by releasing object records before we decode into
them. We may eventually want to reuse memory instead to avoid needless
reallocations. But for now, let's just plug the leak and be done.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We do not register tables resulting from stack compaction with the
tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register compacted tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We do not register any of the locks we acquire when compacting the
reftable stack via our lockfiles interfaces. These locks will thus not
be released when Git gets killed.
Refactor the code to register locks as lockfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We do not register new tables which we're about to add to the stack with
the tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When decoding log records we need a temporary buffer to decode the
reflog entry's name, mail address and message. As this buffer is local
to the function we thus have to reallocate it for every single log
record which we're about to decode, which is inefficient.
Refactor the code such that callers need to pass in a scratch buffer,
which allows us to reuse it for multiple decodes. This reduces the
number of allocations when iterating through reflogs. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated
Note that this commit also drop some redundant calls to `strbuf_reset()`
right before calling `decode_string()`. The latter already knows to
reset the buffer, so there is no need for these.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Same as the preceding commit we can allocate log messages as needed when
decoding log records, thus further reducing the number of allocations.
Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When decoding a log record we always reallocate their refname arrays.
This results in quite a lot of needless allocation churn.
Refactor the code to grow the array as required only. Like this, we
should usually only end up reallocating the array a small handful of
times when iterating over many refs. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each reflog entry contains information regarding the authorship of who
has made the change. This authorship information is not the same as that
of any of the commits that the reflog entry references, but instead
corresponds to the local user that has executed the command. Thus, it is
almost always the case that all reflog entries have the same author.
We can make use of this fact when decoding reftable records: instead of
freeing and then reallocating the authorship information of log records,
we can special-case when the next record during an iteration has the
exact same authorship as the preceding record. If so, then there is no
need to reallocate the respective fields.
This change results in two allocations less per log record that we're
iterating over in the most common case. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated
An alternative would be to store the capacity of both name and email and
then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array.
But reftable records are copied around quite a lot, and thus we need to
be a bit mindful of the overall record size. Furthermore, a memory
comparison should also be more efficient than having to copy over memory
even if we wouldn't have to allocate a new array every time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 7af607c58d (reftable/record: store "val1" hashes as static arrays,
2024-01-03) and b31e3cc620 (reftable/record: store "val2" hashes as
static arrays, 2024-01-03) we have converted ref records to store their
object IDs in a static array. Convert log records to do the same so that
their old and new object IDs are arrays, too.
This change results in two allocations less per log record that we're
iterating over. Before:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 13,473 bytes in 122 blocks
total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We have a few functions which are basically just accessors to
structures. As those functions are executed inside the hot loop when
iterating through many refs, the fact that they cannot be inlined is
costing us some performance.
Move the function definitions into their respective headers so that they
can be inlined. This results in a performance improvement when iterating
over 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 105.9 ms ± 3.6 ms [User: 103.0 ms, System: 2.8 ms]
Range (min … max): 103.1 ms … 133.4 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 100.7 ms ± 3.4 ms [User: 97.8 ms, System: 2.8 ms]
Range (min … max): 97.8 ms … 124.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When reading a record from a block, we need to decode the record's key.
As reftable keys are prefix-compressed, meaning they reuse a prefix from
the preceding record's key, this is a bit more involved than just having
to copy the relevant bytes: we need to figure out the prefix and suffix
lengths, copy the prefix from the preceding record and finally copy the
suffix from the current record.
This is done by passing three buffers to `reftable_decode_key()`: one
buffer that holds the result, one buffer that holds the last key, and
one buffer that points to the current record. The final key is then
assembled by calling `strbuf_add()` twice to copy over the prefix and
suffix.
Performing two memory copies is inefficient though. And we can indeed do
better by decoding keys in place. Instead of providing two buffers, the
caller may only call a single buffer that is already pre-populated with
the last key. Like this, we only have to call `strbuf_setlen()` to trim
the record to its prefix and then `strbuf_add()` to add the suffix.
This refactoring leads to a noticeable performance bump when iterating
over 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 112.2 ms ± 3.9 ms [User: 109.3 ms, System: 2.8 ms]
Range (min … max): 109.2 ms … 149.6 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 106.0 ms ± 3.5 ms [User: 103.2 ms, System: 2.7 ms]
Range (min … max): 103.2 ms … 133.7 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Do the same optimization as in the preceding commit, but this time for
`reftable_record_copy()`. While not as noticeable, it still results in a
small speedup when iterating over 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 114.0 ms ± 3.8 ms [User: 111.1 ms, System: 2.7 ms]
Range (min … max): 110.9 ms … 144.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 112.5 ms ± 3.7 ms [User: 109.5 ms, System: 2.8 ms]
Range (min … max): 109.2 ms … 140.7 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.01 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When decoding a reftable record we will first release the user-provided
record and then decode the new record into it. This is quite inefficient
as we basically need to reallocate at least the refname every time.
Refactor the function to start tracking the refname capacity. Like this,
we can stow away the refname, release, restore and then grow the refname
to the required number of bytes via `REFTABLE_ALLOC_GROW()`.
This refactoring is safe to do because all functions that assigning to
the refname will first call `reftable_ref_record_release()`, which will
zero out the complete record after releasing memory.
This change results in a nice speedup when iterating over 1 million
refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.7 ms]
Range (min … max): 120.4 ms … 152.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 114.4 ms ± 3.7 ms [User: 111.5 ms, System: 2.7 ms]
Range (min … max): 111.0 ms … 152.1 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Furthermore, with this change we now perform a mostly constant number of
allocations when iterating. Before this change:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated
After this change:
HEAP SUMMARY:
in use at exit: 13,603 bytes in 125 blocks
total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When calling `merged_iter_next_void()` we first check whether the iter
has been exhausted already. We already perform this check two levels
down the stack in `merged_iter_next_entry()` though, which makes this
check redundant.
Now if this check was there to accelerate the common case it might have
made sense to keep it. But the iterator being exhausted is rather the
uncommon case because you can expect most reftable stacks to contain
more than two refs.
Simplify the code by removing the check. As `merged_iter_next_void()` is
basically empty except for calling `merged_iter_next()` now, merge these
two functions. This also results in a tiny speedup when iterating over
many refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 125.6 ms ± 3.8 ms [User: 122.7 ms, System: 2.8 ms]
Range (min … max): 122.4 ms … 153.4 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.8 ms]
Range (min … max): 120.1 ms … 156.4 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The merged iterator uses a priority queue to order records so that we
can yielid them in the expected order. This priority queue of course
comes with some overhead as we need to add, compare and remove entries
in that priority queue.
In the general case, that overhead cannot really be avoided. But when we
have a single subiter left then there is no need to use the priority
queue anymore because the order is exactly the same as what that subiter
would return.
While having a single subiter may sound like an edge case, it happens
more frequently than one might think. In the most common scenario, you
can expect a repository to have a single large table that contains most
of the records and then a set of smaller tables which contain later
additions to the reftable stack. In this case it is quite likely that we
exhaust subiters of those smaller stacks before exhausting the large
table.
Special-case this and return records directly from the remaining
subiter. This results in a sizeable speedup when iterating over 1m refs
in a repository with a single table:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 135.4 ms ± 4.4 ms [User: 132.5 ms, System: 2.8 ms]
Range (min … max): 131.0 ms … 166.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 126.3 ms ± 3.9 ms [User: 123.3 ms, System: 2.8 ms]
Range (min … max): 122.7 ms … 157.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When advancing one of the subiters fails we immediately release
resources associated with that subiter. This is not necessary though as
we will release these resources when closing the merged iterator anyway.
Drop the logic and only release resources when the merged iterator is
done. This is a mere cleanup that should help reduce the cognitive load
when reading through the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Whenever we advance a subiter we first call `iterator_is_null()`. This
is not needed though because we only ever advance subiters which have
entries in the priority queue, and we do not end entries to the priority
queue when the subiter has been exhausted.
Drop the check as well as the now-unused function. This results in a
surprisingly big speedup:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 138.1 ms ± 4.4 ms [User: 135.1 ms, System: 2.8 ms]
Range (min … max): 133.4 ms … 167.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 134.4 ms ± 4.2 ms [User: 131.5 ms, System: 2.8 ms]
Range (min … max): 130.0 ms … 164.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
For each subiterator, the merged table needs to track their current
record. This record is owned by the priority queue though instead of by
the merged iterator. This is not optimal performance-wise.
For one, we need to move around records whenever we add or remove a
record from the priority queue. Thus, the bigger the entries the more
bytes we need to copy around. And compared to pointers, a reftable
record is rather on the bigger side. The other issue is that this makes
it harder to reuse the records.
Refactor the code so that the merged iterator tracks ownership of the
records per-subiter. Instead of having records in the priority queue, we
can now use mere pointers to the per-subiter records. This also allows
us to swap records between the caller and the per-subiter record instead
of doing an actual copy via `reftable_record_copy_from()`, which removes
the need to release the caller-provided record.
This results in a noticeable speedup when iterating through many refs.
The following benchmark iterates through 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 145.5 ms ± 4.5 ms [User: 142.5 ms, System: 2.8 ms]
Range (min … max): 141.3 ms … 177.0 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 139.0 ms ± 4.7 ms [User: 136.1 ms, System: 2.8 ms]
Range (min … max): 134.2 ms … 182.2 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
This refactoring also allows a subsequent refactoring where we start
reusing memory allocated by the reftable records because we do not need
to release the caller-provided record anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When advancing the merged iterator, we pop the topmost entry from its
priority queue and then advance the sub-iterator that the entry belongs
to, adding the result as a new entry. This is quite sensible in the case
where the merged iterator is used to actually iterate through records.
But the merged iterator is also used when we look up a single record,
only, so advancing the sub-iterator is wasted effort because we would
never even look at the result.
Instead of immediately advancing the sub-iterator, we can also defer
this to the next iteration of the merged iterator by storing the
intent-to-advance. This results in a small speedup when reading many
records. The following benchmark creates 10000 refs, which will also end
up with many ref lookups:
Benchmark 1: update-ref: create many refs (revision = HEAD~)
Time (mean ± σ): 337.2 ms ± 7.3 ms [User: 200.1 ms, System: 136.9 ms]
Range (min … max): 329.3 ms … 373.2 ms 100 runs
Benchmark 2: update-ref: create many refs (revision = HEAD)
Time (mean ± σ): 332.5 ms ± 5.9 ms [User: 197.2 ms, System: 135.1 ms]
Range (min … max): 327.6 ms … 359.8 ms 100 runs
Summary
update-ref: create many refs (revision = HEAD) ran
1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~)
While this speedup alone isn't really worth it, this refactoring will
also allow two additional optimizations in subsequent patches. First, it
will allow us to special-case when there is only a single sub-iter left
to circumvent the priority queue altogether. And second, it makes it
easier to avoid copying records to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `merged_iter` structure is not used anywhere outside of "merged.c",
but is declared in its header. Move it into the code file so that it is
clear that its implementation details are never exposed to anything.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable priority queue is used by the merged iterator to yield
records from its sub-iterators in the expected order. Each entry has a
record corresponding to such a sub-iterator as well as an index that
indicates which sub-iterator the record belongs to. But while the
sub-iterators are tracked with a `size_t`, we store the index as an
`int` in the entry.
Fix this and use `size_t` consistently.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to iterate over refs with the reftable backend has seen
some optimization.
* ps/reftable-iteration-perf:
reftable/reader: add comments to `table_iter_next()`
reftable/record: don't try to reallocate ref record name
reftable/block: swap buffers instead of copying
reftable/pq: allocation-less comparison of entry keys
reftable/merged: skip comparison for records of the same subiter
reftable/merged: allocation-less dropping of shadowed records
reftable/record: introduce function to compare records by key
|
|
Rewrite //-comments to /* comments */ in files whose comments
prevalently use the latter.
* jc/comment-style-fixes:
reftable/pq_test: comment style fix
merge-ort.c: comment style fix
builtin/worktree: comment style fixes
|
|
Code clean-up in various reftable code paths.
* ps/reftable-styles:
reftable/record: improve semantics when initializing records
reftable/merged: refactor initialization of iterators
reftable/merged: refactor seeking of records
reftable/stack: use `size_t` to track stack length
reftable/stack: use `size_t` to track stack slices during compaction
reftable/stack: index segments with `size_t`
reftable/stack: fix parameter validation when compacting range
reftable: introduce macros to allocate arrays
reftable: introduce macros to grow arrays
|
|
Write multi-level indices for reftable has been corrected.
* ps/reftable-multi-level-indices-fix:
reftable: document reading and writing indices
reftable/writer: fix writing multi-level indices
reftable/writer: simplify writing index records
reftable/writer: use correct type to iterate through index entries
reftable/reader: be more careful about errors in indexed seeks
|
|
While working on the optimizations in the preceding patches I stumbled
upon `table_iter_next()` multiple times. It is quite easy to miss the
fact that we don't call `table_iter_next_in_block()` twice, but that the
second call is in fact `table_iter_next_block()`.
Add comments to explain what exactly is going on here to make things
more obvious. While at it, touch up the code to conform to our code
style better.
Note that one of the refactorings merges two conditional blocks into
one. Before, we had the following code:
```
err = table_iter_next_block(&next, ti);
if (err != 0) {
ti->is_finished = 1;
}
table_iter_block_done(ti);
if (err != 0) {
return err;
}
```
As `table_iter_block_done()` does not care about `is_finished`, the
conditional blocks can be merged into one block:
```
err = table_iter_next_block(&next, ti);
table_iter_block_done(ti);
if (err != 0) {
ti->is_finished = 1;
return err;
}
```
This is both easier to reason about and more performant because we have
one branch less.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When decoding reftable ref records we first release the pointer to the
record passed to us and then use realloc(3P) to allocate the refname
array. This is a bit misleading though as we know at that point that the
refname will always be `NULL`, so we would always end up allocating a
new char array anyway.
Refactor the code to use `REFTABLE_ALLOC_ARRAY()` instead. As the
following benchmark demonstrates this is a tiny bit more efficient. But
the bigger selling point really is the gained clarity.
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 150.1 ms ± 4.1 ms [User: 146.6 ms, System: 3.3 ms]
Range (min … max): 144.5 ms … 180.5 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 148.9 ms ± 4.5 ms [User: 145.2 ms, System: 3.4 ms]
Range (min … max): 143.0 ms … 185.4 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Ideally, we should try and reuse the memory of the old record instead of
first freeing and then immediately reallocating it. This requires some
more surgery though and is thus left for a future iteration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When iterating towards the next record in a reftable block we need to
keep track of the key that the last record had. This is required because
reftable records use prefix compression, where subsequent records may
reuse parts of their preceding record's key.
This key is stored in the `block_iter::last_key`, which we update after
every call to `block_iter_next()`: we simply reset the buffer and then
add the current key to it.
This is a bit inefficient though because it requires us to copy over the
key on every iteration, which adds up when iterating over many records.
Instead, we can make use of the fact that the `block_iter::key` buffer
is basically only a scratch buffer. So instead of copying over contents,
we can just swap both buffers.
The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1):
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 155.7 ms ± 5.0 ms [User: 152.1 ms, System: 3.4 ms]
Range (min … max): 150.8 ms … 185.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 150.8 ms ± 4.2 ms [User: 147.1 ms, System: 3.5 ms]
Range (min … max): 145.1 ms … 180.7 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.03 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.
To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.
This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.
Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.
The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 224.4 ms ± 6.5 ms [User: 220.6 ms, System: 3.6 ms]
Range (min … max): 216.5 ms … 261.1 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 172.9 ms ± 4.4 ms [User: 169.2 ms, System: 3.6 ms]
Range (min … max): 166.5 ms … 204.6 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When retrieving the next entry of a merged iterator we need to drop all
records of other sub-iterators that would be shadowed by the record that
we are about to return. We do this by comparing record keys, dropping
all keys that are smaller or equal to the key of the record we are about
to return.
There is an edge case here where we can skip that comparison: when the
record in the priority queue comes from the same subiterator as the
record we are about to return then we know that its key must be larger
than the key of the record we are about to return. This property is
guaranteed by the sub-iterators, and if it didn't hold then the whole
merged iterator would return records in the wrong order, too.
While this may seem like a very specific edge case it's in fact quite
likely to happen. For most repositories out there you can assume that we
will end up with one large table and several smaller ones on top of it.
Thus, it is very likely that the next entry will sort towards the top of
the priority queue.
Special case this and break out of the loop in that case. The following
benchmark uses git-show-ref(1) to print a single ref matching a pattern
out of 1 million refs:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 162.6 ms ± 4.5 ms [User: 159.0 ms, System: 3.5 ms]
Range (min … max): 156.6 ms … 188.5 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 156.8 ms ± 4.7 ms [User: 153.0 ms, System: 3.6 ms]
Range (min … max): 151.4 ms … 188.4 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The purpose of the merged reftable iterator is to iterate through all
entries of a set of tables in the correct order. This is implemented by
using a sub-iterator for each table, where the next entry of each of
these iterators gets put into a priority queue. For each iteration, we
do roughly the following steps:
1. Retrieve the top record of the priority queue. This is the entry we
want to return to the caller.
2. Retrieve the next record of the sub-iterator that this record came
from. If any, add it to the priority queue at the correct position.
The position is determined by comparing the record keys, which e.g.
corresponds to the refname for ref records.
3. Keep removing the top record of the priority queue until we hit the
first entry whose key is larger than the returned record's key.
This is required to drop "shadowed" records.
The last step will lead to at least one comparison to the next entry,
but may lead to many comparisons in case the reftable stack consists of
many tables with shadowed records. It is thus part of the hot code path
when iterating through records.
The code to compare the entries with each other is quite inefficient
though. Instead of comparing record keys with each other directly, we
first format them into `struct strbuf`s and only then compare them with
each other. While we already optimized this code path to reuse buffers
in 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), the cost to format the keys into the buffers still adds up
quite significantly.
Refactor the code to use `reftable_record_cmp()` instead, which has been
introduced in the preceding commit. This function compares records with
each other directly without requiring any memory allocations or copying
and is thus way more efficient.
The following benchmark uses git-show-ref(1) to print a single ref
matching a pattern out of 1 million refs. This is the most direct way to
exercise ref iteration speed as we remove all overhead of having to show
the refs, too.
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 180.7 ms ± 4.7 ms [User: 177.1 ms, System: 3.4 ms]
Range (min … max): 174.9 ms … 211.7 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 162.1 ms ± 4.4 ms [User: 158.5 ms, System: 3.4 ms]
Range (min … max): 155.4 ms … 189.3 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In some places we need to sort reftable records by their keys to
determine their ordering. This is done by first formatting the keys into
a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
logic is needlessly roundabout and can end up costing quite a bit of CPU
cycles, both due to the allocation and formatting logic.
Introduce a new `reftable_record_cmp()` function that knows how to
compare two records with each other without requiring allocations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
|
|
Rewrite //-comments to /* comments */ in files whose comments
prevalently use the latter.
* jc/comment-style-fixes:
reftable/pq_test: comment style fix
merge-ort.c: comment style fix
builtin/worktree: comment style fixes
|
|
Reftable bugfix.
* ps/reftable-compacted-tables-permission-fix:
reftable/stack: adjust permissions of compacted tables
|
|
The write codepath for the reftable data learned to honor
core.fsync configuration.
* jc/reftable-core-fsync:
reftable/stack: fsync "tables.list" during compaction
reftable: honor core.fsync
|
|
According to our usual coding style, the `reftable_new_record()`
function would indicate that it is allocating a new record. This is not
the case though as the function merely initializes records without
allocating any memory.
Replace `reftable_new_record()` with a new `reftable_record_init()`
function that takes a record pointer as input and initializes it
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor the initialization of the merged iterator to fit our code style
better. This refactoring prepares the code for a refactoring of how
records are being initialized.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The code to seek reftable records in the merged table code is quite hard
to read and does not conform to our coding style in multiple ways:
- We have multiple exit paths where we release resources even though
that is not really necessary.
- We use a scoped error variable `e` which is hard to reason about.
This variable is not required at all.
- We allocate memory in the variable declarations, which is easy to
miss.
Refactor the function so that it becomes more maintainable in the
future.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While the stack length is already stored as `size_t`, we frequently use
`int`s to refer to those stacks throughout the reftable library. Convert
those cases to use `size_t` instead to make things consistent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We use `int`s to track reftable slices when compacting the reftable
stack, which is considered to be a code smell in the Git project.
Convert the code to use `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We use `int`s to index into arrays of segments and track the length of
them, which is considered to be a code smell in the Git project. Convert
the code to use `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `stack_compact_range()` function receives a "first" and "last" index
that indicates which tables of the reftable stack should be compacted.
Naturally, "first" must be smaller than "last" in order to identify a
proper range of tables to compress, which we indeed also assert in the
function. But the validations happens after we have already allocated
arrays with a size of `last - first + 1`, leading to an underflow and
thus an invalid allocation size.
Fix this by reordering the array allocations to happen after we have
validated parameters. While at it, convert the array allocations to use
the newly introduced macros.
Note that the relevant variables pointing into arrays should also be
converted to use `size_t` instead of `int`. This is left for a later
commit in this series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the preceding commit, let's carry over macros to allocate
arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This
requires us to change the signature of `reftable_calloc()`, which only
takes a single argument right now and thus puts the burden on the caller
to calculate the final array's size. This is a net improvement though as
it means that we can now provide proper overflow checks when multiplying
the array size with the member size.
Convert callsites of `reftable_calloc()` to the new signature and start
using the new macros where possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Throughout the reftable library we have many cases where we need to grow
arrays. In order to avoid too many reallocations, we roughly double the
capacity of the array on each iteration. The resulting code pattern is
duplicated across many sites.
We have similar patterns in our main codebase, which is why we have
eventually introduced an `ALLOC_GROW()` macro to abstract it away and
avoid some code duplication. We cannot easily reuse this macro here
though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will
call realloc(3P) to grow the array. The reftable code is structured as a
library though (even if the boundaries are fuzzy), and one property this
brings with it is that it is possible to plug in your own allocators. So
instead of using realloc(3P), we need to use `reftable_realloc()` that
knows to use the user-provided implementation.
So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and
`REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase,
with two modifications:
- They use `reftable_realloc()`, as explained above.
- They use a different growth factor of `2 * cap + 1` instead of `(cap
+ 16) * 3 / 2`.
The second change is because we know a bit more about the allocation
patterns in the reftable library. In most cases, we end up only having a
handful of items in the array and don't end up growing them. The initial
capacity that our normal growth factor uses (which is 24) would thus end
up over-allocating in a lot of code paths. This effect is measurable:
- Before change:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated
- After change with a growth factor of `(2 * alloc + 1)`:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated
- After change with a growth factor of `(alloc + 16)* 2 / 3`:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated
While the total heap usage is roughly the same, we do end up allocating
significantly more bytes with our usual growth factor (in fact, roughly
21 times as many).
Convert the reftable library to use these new macros.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The way the index gets written and read is not trivial at all and
requires the reader to piece together a bunch of parts to figure out how
it works. Add some documentation to hopefully make this easier to
understand for the next reader.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When finishing a section we will potentially write an index that makes
it more efficient to look up relevant blocks. The index records written
will encode, for each block of the indexed section, what the offset of
that block is as well as the last key of that block. Thus, the reader
would iterate through the index records to find the first key larger or
equal to the wanted key and then use the encoded offset to look up the
desired block.
When there are a lot of blocks to index though we may end up writing
multiple index blocks, too. To not require a linear search across all
index blocks we instead end up writing a multi-level index. Instead of
referring to the block we are after, an index record may point to
another index block. The reader will then access the highest-level index
and follow down the chain of index blocks until it hits the sought-after
block.
It has been observed though that it is impossible to seek ref records of
the last ref block when using a multi-level index. While the multi-level
index exists and looks fine for most of the part, the highest-level
index was missing an index record pointing to the last block of the next
index. Thus, every additional level made more refs become unseekable at
the end of the ref section.
The root cause is that we are not flushing the last block of the current
level once done writing the level. Consequently, it wasn't recorded in
the blocks that need to be indexed by the next-higher level and thus we
forgot about it.
Fix this bug by flushing blocks after we have written all index records.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When finishing the current section some index records might be written
for the section to the table. The logic that adds these records to the
writer duplicates what we already have in `writer_add_record()`, making
this more complicated than it really has to be.
Simplify the code by using `writer_add_record()` instead. While at it,
drop the unneeded braces around a loop to make the code conform to our
code style better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable writer is tracking the number of blocks it has to index via
the `index_len` variable. But while this variable is of type `size_t`,
some sites use an `int` to loop through the index entries.
Convert the code to consistently use `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When doing an indexed seek we first need to do a linear seek in order to
find the index block for our wanted key. We do not check the returned
error of the linear seek though. This is likely not an issue because the
next call to `table_iter_next()` would return error, too. But it very
much is a code smell when an error variable is being assigned to without
actually checking it.
Safeguard the code by checking for errors.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 1df18a1c9a (reftable: honor core.fsync, 2024-01-23), we have added
code to fsync both newly written reftables as well as "tables.list" to
disk. But there are two code paths where "tables.list" is being written:
- When appending a new table due to a normal ref update.
- When compacting a range of tables during compaction.
We have only addressed the former code path, but do not yet sync the new
"tables.list" file in the latter. Fix this omission.
Note that we are not yet adding any tests. These tests will be added
once the "reftable" backend has been upstreamed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Low-level I/O optimization for reftable.
* ps/reftable-optimize-io:
reftable/stack: fix race in up-to-date check
reftable/stack: unconditionally reload stack after commit
reftable/blocksource: use mmap to read tables
reftable/blocksource: refactor code to match our coding style
reftable/stack: use stat info to avoid re-reading stack list
reftable/stack: refactor reloading to use file descriptor
reftable/stack: refactor stack reloading to have common exit path
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When creating a new compacted table from a range of preexisting ones we
don't set the default permissions on the resulting table when specified
by the user. This has the effect that the "core.sharedRepository" config
will not be honored correctly.
Fix this bug and add a test to catch this issue. Note that we only test
on non-Windows platforms because Windows does not use POSIX permissions
natively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While the reffiles backend honors configured fsync settings, the
reftable backend does not. Address this by fsyncing reftable files using
the write-or-die api's fsync_component() in two places: when we
add additional entries into the table, and when we close the reftable
writer.
This commits adds a flush function pointer as a new member of
reftable_writer because we are not sure that the first argument to the
*write function pointer always contains a file descriptor. In the case of
strbuf_add_void, the first argument is a buffer. This way, we can pass
in a corresponding flush function that knows how to flush depending on
which writer is being used.
This patch does not contain tests as they will need to wait for another
patch to start to exercise the reftable backend. At that point, the
tests will be added to observe that fsyncs are happening when the
reftable is in use.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack
list, 2024-01-11) we have introduced a new mechanism to avoid re-reading
the table list in case stat(3P) figures out that the stack didn't change
since the last time we read it.
While this change significantly improved performance when writing many
refs, it can unfortunately lead to false negatives in very specific
scenarios. Given two processes A and B, there is a feasible sequence of
events that cause us to accidentally treat the table list as up-to-date
even though it changed:
1. A reads the reftable stack and caches its stat info.
2. B updates the stack, appending a new table to "tables.list". This
will both use a new inode and result in a different file size, thus
invalidating A's cache in theory.
3. B decides to auto-compact the stack and merges two tables. The file
size now matches what A has cached again. Furthermore, the
filesystem may decide to recycle the inode number of the file we
have replaced in (2) because it is not in use anymore.
4. A reloads the reftable stack. Neither the inode number nor the
file size changed. If the timestamps did not change either then we
think the cached copy of our stack is up-to-date.
In fact, the commit introduced three related issues:
- Non-POSIX compliant systems may not report proper `st_dev` and
`st_ino` values in stat(3P), which made us rely solely on the
file's potentially coarse-grained mtime and ctime.
- `stat_validity_check()` and friends may end up not comparing
`st_dev` and `st_ino` depending on the "core.checkstat" config,
again reducing the signal to the mtime and ctime.
- `st_ino` can be recycled, rendering the check moot even on
POSIX-compliant systems.
Given that POSIX defines that "The st_ino and st_dev fields taken
together uniquely identify the file within the system", these issues led
to the most important signal to establish file identity to be ignored or
become useless in some cases.
Refactor the code to stop using `stat_validity_check()`. Instead, we
manually stat(3P) the file descriptors to make relevant information
available. On Windows and MSYS2 the result will have both `st_dev` and
`st_ino` set to 0, which allows us to address the first issue by not
using the stat-based cache in that case. It also allows us to make sure
that we always compare `st_dev` and `st_ino`, addressing the second
issue.
The third issue of inode recycling can be addressed by keeping the file
descriptor of "files.list" open during the lifetime of the reftable
stack. As the file will still exist on disk even though it has been
unlinked it is impossible for its inode to be recycled as long as the
file descriptor is still open.
This should address the race in a POSIX-compliant way. The only real
downside is that this mechanism cannot be used on non-POSIX-compliant
systems like Windows. But we at least have the second-level caching
mechanism in place that compares contents of "files.list" with the
currently loaded list of tables.
This new mechanism performs roughly the same as the previous one that
relied on `stat_validity_check()`:
Benchmark 1: update-ref: create many refs (HEAD~)
Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s]
Range (min … max): 4.694 s … 4.802 s 20 runs
Benchmark 2: update-ref: create many refs (HEAD)
Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s]
Range (min … max): 4.691 s … 4.753 s 20 runs
Summary
update-ref: create many refs (HEAD~) ran
1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
After we have committed an addition to the reftable stack we call
`reftable_stack_reload()` to reload the stack and thus reflect the
changes that were just added. This function will only conditionally
reload the stack in case `stack_uptodate()` tells us that the stack
needs reloading. This check is wasteful though because we already know
that the stack needs reloading.
Call `reftable_stack_reload_maybe_reuse()` instead, which will
unconditionally reload the stack. This is merely a conceptual fix, the
code in question was not found to cause any problems in practice.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
More fixes and optimizations to the reftable backend.
* ps/reftable-fixes-and-optims:
reftable/merged: transfer ownership of records when iterating
reftable/merged: really reuse buffers to compute record keys
reftable/record: store "val2" hashes as static arrays
reftable/record: store "val1" hashes as static arrays
reftable/record: constify some parts of the interface
reftable/writer: fix index corruption when writing multiple indices
reftable/stack: do not auto-compact twice in `reftable_stack_add()`
reftable/stack: do not overwrite errors when compacting
|
|
The blocksource interface provides an interface to read blocks from a
reftable table. This interface is implemented using read(3P) calls on
the underlying file descriptor. While this works alright, this pattern
is very inefficient when repeatedly querying the reftable stack for one
or more refs. This inefficiency can mostly be attributed to the fact
that we often need to re-read the same blocks over and over again, and
every single time we need to call read(3P) again.
A natural fit in this context is to use mmap(3P) instead of read(3P),
which has a bunch of benefits:
- We do not need to come up with a caching strategy for some of the
blocks as this will be handled by the kernel already.
- We can avoid the overhead of having to call into the read(3P)
syscall repeatedly.
- We do not need to allocate returned blocks repeatedly, but can
instead hand out pointers into the mmapped region directly.
Using mmap comes with a significant drawback on Windows though, because
mmapped files cannot be deleted and neither is it possible to rename
files onto an mmapped file. But for one, the reftable library gracefully
handles the case where auto-compaction cannot delete a still-open stack
already and ignores any such errors. Also, `reftable_stack_clean()` will
prune stale tables which are not referenced by "tables.list" anymore so
that those files can eventually be pruned. And second, we never rewrite
already-written stacks, so it does not matter that we cannot rename a
file over an mmaped file, either.
Another unfortunate property of mmap is that it is not supported by all
systems. But given that the size of reftables should typically be rather
limited (megabytes at most in the vast majority of repositories), we can
use the fallback implementation provided by `git_mmap()` which reads the
whole file into memory instead. This is the same strategy that the
"packed" backend uses.
While this change doesn't significantly improve performance in the case
where we're seeking through stacks once (like e.g. git-for-each-ref(1)
would). But it does speed up usecases where there is lots of random
access to refs, e.g. when writing. The following benchmark demonstrates
these savings with git-update-ref(1) creating N refs in an otherwise
empty repository:
Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.5 ms, System: 2.5 ms]
Range (min … max): 4.8 ms … 7.1 ms 111 runs
Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
Time (mean ± σ): 14.8 ms ± 0.5 ms [User: 7.1 ms, System: 7.5 ms]
Range (min … max): 14.1 ms … 18.7 ms 84 runs
Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
Time (mean ± σ): 926.4 ms ± 5.6 ms [User: 448.5 ms, System: 477.7 ms]
Range (min … max): 920.0 ms … 936.1 ms 10 runs
Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
Time (mean ± σ): 5.0 ms ± 0.2 ms [User: 2.4 ms, System: 2.5 ms]
Range (min … max): 4.7 ms … 5.4 ms 111 runs
Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
Time (mean ± σ): 10.5 ms ± 0.2 ms [User: 5.0 ms, System: 5.3 ms]
Range (min … max): 10.0 ms … 10.9 ms 93 runs
Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
Time (mean ± σ): 529.6 ms ± 9.1 ms [User: 268.0 ms, System: 261.4 ms]
Range (min … max): 522.4 ms … 547.1 ms 10 runs
Summary
update-ref: create many refs (refcount = 1, revision = HEAD) ran
1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
2.08 ± 0.07 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
2.95 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
105.33 ± 3.76 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
184.24 ± 5.89 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
Theoretically, we could also replicate the strategy of the "packed"
backend where small tables are read into memory instead of using mmap.
Benchmarks did not confirm that this has a performance benefit though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Refactor `reftable_block_source_from_file()` to match our coding style
better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Whenever we call into the refs interfaces we potentially have to reload
refs in case they have been concurrently modified, either in-process or
externally. While this happens somewhat automatically for loose refs
because we simply try to re-read the files, the "packed" backend will
reload its snapshot of the packed-refs file in case its stat info has
changed since last reading it.
In the reftable backend we have a similar mechanism that is provided by
`reftable_stack_reload()`. This function will read the list of stacks
from "tables.list" and, if they have changed from the currently stored
list, reload the stacks. This is heavily inefficient though, as we have
to check whether the stack is up-to-date on basically every read and
thus keep on re-reading the file all the time even if it didn't change
at all.
We can do better and use the same stat(3P)-based mechanism that the
"packed" backend uses. Instead of reading the file, we will only open
the file descriptor, fstat(3P) it, and then compare the info against the
cached value from the last time we have updated the stack. This should
always work alright because "tables.list" is updated atomically via a
rename, so even if the ctime or mtime wasn't granular enough to identify
a change, at least the inode number or file size should have changed.
This change significantly speeds up operations where many refs are read,
like when using git-update-ref(1). The following benchmark creates N
refs in an otherwise-empty repository via `git update-ref --stdin`:
Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms]
Range (min … max): 4.8 ms … 7.2 ms 109 runs
Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
Time (mean ± σ): 19.1 ms ± 0.9 ms [User: 8.9 ms, System: 9.9 ms]
Range (min … max): 18.4 ms … 26.7 ms 72 runs
Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
Time (mean ± σ): 1.336 s ± 0.018 s [User: 0.590 s, System: 0.724 s]
Range (min … max): 1.314 s … 1.373 s 10 runs
Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms]
Range (min … max): 4.8 ms … 7.2 ms 109 runs
Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
Time (mean ± σ): 14.8 ms ± 0.2 ms [User: 7.1 ms, System: 7.5 ms]
Range (min … max): 14.2 ms … 15.2 ms 82 runs
Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
Time (mean ± σ): 927.6 ms ± 5.3 ms [User: 437.8 ms, System: 489.5 ms]
Range (min … max): 919.4 ms … 936.4 ms 10 runs
Summary
update-ref: create many refs (refcount = 1, revision = HEAD) ran
1.00 ± 0.07 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
2.89 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
3.74 ± 0.25 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
181.26 ± 8.30 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
261.01 ± 12.35 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We're about to introduce a stat(3P)-based caching mechanism to reload
the list of stacks only when it has changed. In order to avoid race
conditions this requires us to have a file descriptor available that we
can use to call fstat(3P) on.
Prepare for this by converting the code to use `fd_read_lines()` so that
we have the file descriptor readily available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `reftable_stack_reload_maybe_reuse()` function is responsible for
reloading the reftable list from disk. The function is quite hard to
follow though because it has a bunch of different exit paths, many of
which have to free the same set of resources.
Refactor the function to have a common exit path. While at it, touch up
the style of this function a bit to match our usual coding style better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
|
|
When iterating over records with the merged iterator we put the records
into a priority queue before yielding them to the caller. This means
that we need to allocate the contents of these records before we can
pass them over to the caller.
The handover to the caller is quite inefficient though because we first
deallocate the record passed in by the caller and then copy over the new
record, which requires us to reallocate memory.
Refactor the code to instead transfer ownership of the new record to the
caller. So instead of reallocating all contents, we now release the old
record and then copy contents of the new record into place.
The following benchmark of `git show-ref --quiet` in a repository with
around 350k refs shows a clear improvement. Before:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated
This shows that we now have roundabout a single allocation per record
that we're yielding from the iterator. Ideally, we'd also get rid of
this allocation so that the number of allocations doesn't scale with the
number of refs anymore. This would require some larger surgery though
because the memory is owned by the priority queue before transferring it
over to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), we have refactored the merged iterator to reuse a pair of
long-living strbufs by relying on the fact that `reftable_record_key()`
tries to reuse already allocated strbufs by calling `strbuf_reset()`,
which should give us significantly fewer reallocations compared to the
old code that used on-stack strbufs that are allocated for each and
every iteration. Unfortunately, we called `strbuf_release()` on these
long-living strbufs that we meant to reuse on each iteration, defeating
the optimization.
Fix this performance issue by not releasing those buffers on iteration
anymore, where we instead rely on `merged_iter_close()` to release the
buffers for us.
Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Similar to the preceding commit, convert ref records of type "val2" to
store their object IDs in static arrays instead of allocating them for
every single record.
We're using the same benchmark as in the preceding commit, with `git
show-ref --quiet` in a repository with ~350k refs. This time around
though the effects aren't this huge. Before:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,163 bytes in 193 blocks
total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated
This is because "val2"-type records are typically only stored for peeled
tags, and the number of annotated tags in the benchmark repository is
rather low. Still, it can be seen that this change leads to a reduction
of allocations overall, even if only a small one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When reading ref records of type "val1", we store its object ID in an
allocated array. This results in an additional allocation for every
single ref record we read, which is rather inefficient especially when
iterating over refs.
Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ`
bytes. While this means that `struct ref_record` is bigger now, we
typically do not store all refs in an array anyway and instead only
handle a limited number of records at the same point in time.
Using `git show-ref --quiet` in a repository with ~350k refs this leads
to a significant drop in allocations. Before:
HEAP SUMMARY:
in use at exit: 21,098 bytes in 192 blocks
total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated
After:
HEAP SUMMARY:
in use at exit: 21,098 bytes in 192 blocks
total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
We're about to convert reftable records to stop storing their object IDs
as allocated hashes. Prepare for this refactoring by constifying some
parts of the interface that will be impacted by this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each reftable may contain multiple types of blocks for refs, objects and
reflog records, where each of these may have an index that makes it more
efficient to find the records. It was observed that the index for log
records can become corrupted under certain circumstances, where the
first entry of the index points into the object index instead of to the
log records.
As it turns out, this corruption can occur whenever we write a log index
as well as at least one additional index. Writing records and their index
is basically a two-step process:
1. We write all blocks for the corresponding record. Each block that
gets written is added to a list of blocks to index.
2. Once all blocks were written we finish the section. If at least two
blocks have been added to the list of blocks to index then we will
now write the index for those blocks and flush it, as well.
When we have a very large number of blocks then we may decide to write a
multi-level index, which is why we also keep track of the list of the
index blocks in the same way as we previously kept track of the blocks
to index.
Now when we have finished writing all index blocks we clear the index
and flush the last block to disk. This is done in the wrong order though
because flushing the block to disk will re-add it to the list of blocks
to be indexed. The result is that the next section we are about to write
will have an entry in the list of blocks to index that points to the
last block of the preceding section's index, which will corrupt the log
index.
Fix this corruption by clearing the index after having written the last
block.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In 5c086453ff (reftable/stack: perform auto-compaction with
transactional interface, 2023-12-11), we fixed a bug where the
transactional interface to add changes to a reftable stack did not
perform auto-compaction by calling `reftable_stack_auto_compact()` in
`reftable_stack_addition_commit()`. While correct, this change may now
cause us to perform auto-compaction twice in the non-transactional
interface `reftable_stack_add()`:
- It performs auto-compaction by itself.
- It now transitively performs auto-compaction via the transactional
interface.
Remove the first instance so that we only end up doing auto-compaction
once.
Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In order to compact multiple stacks we iterate through the merged ref
and log records. When there is any error either when reading the records
from the old merged table or when writing the records to the new table
then we break out of the respective loops. When breaking out of the loop
for the ref records though the error code will be overwritten, which may
cause us to inadvertently skip over bad ref records. In the worst case,
this can lead to a compacted stack that is missing records.
Fix the code by using `goto done` instead so that any potential error
codes are properly returned to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When iterating over entries in the block iterator we compute the key of
each of the entries and write it into a buffer. We do not reuse the
buffer though and thus re-allocate it on every iteration, which is
wasteful.
Refactor the code to reuse the buffer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are a bunch of locations where we initialize members of `struct
block_iter`, which makes it harder than necessary to expand this struct
to have additional members. Unify the logic via a new `BLOCK_ITER_INIT`
macro that initializes all members.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When iterating over entries in the merged iterator's queue, we compute
the key of each of the entries and write it into a buffer. We do not
reuse the buffer though and thus re-allocate it on every iteration,
which is wasteful given that we never transfer ownership of the
allocated bytes outside of the loop.
Refactor the code to reuse the buffer. This also fixes a potential
memory leak when `merged_iter_advance_subiter()` returns an error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing a new reftable stack, Git will first create the stack with
a random suffix so that concurrent updates will not try to write to the
same file. This random suffix is computed via a call to rand(3P). But we
never seed the function via srand(3P), which means that the suffix is in
fact always the same.
Fix this bug by using `git_rand()` instead, which does not need to be
initialized. While this function is likely going to be slower depending
on the platform, this slowness should not matter in practice as we only
use it when writing a new reftable stack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When starting a transaction via `reftable_stack_init_addition()`, we
create a lockfile for the reftable stack itself which we'll write the
new list of tables to. But if we terminate abnormally e.g. via a call to
`die()`, then we do not remove the lockfile. Subsequent executions of
Git which try to modify references will thus fail with an out-of-date
error.
Fix this bug by registering the lock as a `struct tempfile`, which
ensures automatic cleanup for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
In `reftable_stack_reload_once()` we iterate over all the tables added
to the stack in order to figure out whether any of the tables needs to
be reloaded. We use a set of buffers in this context to compute the
paths of these tables, but discard those buffers on every iteration.
This is quite wasteful given that we do not need to transfer ownership
of the allocated buffer outside of the loop.
Refactor the code to instead reuse the buffers to reduce the number of
allocations we need to do. Note that we do not have to manually reset
the buffer because `stack_filename()` does this for us already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Whenever updating references or reflog entries in the reftable stack, we
need to add a new table to the stack, thus growing the stack's length by
one. The stack can grow to become quite long rather quickly, leading to
performance issues when trying to read records. But besides performance
issues, this can also lead to exhaustion of file descriptors very
rapidly as every single table requires a separate descriptor when
opening the stack.
While git-pack-refs(1) fixes this issue for us by merging the tables, it
runs too irregularly to keep the length of the stack within reasonable
limits. This is why the reftable stack has an auto-compaction mechanism:
`reftable_stack_add()` will call `reftable_stack_auto_compact()` after
its added the new table, which will auto-compact the stack as required.
But while this logic works alright for `reftable_stack_add()`, we do not
do the same in `reftable_addition_commit()`, which is the transactional
equivalent to the former function that allows us to write multiple
updates to the stack atomically. Consequentially, we will easily run
into file descriptor exhaustion in code paths that use many separate
transactions like e.g. non-atomic fetches.
Fix this issue by calling `reftable_stack_auto_compact()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
While we have several tests that check whether we correctly perform
auto-compaction when manually calling `reftable_stack_auto_compact()`,
we don't have any tests that verify whether `reftable_stack_add()` does
call it automatically. Add one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are calls to write(3P) where we don't properly handle interrupts.
Convert them to use `write_in_full()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
There are calls to pread(3P) and read(3P) where we don't properly handle
interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,
respectively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The `EXPECT` macros used by the reftable test framework are all using a
single `if` statement with the actual condition. This results in weird
syntax when using them in if/else statements like the following:
```
if (foo)
EXPECT(foo == 2)
else
EXPECT(bar == 2)
```
Note that there need not be a trailing semicolon. Furthermore, it is not
immediately obvious whether the else now belongs to the `if (foo)` or
whether it belongs to the expanded `if (foo == 2)` from the macro.
Fix this by wrapping the macros in a do/while loop.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The speed of the merged_iter_pqueue_add() can be improved by using a
pointer to the pq_entry struct, which is 96 bytes. Since the pq_entry
param is worked directly on the stack and does not currently have a
pointer to it, the merged_iter_pqueue_add() function is slightly
slower.
References to pq_entry in reftable have typically included pointers,
such as both of the params for pq_less().
Since we are working with pointers in the pq_entry param, as keenly
pointed out, the pq_entry param has also been made into a const since
the contents of the pq_entry param are copied and not manipulated.
Signed-off-by: Elijah Conners <business@elijahpepe.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reader code passes around a "struct reftable_reader" context
variable. But the seek function doesn't need it; the table iterator we
already get is sufficient.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.
* ep/maint-equals-null-cocci:
tree-wide: apply equals-null.cocci
tree-wide: apply equals-null.cocci
contrib/coccinnelle: add equals-null.cocci
|
|
Code clean-up.
* cm/reftable-0-length-memset:
reftable: avoid undefined behaviour breaking t0032
|
|
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
1214aa841bc (reftable: add blocksource, an abstraction for random
access reads, 2021-10-07), makes the assumption that it is ok to
free a reftable_block pointing to NULL if the size is also set to
0, but implements that using a memset call that at least in glibc
based system will trigger a runtime exception if called with a
NULL pointer as its first parameter.
Avoid doing so by adding a conditional to check for the size in all
three identically looking functions that were affected, and therefore,
still allow memset to help catch callers that might incorrectly pass
a NULL pointer with a non zero size, but avoiding the exception for
the valid cases.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change the assignment syntax introduced in 66c0dabab5e (reftable: make
reftable_record a tagged union, 2022-01-20) to be portable to AIX xlc
v12.1:
avar@gcc111:[/home/avar]xlc -qversion
IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72)
Version: 12.01.0000.0000
The error emitted before this was e.g.:
"reftable/generic.c", line 133.26: 1506-196 (S) Initialization
between types "char*" and "struct reftable_ref_record" is not
allowed.
The syntax in the pre-image is supported by e.g. xlc 13.01 on a newer
AIX version:
avar@gcc119:[/home/avar]xlc -qversion
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
Version: 13.01.0003.0006
But as we've otherwise supported this compiler let's not break it
entirely if it's easy to work around it.
Suggested-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This function is part of the reftable API, so it should use the
reftable_ prefix
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The ID => ref map is trimming object IDs to a disambiguating prefix.
Check that we are computing their length correctly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When writing the same hash many times, we might decide to use a
length-1 object ID prefix for the ObjectID => ref table, which is out
of spec.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The public interface (reftable_writer) already ensures that keys are
written in strictly increasing order, and an empty key by definition
fails this check.
However, by also enforcing this at the block layer, it is easier to
verify that records (which are written into blocks) never have to
consider the possibility of empty keys.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Empty keys can only be written as ref records with empty names. The
log record has a logical timestamp in the key, so the key is never
empty.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The spec says 2 <= object_id_len <= 31. We are lenient and allow 1,
but we forbid 0, so we can be sure that we never read a 0-length key.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The build procedure has been taught to notice older version of zlib
and enable our replacement uncompress2() automatically.
* ab/auto-detect-zlib-compress2:
compat: auto-detect if zlib has uncompress2()
|
|
Problems identified by Coverity in the reftable code have been
corrected.
* hn/reftable-coverity-fixes:
reftable: add print functions to the record types
reftable: make reftable_record a tagged union
reftable: remove outdated file reftable.c
reftable: implement record equality generically
reftable: make reftable-record.h function signatures const correct
reftable: handle null refnames in reftable_ref_record_equal
reftable: drop stray printf in readwrite_test
reftable: order unittests by complexity
reftable: all xxx_free() functions accept NULL arguments
reftable: fix resource warning
reftable: ignore remove() return value in stack_test.c
reftable: check reftable_stack_auto_compact() return value
reftable: fix resource leak blocksource.c
reftable: fix resource leak in block.c error path
reftable: fix OOB stack write in print functions
|
|
We have a copy of uncompress2() implementation in compat/ so that we
can build with an older version of zlib that lack the function, and
the build procedure selects if it is used via the NO_UNCOMPRESS2
$(MAKE) variable. This is yet another "annoying" knob the porters
need to tweak on platforms that are not common enough to have the
default set in the config.mak.uname file.
Attempt to instead ask the system header <zlib.h> to decide if we
need the compatibility implementation. This is a deviation from the
way we have been handling the "compatiblity" features so far, and if
it can be done cleanly enough, it could work as a model for features
that need compatibility definition we discover in the future. With
that goal in mind, avoid expedient but ugly hacks, like shoving the
code that is conditionally compiled into an unrelated .c file, which
may not work in future cases---instead, take an approach that uses a
file that is independently compiled and stands on its own.
Compile and link compat/zlib-uncompress2.c file unconditionally, but
conditionally hide the implementation behind #if/#endif when zlib
version is 1.2.9 or newer, and unconditionally archive the resulting
object file in the libgit.a to be picked up by the linker.
There are a few things to note in the shape of the code base after
this change:
- We no longer use NO_UNCOMPRESS2 knob; if the system header
<zlib.h> claims a version that is more cent than the library
actually is, this would break, but it is easy to add it back when
we find such a system.
- The object file compat/zlib-uncompress2.o is always compiled and
archived in libgit.a, just like a few other compat/ object files
already are.
- The inclusion of <zlib.h> is done in <git-compat-util.h>; we used
to do so from <cache.h> which includes <git-compat-util.h> as the
first thing it does, so from the *.c codes, there is no practical
change.
- Until objects in libgit.a that is already used gains a reference
to the function, the reftable code will be the only one that
wants it, so libgit.a on the linker command line needs to appear
once more at the end to satisify the mutual dependency.
- Beat found a trick used by OpenSSL to avoid making the
conditionally-compiled object truly empty (apparently because
they had to deal with compilers that do not want to see an
effectively empty input file). Our compat/zlib-uncompress2.c
file borrows the same trick for portabilty.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This isn't used per se, but it is useful for debugging, especially
Windows CI failures.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This reduces the amount of glue code, because we don't need a void
pointer or vtable within the structure.
The only snag is that reftable_index_record contain a strbuf, so it
cannot be zero-initialized. To address this, use reftable_new_record()
to return fresh instance, given a record type. Since
reftable_new_record() doesn't cause heap allocation anymore, it should
be balanced with reftable_record_release() rather than
reftable_record_destroy().
Thanks to Peff for the suggestion.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This was renamed to generic.c, but the origin was never removed
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This simplifies unittests a little, and provides further coverage for
reftable_record_copy().
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Spotted by Coverity.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This fixes NULL derefs in error paths. Spotted by Coverity.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This would trigger in the unlikely event that we are compacting, and
the next available file handle is 0.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
If the cleanup fails, there is nothing we can do.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Fixes a problem detected by Coverity.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This would be triggered in the unlikely event of fstat() failing on an
opened file.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Add test coverage for corrupt zlib data. Fix memory leaks demonstrated by
unittest.
This problem was discovered by a Coverity scan.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Change code added in 1ae2b8cda84 (reftable: add merged table view,
2021-10-07) to consistently use the "uint64_t" type. These "min" and
"max" variables get passed in the body of this function to a function
whose prototype is:
[...] reftable_writer_set_limits([...], uint64_t min, uint64_t max
This avoids the following warning on SunCC 12.5 on
gcc211.fsffrance.org:
"reftable/merged_test.c", line 27: warning: initializer does not fit or is out of range: 0xffffffff
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Apparently, the IBM xlc compiler doesn't like this.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Create files with mode 0666, so umask works as intended. Provides an override,
which is useful to support shared repos (test t1301-shared-repo.sh).
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
reflog entries have unbounded size. In theory, each log ('g') block in reftable
can have an arbitrary size, so the format allows for arbitrarily sized reflog
messages. However, in the implementation, we are not scaling the log blocks up
with the message, and writing a large message fails.
This triggers a failure for reftable in t7006-pager.sh.
Until this is fixed more structurally, report an error from within the reftable
library for easier debugging.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
provide a command-line utility for inspecting individual tables, and
inspecting a complete ref database
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The packed/loose format has restrictions on refnames: a and a/b cannot
coexist. This limitation does not apply to reftable per se, but must be
maintained for interoperability. This code adds validation routines to
abort transactions that are trying to add invalid names.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This adds an abstract, read-only interface to the ref database.
This primitive is used to construct the read view of the ref database
(the read view is constructed by merging several *.ref files). It also
provides the mechanism to provide a unified view of the refs in the main
repository and the per-worktree refs.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This is needed to create a merged view multiple reftables
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
With support for reading and writing files in place, we can construct files (in
memory) and attempt to read them back.
Because some sections of the format are optional (eg. indices, log entries), we
have to exercise this code using multiple sizes of input data
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This supports reading a single reftable file.
The commit introduces an abstract iterator type, which captures the usecases
both of reading individual refs, and iterating over a segment of the ref
namespace.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable format includes support for an (OID => ref) map. This map can speed
up visibility and reachability checks. In particular, various operations along
the fetch/push path within Gerrit have ben sped up by using this structure.
The map is constructed with help of a binary tree. Object IDs are hashes, so
they are uniformly distributed. Hence, the tree does not attempt forced
rebalancing.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable format is structured as a sequence of block. Within a block,
records are prefix compressed, with an index of offsets for fully expand keys to
enable binary search within blocks.
This commit provides the logic to read and write these blocks.
Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable format is structured as a sequence of blocks, and each block
contains a sequence of prefix-compressed key-value records. There are 4 types of
records, and they have similarities in how they must be handled. This is
achieved by introducing a polymorphic 'record' type that encapsulates ref, log,
index and object records.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable format is usually used with files for storage. However, we abstract
away this using the blocksource data structure. This has two advantages:
* log blocks are zlib compressed, and handling them is simplified if we can
discard byte segments from within the block layer.
* for unittests, it is useful to read and write in-memory. The blocksource
allows us to abstract the data away from on-disk files.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
This commit provides basic utility classes for the reftable library.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The reftable/ directory is structured as a library, so it cannot
crash on misuse. Instead, it returns an error code.
In addition to signaling errors, the error code can be used to signal
conditions from lower levels of the library to be handled by higher
levels of the library. For example, in a transaction we might
legitimately write an empty reftable file, but in that case, we want to
shortcut the transaction.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The objective of this code is to be usable as a C library, so it can be reused
in libgit2.
This is currently using a BSD license as it is the liberal license I could find,
but this could be changed to whatever fits the stated goal above.
This code is currently imported from github.com/hanwen/reftable. Once this code
lands in git.git, the C code will be removed from github.com/hanwen/reftable,
and the git.git code will be the source of truth.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|