lib: replace xbps_file_hash_check_dictionary with transaction_file lookup#359
lib: replace xbps_file_hash_check_dictionary with transaction_file lookup#359Duncaen wants to merge 3 commits intovoid-linux:masterfrom
Conversation
…okup
xbps_file_hash_check_dictionary was called for every file that is
getting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.
Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:
time xbps-install -y texlive-fontsextra
6m34.61s real 6m27.71s user 0m03.95s system
And with this patch:
time xbps-install -y texlive-fontsextra
0m08.40s real 0m07.34s user 0m00.98s system
| xbps_dbg_printf(xhp, "%s: %s\n", entry_pname, file->sha256); | ||
| assert(file->sha256); |
There was a problem hiding this comment.
Might be nice to make it clear that a (null) there is an error? Not sure.
file->sha256 ? file->sha256 : "error..."
|
For the record, a good test case here seems to be updating papirus-icon-theme. It took 2m37s going from I will build this locally and see how better it is. EDIT: I was testing on a loaded machine, so it isn't a clean test, but this PR still took 52s. Without load, 31s. |
|
This should greatly reduce the time needed, not sure why this example with more files is faster, maybe its just a lot more symlinks or smaller files that would need to be checksumed. For the implementation, I think I would rename a few things, |
| TYPE_DIR, | ||
| TYPE_FILE, | ||
| TYPE_CONFFILE, | ||
| } transaction_file_type_t; |
There was a problem hiding this comment.
Technically speaking *_t is reserved for libc (https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html), but everyone uses it anyway.
| const char *pkgname; | ||
| const char *pkgver; | ||
| char *sha256; | ||
| const char *target; | ||
| uint64_t size; | ||
| transaction_file_type_t type; |
There was a problem hiding this comment.
Would these first 6 be the xbps_file struct? If so, I guess transaction_file_type would be switched over to file_type.
There was a problem hiding this comment.
Maybe we need two struct a xbps_transaction_file and a xbps_file which is part of the former.
The idea would be to start using the xbps_file struct in places where we currently use the xbps_dictionary references to files.plist so that we gradually switch from dynamic plist types to actual structs which should help to avoid some issues at compile time through the type system.
I think xbps_file would be a simple way to start, doing this for packages would be the end goal, there are so many places that use xbps_dictionarys for packages and there are a lot of places that can be improved on the way to switching to c structs.
| if (sha256) | ||
| item->new.sha256 = strdup(sha256); |
There was a problem hiding this comment.
Is item->new always guaranteed to be zero initialized? Might be reasonable to add an else item->new.sha256 = NULL;
There was a problem hiding this comment.
Generally everything is zero initialized, IMHO not doing that by default when initializing the memory should only be done if you know you overwrite it immediately.
| void xbps_transaction_files_free(void); | ||
| int HIDDEN xbps_transaction_fetch(struct xbps_handle *, | ||
| xbps_object_iterator_t); | ||
| int HIDDEN xbps_transaction_pkg_deps(struct xbps_handle *, xbps_array_t, xbps_dictionary_t); | ||
|
|
||
| struct transaction_file *xbps_transaction_file_new(struct xbps_handle *, const char *); | ||
|
|
There was a problem hiding this comment.
The naming is a bit confusing, in that xbps_transaction_file_new was added, but no xbps_transaction_file_free, and xbps_transaction_files_free was added, without a corresponding _new, and doesn't seem to be called at all.
There was a problem hiding this comment.
Yes the _new does not actually create anything, it gives you a reference to a file struct that already exists from collecting all files in the transaction, "_new" because it returns the new file as opposed to the old file.
Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.
There was a problem hiding this comment.
xbps_transaction_files_free deletes the whole hashtable/tree, this should be called from somewhere after the transaction is done, maybe from xbps_transaction_commit.
There was a problem hiding this comment.
Not sure how this should be done exactly, something like xbps_transaction_get_file with either _new or _old or a parameter to choose which one you want.
I think the parameter helps avoid duplicated logic, so I'm in favor of that.
xbps_file_hash_check_dictionarywas called for every file that isgetting unpacked, because the files list is an array it iterated over
the whole files array to find the matching file.
With a lot of files this is really slow and a lot of time was spend
in locking the proplib array and iterating over it.
Time from my Ryzen 3700X system with nvme disk updating
texlive-fontsextra with files 86375:
And with this patch: