1
0
mirror of https://github.com/git/git synced 2024-07-02 15:48:44 +00:00
git/symlinks.c

150 lines
3.9 KiB
C
Raw Normal View History

Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
#include "cache.h"
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
static struct cache_def {
Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
char path[PATH_MAX];
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
int len;
int flags;
} cache;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
/*
* Returns the length (on a path component basis) of the longest
* common prefix match of 'name' and the cached path string.
*/
static inline int longest_match_lstat_cache(int len, const char *name)
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
{
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
int max_len, match_len = 0, i = 0;
max_len = len < cache.len ? len : cache.len;
while (i < max_len && name[i] == cache.path[i]) {
if (name[i] == '/')
match_len = i;
i++;
}
/* Is the cached path string a substring of 'name'? */
if (i == cache.len && cache.len < len && name[cache.len] == '/')
match_len = cache.len;
/* Is 'name' a substring of the cached path string? */
else if ((i == len && len < cache.len && cache.path[len] == '/') ||
(i == len && len == cache.len))
match_len = len;
return match_len;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
}
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
static inline void reset_lstat_cache(void)
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
{
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
cache.path[0] = '\0';
cache.len = 0;
cache.flags = 0;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
}
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
#define FL_DIR (1 << 0)
#define FL_SYMLINK (1 << 1)
#define FL_LSTATERR (1 << 2)
#define FL_ERR (1 << 3)
/*
* Check if name 'name' of length 'len' has a symlink leading
* component, or if the directory exists and is real.
*
* To speed up the check, some information is allowed to be cached.
* This can be indicated by the 'track_flags' argument.
*/
static int lstat_cache(int len, const char *name,
int track_flags)
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
{
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
int match_len, last_slash, last_slash_dir;
int match_flags, ret_flags, save_flags, max_len;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
struct stat st;
Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
/*
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
* Check to see if we have a match from the cache for the
* symlink path type.
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
*/
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
match_len = last_slash = longest_match_lstat_cache(len, name);
match_flags = cache.flags & track_flags & FL_SYMLINK;
if (match_flags && match_len == cache.len)
return match_flags;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
/*
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
* If we now have match_len > 0, we would know that the
* matched part will always be a directory.
*
* Also, if we are tracking directories and 'name' is a
* substring of the cache on a path component basis, we can
* return immediately.
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
*/
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
match_flags = track_flags & FL_DIR;
if (match_flags && len == match_len)
return match_flags;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
/*
* Okay, no match from the cache so far, so now we have to
* check the rest of the path components.
*/
ret_flags = FL_DIR;
last_slash_dir = last_slash;
max_len = len < PATH_MAX ? len : PATH_MAX;
while (match_len < max_len) {
do {
cache.path[match_len] = name[match_len];
match_len++;
} while (match_len < max_len && name[match_len] != '/');
if (match_len >= max_len)
break;
last_slash = match_len;
cache.path[last_slash] = '\0';
Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
if (lstat(cache.path, &st)) {
ret_flags = FL_LSTATERR;
} else if (S_ISDIR(st.st_mode)) {
last_slash_dir = last_slash;
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
continue;
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
} else if (S_ISLNK(st.st_mode)) {
ret_flags = FL_SYMLINK;
} else {
ret_flags = FL_ERR;
Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
}
Optimize symlink/directory detection This is the base for making symlink detection in the middle fo a pathname saner and (much) more efficient. Under various loads, we want to verify that the full path leading up to a filename is a real directory tree, and that when we successfully do an 'lstat()' on a filename, we don't get a false positive due to a symlink in the middle of the path that git should have seen as a symlink, not as a normal path component. The 'has_symlink_leading_path()' function already did this, and cached a single level of symlink information, but didn't cache the _lack_ of a symlink, so the normal behaviour was actually the wrong way around, and we ended up doing an 'lstat()' on each path component to check that it was a real directory. This caches the last detected full directory and symlink entries, and speeds up especially deep directory structures a lot by avoiding to lstat() all the directories leading up to each entry in the index. [ This can - and should - probably be extended upon so that we eventually never do a bare 'lstat()' on any path entries at *all* when checking the index, but always check the full path carefully. Right now we do not generally check the whole path for all our normal quick index revalidation. We should also make sure that we're careful about all the invalidation, ie when we remove a link and replace it by a directory we should invalidate the symlink cache if it matches (and vice versa for the directory cache). But regardless, the basic function needs to be sane to do that. The old 'has_symlink_leading_path()' was not capable enough - or indeed the code readable enough - to really do that sanely. So I'm pushing this as not just an optimization, but as a base for further work. ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-09 16:21:07 +00:00
break;
Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
}
lstat_cache(): more cache effective symlink/directory detection Make the cache functionality more effective. Previously when A/B/C/D was in the cache and A/B/C/E/file.c was called for, there was no match at all from the cache. Now we use the fact that the paths "A", "A/B" and "A/B/C" are already tested, and we only need to do an lstat() call on "A/B/C/E". We only cache/store the last path regardless of its type. Since the cache functionality is always used with alphabetically sorted names (at least it seems so for me), there is no need to store both the last symlink-leading path and the last real-directory path. Note that if the cache is not called with (mostly) alphabetically sorted names, neither the old, nor this new one, would be very effective. Previously, when symlink A/B/C/S was cached/stored in the symlink- leading path, and A/B/C/file.c was called for, it was not easy to use the fact that we already knew that the paths "A", "A/B" and "A/B/C" are real directories. Avoid copying the first path components of the name 2 zillion times when we test new path components. Since we always cache/store the last path, we can copy each component as we test those directly into the cache. Previously we ended up doing a memcpy() for the full path/name right before each lstat() call, and when updating the cache for each time we have tested a new path component. We also use less memory, that is, PATH_MAX bytes less memory on the stack and PATH_MAX bytes less memory on the heap. Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable comments to this patch! Signed-off-by: Kjetil Barvik <barvik@broadpark.no> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-18 15:14:50 +00:00
/*
* At the end update the cache. Note that max 2 different
* path types, FL_SYMLINK and FL_DIR, can be cached for the
* moment!
*/
save_flags = ret_flags & track_flags & FL_SYMLINK;
if (save_flags && last_slash > 0 && last_slash < PATH_MAX) {
cache.path[last_slash] = '\0';
cache.len = last_slash;
cache.flags = save_flags;
} else if (track_flags & FL_DIR &&
last_slash_dir > 0 && last_slash_dir < PATH_MAX) {
/*
* We have a separate test for the directory case,
* since it could be that we have found a symlink and
* the track_flags says that we cannot cache this
* fact, so the cache would then have been left empty
* in this case.
*
* But if we are allowed to track real directories, we
* can still cache the path components before the last
* one (the found symlink component).
*/
cache.path[last_slash_dir] = '\0';
cache.len = last_slash_dir;
cache.flags = FL_DIR;
} else {
reset_lstat_cache();
}
return ret_flags;
}
/*
* Return non-zero if path 'name' has a leading symlink component
*/
int has_symlink_leading_path(int len, const char *name)
{
return lstat_cache(len, name,
FL_SYMLINK|FL_DIR) &
FL_SYMLINK;
Add has_symlink_leading_path() function. When we are applying a patch that creates a blob at a path, or when we are switching from a branch that does not have a blob at the path to another branch that has one, we need to make sure that there is nothing at the path in the working tree, as such a file is a local modification made by the user that would be lost by the operation. Normally, lstat() on the path and making sure ENOENT is returned is good enough for that purpose. However there is a twist. We may be creating a regular file arch/x86_64/boot/Makefile, while removing an existing symbolic link at arch/x86_64/boot that points at existing ../i386/boot directory that has Makefile in it. We always first check without touching filesystem and then perform the actual operation, so when we verify the new file, arch/x86_64/boot/Makefile, does not exist, we haven't removed the symbolic link arc/x86_64/boot symbolic link yet. lstat() on the file sees through the symbolic link and reports the file is there, which is not what we want. The function has_symlink_leading_path() function takes a path, and sees if any of the leading directory component is a symbolic link. When files in a new directory are created, we tend to process them together because both index and tree are sorted. The function takes advantage of this and allows the caller to cache and reuse which symbolic link on the filesystem caused the function to return true. The calling sequence would be: char last_symlink[PATH_MAX]; *last_symlink = '\0'; for each index entry { if (!lose) continue; if (lstat(it)) if (errno == ENOENT) ; /* happy */ else error; else if (has_symlink_leading_path(it, last_symlink)) ; /* happy */ else error; /* would lose local changes */ unlink_entry(it, last_symlink); } Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-12 05:11:07 +00:00
}