git/userdiff.h

53 lines
1.4 KiB
C
Raw Normal View History

#ifndef USERDIFF_H
#define USERDIFF_H
#include "notes-cache.h"
struct index_state;
struct repository;
struct userdiff_funcname {
const char *pattern;
int cflags;
};
struct userdiff_driver {
const char *name;
const char *external;
const char *algorithm;
diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-05 21:43:36 +00:00
int binary;
struct userdiff_funcname funcname;
const char *word_regex;
userdiff: support regexec(3) with multi-byte support Since 1819ad327b (grep: fix multibyte regex handling under macOS, 2022-08-26) we use the system library for all regular expression matching on macOS, not just for git grep. It supports multi-byte strings and rejects invalid multi-byte characters. This broke all built-in userdiff word regexes in UTF-8 locales because they all include such invalid bytes in expressions that are intended to match multi-byte characters without explicit support for that from the regex engine. "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word regexes to match a single non-space or multi-byte character. The \xNN characters are invalid if interpreted as UTF-8 because they have their high bit set, which indicates they are part of a multi-byte character, but they are surrounded by single-byte characters. Replace that expression with "|[^[:space:]]" if the regex engine supports multi-byte matching, as there is no need to have an explicit range for multi-byte characters then. Check for that capability at runtime, because it depends on the locale and thus on environment variables. Construct the full replacement expression at build time and just switch it in if necessary to avoid string manipulation and allocations at runtime. Additionally the word regex for tex contains the expression "[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range. The best replacement with only valid characters that I can come up with is "([a-zA-Z0-9]|[^\x01-\x7f])+". Unlike the original it matches NUL characters, though. Assuming that tex files usually don't contain NUL this should be acceptable. Reported-by: D. Ben Knoble <ben.knoble@gmail.com> Reported-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06 20:19:11 +00:00
const char *word_regex_multi_byte;
const char *textconv;
struct notes_cache *textconv_cache;
int textconv_want_cache;
};
enum userdiff_driver_type {
USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
USERDIFF_DRIVER_TYPE_CUSTOM = 1<<1,
};
typedef int (*each_userdiff_driver_fn)(struct userdiff_driver *,
enum userdiff_driver_type, void *);
int userdiff_config(const char *k, const char *v);
struct userdiff_driver *userdiff_find_by_name(const char *name);
struct userdiff_driver *userdiff_find_by_path(struct index_state *istate,
const char *path);
/*
* Initialize any textconv-related fields in the driver and return it, or NULL
* if it does not have textconv enabled at all.
*/
struct userdiff_driver *userdiff_get_textconv(struct repository *r,
struct userdiff_driver *driver);
/*
* Iterate over all userdiff drivers. The userdiff_driver_type
* argument to each_userdiff_driver_fn indicates their type. Return
* non-zero to exit early from the loop.
*/
int for_each_userdiff_driver(each_userdiff_driver_fn, void *);
#endif /* USERDIFF */