Vendor import of diff from OpenBSD's Game of Trees

Repository:	ssh://anonymous@got.gameoftrees.org/diff.git
Commit hash:	b5a9c15f4d68c06ec3bf839529b3ed2def0a6af6
Commit date:	2023-09-15
This commit is contained in:
Dag-Erling Smørgrav 2024-03-07 12:32:03 +01:00
commit 9eb461aa4b
169 changed files with 80272 additions and 0 deletions

13
.gitignore vendored Normal file
View file

@ -0,0 +1,13 @@
.*.sw?
diff/diff
*.o
*.d
*.a
**/*.o
**/*.d
***/.a
tags
test/got*.diff
test/verify.*
test/arraylist_test/arraylist_test
test/results_test/results_test

13
LICENCE Normal file
View file

@ -0,0 +1,13 @@
Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
Permission to use, copy, modify, and distribute this software for any
purpose with or without fee is hereby granted, provided that the above
copyright notice and this permission notice appear in all copies.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

26
README Normal file
View file

@ -0,0 +1,26 @@
This is a collection of diff algorithms, to test various combinations.
The initial aim was to provide a faster diff implementation for got
(gameoftrees.org) with a BSD license, at the u2k20 OpenBSD hackathon.
A side effect could be improving OpenBSD's /usr/bin/diff utility.
At the time of writing, this is little more than a playground / benchmark basis
/ diff algorithm analysis platform. What could be done:
- add profiling and test series to rate diff algorithm combinations.
- interface with / merge into got.
The Myers and Patience Diff algorithm implementations found here are based on
the explanations found in these blog post series:
https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/ ff.
and
https://blog.jcoglan.com/2017/09/19/the-patience-diff-algorithm/ ff.
-- possibly the single most comprehensive explanations of these algorithms.
Many thanks for this valuable door opener!
The source code itself is not based on the code found in those blogs, but
written from scratch with the knowledge gained.
Compile:
make -C diff
Test:
make -C test/

View file

@ -0,0 +1,8 @@
#define _GNU_SOURCE
#include <errno.h>
const char *
getprogname(void)
{
return program_invocation_short_name;
}

20
compat/include/stdlib.h Normal file
View file

@ -0,0 +1,20 @@
/*
* stdlib.h compatibility shim
* Public domain
*/
#include_next <stdlib.h>
#ifndef DIFFCOMPAT_STDLIB_H
#define DIFFCOMPAT_STDLIB_H
#include <sys/types.h>
#include <stdint.h>
const char * getprogname(void);
void *reallocarray(void *, size_t, size_t);
void *recallocarray(void *, size_t, size_t, size_t);
int mergesort(void *, size_t, size_t, int (*cmp)(const void *, const void *));
#endif

16
compat/include/string.h Normal file
View file

@ -0,0 +1,16 @@
/*
* string.h compatibility shim
* Public domain
*/
#include_next <string.h>
#ifndef DIFFCOMPAT_STRING_H
#define DIFFCOMPAT_STRING_H
#include <sys/types.h>
size_t strlcpy(char *dst, const char *src, size_t dstsize);
size_t strlcat(char *dst, const char *src, size_t dstsize);
#endif

View file

@ -0,0 +1,15 @@
/*
* Public domain
* sys/types.h compatibility shim
*/
#include_next <sys/types.h>
#ifndef DIFFCOMPAT_SYS_TYPES_H
#define DIFFCOMPAT_SYS_TYPES_H
#if !defined(__dead)
#define __dead __attribute__((__noreturn__))
#endif
#endif

338
compat/merge.c Normal file
View file

@ -0,0 +1,338 @@
/* $OpenBSD: merge.c,v 1.10 2015/06/21 03:20:56 millert Exp $ */
/*-
* Copyright (c) 1992, 1993
* The Regents of the University of California. All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Peter McIlroy.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/*
* Hybrid exponential search/linear search merge sort with hybrid
* natural/pairwise first pass. Requires about .3% more comparisons
* for random data than LSMS with pairwise first pass alone.
* It works for objects as small as two bytes.
*/
#define NATURAL
#define THRESHOLD 16 /* Best choice for natural merge cut-off. */
/* #define NATURAL to get hybrid natural merge.
* (The default is pairwise merging.)
*/
#include <sys/types.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
static void setup(u_char *list1, u_char *list2, size_t n, size_t size,
int (*cmp)(const void *, const void *));
static void insertionsort(u_char *a, size_t n, size_t size,
int (*cmp)(const void *, const void *));
#define ISIZE sizeof(int)
#define PSIZE sizeof(u_char *)
#define ICOPY_LIST(src, dst, last) \
do \
*(int*)dst = *(int*)src, src += ISIZE, dst += ISIZE; \
while(src < last)
#define ICOPY_ELT(src, dst, i) \
do \
*(int*) dst = *(int*) src, src += ISIZE, dst += ISIZE; \
while (i -= ISIZE)
#define CCOPY_LIST(src, dst, last) \
do \
*dst++ = *src++; \
while (src < last)
#define CCOPY_ELT(src, dst, i) \
do \
*dst++ = *src++; \
while (i -= 1)
/*
* Find the next possible pointer head. (Trickery for forcing an array
* to do double duty as a linked list when objects do not align with word
* boundaries.
*/
/* Assumption: PSIZE is a power of 2. */
#define EVAL(p) (u_char **) \
((u_char *)0 + \
(((u_char *)p + PSIZE - 1 - (u_char *) 0) & ~(PSIZE - 1)))
/*
* Arguments are as for qsort.
*/
int
mergesort(void *base, size_t nmemb, size_t size,
int (*cmp)(const void *, const void *))
{
int i, sense;
int big, iflag;
u_char *f1, *f2, *t, *b, *tp2, *q, *l1, *l2;
u_char *list2, *list1, *p2, *p, *last, **p1;
if (size < PSIZE / 2) { /* Pointers must fit into 2 * size. */
errno = EINVAL;
return (-1);
}
if (nmemb == 0)
return (0);
/*
* XXX
* Stupid subtraction for the Cray.
*/
iflag = 0;
if (!(size % ISIZE) && !(((char *)base - (char *)0) % ISIZE))
iflag = 1;
if ((list2 = malloc(nmemb * size + PSIZE)) == NULL)
return (-1);
list1 = base;
setup(list1, list2, nmemb, size, cmp);
last = list2 + nmemb * size;
i = big = 0;
while (*EVAL(list2) != last) {
l2 = list1;
p1 = EVAL(list1);
for (tp2 = p2 = list2; p2 != last; p1 = EVAL(l2)) {
p2 = *EVAL(p2);
f1 = l2;
f2 = l1 = list1 + (p2 - list2);
if (p2 != last)
p2 = *EVAL(p2);
l2 = list1 + (p2 - list2);
while (f1 < l1 && f2 < l2) {
if ((*cmp)(f1, f2) <= 0) {
q = f2;
b = f1, t = l1;
sense = -1;
} else {
q = f1;
b = f2, t = l2;
sense = 0;
}
if (!big) { /* here i = 0 */
while ((b += size) < t && cmp(q, b) >sense)
if (++i == 6) {
big = 1;
goto EXPONENTIAL;
}
} else {
EXPONENTIAL: for (i = size; ; i <<= 1)
if ((p = (b + i)) >= t) {
if ((p = t - size) > b &&
(*cmp)(q, p) <= sense)
t = p;
else
b = p;
break;
} else if ((*cmp)(q, p) <= sense) {
t = p;
if (i == size)
big = 0;
goto FASTCASE;
} else
b = p;
while (t > b+size) {
i = (((t - b) / size) >> 1) * size;
if ((*cmp)(q, p = b + i) <= sense)
t = p;
else
b = p;
}
goto COPY;
FASTCASE: while (i > size)
if ((*cmp)(q,
p = b + (i >>= 1)) <= sense)
t = p;
else
b = p;
COPY: b = t;
}
i = size;
if (q == f1) {
if (iflag) {
ICOPY_LIST(f2, tp2, b);
ICOPY_ELT(f1, tp2, i);
} else {
CCOPY_LIST(f2, tp2, b);
CCOPY_ELT(f1, tp2, i);
}
} else {
if (iflag) {
ICOPY_LIST(f1, tp2, b);
ICOPY_ELT(f2, tp2, i);
} else {
CCOPY_LIST(f1, tp2, b);
CCOPY_ELT(f2, tp2, i);
}
}
}
if (f2 < l2) {
if (iflag)
ICOPY_LIST(f2, tp2, l2);
else
CCOPY_LIST(f2, tp2, l2);
} else if (f1 < l1) {
if (iflag)
ICOPY_LIST(f1, tp2, l1);
else
CCOPY_LIST(f1, tp2, l1);
}
*p1 = l2;
}
tp2 = list1; /* swap list1, list2 */
list1 = list2;
list2 = tp2;
last = list2 + nmemb*size;
}
if (base == list2) {
memmove(list2, list1, nmemb*size);
list2 = list1;
}
free(list2);
return (0);
}
#define swap(a, b) { \
s = b; \
i = size; \
do { \
tmp = *a; *a++ = *s; *s++ = tmp; \
} while (--i); \
a -= size; \
}
#define reverse(bot, top) { \
s = top; \
do { \
i = size; \
do { \
tmp = *bot; *bot++ = *s; *s++ = tmp; \
} while (--i); \
s -= size2; \
} while(bot < s); \
}
/*
* Optional hybrid natural/pairwise first pass. Eats up list1 in runs of
* increasing order, list2 in a corresponding linked list. Checks for runs
* when THRESHOLD/2 pairs compare with same sense. (Only used when NATURAL
* is defined. Otherwise simple pairwise merging is used.)
*/
void
setup(u_char *list1, u_char *list2, size_t n, size_t size,
int (*cmp)(const void *, const void *))
{
int i, length, size2, sense;
u_char tmp, *f1, *f2, *s, *l2, *last, *p2;
size2 = size*2;
if (n <= 5) {
insertionsort(list1, n, size, cmp);
*EVAL(list2) = (u_char*) list2 + n*size;
return;
}
/*
* Avoid running pointers out of bounds; limit n to evens
* for simplicity.
*/
i = 4 + (n & 1);
insertionsort(list1 + (n - i) * size, i, size, cmp);
last = list1 + size * (n - i);
*EVAL(list2 + (last - list1)) = list2 + n * size;
#ifdef NATURAL
p2 = list2;
f1 = list1;
sense = (cmp(f1, f1 + size) > 0);
for (; f1 < last; sense = !sense) {
length = 2;
/* Find pairs with same sense. */
for (f2 = f1 + size2; f2 < last; f2 += size2) {
if ((cmp(f2, f2+ size) > 0) != sense)
break;
length += 2;
}
if (length < THRESHOLD) { /* Pairwise merge */
do {
p2 = *EVAL(p2) = f1 + size2 - list1 + list2;
if (sense > 0)
swap (f1, f1 + size);
} while ((f1 += size2) < f2);
} else { /* Natural merge */
l2 = f2;
for (f2 = f1 + size2; f2 < l2; f2 += size2) {
if ((cmp(f2-size, f2) > 0) != sense) {
p2 = *EVAL(p2) = f2 - list1 + list2;
if (sense > 0)
reverse(f1, f2-size);
f1 = f2;
}
}
if (sense > 0)
reverse (f1, f2-size);
f1 = f2;
if (f2 < last || cmp(f2 - size, f2) > 0)
p2 = *EVAL(p2) = f2 - list1 + list2;
else
p2 = *EVAL(p2) = list2 + n*size;
}
}
#else /* pairwise merge only. */
for (f1 = list1, p2 = list2; f1 < last; f1 += size2) {
p2 = *EVAL(p2) = p2 + size2;
if (cmp (f1, f1 + size) > 0)
swap(f1, f1 + size);
}
#endif /* NATURAL */
}
/*
* This is to avoid out-of-bounds addresses in sorting the
* last 4 elements.
*/
static void
insertionsort(u_char *a, size_t n, size_t size,
int (*cmp)(const void *, const void *))
{
u_char *ai, *s, *t, *u, tmp;
int i;
for (ai = a+size; --n >= 1; ai += size)
for (t = ai; t > a; t -= size) {
u = t - size;
if (cmp(u, t) <= 0)
break;
swap(u, t);
}
}

38
compat/reallocarray.c Normal file
View file

@ -0,0 +1,38 @@
/* $OpenBSD: reallocarray.c,v 1.3 2015/09/13 08:31:47 guenther Exp $ */
/*
* Copyright (c) 2008 Otto Moerbeek <otto@drijf.net>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <sys/types.h>
#include <errno.h>
#include <stdint.h>
#include <stdlib.h>
/*
* This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
* if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
*/
#define MUL_NO_OVERFLOW ((size_t)1 << (sizeof(size_t) * 4))
void *
reallocarray(void *optr, size_t nmemb, size_t size)
{
if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
nmemb > 0 && SIZE_MAX / nmemb < size) {
errno = ENOMEM;
return NULL;
}
return realloc(optr, size * nmemb);
}

80
compat/recallocarray.c Normal file
View file

@ -0,0 +1,80 @@
/* $OpenBSD: recallocarray.c,v 1.1 2017/03/06 18:44:21 otto Exp $ */
/*
* Copyright (c) 2008, 2017 Otto Moerbeek <otto@drijf.net>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <errno.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
/*
* This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
* if both s1 < MUL_NO_OVERFLOW and s2 < MUL_NO_OVERFLOW
*/
#define MUL_NO_OVERFLOW ((size_t)1 << (sizeof(size_t) * 4))
void *
recallocarray(void *ptr, size_t oldnmemb, size_t newnmemb, size_t size)
{
size_t oldsize, newsize;
void *newptr;
if (ptr == NULL)
return calloc(newnmemb, size);
if ((newnmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
newnmemb > 0 && SIZE_MAX / newnmemb < size) {
errno = ENOMEM;
return NULL;
}
newsize = newnmemb * size;
if ((oldnmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
oldnmemb > 0 && SIZE_MAX / oldnmemb < size) {
errno = EINVAL;
return NULL;
}
oldsize = oldnmemb * size;
/*
* Don't bother too much if we're shrinking just a bit,
* we do not shrink for series of small steps, oh well.
*/
if (newsize <= oldsize) {
size_t d = oldsize - newsize;
if (d < oldsize / 2 && d < getpagesize()) {
memset((char *)ptr + newsize, 0, d);
return ptr;
}
}
newptr = malloc(newsize);
if (newptr == NULL)
return NULL;
if (newsize > oldsize) {
memcpy(newptr, ptr, oldsize);
memset((char *)newptr + oldsize, 0, newsize - oldsize);
} else
memcpy(newptr, ptr, newsize);
explicit_bzero(ptr, oldsize);
free(ptr);
return newptr;
}

55
compat/strlcat.c Normal file
View file

@ -0,0 +1,55 @@
/* $OpenBSD: strlcat.c,v 1.19 2019/01/25 00:19:25 millert Exp $ */
/*
* Copyright (c) 1998, 2015 Todd C. Miller <millert@openbsd.org>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <sys/types.h>
#include <string.h>
/*
* Appends src to string dst of size dsize (unlike strncat, dsize is the
* full size of dst, not space left). At most dsize-1 characters
* will be copied. Always NUL terminates (unless dsize <= strlen(dst)).
* Returns strlen(src) + MIN(dsize, strlen(initial dst)).
* If retval >= dsize, truncation occurred.
*/
size_t
strlcat(char *dst, const char *src, size_t dsize)
{
const char *odst = dst;
const char *osrc = src;
size_t n = dsize;
size_t dlen;
/* Find the end of dst and adjust bytes left but don't go past end. */
while (n-- != 0 && *dst != '\0')
dst++;
dlen = dst - odst;
n = dsize - dlen;
if (n-- == 0)
return(dlen + strlen(src));
while (*src != '\0') {
if (n != 0) {
*dst++ = *src;
n--;
}
src++;
}
*dst = '\0';
return(dlen + (src - osrc)); /* count does not include NUL */
}

50
compat/strlcpy.c Normal file
View file

@ -0,0 +1,50 @@
/* $OpenBSD: strlcpy.c,v 1.16 2019/01/25 00:19:25 millert Exp $ */
/*
* Copyright (c) 1998, 2015 Todd C. Miller <millert@openbsd.org>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <sys/types.h>
#include <string.h>
/*
* Copy string src to buffer dst of size dsize. At most dsize-1
* chars will be copied. Always NUL terminates (unless dsize == 0).
* Returns strlen(src); if retval >= dsize, truncation occurred.
*/
size_t
strlcpy(char *dst, const char *src, size_t dsize)
{
const char *osrc = src;
size_t nleft = dsize;
/* Copy as many bytes as will fit. */
if (nleft != 0) {
while (--nleft != 0) {
if ((*dst++ = *src++) == '\0')
break;
}
}
/* Not enough room in dst, add NUL and traverse rest of src. */
if (nleft == 0) {
if (dsize != 0)
*dst = '\0'; /* NUL-terminate dst */
while (*src++)
;
}
return(src - osrc - 1); /* count does not include NUL */
}

8
diff-version.mk Normal file
View file

@ -0,0 +1,8 @@
DIFF_RELEASE=No
DIFF_VERSION_NUMBER=0.1
.if ${DIFF_RELEASE} == Yes
DIFF_VERSION=${DIFF_VERSION_NUMBER}
.else
DIFF_VERSION=${DIFF_VERSION_NUMBER}-current
.endif

19
diff/GNUmakefile Normal file
View file

@ -0,0 +1,19 @@
CFLAGS = -fsanitize=address -fsanitize=undefined -g -O3
CFLAGS += -Wstrict-prototypes -Wunused-variable -Wuninitialized
SRCS= diff.c
LIB= ../lib/libdiff.a
# Compat sources
CFLAGS+= -I$(CURDIR)/../compat/include
diff: $(SRCS) $(LIB)
gcc $(CFLAGS) -I../include -o $@ $^
../lib/libdiff.a: ../lib/*.[hc] ../include/*.h
$(MAKE) -C ../lib
.PHONY: clean
clean:
rm diff
$(MAKE) -C ../lib clean

41
diff/Makefile Normal file
View file

@ -0,0 +1,41 @@
.PATH:${.CURDIR}/../lib
.include "../diff-version.mk"
PROG= diff
SRCS= \
diff.c \
diff_atomize_text.c \
diff_main.c \
diff_myers.c \
diff_patience.c \
diff_output.c \
diff_output_plain.c \
diff_output_unidiff.c \
diff_output_edscript.c \
${END}
MAN = ${PROG}.1
CPPFLAGS = -I${.CURDIR}/../include -I${.CURDIR}/../lib
#CPPFLAGS += -DDIFF_NO_MMAP
.if defined(PROFILE)
CFLAGS = -O0 -pg -g
LDFLAGS = -pg -lc_p -lutil_p -lz_p -static
.else
LDFLAGS = -lutil -lz
.endif
.if ${DIFF_RELEASE} != "Yes"
NOMAN = Yes
.endif
realinstall:
${INSTALL} ${INSTALL_COPY} -o ${BINOWN} -g ${BINGRP} \
-m ${BINMODE} ${PROG} ${BINDIR}/${PROG}
dist:
mkdir ../diff-${DIFF_VERSION}/diff
cp ${SRCS} ${MAN} ../diff-${DIFF_VERSION}/diff
.include <bsd.prog.mk>

280
diff/diff.c Normal file
View file

@ -0,0 +1,280 @@
/* Commandline diff utility to test diff implementations. */
/*
* Copyright (c) 2018 Martin Pieuchot
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <err.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <arraylist.h>
#include <diff_main.h>
#include <diff_output.h>
enum diffreg_algo {
DIFFREG_ALGO_MYERS_THEN_MYERS_DIVIDE = 0,
DIFFREG_ALGO_MYERS_THEN_PATIENCE = 1,
DIFFREG_ALGO_PATIENCE = 2,
DIFFREG_ALGO_NONE = 3,
};
__dead void usage(void);
int diffreg(char *, char *, enum diffreg_algo, bool, bool, bool,
int, bool);
FILE * openfile(const char *, char **, struct stat *);
__dead void
usage(void)
{
fprintf(stderr,
"usage: %s [-apPQTwe] [-U n] file1 file2\n"
"\n"
" -a Treat input as ASCII even if binary data is detected\n"
" -p Show function prototypes in hunk headers\n"
" -P Use Patience Diff (slower but often nicer)\n"
" -Q Use forward-Myers for small files, otherwise Patience\n"
" -T Trivial algo: detect similar start and end only\n"
" -w Ignore Whitespace\n"
" -U n Number of Context Lines\n"
" -e Produce ed script output\n"
, getprogname());
exit(1);
}
int
main(int argc, char *argv[])
{
int ch, rc;
bool force_text = false;
bool ignore_whitespace = false;
bool show_function_prototypes = false;
bool edscript = false;
int context_lines = 3;
enum diffreg_algo algo = DIFFREG_ALGO_MYERS_THEN_MYERS_DIVIDE;
while ((ch = getopt(argc, argv, "apPQTwU:e")) != -1) {
switch (ch) {
case 'a':
force_text = true;
break;
case 'p':
show_function_prototypes = true;
break;
case 'P':
algo = DIFFREG_ALGO_PATIENCE;
break;
case 'Q':
algo = DIFFREG_ALGO_MYERS_THEN_PATIENCE;
break;
case 'T':
algo = DIFFREG_ALGO_NONE;
break;
case 'w':
ignore_whitespace = true;
break;
case 'U':
context_lines = atoi(optarg);
break;
case 'e':
edscript = true;
break;
default:
usage();
}
}
argc -= optind;
argv += optind;
if (argc != 2)
usage();
rc = diffreg(argv[0], argv[1], algo, force_text, ignore_whitespace,
show_function_prototypes, context_lines, edscript);
if (rc != DIFF_RC_OK) {
fprintf(stderr, "diff: %s\n", strerror(rc));
return 1;
}
return 0;
}
const struct diff_algo_config myers_then_patience;
const struct diff_algo_config myers_then_myers_divide;
const struct diff_algo_config patience;
const struct diff_algo_config myers_divide;
const struct diff_algo_config myers_then_patience = (struct diff_algo_config){
.impl = diff_algo_myers,
.permitted_state_size = 1024 * 1024 * sizeof(int),
.fallback_algo = &patience,
};
const struct diff_algo_config myers_then_myers_divide =
(struct diff_algo_config){
.impl = diff_algo_myers,
.permitted_state_size = 1024 * 1024 * sizeof(int),
.fallback_algo = &myers_divide,
};
const struct diff_algo_config patience = (struct diff_algo_config){
.impl = diff_algo_patience,
/* After subdivision, do Patience again: */
.inner_algo = &patience,
/* If subdivision failed, do Myers Divide et Impera: */
.fallback_algo = &myers_then_myers_divide,
};
const struct diff_algo_config myers_divide = (struct diff_algo_config){
.impl = diff_algo_myers_divide,
/* When division succeeded, start from the top: */
.inner_algo = &myers_then_myers_divide,
/* (fallback_algo = NULL implies diff_algo_none). */
};
const struct diff_algo_config no_algo = (struct diff_algo_config){
.impl = diff_algo_none,
};
/* If the state for a forward-Myers is small enough, use Myers, otherwise first
* do a Myers-divide. */
const struct diff_config diff_config_myers_then_myers_divide = {
.atomize_func = diff_atomize_text_by_line,
.algo = &myers_then_myers_divide,
};
/* If the state for a forward-Myers is small enough, use Myers, otherwise first
* do a Patience. */
const struct diff_config diff_config_myers_then_patience = {
.atomize_func = diff_atomize_text_by_line,
.algo = &myers_then_patience,
};
/* Directly force Patience as a first divider of the source file. */
const struct diff_config diff_config_patience = {
.atomize_func = diff_atomize_text_by_line,
.algo = &patience,
};
/* Directly force Patience as a first divider of the source file. */
const struct diff_config diff_config_no_algo = {
.atomize_func = diff_atomize_text_by_line,
};
int
diffreg(char *file1, char *file2, enum diffreg_algo algo, bool force_text,
bool ignore_whitespace, bool show_function_prototypes, int context_lines,
bool edscript)
{
char *str1, *str2;
FILE *f1, *f2;
struct stat st1, st2;
struct diff_input_info info = {
.left_path = file1,
.right_path = file2,
};
struct diff_data left = {}, right = {};
struct diff_result *result = NULL;
int rc;
const struct diff_config *cfg;
int diff_flags = 0;
switch (algo) {
default:
case DIFFREG_ALGO_MYERS_THEN_MYERS_DIVIDE:
cfg = &diff_config_myers_then_myers_divide;
break;
case DIFFREG_ALGO_MYERS_THEN_PATIENCE:
cfg = &diff_config_myers_then_patience;
break;
case DIFFREG_ALGO_PATIENCE:
cfg = &diff_config_patience;
break;
case DIFFREG_ALGO_NONE:
cfg = &diff_config_no_algo;
break;
}
f1 = openfile(file1, &str1, &st1);
f2 = openfile(file2, &str2, &st2);
if (force_text)
diff_flags |= DIFF_FLAG_FORCE_TEXT_DATA;
if (ignore_whitespace)
diff_flags |= DIFF_FLAG_IGNORE_WHITESPACE;
if (show_function_prototypes)
diff_flags |= DIFF_FLAG_SHOW_PROTOTYPES;
rc = diff_atomize_file(&left, cfg, f1, str1, st1.st_size, diff_flags);
if (rc)
goto done;
rc = diff_atomize_file(&right, cfg, f2, str2, st2.st_size, diff_flags);
if (rc)
goto done;
result = diff_main(cfg, &left, &right);
#if 0
rc = diff_output_plain(stdout, &info, result);
#else
if (edscript)
rc = diff_output_edscript(NULL, stdout, &info, result);
else {
rc = diff_output_unidiff(NULL, stdout, &info, result,
context_lines);
}
#endif
done:
diff_result_free(result);
diff_data_free(&left);
diff_data_free(&right);
if (str1)
munmap(str1, st1.st_size);
if (str2)
munmap(str2, st2.st_size);
fclose(f1);
fclose(f2);
return rc;
}
FILE *
openfile(const char *path, char **p, struct stat *st)
{
FILE *f = NULL;
f = fopen(path, "r");
if (f == NULL)
err(2, "%s", path);
if (fstat(fileno(f), st) == -1)
err(2, "%s", path);
#ifndef DIFF_NO_MMAP
*p = mmap(NULL, st->st_size, PROT_READ, MAP_PRIVATE, fileno(f), 0);
if (*p == MAP_FAILED)
#endif
*p = NULL; /* fall back on file I/O */
return f;
}

121
include/arraylist.h Normal file
View file

@ -0,0 +1,121 @@
/* Auto-reallocating array for arbitrary member types. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
/* Usage:
*
* ARRAYLIST(any_type_t) list;
* // OR
* typedef ARRAYLIST(any_type_t) any_type_list_t;
* any_type_list_t list;
*
* // pass the number of (at first unused) members to add on each realloc:
* ARRAYLIST_INIT(list, 128);
* any_type_t *x;
* while (bar) {
* // This enlarges the allocated array as needed;
* // list.head may change due to realloc:
* ARRAYLIST_ADD(x, list);
* if (!x)
* return ENOMEM;
* *x = random_foo_value;
* }
* for (i = 0; i < list.len; i++)
* printf("%s", foo_to_str(list.head[i]));
* ARRAYLIST_FREE(list);
*/
#define ARRAYLIST(MEMBER_TYPE) \
struct { \
MEMBER_TYPE *head; \
MEMBER_TYPE *p; \
unsigned int len; \
unsigned int allocated; \
unsigned int alloc_blocksize; \
}
#define ARRAYLIST_INIT(ARRAY_LIST, ALLOC_BLOCKSIZE) do { \
(ARRAY_LIST).head = NULL; \
(ARRAY_LIST).len = 0; \
(ARRAY_LIST).allocated = 0; \
(ARRAY_LIST).alloc_blocksize = ALLOC_BLOCKSIZE; \
} while(0)
#define ARRAYLIST_ADD(NEW_ITEM_P, ARRAY_LIST) do { \
if ((ARRAY_LIST).len && !(ARRAY_LIST).allocated) { \
NEW_ITEM_P = NULL; \
break; \
} \
if ((ARRAY_LIST).head == NULL \
|| (ARRAY_LIST).allocated < (ARRAY_LIST).len + 1) { \
(ARRAY_LIST).p = recallocarray((ARRAY_LIST).head, \
(ARRAY_LIST).len, \
(ARRAY_LIST).allocated + \
((ARRAY_LIST).allocated ? \
(ARRAY_LIST).allocated / 2 : \
(ARRAY_LIST).alloc_blocksize ? \
(ARRAY_LIST).alloc_blocksize : 8), \
sizeof(*(ARRAY_LIST).head)); \
if ((ARRAY_LIST).p == NULL) { \
NEW_ITEM_P = NULL; \
break; \
} \
(ARRAY_LIST).allocated += \
(ARRAY_LIST).allocated ? \
(ARRAY_LIST).allocated / 2 : \
(ARRAY_LIST).alloc_blocksize ? \
(ARRAY_LIST).alloc_blocksize : 8, \
(ARRAY_LIST).head = (ARRAY_LIST).p; \
(ARRAY_LIST).p = NULL; \
}; \
if ((ARRAY_LIST).head == NULL \
|| (ARRAY_LIST).allocated < (ARRAY_LIST).len + 1) { \
NEW_ITEM_P = NULL; \
break; \
} \
(NEW_ITEM_P) = &(ARRAY_LIST).head[(ARRAY_LIST).len]; \
(ARRAY_LIST).len++; \
} while (0)
#define ARRAYLIST_INSERT(NEW_ITEM_P, ARRAY_LIST, AT_IDX) do { \
int _at_idx = (AT_IDX); \
ARRAYLIST_ADD(NEW_ITEM_P, ARRAY_LIST); \
if ((NEW_ITEM_P) \
&& _at_idx >= 0 \
&& _at_idx < (ARRAY_LIST).len) { \
memmove(&(ARRAY_LIST).head[_at_idx + 1], \
&(ARRAY_LIST).head[_at_idx], \
((ARRAY_LIST).len - 1 - _at_idx) \
* sizeof(*(ARRAY_LIST).head)); \
(NEW_ITEM_P) = &(ARRAY_LIST).head[_at_idx]; \
}; \
} while (0)
#define ARRAYLIST_CLEAR(ARRAY_LIST) \
(ARRAY_LIST).len = 0
#define ARRAYLIST_FREE(ARRAY_LIST) \
do { \
if ((ARRAY_LIST).head && (ARRAY_LIST).allocated) \
free((ARRAY_LIST).head); \
ARRAYLIST_INIT(ARRAY_LIST, (ARRAY_LIST).alloc_blocksize); \
} while(0)
#define ARRAYLIST_FOREACH(ITEM_P, ARRAY_LIST) \
for ((ITEM_P) = (ARRAY_LIST).head; \
(ITEM_P) - (ARRAY_LIST).head < (ARRAY_LIST).len; \
(ITEM_P)++)
#define ARRAYLIST_IDX(ITEM_P, ARRAY_LIST) ((ITEM_P) - (ARRAY_LIST).head)

264
include/diff_main.h Normal file
View file

@ -0,0 +1,264 @@
/* Generic infrastructure to implement various diff algorithms. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
struct diff_range {
int start;
int end;
};
/* List of all possible return codes of a diff invocation. */
#define DIFF_RC_USE_DIFF_ALGO_FALLBACK -1
#define DIFF_RC_OK 0
/* Any positive return values are errno values from sys/errno.h */
struct diff_atom {
struct diff_data *root; /* back pointer to root diff data */
off_t pos; /* set whether memory-mapped or not */
const uint8_t *at; /* only set if memory-mapped */
off_t len;
/* This hash is just a very cheap speed up for finding *mismatching*
* atoms. When hashes match, we still need to compare entire atoms to
* find out whether they are indeed identical or not.
* Calculated over all atom bytes with diff_atom_hash_update(). */
unsigned int hash;
};
/* Mix another atom_byte into the provided hash value and return the result.
* The hash value passed in for the first byte of the atom must be zero. */
unsigned int
diff_atom_hash_update(unsigned int hash, unsigned char atom_byte);
/* Compare two atoms for equality. Return 0 on success, or errno on failure.
* Set cmp to -1, 0, or 1, just like strcmp(). */
int
diff_atom_cmp(int *cmp,
const struct diff_atom *left,
const struct diff_atom *right);
/* The atom's index in the entire file. For atoms divided by lines of text, this
* yields the line number (starting with 0). Also works for diff_data that
* reference only a subsection of a file, always reflecting the global position
* in the file (and not the relative position within the subsection). */
#define diff_atom_root_idx(DIFF_DATA, ATOM) \
((ATOM) && ((ATOM) >= (DIFF_DATA)->root->atoms.head) \
? (unsigned int)((ATOM) - ((DIFF_DATA)->root->atoms.head)) \
: (DIFF_DATA)->root->atoms.len)
/* The atom's index within DIFF_DATA. For atoms divided by lines of text, this
* yields the line number (starting with 0). */
#define diff_atom_idx(DIFF_DATA, ATOM) \
((ATOM) && ((ATOM) >= (DIFF_DATA)->atoms.head) \
? (unsigned int)((ATOM) - ((DIFF_DATA)->atoms.head)) \
: (DIFF_DATA)->atoms.len)
#define foreach_diff_atom(ATOM, FIRST_ATOM, COUNT) \
for ((ATOM) = (FIRST_ATOM); \
(ATOM) \
&& ((ATOM) >= (FIRST_ATOM)) \
&& ((ATOM) - (FIRST_ATOM) < (COUNT)); \
(ATOM)++)
#define diff_data_foreach_atom(ATOM, DIFF_DATA) \
foreach_diff_atom(ATOM, (DIFF_DATA)->atoms.head, (DIFF_DATA)->atoms.len)
#define diff_data_foreach_atom_from(FROM, ATOM, DIFF_DATA) \
for ((ATOM) = (FROM); \
(ATOM) \
&& ((ATOM) >= (DIFF_DATA)->atoms.head) \
&& ((ATOM) - (DIFF_DATA)->atoms.head < (DIFF_DATA)->atoms.len); \
(ATOM)++)
#define diff_data_foreach_atom_backwards_from(FROM, ATOM, DIFF_DATA) \
for ((ATOM) = (FROM); \
(ATOM) \
&& ((ATOM) >= (DIFF_DATA)->atoms.head) \
&& ((ATOM) - (DIFF_DATA)->atoms.head >= 0); \
(ATOM)--)
/* For each file, there is a "root" struct diff_data referencing the entire
* file, which the atoms are parsed from. In recursion of diff algorithm, there
* may be "child" struct diff_data only referencing a subsection of the file,
* re-using the atoms parsing. For "root" structs, atoms_allocated will be
* nonzero, indicating that the array of atoms is owned by that struct. For
* "child" structs, atoms_allocated == 0, to indicate that the struct is
* referencing a subset of atoms. */
struct diff_data {
FILE *f; /* if root diff_data and not memory-mapped */
off_t pos; /* if not memory-mapped */
const uint8_t *data; /* if memory-mapped */
off_t len;
int atomizer_flags;
ARRAYLIST(struct diff_atom) atoms;
struct diff_data *root;
struct diff_data *current;
void *algo_data;
int diff_flags;
int err;
};
/* Flags set by file atomizer. */
#define DIFF_ATOMIZER_FOUND_BINARY_DATA 0x00000001
/* Flags set by caller of diff_main(). */
#define DIFF_FLAG_IGNORE_WHITESPACE 0x00000001
#define DIFF_FLAG_SHOW_PROTOTYPES 0x00000002
#define DIFF_FLAG_FORCE_TEXT_DATA 0x00000004
void diff_data_free(struct diff_data *diff_data);
struct diff_chunk;
typedef ARRAYLIST(struct diff_chunk) diff_chunk_arraylist_t;
struct diff_result {
int rc;
/*
* Pointers to diff data passed in via diff_main.
* Do not free these diff_data before freeing the diff_result struct.
*/
struct diff_data *left;
struct diff_data *right;
diff_chunk_arraylist_t chunks;
};
enum diff_chunk_type {
CHUNK_EMPTY,
CHUNK_PLUS,
CHUNK_MINUS,
CHUNK_SAME,
CHUNK_ERROR,
};
enum diff_chunk_type diff_chunk_type(const struct diff_chunk *c);
struct diff_state;
/* Signature of a utility function to divide a file into diff atoms.
* An example is diff_atomize_text_by_line() in diff_atomize_text.c.
*
* func_data: context pointer (free to be used by implementation).
* d: struct diff_data with d->data and d->len already set up, and
* d->atoms to be created and d->atomizer_flags to be set up.
*/
typedef int (*diff_atomize_func_t)(void *func_data, struct diff_data *d);
extern int diff_atomize_text_by_line(void *func_data, struct diff_data *d);
struct diff_algo_config;
typedef int (*diff_algo_impl_t)(
const struct diff_algo_config *algo_config, struct diff_state *state);
/* Form a result with all left-side removed and all right-side added, i.e. no
* actual diff algorithm involved. */
int diff_algo_none(const struct diff_algo_config *algo_config,
struct diff_state *state);
/* Myers Diff tracing from the start all the way through to the end, requiring
* quadratic amounts of memory. This can fail if the required space surpasses
* algo_config->permitted_state_size. */
extern int diff_algo_myers(const struct diff_algo_config *algo_config,
struct diff_state *state);
/* Myers "Divide et Impera": tracing forwards from the start and backwards from
* the end to find a midpoint that divides the problem into smaller chunks.
* Requires only linear amounts of memory. */
extern int diff_algo_myers_divide(
const struct diff_algo_config *algo_config, struct diff_state *state);
/* Patience Diff algorithm, which divides a larger diff into smaller chunks. For
* very specific scenarios, it may lead to a complete diff result by itself, but
* needs a fallback algo to solve chunks that don't have common-unique atoms. */
extern int diff_algo_patience(
const struct diff_algo_config *algo_config, struct diff_state *state);
/* Diff algorithms to use, possibly nested. For example:
*
* struct diff_algo_config myers, patience, myers_divide;
*
* myers = (struct diff_algo_config){
* .impl = diff_algo_myers,
* .permitted_state_size = 32 * 1024 * 1024,
* // When too large, do diff_algo_patience:
* .fallback_algo = &patience,
* };
*
* const struct diff_algo_config patience = (struct diff_algo_config){
* .impl = diff_algo_patience,
* // After subdivision, do Patience again:
* .inner_algo = &patience,
* // If subdivision failed, do Myers Divide et Impera:
* .fallback_algo = &myers_then_myers_divide,
* };
*
* const struct diff_algo_config myers_divide = (struct diff_algo_config){
* .impl = diff_algo_myers_divide,
* // When division succeeded, start from the top:
* .inner_algo = &myers_then_myers_divide,
* // (fallback_algo = NULL implies diff_algo_none).
* };
* struct diff_config config = {
* .algo = &myers,
* ...
* };
* diff_main(&config, ...);
*/
struct diff_algo_config {
diff_algo_impl_t impl;
/* Fail this algo if it would use more than this amount of memory, and
* instead use fallback_algo (diff_algo_myers). permitted_state_size ==
* 0 means no limitation. */
size_t permitted_state_size;
/* For algorithms that divide into smaller chunks, use this algorithm to
* solve the divided chunks. */
const struct diff_algo_config *inner_algo;
/* If the algorithm fails (e.g. diff_algo_myers_if_small needs too large
* state, or diff_algo_patience can't find any common-unique atoms),
* then use this algorithm instead. */
const struct diff_algo_config *fallback_algo;
};
struct diff_config {
diff_atomize_func_t atomize_func;
void *atomize_func_data;
const struct diff_algo_config *algo;
/* How deep to step into subdivisions of a source file, a paranoia /
* safety measure to guard against infinite loops through diff
* algorithms. When the maximum recursion is reached, employ
* diff_algo_none (i.e. remove all left atoms and add all right atoms).
*/
unsigned int max_recursion_depth;
};
int diff_atomize_file(struct diff_data *d, const struct diff_config *config,
FILE *f, const uint8_t *data, off_t len, int diff_flags);
struct diff_result *diff_main(const struct diff_config *config,
struct diff_data *left,
struct diff_data *right);
void diff_result_free(struct diff_result *result);
int diff_result_contains_printable_chunks(struct diff_result *result);

112
include/diff_output.h Normal file
View file

@ -0,0 +1,112 @@
/* Diff output generators and invocation shims. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
struct diff_input_info {
const char *left_path;
const char *right_path;
/* Set by caller of diff_output_* functions. */
int flags;
#define DIFF_INPUT_LEFT_NONEXISTENT 0x00000001
#define DIFF_INPUT_RIGHT_NONEXISTENT 0x00000002
};
struct diff_output_info {
/*
* Byte offset to each line in the generated output file.
* The total number of lines in the file is line_offsets.len - 1.
* The last offset in this array corresponds to end-of-file.
*/
ARRAYLIST(off_t) line_offsets;
/*
* Type (i.e., context, minus, plus) of each line generated by the diff.
* nb. 0x00 to 0x3b reserved for client-defined line types.
*/
ARRAYLIST(uint8_t) line_types;
#define DIFF_LINE_HUNK 0x3c
#define DIFF_LINE_MINUS 0x3d
#define DIFF_LINE_PLUS 0x3e
#define DIFF_LINE_CONTEXT 0x3f
#define DIFF_LINE_NONE 0x40 /* binary or no EOF newline msg, etc. */
};
void diff_output_info_free(struct diff_output_info *output_info);
struct diff_chunk_context {
struct diff_range chunk;
struct diff_range left, right;
};
int diff_output_plain(struct diff_output_info **output_info, FILE *dest,
const struct diff_input_info *info,
const struct diff_result *result,
int hunk_headers_only);
int diff_output_unidiff(struct diff_output_info **output_info,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result,
unsigned int context_lines);
int diff_output_edscript(struct diff_output_info **output_info,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result);
int diff_chunk_get_left_start(const struct diff_chunk *c,
const struct diff_result *r,
int context_lines);
int diff_chunk_get_left_end(const struct diff_chunk *c,
const struct diff_result *r,
int context_lines);
int diff_chunk_get_right_start(const struct diff_chunk *c,
const struct diff_result *r,
int context_lines);
int diff_chunk_get_right_end(const struct diff_chunk *c,
const struct diff_result *r,
int context_lines);
off_t diff_chunk_get_left_start_pos(const struct diff_chunk *c);
off_t diff_chunk_get_right_start_pos(const struct diff_chunk *c);
struct diff_chunk *diff_chunk_get(const struct diff_result *r, int chunk_idx);
int diff_chunk_get_left_count(struct diff_chunk *c);
int diff_chunk_get_right_count(struct diff_chunk *c);
void diff_chunk_context_get(struct diff_chunk_context *cc,
const struct diff_result *r,
int chunk_idx, int context_lines);
void diff_chunk_context_load_change(struct diff_chunk_context *cc,
int *nchunks_used,
struct diff_result *result,
int start_chunk_idx,
int context_lines);
struct diff_output_unidiff_state;
struct diff_output_unidiff_state *diff_output_unidiff_state_alloc(void);
void diff_output_unidiff_state_reset(struct diff_output_unidiff_state *state);
void diff_output_unidiff_state_free(struct diff_output_unidiff_state *state);
int diff_output_unidiff_chunk(struct diff_output_info **output_info, FILE *dest,
struct diff_output_unidiff_state *state,
const struct diff_input_info *info,
const struct diff_result *result,
const struct diff_chunk_context *cc);
int diff_output_chunk_left_version(struct diff_output_info **output_info,
FILE *dest,
const struct diff_input_info *info,
const struct diff_result *result,
const struct diff_chunk_context *cc);
int diff_output_chunk_right_version(struct diff_output_info **output_info,
FILE *dest,
const struct diff_input_info *info,
const struct diff_result *result,
const struct diff_chunk_context *cc);
const char *diff_output_get_label_left(const struct diff_input_info *info);
const char *diff_output_get_label_right(const struct diff_input_info *info);

32
lib/GNUmakefile Normal file
View file

@ -0,0 +1,32 @@
SRCS = \
diff_atomize_text.c \
diff_main.c \
diff_myers.c \
diff_patience.c \
diff_output.c \
diff_output_plain.c \
diff_output_unidiff.c \
diff_output_edscript.c \
$(END)
# Compat sources
VPATH= $(CURDIR)/../compat
SRCS+= getprogname_linux.c reallocarray.c recallocarray.c merge.c \
strlcat.c
CFLAGS+= -I$(CURDIR)/../compat/include
OBJS = $(SRCS:.c=.o)
libdiff.a: $(OBJS)
ar rcs $@ $^
CFLAGS += -fsanitize=address -fsanitize=undefined -g -O3
CFLAGS += -Wstrict-prototypes -Wunused-variable -Wuninitialized
%.o: %.c ./*.h ../include/*.h
gcc $(CFLAGS) -I../include -o $@ -c $<
.PHONY: clean
clean:
-rm $(OBJS)
-rm libdiff.a

197
lib/diff_atomize_text.c Normal file
View file

@ -0,0 +1,197 @@
/* Split source by line breaks, and calculate a simplistic checksum. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <errno.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <ctype.h>
#include <arraylist.h>
#include <diff_main.h>
#include "diff_internal.h"
#include "diff_debug.h"
unsigned int
diff_atom_hash_update(unsigned int hash, unsigned char atom_byte)
{
return hash * 23 + atom_byte;
}
static int
diff_data_atomize_text_lines_fd(struct diff_data *d)
{
off_t pos = 0;
const off_t end = pos + d->len;
unsigned int array_size_estimate = d->len / 50;
unsigned int pow2 = 1;
bool ignore_whitespace = (d->diff_flags & DIFF_FLAG_IGNORE_WHITESPACE);
bool embedded_nul = false;
while (array_size_estimate >>= 1)
pow2++;
ARRAYLIST_INIT(d->atoms, 1 << pow2);
if (fseek(d->root->f, 0L, SEEK_SET) == -1)
return errno;
while (pos < end) {
off_t line_end = pos;
unsigned int hash = 0;
unsigned char buf[512];
size_t r, i;
struct diff_atom *atom;
int eol = 0;
while (eol == 0 && line_end < end) {
r = fread(buf, sizeof(char), sizeof(buf), d->root->f);
if (r == 0 && ferror(d->root->f))
return EIO;
i = 0;
while (eol == 0 && i < r) {
if (buf[i] != '\r' && buf[i] != '\n') {
if (!ignore_whitespace
|| !isspace((unsigned char)buf[i]))
hash = diff_atom_hash_update(
hash, buf[i]);
if (buf[i] == '\0')
embedded_nul = true;
line_end++;
} else
eol = buf[i];
i++;
}
}
/* When not at the end of data, the line ending char ('\r' or
* '\n') must follow */
if (line_end < end)
line_end++;
/* If that was an '\r', also pull in any following '\n' */
if (line_end < end && eol == '\r') {
if (fseeko(d->root->f, line_end, SEEK_SET) == -1)
return errno;
r = fread(buf, sizeof(char), sizeof(buf), d->root->f);
if (r == 0 && ferror(d->root->f))
return EIO;
if (r > 0 && buf[0] == '\n')
line_end++;
}
/* Record the found line as diff atom */
ARRAYLIST_ADD(atom, d->atoms);
if (!atom)
return ENOMEM;
*atom = (struct diff_atom){
.root = d,
.pos = pos,
.at = NULL, /* atom data is not memory-mapped */
.len = line_end - pos,
.hash = hash,
};
/* Starting point for next line: */
pos = line_end;
if (fseeko(d->root->f, pos, SEEK_SET) == -1)
return errno;
}
/* File are considered binary if they contain embedded '\0' bytes. */
if (embedded_nul)
d->atomizer_flags |= DIFF_ATOMIZER_FOUND_BINARY_DATA;
return DIFF_RC_OK;
}
static int
diff_data_atomize_text_lines_mmap(struct diff_data *d)
{
const uint8_t *pos = d->data;
const uint8_t *end = pos + d->len;
bool ignore_whitespace = (d->diff_flags & DIFF_FLAG_IGNORE_WHITESPACE);
bool embedded_nul = false;
unsigned int array_size_estimate = d->len / 50;
unsigned int pow2 = 1;
while (array_size_estimate >>= 1)
pow2++;
ARRAYLIST_INIT(d->atoms, 1 << pow2);
while (pos < end) {
const uint8_t *line_end = pos;
unsigned int hash = 0;
while (line_end < end && *line_end != '\r' && *line_end != '\n') {
if (!ignore_whitespace
|| !isspace((unsigned char)*line_end))
hash = diff_atom_hash_update(hash, *line_end);
if (*line_end == '\0')
embedded_nul = true;
line_end++;
}
/* When not at the end of data, the line ending char ('\r' or
* '\n') must follow */
if (line_end < end && *line_end == '\r')
line_end++;
if (line_end < end && *line_end == '\n')
line_end++;
/* Record the found line as diff atom */
struct diff_atom *atom;
ARRAYLIST_ADD(atom, d->atoms);
if (!atom)
return ENOMEM;
*atom = (struct diff_atom){
.root = d,
.pos = (off_t)(pos - d->data),
.at = pos,
.len = line_end - pos,
.hash = hash,
};
/* Starting point for next line: */
pos = line_end;
}
/* File are considered binary if they contain embedded '\0' bytes. */
if (embedded_nul)
d->atomizer_flags |= DIFF_ATOMIZER_FOUND_BINARY_DATA;
return DIFF_RC_OK;
}
static int
diff_data_atomize_text_lines(struct diff_data *d)
{
if (d->data == NULL)
return diff_data_atomize_text_lines_fd(d);
else
return diff_data_atomize_text_lines_mmap(d);
}
int
diff_atomize_text_by_line(void *func_data, struct diff_data *d)
{
return diff_data_atomize_text_lines(d);
}

226
lib/diff_debug.h Normal file
View file

@ -0,0 +1,226 @@
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#define DEBUG 0
#if DEBUG
#include <stdio.h>
#include <unistd.h>
#define print(args...) fprintf(stderr, ##args)
#define debug print
#define debug_dump dump
#define debug_dump_atom dump_atom
#define debug_dump_atoms dump_atoms
static inline void
print_atom_byte(unsigned char c) {
if (c == '\r')
print("\\r");
else if (c == '\n')
print("\\n");
else if ((c < 32 || c >= 127) && (c != '\t'))
print("\\x%02x", c);
else
print("%c", c);
}
static inline void
dump_atom(const struct diff_data *left, const struct diff_data *right,
const struct diff_atom *atom)
{
if (!atom) {
print("NULL atom\n");
return;
}
if (left)
print(" %3u '", diff_atom_root_idx(left, atom));
if (atom->at == NULL) {
off_t remain = atom->len;
if (fseek(atom->root->f, atom->pos, SEEK_SET) == -1)
abort(); /* cannot return error */
while (remain > 0) {
char buf[16];
size_t r;
int i;
r = fread(buf, 1, MIN(remain, sizeof(buf)),
atom->root->f);
if (r == 0)
break;
remain -= r;
for (i = 0; i < r; i++)
print_atom_byte(buf[i]);
}
} else {
const char *s;
for (s = atom->at; s < (const char*)(atom->at + atom->len); s++)
print_atom_byte(*s);
}
print("'\n");
}
static inline void
dump_atoms(const struct diff_data *d, struct diff_atom *atom,
unsigned int count)
{
if (count > 42) {
dump_atoms(d, atom, 20);
print("[%u lines skipped]\n", count - 20 - 20);
dump_atoms(d, atom + count - 20, 20);
return;
} else {
struct diff_atom *i;
foreach_diff_atom(i, atom, count) {
dump_atom(d, NULL, i);
}
}
}
static inline void
dump(struct diff_data *d)
{
dump_atoms(d, d->atoms.head, d->atoms.len);
}
/* kd is a quadratic space myers matrix from the original Myers algorithm.
* kd_forward and kd_backward are linear slices of a myers matrix from the Myers
* Divide algorithm.
*/
static inline void
dump_myers_graph(const struct diff_data *l, const struct diff_data *r,
int *kd, int *kd_forward, int kd_forward_d,
int *kd_backward, int kd_backward_d)
{
#define COLOR_YELLOW "\033[1;33m"
#define COLOR_GREEN "\033[1;32m"
#define COLOR_BLUE "\033[1;34m"
#define COLOR_RED "\033[1;31m"
#define COLOR_END "\033[0;m"
int x;
int y;
print(" ");
for (x = 0; x <= l->atoms.len; x++)
print("%2d", x % 100);
print("\n");
for (y = 0; y <= r->atoms.len; y++) {
print("%3d ", y);
for (x = 0; x <= l->atoms.len; x++) {
/* print d advancements from kd, if any. */
char label = 'o';
char *color = NULL;
if (kd) {
int max = l->atoms.len + r->atoms.len;
size_t kd_len = max + 1 + max;
int *kd_pos = kd;
int di;
#define xk_to_y(X, K) ((X) - (K))
for (di = 0; di < max; di++) {
int ki;
for (ki = di; ki >= -di; ki -= 2) {
if (x != kd_pos[ki]
|| y != xk_to_y(x, ki))
continue;
label = '0' + (di % 10);
color = COLOR_YELLOW;
break;
}
if (label != 'o')
break;
kd_pos += kd_len;
}
}
if (kd_forward && kd_forward_d >= 0) {
#define xc_to_y(X, C, DELTA) ((X) - (C) + (DELTA))
int ki;
for (ki = kd_forward_d;
ki >= -kd_forward_d;
ki -= 2) {
if (x != kd_forward[ki])
continue;
if (y != xk_to_y(x, ki))
continue;
label = 'F';
color = COLOR_GREEN;
break;
}
}
if (kd_backward && kd_backward_d >= 0) {
int delta = (int)r->atoms.len
- (int)l->atoms.len;
int ki;
for (ki = kd_backward_d;
ki >= -kd_backward_d;
ki -= 2) {
if (x != kd_backward[ki])
continue;
if (y != xc_to_y(x, ki, delta))
continue;
if (label == 'o') {
label = 'B';
color = COLOR_BLUE;
} else {
label = 'X';
color = COLOR_RED;
}
break;
}
}
if (color)
print("%s", color);
print("%c", label);
if (color)
print("%s", COLOR_END);
if (x < l->atoms.len)
print("-");
}
print("\n");
if (y == r->atoms.len)
break;
print(" ");
for (x = 0; x < l->atoms.len; x++) {
bool same;
diff_atom_same(&same, &l->atoms.head[x],
&r->atoms.head[y]);
if (same)
print("|\\");
else
print("| ");
}
print("|\n");
}
}
static inline void
debug_dump_myers_graph(const struct diff_data *l, const struct diff_data *r,
int *kd, int *kd_forward, int kd_forward_d,
int *kd_backward, int kd_backward_d)
{
if (l->atoms.len > 99 || r->atoms.len > 99)
return;
dump_myers_graph(l, r, kd, kd_forward, kd_forward_d,
kd_backward, kd_backward_d);
}
#else
#define debug(args...)
#define debug_dump(args...)
#define debug_dump_atom(args...)
#define debug_dump_atoms(args...)
#define debug_dump_myers_graph(args...)
#endif

157
lib/diff_internal.h Normal file
View file

@ -0,0 +1,157 @@
/* Generic infrastructure to implement various diff algorithms. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#ifndef MAX
#define MAX(A,B) ((A)>(B)?(A):(B))
#endif
#ifndef MIN
#define MIN(A,B) ((A)<(B)?(A):(B))
#endif
static inline bool
diff_range_empty(const struct diff_range *r)
{
return r->start == r->end;
}
static inline bool
diff_ranges_touch(const struct diff_range *a, const struct diff_range *b)
{
return (a->end >= b->start) && (a->start <= b->end);
}
static inline void
diff_ranges_merge(struct diff_range *a, const struct diff_range *b)
{
*a = (struct diff_range){
.start = MIN(a->start, b->start),
.end = MAX(a->end, b->end),
};
}
static inline int
diff_range_len(const struct diff_range *r)
{
if (!r)
return 0;
return r->end - r->start;
}
/* Indicate whether two given diff atoms match. */
int
diff_atom_same(bool *same,
const struct diff_atom *left,
const struct diff_atom *right);
/* A diff chunk represents a set of atoms on the left and/or a set of atoms on
* the right.
*
* If solved == false:
* The diff algorithm has divided the source file, and this is a chunk that the
* inner_algo should run on next.
* The lines on the left should be diffed against the lines on the right.
* (If there are no left lines or no right lines, it implies solved == true,
* because there is nothing to diff.)
*
* If solved == true:
* If there are only left atoms, it is a chunk removing atoms from the left ("a
* minus chunk").
* If there are only right atoms, it is a chunk adding atoms from the right ("a
* plus chunk").
* If there are both left and right lines, it is a chunk of equal content on
* both sides, and left_count == right_count:
*
* - foo }
* - bar }-- diff_chunk{ left_start = &left.atoms.head[0], left_count = 3,
* - baz } right_start = NULL, right_count = 0 }
* moo }
* goo }-- diff_chunk{ left_start = &left.atoms.head[3], left_count = 3,
* zoo } right_start = &right.atoms.head[0], right_count = 3 }
* +loo }
* +roo }-- diff_chunk{ left_start = NULL, left_count = 0,
* +too } right_start = &right.atoms.head[3], right_count = 3 }
*
*/
struct diff_chunk {
bool solved;
struct diff_atom *left_start;
unsigned int left_count;
struct diff_atom *right_start;
unsigned int right_count;
};
#define DIFF_RESULT_ALLOC_BLOCKSIZE 128
struct diff_chunk_context;
bool
diff_chunk_context_empty(const struct diff_chunk_context *cc);
bool
diff_chunk_contexts_touch(const struct diff_chunk_context *cc,
const struct diff_chunk_context *other);
void
diff_chunk_contexts_merge(struct diff_chunk_context *cc,
const struct diff_chunk_context *other);
struct diff_state {
/* The final result passed to the original diff caller. */
struct diff_result *result;
/* The root diff_data is in result->left,right, these are (possibly)
* subsections of the root data. */
struct diff_data left;
struct diff_data right;
unsigned int recursion_depth_left;
/* Remaining chunks from one diff algorithm pass, if any solved == false
* chunks came up. */
diff_chunk_arraylist_t temp_result;
/* State buffer used by Myers algorithm. */
int *kd_buf;
size_t kd_buf_size; /* in units of sizeof(int), not bytes */
};
struct diff_chunk *diff_state_add_chunk(struct diff_state *state, bool solved,
struct diff_atom *left_start,
unsigned int left_count,
struct diff_atom *right_start,
unsigned int right_count);
struct diff_output_info;
int diff_output_lines(struct diff_output_info *output_info, FILE *dest,
const char *prefix, struct diff_atom *start_atom,
unsigned int count);
int diff_output_trailing_newline_msg(struct diff_output_info *outinfo,
FILE *dest,
const struct diff_chunk *c);
#define DIFF_FUNCTION_CONTEXT_SIZE 55
int diff_output_match_function_prototype(char *prototype, size_t prototype_size,
int *last_prototype_idx,
const struct diff_result *result,
const struct diff_chunk_context *cc);
struct diff_output_info *diff_output_info_alloc(void);
void
diff_data_init_subsection(struct diff_data *d, struct diff_data *parent,
struct diff_atom *from_atom, unsigned int atoms_count);

663
lib/diff_main.c Normal file
View file

@ -0,0 +1,663 @@
/* Generic infrastructure to implement various diff algorithms (implementation). */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <sys/queue.h>
#include <ctype.h>
#include <errno.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
#include <unistd.h>
#include <assert.h>
#include <arraylist.h>
#include <diff_main.h>
#include "diff_internal.h"
#include "diff_debug.h"
inline enum diff_chunk_type
diff_chunk_type(const struct diff_chunk *chunk)
{
if (!chunk->left_count && !chunk->right_count)
return CHUNK_EMPTY;
if (!chunk->solved)
return CHUNK_ERROR;
if (!chunk->right_count)
return CHUNK_MINUS;
if (!chunk->left_count)
return CHUNK_PLUS;
if (chunk->left_count != chunk->right_count)
return CHUNK_ERROR;
return CHUNK_SAME;
}
static int
read_at(FILE *f, off_t at_pos, unsigned char *buf, size_t len)
{
int r;
if (fseeko(f, at_pos, SEEK_SET) == -1)
return errno;
r = fread(buf, sizeof(char), len, f);
if ((r == 0 || r < len) && ferror(f))
return EIO;
if (r != len)
return EIO;
return 0;
}
static int
buf_cmp(const unsigned char *left, size_t left_len,
const unsigned char *right, size_t right_len,
bool ignore_whitespace)
{
int cmp;
if (ignore_whitespace) {
int il = 0, ir = 0;
while (il < left_len && ir < right_len) {
unsigned char cl = left[il];
unsigned char cr = right[ir];
if (isspace((unsigned char)cl) && il < left_len) {
il++;
continue;
}
if (isspace((unsigned char)cr) && ir < right_len) {
ir++;
continue;
}
if (cl > cr)
return 1;
if (cr > cl)
return -1;
il++;
ir++;
}
while (il < left_len) {
unsigned char cl = left[il++];
if (!isspace((unsigned char)cl))
return 1;
}
while (ir < right_len) {
unsigned char cr = right[ir++];
if (!isspace((unsigned char)cr))
return -1;
}
return 0;
}
cmp = memcmp(left, right, MIN(left_len, right_len));
if (cmp)
return cmp;
if (left_len == right_len)
return 0;
return (left_len > right_len) ? 1 : -1;
}
int
diff_atom_cmp(int *cmp,
const struct diff_atom *left,
const struct diff_atom *right)
{
off_t remain_left, remain_right;
int flags = (left->root->diff_flags | right->root->diff_flags);
bool ignore_whitespace = (flags & DIFF_FLAG_IGNORE_WHITESPACE);
if (!left->len && !right->len) {
*cmp = 0;
return 0;
}
if (!ignore_whitespace) {
if (!right->len) {
*cmp = 1;
return 0;
}
if (!left->len) {
*cmp = -1;
return 0;
}
}
if (left->at != NULL && right->at != NULL) {
*cmp = buf_cmp(left->at, left->len, right->at, right->len,
ignore_whitespace);
return 0;
}
remain_left = left->len;
remain_right = right->len;
while (remain_left > 0 || remain_right > 0) {
const size_t chunksz = 8192;
unsigned char buf_left[chunksz], buf_right[chunksz];
const uint8_t *p_left, *p_right;
off_t n_left, n_right;
ssize_t r;
if (!remain_right) {
*cmp = 1;
return 0;
}
if (!remain_left) {
*cmp = -1;
return 0;
}
n_left = MIN(chunksz, remain_left);
n_right = MIN(chunksz, remain_right);
if (left->at == NULL) {
r = read_at(left->root->f,
left->pos + (left->len - remain_left),
buf_left, n_left);
if (r) {
*cmp = 0;
return r;
}
p_left = buf_left;
} else {
p_left = left->at + (left->len - remain_left);
}
if (right->at == NULL) {
r = read_at(right->root->f,
right->pos + (right->len - remain_right),
buf_right, n_right);
if (r) {
*cmp = 0;
return r;
}
p_right = buf_right;
} else {
p_right = right->at + (right->len - remain_right);
}
r = buf_cmp(p_left, n_left, p_right, n_right,
ignore_whitespace);
if (r) {
*cmp = r;
return 0;
}
remain_left -= n_left;
remain_right -= n_right;
}
*cmp = 0;
return 0;
}
int
diff_atom_same(bool *same,
const struct diff_atom *left,
const struct diff_atom *right)
{
int cmp;
int r;
if (left->hash != right->hash) {
*same = false;
return 0;
}
r = diff_atom_cmp(&cmp, left, right);
if (r) {
*same = true;
return r;
}
*same = (cmp == 0);
return 0;
}
static struct diff_chunk *
diff_state_add_solved_chunk(struct diff_state *state,
const struct diff_chunk *chunk)
{
diff_chunk_arraylist_t *result;
struct diff_chunk *new_chunk;
enum diff_chunk_type last_t;
enum diff_chunk_type new_t;
struct diff_chunk *last;
/* Append to solved chunks; make sure that adjacent chunks of same type are combined, and that a minus chunk
* never directly follows a plus chunk. */
result = &state->result->chunks;
last_t = result->len ? diff_chunk_type(&result->head[result->len - 1])
: CHUNK_EMPTY;
new_t = diff_chunk_type(chunk);
debug("ADD %s chunk #%u:\n", chunk->solved ? "solved" : "UNSOLVED",
result->len);
debug("L\n");
debug_dump_atoms(&state->left, chunk->left_start, chunk->left_count);
debug("R\n");
debug_dump_atoms(&state->right, chunk->right_start, chunk->right_count);
if (result->len) {
last = &result->head[result->len - 1];
assert(chunk->left_start
== last->left_start + last->left_count);
assert(chunk->right_start
== last->right_start + last->right_count);
}
if (new_t == last_t) {
new_chunk = &result->head[result->len - 1];
new_chunk->left_count += chunk->left_count;
new_chunk->right_count += chunk->right_count;
debug(" - added chunk touches previous one of same type, joined:\n");
debug("L\n");
debug_dump_atoms(&state->left, new_chunk->left_start, new_chunk->left_count);
debug("R\n");
debug_dump_atoms(&state->right, new_chunk->right_start, new_chunk->right_count);
} else if (last_t == CHUNK_PLUS && new_t == CHUNK_MINUS) {
enum diff_chunk_type prev_last_t =
result->len > 1 ?
diff_chunk_type(&result->head[result->len - 2])
: CHUNK_EMPTY;
/* If a minus-chunk follows a plus-chunk, place it above the plus-chunk->
* Is the one before that also a minus? combine. */
if (prev_last_t == CHUNK_MINUS) {
new_chunk = &result->head[result->len - 2];
new_chunk->left_count += chunk->left_count;
new_chunk->right_count += chunk->right_count;
debug(" - added minus-chunk follows plus-chunk,"
" put before that plus-chunk and joined"
" with preceding minus-chunk:\n");
debug("L\n");
debug_dump_atoms(&state->left, new_chunk->left_start, new_chunk->left_count);
debug("R\n");
debug_dump_atoms(&state->right, new_chunk->right_start, new_chunk->right_count);
} else {
ARRAYLIST_INSERT(new_chunk, *result, result->len - 1);
if (!new_chunk)
return NULL;
*new_chunk = *chunk;
/* The new minus chunk indicates to which position on
* the right it corresponds, even though it doesn't add
* any lines on the right. By moving above a plus chunk,
* that position on the right has shifted. */
last = &result->head[result->len - 1];
new_chunk->right_start = last->right_start;
debug(" - added minus-chunk follows plus-chunk,"
" put before that plus-chunk\n");
}
/* That last_t == CHUNK_PLUS indicates to which position on the
* left it corresponds, even though it doesn't add any lines on
* the left. By inserting/extending the prev_last_t ==
* CHUNK_MINUS, that position on the left has shifted. */
last = &result->head[result->len - 1];
last->left_start = new_chunk->left_start
+ new_chunk->left_count;
} else {
ARRAYLIST_ADD(new_chunk, *result);
if (!new_chunk)
return NULL;
*new_chunk = *chunk;
}
return new_chunk;
}
/* Even if a left or right side is empty, diff output may need to know the
* position in that file.
* So left_start or right_start must never be NULL -- pass left_count or
* right_count as zero to indicate staying at that position without consuming
* any lines. */
struct diff_chunk *
diff_state_add_chunk(struct diff_state *state, bool solved,
struct diff_atom *left_start, unsigned int left_count,
struct diff_atom *right_start, unsigned int right_count)
{
struct diff_chunk *new_chunk;
struct diff_chunk chunk = {
.solved = solved,
.left_start = left_start,
.left_count = left_count,
.right_start = right_start,
.right_count = right_count,
};
/* An unsolved chunk means store as intermediate result for later
* re-iteration.
* If there already are intermediate results, that means even a
* following solved chunk needs to go to intermediate results, so that
* it is later put in the final correct position in solved chunks.
*/
if (!solved || state->temp_result.len) {
/* Append to temp_result */
debug("ADD %s chunk to temp result:\n",
chunk.solved ? "solved" : "UNSOLVED");
debug("L\n");
debug_dump_atoms(&state->left, left_start, left_count);
debug("R\n");
debug_dump_atoms(&state->right, right_start, right_count);
ARRAYLIST_ADD(new_chunk, state->temp_result);
if (!new_chunk)
return NULL;
*new_chunk = chunk;
return new_chunk;
}
return diff_state_add_solved_chunk(state, &chunk);
}
static void
diff_data_init_root(struct diff_data *d, FILE *f, const uint8_t *data,
unsigned long long len, int diff_flags)
{
*d = (struct diff_data){
.f = f,
.pos = 0,
.data = data,
.len = len,
.root = d,
.diff_flags = diff_flags,
};
}
void
diff_data_init_subsection(struct diff_data *d, struct diff_data *parent,
struct diff_atom *from_atom, unsigned int atoms_count)
{
struct diff_atom *last_atom;
debug("diff_data %p parent %p from_atom %p atoms_count %u\n",
d, parent, from_atom, atoms_count);
debug(" from_atom ");
debug_dump_atom(parent, NULL, from_atom);
if (atoms_count == 0) {
*d = (struct diff_data){
.f = NULL,
.pos = 0,
.data = NULL,
.len = 0,
.root = parent->root,
.atoms.head = NULL,
.atoms.len = atoms_count,
};
return;
}
last_atom = from_atom + atoms_count - 1;
*d = (struct diff_data){
.f = NULL,
.pos = from_atom->pos,
.data = from_atom->at,
.len = (last_atom->pos + last_atom->len) - from_atom->pos,
.root = parent->root,
.atoms.head = from_atom,
.atoms.len = atoms_count,
};
debug("subsection:\n");
debug_dump(d);
}
void
diff_data_free(struct diff_data *diff_data)
{
if (!diff_data)
return;
if (diff_data->atoms.allocated)
ARRAYLIST_FREE(diff_data->atoms);
}
int
diff_algo_none(const struct diff_algo_config *algo_config,
struct diff_state *state)
{
debug("\n** %s\n", __func__);
debug("left:\n");
debug_dump(&state->left);
debug("right:\n");
debug_dump(&state->right);
debug_dump_myers_graph(&state->left, &state->right, NULL, NULL, 0, NULL,
0);
/* Add a chunk of equal lines, if any */
struct diff_atom *l = state->left.atoms.head;
unsigned int l_len = state->left.atoms.len;
struct diff_atom *r = state->right.atoms.head;
unsigned int r_len = state->right.atoms.len;
unsigned int equal_atoms_start = 0;
unsigned int equal_atoms_end = 0;
unsigned int l_idx = 0;
unsigned int r_idx = 0;
while (equal_atoms_start < l_len
&& equal_atoms_start < r_len) {
int err;
bool same;
err = diff_atom_same(&same, &l[equal_atoms_start],
&r[equal_atoms_start]);
if (err)
return err;
if (!same)
break;
equal_atoms_start++;
}
while (equal_atoms_end < (l_len - equal_atoms_start)
&& equal_atoms_end < (r_len - equal_atoms_start)) {
int err;
bool same;
err = diff_atom_same(&same, &l[l_len - 1 - equal_atoms_end],
&r[r_len - 1 - equal_atoms_end]);
if (err)
return err;
if (!same)
break;
equal_atoms_end++;
}
/* Add a chunk of equal lines at the start */
if (equal_atoms_start) {
if (!diff_state_add_chunk(state, true,
l, equal_atoms_start,
r, equal_atoms_start))
return ENOMEM;
l_idx += equal_atoms_start;
r_idx += equal_atoms_start;
}
/* Add a "minus" chunk with all lines from the left. */
if (equal_atoms_start + equal_atoms_end < l_len) {
unsigned int add_len = l_len - equal_atoms_start - equal_atoms_end;
if (!diff_state_add_chunk(state, true,
&l[l_idx], add_len,
&r[r_idx], 0))
return ENOMEM;
l_idx += add_len;
}
/* Add a "plus" chunk with all lines from the right. */
if (equal_atoms_start + equal_atoms_end < r_len) {
unsigned int add_len = r_len - equal_atoms_start - equal_atoms_end;
if (!diff_state_add_chunk(state, true,
&l[l_idx], 0,
&r[r_idx], add_len))
return ENOMEM;
r_idx += add_len;
}
/* Add a chunk of equal lines at the end */
if (equal_atoms_end) {
if (!diff_state_add_chunk(state, true,
&l[l_idx], equal_atoms_end,
&r[r_idx], equal_atoms_end))
return ENOMEM;
}
return DIFF_RC_OK;
}
static int
diff_run_algo(const struct diff_algo_config *algo_config,
struct diff_state *state)
{
int rc;
if (!algo_config || !algo_config->impl
|| !state->recursion_depth_left
|| !state->left.atoms.len || !state->right.atoms.len) {
debug("Fall back to diff_algo_none():%s%s%s\n",
(!algo_config || !algo_config->impl) ? " no-cfg" : "",
(!state->recursion_depth_left) ? " max-depth" : "",
(!state->left.atoms.len || !state->right.atoms.len)?
" trivial" : "");
return diff_algo_none(algo_config, state);
}
ARRAYLIST_FREE(state->temp_result);
ARRAYLIST_INIT(state->temp_result, DIFF_RESULT_ALLOC_BLOCKSIZE);
rc = algo_config->impl(algo_config, state);
switch (rc) {
case DIFF_RC_USE_DIFF_ALGO_FALLBACK:
debug("Got DIFF_RC_USE_DIFF_ALGO_FALLBACK (%p)\n",
algo_config->fallback_algo);
rc = diff_run_algo(algo_config->fallback_algo, state);
goto return_rc;
case DIFF_RC_OK:
/* continue below */
break;
default:
/* some error happened */
goto return_rc;
}
/* Pick up any diff chunks that are still unsolved and feed to
* inner_algo. inner_algo will solve unsolved chunks and append to
* result, and subsequent solved chunks on this level are then appended
* to result afterwards. */
int i;
for (i = 0; i < state->temp_result.len; i++) {
struct diff_chunk *c = &state->temp_result.head[i];
if (c->solved) {
diff_state_add_solved_chunk(state, c);
continue;
}
/* c is an unsolved chunk, feed to inner_algo */
struct diff_state inner_state = {
.result = state->result,
.recursion_depth_left = state->recursion_depth_left - 1,
.kd_buf = state->kd_buf,
.kd_buf_size = state->kd_buf_size,
};
diff_data_init_subsection(&inner_state.left, &state->left,
c->left_start, c->left_count);
diff_data_init_subsection(&inner_state.right, &state->right,
c->right_start, c->right_count);
rc = diff_run_algo(algo_config->inner_algo, &inner_state);
state->kd_buf = inner_state.kd_buf;
state->kd_buf_size = inner_state.kd_buf_size;
if (rc != DIFF_RC_OK)
goto return_rc;
}
rc = DIFF_RC_OK;
return_rc:
ARRAYLIST_FREE(state->temp_result);
return rc;
}
int
diff_atomize_file(struct diff_data *d,
const struct diff_config *config,
FILE *f, const uint8_t *data, off_t len, int diff_flags)
{
if (!config->atomize_func)
return EINVAL;
diff_data_init_root(d, f, data, len, diff_flags);
return config->atomize_func(config->atomize_func_data, d);
}
struct diff_result *
diff_main(const struct diff_config *config, struct diff_data *left,
struct diff_data *right)
{
struct diff_result *result = malloc(sizeof(struct diff_result));
if (!result)
return NULL;
*result = (struct diff_result){};
result->left = left;
result->right = right;
struct diff_state state = {
.result = result,
.recursion_depth_left = config->max_recursion_depth ?
config->max_recursion_depth : UINT_MAX,
.kd_buf = NULL,
.kd_buf_size = 0,
};
diff_data_init_subsection(&state.left, left,
left->atoms.head,
left->atoms.len);
diff_data_init_subsection(&state.right, right,
right->atoms.head,
right->atoms.len);
result->rc = diff_run_algo(config->algo, &state);
free(state.kd_buf);
return result;
}
void
diff_result_free(struct diff_result *result)
{
if (!result)
return;
ARRAYLIST_FREE(result->chunks);
free(result);
}
int
diff_result_contains_printable_chunks(struct diff_result *result)
{
struct diff_chunk *c;
enum diff_chunk_type t;
int i;
for (i = 0; i < result->chunks.len; i++) {
c = &result->chunks.head[i];
t = diff_chunk_type(c);
if (t == CHUNK_MINUS || t == CHUNK_PLUS)
return 1;
}
return 0;
}

1425
lib/diff_myers.c Normal file

File diff suppressed because it is too large Load diff

371
lib/diff_output.c Normal file
View file

@ -0,0 +1,371 @@
/* Common parts for printing diff output */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <ctype.h>
#include <errno.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arraylist.h>
#include <diff_main.h>
#include <diff_output.h>
#include "diff_internal.h"
static int
get_atom_byte(int *ch, struct diff_atom *atom, off_t off)
{
off_t cur;
if (atom->at != NULL) {
*ch = atom->at[off];
return 0;
}
cur = ftello(atom->root->f);
if (cur == -1)
return errno;
if (cur != atom->pos + off &&
fseeko(atom->root->f, atom->pos + off, SEEK_SET) == -1)
return errno;
*ch = fgetc(atom->root->f);
if (*ch == EOF && ferror(atom->root->f))
return errno;
return 0;
}
#define DIFF_OUTPUT_BUF_SIZE 512
int
diff_output_lines(struct diff_output_info *outinfo, FILE *dest,
const char *prefix, struct diff_atom *start_atom,
unsigned int count)
{
struct diff_atom *atom;
off_t outoff = 0, *offp;
uint8_t *typep;
int rc;
if (outinfo && outinfo->line_offsets.len > 0) {
unsigned int idx = outinfo->line_offsets.len - 1;
outoff = outinfo->line_offsets.head[idx];
}
foreach_diff_atom(atom, start_atom, count) {
off_t outlen = 0;
int i, ch, nbuf = 0;
unsigned int len = atom->len;
unsigned char buf[DIFF_OUTPUT_BUF_SIZE + 1 /* '\n' */];
size_t n;
n = strlcpy(buf, prefix, sizeof(buf));
if (n >= DIFF_OUTPUT_BUF_SIZE) /* leave room for '\n' */
return ENOBUFS;
nbuf += n;
if (len) {
rc = get_atom_byte(&ch, atom, len - 1);
if (rc)
return rc;
if (ch == '\n')
len--;
}
for (i = 0; i < len; i++) {
rc = get_atom_byte(&ch, atom, i);
if (rc)
return rc;
if (nbuf >= DIFF_OUTPUT_BUF_SIZE) {
rc = fwrite(buf, 1, nbuf, dest);
if (rc != nbuf)
return errno;
outlen += rc;
nbuf = 0;
}
buf[nbuf++] = ch;
}
buf[nbuf++] = '\n';
rc = fwrite(buf, 1, nbuf, dest);
if (rc != nbuf)
return errno;
outlen += rc;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += outlen;
*offp = outoff;
ARRAYLIST_ADD(typep, outinfo->line_types);
if (typep == NULL)
return ENOMEM;
*typep = *prefix == ' ' ? DIFF_LINE_CONTEXT :
*prefix == '-' ? DIFF_LINE_MINUS :
*prefix == '+' ? DIFF_LINE_PLUS : DIFF_LINE_NONE;
}
}
return DIFF_RC_OK;
}
int
diff_output_chunk_left_version(struct diff_output_info **output_info,
FILE *dest,
const struct diff_input_info *info,
const struct diff_result *result,
const struct diff_chunk_context *cc)
{
int rc, c_idx;
struct diff_output_info *outinfo = NULL;
if (diff_range_empty(&cc->left))
return DIFF_RC_OK;
if (output_info) {
*output_info = diff_output_info_alloc();
if (*output_info == NULL)
return ENOMEM;
outinfo = *output_info;
}
/* Write out all chunks on the left side. */
for (c_idx = cc->chunk.start; c_idx < cc->chunk.end; c_idx++) {
const struct diff_chunk *c = &result->chunks.head[c_idx];
if (c->left_count) {
rc = diff_output_lines(outinfo, dest, "",
c->left_start, c->left_count);
if (rc)
return rc;
}
}
return DIFF_RC_OK;
}
int
diff_output_chunk_right_version(struct diff_output_info **output_info,
FILE *dest,
const struct diff_input_info *info,
const struct diff_result *result,
const struct diff_chunk_context *cc)
{
int rc, c_idx;
struct diff_output_info *outinfo = NULL;
if (diff_range_empty(&cc->right))
return DIFF_RC_OK;
if (output_info) {
*output_info = diff_output_info_alloc();
if (*output_info == NULL)
return ENOMEM;
outinfo = *output_info;
}
/* Write out all chunks on the right side. */
for (c_idx = cc->chunk.start; c_idx < cc->chunk.end; c_idx++) {
const struct diff_chunk *c = &result->chunks.head[c_idx];
if (c->right_count) {
rc = diff_output_lines(outinfo, dest, "", c->right_start,
c->right_count);
if (rc)
return rc;
}
}
return DIFF_RC_OK;
}
int
diff_output_trailing_newline_msg(struct diff_output_info *outinfo, FILE *dest,
const struct diff_chunk *c)
{
enum diff_chunk_type chunk_type = diff_chunk_type(c);
struct diff_atom *atom, *start_atom;
unsigned int atom_count;
int rc, ch;
off_t outoff = 0, *offp;
uint8_t *typep;
if (chunk_type == CHUNK_MINUS || chunk_type == CHUNK_SAME) {
start_atom = c->left_start;
atom_count = c->left_count;
} else if (chunk_type == CHUNK_PLUS) {
start_atom = c->right_start;
atom_count = c->right_count;
} else
return EINVAL;
/* Locate the last atom. */
if (atom_count == 0)
return EINVAL;
atom = &start_atom[atom_count - 1];
rc = get_atom_byte(&ch, atom, atom->len - 1);
if (rc != DIFF_RC_OK)
return rc;
if (ch != '\n') {
if (outinfo && outinfo->line_offsets.len > 0) {
unsigned int idx = outinfo->line_offsets.len - 1;
outoff = outinfo->line_offsets.head[idx];
}
rc = fprintf(dest, "\\ No newline at end of file\n");
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
ARRAYLIST_ADD(typep, outinfo->line_types);
if (typep == NULL)
return ENOMEM;
*typep = DIFF_LINE_NONE;
}
}
return DIFF_RC_OK;
}
static bool
is_function_prototype(unsigned char ch)
{
return (isalpha((unsigned char)ch) || ch == '_' || ch == '$');
}
#define begins_with(s, pre) (strncmp(s, pre, sizeof(pre)-1) == 0)
int
diff_output_match_function_prototype(char *prototype, size_t prototype_size,
int *last_prototype_idx, const struct diff_result *result,
const struct diff_chunk_context *cc)
{
struct diff_atom *start_atom, *atom;
const struct diff_data *data;
unsigned char buf[DIFF_FUNCTION_CONTEXT_SIZE];
const char *state = NULL;
int rc, i, ch;
if (result->left->atoms.len > 0 && cc->left.start > 0) {
data = result->left;
start_atom = &data->atoms.head[cc->left.start - 1];
} else
return DIFF_RC_OK;
diff_data_foreach_atom_backwards_from(start_atom, atom, data) {
int atom_idx = diff_atom_root_idx(data, atom);
if (atom_idx < *last_prototype_idx)
break;
rc = get_atom_byte(&ch, atom, 0);
if (rc)
return rc;
buf[0] = (unsigned char)ch;
if (!is_function_prototype(buf[0]))
continue;
for (i = 1; i < atom->len && i < sizeof(buf) - 1; i++) {
rc = get_atom_byte(&ch, atom, i);
if (rc)
return rc;
if (ch == '\n')
break;
buf[i] = (unsigned char)ch;
}
buf[i] = '\0';
if (begins_with(buf, "private:")) {
if (!state)
state = " (private)";
} else if (begins_with(buf, "protected:")) {
if (!state)
state = " (protected)";
} else if (begins_with(buf, "public:")) {
if (!state)
state = " (public)";
} else {
if (state) /* don't care about truncation */
strlcat(buf, state, sizeof(buf));
strlcpy(prototype, buf, prototype_size);
break;
}
}
*last_prototype_idx = diff_atom_root_idx(data, start_atom);
return DIFF_RC_OK;
}
struct diff_output_info *
diff_output_info_alloc(void)
{
struct diff_output_info *output_info;
off_t *offp;
uint8_t *typep;
output_info = malloc(sizeof(*output_info));
if (output_info != NULL) {
ARRAYLIST_INIT(output_info->line_offsets, 128);
ARRAYLIST_ADD(offp, output_info->line_offsets);
if (offp == NULL) {
diff_output_info_free(output_info);
return NULL;
}
*offp = 0;
ARRAYLIST_INIT(output_info->line_types, 128);
ARRAYLIST_ADD(typep, output_info->line_types);
if (typep == NULL) {
diff_output_info_free(output_info);
return NULL;
}
*typep = DIFF_LINE_NONE;
}
return output_info;
}
void
diff_output_info_free(struct diff_output_info *output_info)
{
ARRAYLIST_FREE(output_info->line_offsets);
ARRAYLIST_FREE(output_info->line_types);
free(output_info);
}
const char *
diff_output_get_label_left(const struct diff_input_info *info)
{
if (info->flags & DIFF_INPUT_LEFT_NONEXISTENT)
return "/dev/null";
return info->left_path ? info->left_path : "a";
}
const char *
diff_output_get_label_right(const struct diff_input_info *info)
{
if (info->flags & DIFF_INPUT_RIGHT_NONEXISTENT)
return "/dev/null";
return info->right_path ? info->right_path : "b";
}

190
lib/diff_output_edscript.c Normal file
View file

@ -0,0 +1,190 @@
/* Produce ed(1) script output from a diff_result. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
* Copyright (c) 2020 Stefan Sperling <stsp@openbsd.org>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <arraylist.h>
#include <diff_main.h>
#include <diff_output.h>
#include "diff_internal.h"
static int
output_edscript_chunk(struct diff_output_info *outinfo,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result,
struct diff_chunk_context *cc)
{
off_t outoff = 0, *offp;
int left_start, left_len, right_start, right_len;
int rc;
left_len = cc->left.end - cc->left.start;
if (left_len < 0)
return EINVAL;
else if (result->left->atoms.len == 0)
left_start = 0;
else if (left_len == 0 && cc->left.start > 0)
left_start = cc->left.start;
else if (cc->left.end > 0)
left_start = cc->left.start + 1;
else
left_start = cc->left.start;
right_len = cc->right.end - cc->right.start;
if (right_len < 0)
return EINVAL;
else if (result->right->atoms.len == 0)
right_start = 0;
else if (right_len == 0 && cc->right.start > 0)
right_start = cc->right.start;
else if (cc->right.end > 0)
right_start = cc->right.start + 1;
else
right_start = cc->right.start;
if (left_len == 0) {
/* addition */
if (right_len == 1) {
rc = fprintf(dest, "%da%d\n", left_start, right_start);
} else {
rc = fprintf(dest, "%da%d,%d\n", left_start,
right_start, cc->right.end);
}
} else if (right_len == 0) {
/* deletion */
if (left_len == 1) {
rc = fprintf(dest, "%dd%d\n", left_start,
right_start);
} else {
rc = fprintf(dest, "%d,%dd%d\n", left_start,
cc->left.end, right_start);
}
} else {
/* change */
if (left_len == 1 && right_len == 1) {
rc = fprintf(dest, "%dc%d\n", left_start, right_start);
} else if (left_len == 1) {
rc = fprintf(dest, "%dc%d,%d\n", left_start,
right_start, cc->right.end);
} else if (right_len == 1) {
rc = fprintf(dest, "%d,%dc%d\n", left_start,
cc->left.end, right_start);
} else {
rc = fprintf(dest, "%d,%dc%d,%d\n", left_start,
cc->left.end, right_start, cc->right.end);
}
}
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
}
return DIFF_RC_OK;
}
int
diff_output_edscript(struct diff_output_info **output_info,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result)
{
struct diff_output_info *outinfo = NULL;
struct diff_chunk_context cc = {};
int atomizer_flags = (result->left->atomizer_flags|
result->right->atomizer_flags);
int flags = (result->left->root->diff_flags |
result->right->root->diff_flags);
bool force_text = (flags & DIFF_FLAG_FORCE_TEXT_DATA);
bool have_binary = (atomizer_flags & DIFF_ATOMIZER_FOUND_BINARY_DATA);
int i, rc;
if (!result)
return EINVAL;
if (result->rc != DIFF_RC_OK)
return result->rc;
if (output_info) {
*output_info = diff_output_info_alloc();
if (*output_info == NULL)
return ENOMEM;
outinfo = *output_info;
}
if (have_binary && !force_text) {
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *c = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(c);
if (t != CHUNK_MINUS && t != CHUNK_PLUS)
continue;
fprintf(dest, "Binary files %s and %s differ\n",
diff_output_get_label_left(info),
diff_output_get_label_right(info));
break;
}
return DIFF_RC_OK;
}
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *chunk = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(chunk);
struct diff_chunk_context next;
if (t != CHUNK_MINUS && t != CHUNK_PLUS)
continue;
if (diff_chunk_context_empty(&cc)) {
/* Note down the start point, any number of subsequent
* chunks may be joined up to this chunk by being
* directly adjacent. */
diff_chunk_context_get(&cc, result, i, 0);
continue;
}
/* There already is a previous chunk noted down for being
* printed. Does it join up with this one? */
diff_chunk_context_get(&next, result, i, 0);
if (diff_chunk_contexts_touch(&cc, &next)) {
/* This next context touches or overlaps the previous
* one, join. */
diff_chunk_contexts_merge(&cc, &next);
continue;
}
rc = output_edscript_chunk(outinfo, dest, info, result, &cc);
if (rc != DIFF_RC_OK)
return rc;
cc = next;
}
if (!diff_chunk_context_empty(&cc))
return output_edscript_chunk(outinfo, dest, info, result, &cc);
return DIFF_RC_OK;
}

246
lib/diff_output_plain.c Normal file
View file

@ -0,0 +1,246 @@
/* Output all lines of a diff_result. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <errno.h>
#include <stdint.h>
#include <stdio.h>
#include <stdbool.h>
#include <stdlib.h>
#include <arraylist.h>
#include <diff_main.h>
#include <diff_output.h>
#include "diff_internal.h"
static int
output_plain_chunk(struct diff_output_info *outinfo,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result,
struct diff_chunk_context *cc, off_t *outoff, bool headers_only)
{
off_t *offp;
int left_start, left_len, right_start, right_len;
int rc;
bool change = false;
left_len = cc->left.end - cc->left.start;
if (left_len < 0)
return EINVAL;
else if (result->left->atoms.len == 0)
left_start = 0;
else if (left_len == 0 && cc->left.start > 0)
left_start = cc->left.start;
else if (cc->left.end > 0)
left_start = cc->left.start + 1;
else
left_start = cc->left.start;
right_len = cc->right.end - cc->right.start;
if (right_len < 0)
return EINVAL;
else if (result->right->atoms.len == 0)
right_start = 0;
else if (right_len == 0 && cc->right.start > 0)
right_start = cc->right.start;
else if (cc->right.end > 0)
right_start = cc->right.start + 1;
else
right_start = cc->right.start;
if (left_len == 0) {
/* addition */
if (right_len == 1) {
rc = fprintf(dest, "%da%d\n", left_start, right_start);
} else {
rc = fprintf(dest, "%da%d,%d\n", left_start,
right_start, cc->right.end);
}
} else if (right_len == 0) {
/* deletion */
if (left_len == 1) {
rc = fprintf(dest, "%dd%d\n", left_start,
right_start);
} else {
rc = fprintf(dest, "%d,%dd%d\n", left_start,
cc->left.end, right_start);
}
} else {
/* change */
change = true;
if (left_len == 1 && right_len == 1) {
rc = fprintf(dest, "%dc%d\n", left_start, right_start);
} else if (left_len == 1) {
rc = fprintf(dest, "%dc%d,%d\n", left_start,
right_start, cc->right.end);
} else if (right_len == 1) {
rc = fprintf(dest, "%d,%dc%d\n", left_start,
cc->left.end, right_start);
} else {
rc = fprintf(dest, "%d,%dc%d,%d\n", left_start,
cc->left.end, right_start, cc->right.end);
}
}
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
*outoff += rc;
*offp = *outoff;
}
/*
* Now write out all the joined chunks.
*
* If the hunk denotes a change, it will come in the form of a deletion
* chunk followed by a addition chunk. Print a marker to break up the
* additions and deletions when this happens.
*/
int c_idx;
for (c_idx = cc->chunk.start; !headers_only && c_idx < cc->chunk.end;
c_idx++) {
const struct diff_chunk *c = &result->chunks.head[c_idx];
if (c->left_count && !c->right_count)
rc = diff_output_lines(outinfo, dest,
c->solved ? "< " : "?",
c->left_start, c->left_count);
else if (c->right_count && !c->left_count) {
if (change) {
rc = fprintf(dest, "---\n");
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp,
outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
*outoff += rc;
*offp = *outoff;
}
}
rc = diff_output_lines(outinfo, dest,
c->solved ? "> " : "?",
c->right_start, c->right_count);
}
if (rc)
return rc;
if (cc->chunk.end == result->chunks.len) {
rc = diff_output_trailing_newline_msg(outinfo, dest, c);
if (rc != DIFF_RC_OK)
return rc;
}
}
return DIFF_RC_OK;
}
int
diff_output_plain(struct diff_output_info **output_info,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result, int hunk_headers_only)
{
struct diff_output_info *outinfo = NULL;
struct diff_chunk_context cc = {};
int atomizer_flags = (result->left->atomizer_flags|
result->right->atomizer_flags);
int flags = (result->left->root->diff_flags |
result->right->root->diff_flags);
bool force_text = (flags & DIFF_FLAG_FORCE_TEXT_DATA);
bool have_binary = (atomizer_flags & DIFF_ATOMIZER_FOUND_BINARY_DATA);
int i, rc;
off_t outoff = 0, *offp;
if (!result)
return EINVAL;
if (result->rc != DIFF_RC_OK)
return result->rc;
if (output_info) {
*output_info = diff_output_info_alloc();
if (*output_info == NULL)
return ENOMEM;
outinfo = *output_info;
}
if (have_binary && !force_text) {
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *c = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(c);
if (t != CHUNK_MINUS && t != CHUNK_PLUS)
continue;
rc = fprintf(dest, "Binary files %s and %s differ\n",
diff_output_get_label_left(info),
diff_output_get_label_right(info));
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
}
break;
}
return DIFF_RC_OK;
}
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *chunk = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(chunk);
struct diff_chunk_context next;
if (t != CHUNK_MINUS && t != CHUNK_PLUS)
continue;
if (diff_chunk_context_empty(&cc)) {
/* Note down the start point, any number of subsequent
* chunks may be joined up to this chunk by being
* directly adjacent. */
diff_chunk_context_get(&cc, result, i, 0);
continue;
}
/* There already is a previous chunk noted down for being
* printed. Does it join up with this one? */
diff_chunk_context_get(&next, result, i, 0);
if (diff_chunk_contexts_touch(&cc, &next)) {
/* This next context touches or overlaps the previous
* one, join. */
diff_chunk_contexts_merge(&cc, &next);
/* When we merge the last chunk we can end up with one
* hanging chunk and have to come back for it after the
* loop */
continue;
}
rc = output_plain_chunk(outinfo, dest, info, result, &cc,
&outoff, hunk_headers_only);
if (rc != DIFF_RC_OK)
return rc;
cc = next;
}
if (!diff_chunk_context_empty(&cc))
return output_plain_chunk(outinfo, dest, info, result, &cc,
&outoff, hunk_headers_only);
return DIFF_RC_OK;
}

602
lib/diff_output_unidiff.c Normal file
View file

@ -0,0 +1,602 @@
/* Produce a unidiff output from a diff_result. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <errno.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <arraylist.h>
#include <diff_main.h>
#include <diff_output.h>
#include "diff_internal.h"
#include "diff_debug.h"
off_t
diff_chunk_get_left_start_pos(const struct diff_chunk *c)
{
return c->left_start->pos;
}
off_t
diff_chunk_get_right_start_pos(const struct diff_chunk *c)
{
return c->right_start->pos;
}
bool
diff_chunk_context_empty(const struct diff_chunk_context *cc)
{
return diff_range_empty(&cc->chunk);
}
int
diff_chunk_get_left_start(const struct diff_chunk *c,
const struct diff_result *r, int context_lines)
{
int left_start = diff_atom_root_idx(r->left, c->left_start);
return MAX(0, left_start - context_lines);
}
int
diff_chunk_get_left_end(const struct diff_chunk *c,
const struct diff_result *r, int context_lines)
{
int left_start = diff_chunk_get_left_start(c, r, 0);
return MIN(r->left->atoms.len,
left_start + c->left_count + context_lines);
}
int
diff_chunk_get_right_start(const struct diff_chunk *c,
const struct diff_result *r, int context_lines)
{
int right_start = diff_atom_root_idx(r->right, c->right_start);
return MAX(0, right_start - context_lines);
}
int
diff_chunk_get_right_end(const struct diff_chunk *c,
const struct diff_result *r, int context_lines)
{
int right_start = diff_chunk_get_right_start(c, r, 0);
return MIN(r->right->atoms.len,
right_start + c->right_count + context_lines);
}
struct diff_chunk *
diff_chunk_get(const struct diff_result *r, int chunk_idx)
{
return &r->chunks.head[chunk_idx];
}
int
diff_chunk_get_left_count(struct diff_chunk *c)
{
return c->left_count;
}
int
diff_chunk_get_right_count(struct diff_chunk *c)
{
return c->right_count;
}
void
diff_chunk_context_get(struct diff_chunk_context *cc, const struct diff_result *r,
int chunk_idx, int context_lines)
{
const struct diff_chunk *c = &r->chunks.head[chunk_idx];
int left_start = diff_chunk_get_left_start(c, r, context_lines);
int left_end = diff_chunk_get_left_end(c, r, context_lines);
int right_start = diff_chunk_get_right_start(c, r, context_lines);
int right_end = diff_chunk_get_right_end(c, r, context_lines);
*cc = (struct diff_chunk_context){
.chunk = {
.start = chunk_idx,
.end = chunk_idx + 1,
},
.left = {
.start = left_start,
.end = left_end,
},
.right = {
.start = right_start,
.end = right_end,
},
};
}
bool
diff_chunk_contexts_touch(const struct diff_chunk_context *cc,
const struct diff_chunk_context *other)
{
return diff_ranges_touch(&cc->chunk, &other->chunk)
|| diff_ranges_touch(&cc->left, &other->left)
|| diff_ranges_touch(&cc->right, &other->right);
}
void
diff_chunk_contexts_merge(struct diff_chunk_context *cc,
const struct diff_chunk_context *other)
{
diff_ranges_merge(&cc->chunk, &other->chunk);
diff_ranges_merge(&cc->left, &other->left);
diff_ranges_merge(&cc->right, &other->right);
}
void
diff_chunk_context_load_change(struct diff_chunk_context *cc,
int *nchunks_used,
struct diff_result *result,
int start_chunk_idx,
int context_lines)
{
int i;
int seen_minus = 0, seen_plus = 0;
if (nchunks_used)
*nchunks_used = 0;
for (i = start_chunk_idx; i < result->chunks.len; i++) {
struct diff_chunk *chunk = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(chunk);
struct diff_chunk_context next;
if (t != CHUNK_MINUS && t != CHUNK_PLUS) {
if (nchunks_used)
(*nchunks_used)++;
if (seen_minus || seen_plus)
break;
else
continue;
} else if (t == CHUNK_MINUS)
seen_minus = 1;
else if (t == CHUNK_PLUS)
seen_plus = 1;
if (diff_chunk_context_empty(cc)) {
/* Note down the start point, any number of subsequent
* chunks may be joined up to this chunk by being
* directly adjacent. */
diff_chunk_context_get(cc, result, i, context_lines);
if (nchunks_used)
(*nchunks_used)++;
continue;
}
/* There already is a previous chunk noted down for being
* printed. Does it join up with this one? */
diff_chunk_context_get(&next, result, i, context_lines);
if (diff_chunk_contexts_touch(cc, &next)) {
/* This next context touches or overlaps the previous
* one, join. */
diff_chunk_contexts_merge(cc, &next);
if (nchunks_used)
(*nchunks_used)++;
continue;
} else
break;
}
}
struct diff_output_unidiff_state {
bool header_printed;
char prototype[DIFF_FUNCTION_CONTEXT_SIZE];
int last_prototype_idx;
};
struct diff_output_unidiff_state *
diff_output_unidiff_state_alloc(void)
{
struct diff_output_unidiff_state *state;
state = calloc(1, sizeof(struct diff_output_unidiff_state));
if (state != NULL)
diff_output_unidiff_state_reset(state);
return state;
}
void
diff_output_unidiff_state_reset(struct diff_output_unidiff_state *state)
{
state->header_printed = false;
memset(state->prototype, 0, sizeof(state->prototype));
state->last_prototype_idx = 0;
}
void
diff_output_unidiff_state_free(struct diff_output_unidiff_state *state)
{
free(state);
}
static int
output_unidiff_chunk(struct diff_output_info *outinfo, FILE *dest,
struct diff_output_unidiff_state *state,
const struct diff_input_info *info,
const struct diff_result *result,
bool print_header, bool show_function_prototypes,
const struct diff_chunk_context *cc)
{
int rc, left_start, left_len, right_start, right_len;
off_t outoff = 0, *offp;
uint8_t *typep;
if (diff_range_empty(&cc->left) && diff_range_empty(&cc->right))
return DIFF_RC_OK;
if (outinfo && outinfo->line_offsets.len > 0) {
unsigned int idx = outinfo->line_offsets.len - 1;
outoff = outinfo->line_offsets.head[idx];
}
if (print_header && !(state->header_printed)) {
rc = fprintf(dest, "--- %s\n",
diff_output_get_label_left(info));
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
ARRAYLIST_ADD(typep, outinfo->line_types);
if (typep == NULL)
return ENOMEM;
*typep = DIFF_LINE_MINUS;
}
rc = fprintf(dest, "+++ %s\n",
diff_output_get_label_right(info));
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
ARRAYLIST_ADD(typep, outinfo->line_types);
if (typep == NULL)
return ENOMEM;
*typep = DIFF_LINE_PLUS;
}
state->header_printed = true;
}
left_len = cc->left.end - cc->left.start;
if (result->left->atoms.len == 0)
left_start = 0;
else if (left_len == 0 && cc->left.start > 0)
left_start = cc->left.start;
else
left_start = cc->left.start + 1;
right_len = cc->right.end - cc->right.start;
if (result->right->atoms.len == 0)
right_start = 0;
else if (right_len == 0 && cc->right.start > 0)
right_start = cc->right.start;
else
right_start = cc->right.start + 1;
if (show_function_prototypes) {
rc = diff_output_match_function_prototype(state->prototype,
sizeof(state->prototype), &state->last_prototype_idx,
result, cc);
if (rc)
return rc;
}
if (left_len == 1 && right_len == 1) {
rc = fprintf(dest, "@@ -%d +%d @@%s%s\n",
left_start, right_start,
state->prototype[0] ? " " : "",
state->prototype[0] ? state->prototype : "");
} else if (left_len == 1 && right_len != 1) {
rc = fprintf(dest, "@@ -%d +%d,%d @@%s%s\n",
left_start, right_start, right_len,
state->prototype[0] ? " " : "",
state->prototype[0] ? state->prototype : "");
} else if (left_len != 1 && right_len == 1) {
rc = fprintf(dest, "@@ -%d,%d +%d @@%s%s\n",
left_start, left_len, right_start,
state->prototype[0] ? " " : "",
state->prototype[0] ? state->prototype : "");
} else {
rc = fprintf(dest, "@@ -%d,%d +%d,%d @@%s%s\n",
left_start, left_len, right_start, right_len,
state->prototype[0] ? " " : "",
state->prototype[0] ? state->prototype : "");
}
if (rc < 0)
return errno;
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
ARRAYLIST_ADD(typep, outinfo->line_types);
if (typep == NULL)
return ENOMEM;
*typep = DIFF_LINE_HUNK;
}
/* Got the absolute line numbers where to start printing, and the index
* of the interesting (non-context) chunk.
* To print context lines above the interesting chunk, nipping on the
* previous chunk index may be necessary.
* It is guaranteed to be only context lines where left == right, so it
* suffices to look on the left. */
const struct diff_chunk *first_chunk;
int chunk_start_line;
first_chunk = &result->chunks.head[cc->chunk.start];
chunk_start_line = diff_atom_root_idx(result->left,
first_chunk->left_start);
if (cc->left.start < chunk_start_line) {
rc = diff_output_lines(outinfo, dest, " ",
&result->left->atoms.head[cc->left.start],
chunk_start_line - cc->left.start);
if (rc)
return rc;
}
/* Now write out all the joined chunks and contexts between them */
int c_idx;
for (c_idx = cc->chunk.start; c_idx < cc->chunk.end; c_idx++) {
const struct diff_chunk *c = &result->chunks.head[c_idx];
if (c->left_count && c->right_count)
rc = diff_output_lines(outinfo, dest,
c->solved ? " " : "?",
c->left_start, c->left_count);
else if (c->left_count && !c->right_count)
rc = diff_output_lines(outinfo, dest,
c->solved ? "-" : "?",
c->left_start, c->left_count);
else if (c->right_count && !c->left_count)
rc = diff_output_lines(outinfo, dest,
c->solved ? "+" : "?",
c->right_start, c->right_count);
if (rc)
return rc;
if (cc->chunk.end == result->chunks.len) {
rc = diff_output_trailing_newline_msg(outinfo, dest, c);
if (rc != DIFF_RC_OK)
return rc;
}
}
/* Trailing context? */
const struct diff_chunk *last_chunk;
int chunk_end_line;
last_chunk = &result->chunks.head[cc->chunk.end - 1];
chunk_end_line = diff_atom_root_idx(result->left,
last_chunk->left_start
+ last_chunk->left_count);
if (cc->left.end > chunk_end_line) {
rc = diff_output_lines(outinfo, dest, " ",
&result->left->atoms.head[chunk_end_line],
cc->left.end - chunk_end_line);
if (rc)
return rc;
if (cc->left.end == result->left->atoms.len) {
rc = diff_output_trailing_newline_msg(outinfo, dest,
&result->chunks.head[result->chunks.len - 1]);
if (rc != DIFF_RC_OK)
return rc;
}
}
return DIFF_RC_OK;
}
int
diff_output_unidiff_chunk(struct diff_output_info **output_info, FILE *dest,
struct diff_output_unidiff_state *state,
const struct diff_input_info *info,
const struct diff_result *result,
const struct diff_chunk_context *cc)
{
struct diff_output_info *outinfo = NULL;
int flags = (result->left->root->diff_flags |
result->right->root->diff_flags);
bool show_function_prototypes = (flags & DIFF_FLAG_SHOW_PROTOTYPES);
if (output_info) {
*output_info = diff_output_info_alloc();
if (*output_info == NULL)
return ENOMEM;
outinfo = *output_info;
}
return output_unidiff_chunk(outinfo, dest, state, info,
result, false, show_function_prototypes, cc);
}
int
diff_output_unidiff(struct diff_output_info **output_info,
FILE *dest, const struct diff_input_info *info,
const struct diff_result *result,
unsigned int context_lines)
{
struct diff_output_unidiff_state *state;
struct diff_chunk_context cc = {};
struct diff_output_info *outinfo = NULL;
int atomizer_flags = (result->left->atomizer_flags|
result->right->atomizer_flags);
int flags = (result->left->root->diff_flags |
result->right->root->diff_flags);
bool show_function_prototypes = (flags & DIFF_FLAG_SHOW_PROTOTYPES);
bool force_text = (flags & DIFF_FLAG_FORCE_TEXT_DATA);
bool have_binary = (atomizer_flags & DIFF_ATOMIZER_FOUND_BINARY_DATA);
off_t outoff = 0, *offp;
uint8_t *typep;
int rc, i;
if (!result)
return EINVAL;
if (result->rc != DIFF_RC_OK)
return result->rc;
if (output_info) {
*output_info = diff_output_info_alloc();
if (*output_info == NULL)
return ENOMEM;
outinfo = *output_info;
}
if (have_binary && !force_text) {
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *c = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(c);
if (t != CHUNK_MINUS && t != CHUNK_PLUS)
continue;
if (outinfo && outinfo->line_offsets.len > 0) {
unsigned int idx =
outinfo->line_offsets.len - 1;
outoff = outinfo->line_offsets.head[idx];
}
rc = fprintf(dest, "Binary files %s and %s differ\n",
diff_output_get_label_left(info),
diff_output_get_label_right(info));
if (outinfo) {
ARRAYLIST_ADD(offp, outinfo->line_offsets);
if (offp == NULL)
return ENOMEM;
outoff += rc;
*offp = outoff;
ARRAYLIST_ADD(typep, outinfo->line_types);
if (typep == NULL)
return ENOMEM;
*typep = DIFF_LINE_NONE;
}
break;
}
return DIFF_RC_OK;
}
state = diff_output_unidiff_state_alloc();
if (state == NULL) {
if (output_info) {
diff_output_info_free(*output_info);
*output_info = NULL;
}
return ENOMEM;
}
#if DEBUG
unsigned int check_left_pos, check_right_pos;
check_left_pos = 0;
check_right_pos = 0;
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *c = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(c);
debug("[%d] %s lines L%d R%d @L %d @R %d\n",
i, (t == CHUNK_MINUS ? "minus" :
(t == CHUNK_PLUS ? "plus" :
(t == CHUNK_SAME ? "same" : "?"))),
c->left_count,
c->right_count,
c->left_start ? diff_atom_root_idx(result->left, c->left_start) : -1,
c->right_start ? diff_atom_root_idx(result->right, c->right_start) : -1);
assert(check_left_pos == diff_atom_root_idx(result->left, c->left_start));
assert(check_right_pos == diff_atom_root_idx(result->right, c->right_start));
check_left_pos += c->left_count;
check_right_pos += c->right_count;
}
assert(check_left_pos == result->left->atoms.len);
assert(check_right_pos == result->right->atoms.len);
#endif
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *c = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(c);
struct diff_chunk_context next;
if (t != CHUNK_MINUS && t != CHUNK_PLUS)
continue;
if (diff_chunk_context_empty(&cc)) {
/* These are the first lines being printed.
* Note down the start point, any number of subsequent
* chunks may be joined up to this unidiff chunk by
* context lines or by being directly adjacent. */
diff_chunk_context_get(&cc, result, i, context_lines);
debug("new chunk to be printed:"
" chunk %d-%d left %d-%d right %d-%d\n",
cc.chunk.start, cc.chunk.end,
cc.left.start, cc.left.end,
cc.right.start, cc.right.end);
continue;
}
/* There already is a previous chunk noted down for being
* printed. Does it join up with this one? */
diff_chunk_context_get(&next, result, i, context_lines);
debug("new chunk to be printed:"
" chunk %d-%d left %d-%d right %d-%d\n",
next.chunk.start, next.chunk.end,
next.left.start, next.left.end,
next.right.start, next.right.end);
if (diff_chunk_contexts_touch(&cc, &next)) {
/* This next context touches or overlaps the previous
* one, join. */
diff_chunk_contexts_merge(&cc, &next);
debug("new chunk to be printed touches previous chunk,"
" now: left %d-%d right %d-%d\n",
cc.left.start, cc.left.end,
cc.right.start, cc.right.end);
continue;
}
/* No touching, so the previous context is complete with a gap
* between it and this next one. Print the previous one and
* start fresh here. */
debug("new chunk to be printed does not touch previous chunk;"
" print left %d-%d right %d-%d\n",
cc.left.start, cc.left.end, cc.right.start, cc.right.end);
output_unidiff_chunk(outinfo, dest, state, info, result,
true, show_function_prototypes, &cc);
cc = next;
debug("new unprinted chunk is left %d-%d right %d-%d\n",
cc.left.start, cc.left.end, cc.right.start, cc.right.end);
}
if (!diff_chunk_context_empty(&cc))
output_unidiff_chunk(outinfo, dest, state, info, result,
true, show_function_prototypes, &cc);
diff_output_unidiff_state_free(state);
return DIFF_RC_OK;
}

647
lib/diff_patience.c Normal file
View file

@ -0,0 +1,647 @@
/* Implementation of the Patience Diff algorithm invented by Bram Cohen:
* Divide a diff problem into smaller chunks by an LCS (Longest Common Sequence)
* of common-unique lines. */
/*
* Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
#include <assert.h>
#include <errno.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <arraylist.h>
#include <diff_main.h>
#include "diff_internal.h"
#include "diff_debug.h"
/* Algorithm to find unique lines:
* 0: stupidly iterate atoms
* 1: qsort
* 2: mergesort
*/
#define UNIQUE_STRATEGY 1
/* Per-atom state for the Patience Diff algorithm */
struct atom_patience {
#if UNIQUE_STRATEGY == 0
bool unique_here;
#endif
bool unique_in_both;
struct diff_atom *pos_in_other;
struct diff_atom *prev_stack;
struct diff_range identical_lines;
};
/* A diff_atom has a backpointer to the root diff_data. That points to the
* current diff_data, a possibly smaller section of the root. That current
* diff_data->algo_data is a pointer to an array of struct atom_patience. The
* atom's index in current diff_data gives the index in the atom_patience array.
*/
#define PATIENCE(ATOM) \
(((struct atom_patience*)((ATOM)->root->current->algo_data))\
[diff_atom_idx((ATOM)->root->current, ATOM)])
#if UNIQUE_STRATEGY == 0
/* Stupid iteration and comparison of all atoms */
static int
diff_atoms_mark_unique(struct diff_data *d, unsigned int *unique_count)
{
struct diff_atom *i;
unsigned int count = 0;
diff_data_foreach_atom(i, d) {
PATIENCE(i).unique_here = true;
PATIENCE(i).unique_in_both = true;
count++;
}
diff_data_foreach_atom(i, d) {
struct diff_atom *j;
if (!PATIENCE(i).unique_here)
continue;
diff_data_foreach_atom_from(i + 1, j, d) {
bool same;
int r = diff_atom_same(&same, i, j);
if (r)
return r;
if (!same)
continue;
if (PATIENCE(i).unique_here) {
PATIENCE(i).unique_here = false;
PATIENCE(i).unique_in_both = false;
count--;
}
PATIENCE(j).unique_here = false;
PATIENCE(j).unique_in_both = false;
count--;
}
}
if (unique_count)
*unique_count = count;
return 0;
}
/* Mark those lines as PATIENCE(atom).unique_in_both = true that appear exactly
* once in each side. */
static int
diff_atoms_mark_unique_in_both(struct diff_data *left, struct diff_data *right,
unsigned int *unique_in_both_count)
{
/* Derive the final unique_in_both count without needing an explicit
* iteration. So this is just some optimiziation to save one iteration
* in the end. */
unsigned int unique_in_both;
int r;
r = diff_atoms_mark_unique(left, &unique_in_both);
if (r)
return r;
r = diff_atoms_mark_unique(right, NULL);
if (r)
return r;
debug("unique_in_both %u\n", unique_in_both);
struct diff_atom *i;
diff_data_foreach_atom(i, left) {
if (!PATIENCE(i).unique_here)
continue;
struct diff_atom *j;
int found_in_b = 0;
diff_data_foreach_atom(j, right) {
bool same;
int r = diff_atom_same(&same, i, j);
if (r)
return r;
if (!same)
continue;
if (!PATIENCE(j).unique_here) {
found_in_b = 2; /* or more */
break;
} else {
found_in_b = 1;
PATIENCE(j).pos_in_other = i;
PATIENCE(i).pos_in_other = j;
}
}
if (found_in_b == 0 || found_in_b > 1) {
PATIENCE(i).unique_in_both = false;
unique_in_both--;
debug("unique_in_both %u (%d) ", unique_in_both,
found_in_b);
debug_dump_atom(left, NULL, i);
}
}
/* Still need to unmark right[*]->patience.unique_in_both for atoms that
* don't exist in left */
diff_data_foreach_atom(i, right) {
if (!PATIENCE(i).unique_here
|| !PATIENCE(i).unique_in_both)
continue;
struct diff_atom *j;
bool found_in_a = false;
diff_data_foreach_atom(j, left) {
bool same;
int r;
if (!PATIENCE(j).unique_in_both)
continue;
r = diff_atom_same(&same, i, j);
if (r)
return r;
if (!same)
continue;
found_in_a = true;
break;
}
if (!found_in_a)
PATIENCE(i).unique_in_both = false;
}
if (unique_in_both_count)
*unique_in_both_count = unique_in_both;
return 0;
}
#else /* UNIQUE_STRATEGY != 0 */
/* Use an optimized sorting algorithm (qsort, mergesort) to find unique lines */
static int diff_atoms_compar(const void *_a, const void *_b)
{
const struct diff_atom *a = *(struct diff_atom**)_a;
const struct diff_atom *b = *(struct diff_atom**)_b;
int cmp;
int rc = 0;
/* If there's been an error (e.g. I/O error) in a previous compar, we
* have no way to abort the sort but just report the rc and stop
* comparing. Make sure to catch errors on either side. If atoms are
* from more than one diff_data, make sure the error, if any, spreads
* to all of them, so we can cut short all future comparisons. */
if (a->root->err)
rc = a->root->err;
if (b->root->err)
rc = b->root->err;
if (rc) {
a->root->err = rc;
b->root->err = rc;
/* just return 'equal' to not swap more positions */
return 0;
}
/* Sort by the simplistic hash */
if (a->hash < b->hash)
return -1;
if (a->hash > b->hash)
return 1;
/* If hashes are the same, the lines may still differ. Do a full cmp. */
rc = diff_atom_cmp(&cmp, a, b);
if (rc) {
/* Mark the I/O error so that the caller can find out about it.
* For the case atoms are from more than one diff_data, mark in
* both. */
a->root->err = rc;
if (a->root != b->root)
b->root->err = rc;
return 0;
}
return cmp;
}
/* Sort an array of struct diff_atom* in-place. */
static int diff_atoms_sort(struct diff_atom *atoms[],
size_t atoms_count)
{
#if UNIQUE_STRATEGY == 1
qsort(atoms, atoms_count, sizeof(struct diff_atom*), diff_atoms_compar);
#else
mergesort(atoms, atoms_count, sizeof(struct diff_atom*),
diff_atoms_compar);
#endif
return atoms[0]->root->err;
}
static int
diff_atoms_mark_unique_in_both(struct diff_data *left, struct diff_data *right,
unsigned int *unique_in_both_count_p)
{
struct diff_atom *a;
struct diff_atom *b;
struct diff_atom **all_atoms;
unsigned int len = 0;
unsigned int i;
unsigned int unique_in_both_count = 0;
int rc;
all_atoms = calloc(left->atoms.len + right->atoms.len,
sizeof(struct diff_atom *));
if (all_atoms == NULL)
return ENOMEM;
left->err = 0;
right->err = 0;
left->root->err = 0;
right->root->err = 0;
diff_data_foreach_atom(a, left) {
all_atoms[len++] = a;
}
diff_data_foreach_atom(b, right) {
all_atoms[len++] = b;
}
rc = diff_atoms_sort(all_atoms, len);
if (rc)
goto free_and_exit;
/* Now we have a sorted array of atom pointers. All similar lines are
* adjacent. Walk through the array and mark those that are unique on
* each side, but exist once in both sources. */
for (i = 0; i < len; i++) {
bool same;
unsigned int next_differing_i;
unsigned int last_identical_i;
unsigned int j;
unsigned int count_first_side = 1;
unsigned int count_other_side = 0;
a = all_atoms[i];
debug("a: ");
debug_dump_atom(a->root, NULL, a);
/* Do as few diff_atom_cmp() as possible: first walk forward
* only using the cheap hash as indicator for differing atoms;
* then walk backwards until hitting an identical atom. */
for (next_differing_i = i + 1; next_differing_i < len;
next_differing_i++) {
b = all_atoms[next_differing_i];
if (a->hash != b->hash)
break;
}
for (last_identical_i = next_differing_i - 1;
last_identical_i > i;
last_identical_i--) {
b = all_atoms[last_identical_i];
rc = diff_atom_same(&same, a, b);
if (rc)
goto free_and_exit;
if (same)
break;
}
next_differing_i = last_identical_i + 1;
for (j = i+1; j < next_differing_i; j++) {
b = all_atoms[j];
/* A following atom is the same. See on which side the
* repetition counts. */
if (a->root == b->root)
count_first_side ++;
else
count_other_side ++;
debug("b: ");
debug_dump_atom(b->root, NULL, b);
debug(" count_first_side=%d count_other_side=%d\n",
count_first_side, count_other_side);
}
/* Counted a section of similar atoms, put the results back to
* the atoms. */
if ((count_first_side == 1)
&& (count_other_side == 1)) {
b = all_atoms[i+1];
PATIENCE(a).unique_in_both = true;
PATIENCE(a).pos_in_other = b;
PATIENCE(b).unique_in_both = true;
PATIENCE(b).pos_in_other = a;
unique_in_both_count++;
}
/* j now points at the first atom after 'a' that is not
* identical to 'a'. j is always > i. */
i = j - 1;
}
*unique_in_both_count_p = unique_in_both_count;
rc = 0;
free_and_exit:
free(all_atoms);
return rc;
}
#endif /* UNIQUE_STRATEGY != 0 */
/* binary search to find the stack to put this atom "card" on. */
static int
find_target_stack(struct diff_atom *atom,
struct diff_atom **patience_stacks,
unsigned int patience_stacks_count)
{
unsigned int lo = 0;
unsigned int hi = patience_stacks_count;
while (lo < hi) {
unsigned int mid = (lo + hi) >> 1;
if (PATIENCE(patience_stacks[mid]).pos_in_other
< PATIENCE(atom).pos_in_other)
lo = mid + 1;
else
hi = mid;
}
return lo;
}
/* Among the lines that appear exactly once in each side, find the longest
* streak that appear in both files in the same order (with other stuff allowed
* to interleave). Use patience sort for that, as in the Patience Diff
* algorithm.
* See https://bramcohen.livejournal.com/73318.html and, for a much more
* detailed explanation,
* https://blog.jcoglan.com/2017/09/19/the-patience-diff-algorithm/ */
int
diff_algo_patience(const struct diff_algo_config *algo_config,
struct diff_state *state)
{
int rc;
struct diff_data *left = &state->left;
struct diff_data *right = &state->right;
struct atom_patience *atom_patience_left =
calloc(left->atoms.len, sizeof(struct atom_patience));
struct atom_patience *atom_patience_right =
calloc(right->atoms.len, sizeof(struct atom_patience));
unsigned int unique_in_both_count;
struct diff_atom **lcs = NULL;
debug("\n** %s\n", __func__);
left->root->current = left;
right->root->current = right;
left->algo_data = atom_patience_left;
right->algo_data = atom_patience_right;
/* Find those lines that appear exactly once in 'left' and exactly once
* in 'right'. */
rc = diff_atoms_mark_unique_in_both(left, right, &unique_in_both_count);
if (rc)
goto free_and_exit;
debug("unique_in_both_count %u\n", unique_in_both_count);
debug("left:\n");
debug_dump(left);
debug("right:\n");
debug_dump(right);
if (!unique_in_both_count) {
/* Cannot apply Patience, tell the caller to use fallback_algo
* instead. */
rc = DIFF_RC_USE_DIFF_ALGO_FALLBACK;
goto free_and_exit;
}
rc = ENOMEM;
/* An array of Longest Common Sequence is the result of the below
* subscope: */
unsigned int lcs_count = 0;
struct diff_atom *lcs_tail = NULL;
{
/* This subscope marks the lifetime of the atom_pointers
* allocation */
/* One chunk of storage for atom pointers */
struct diff_atom **atom_pointers;
atom_pointers = recallocarray(NULL, 0, unique_in_both_count * 2,
sizeof(struct diff_atom*));
if (atom_pointers == NULL)
return ENOMEM;
/* Half for the list of atoms that still need to be put on
* stacks */
struct diff_atom **uniques = atom_pointers;
/* Half for the patience sort state's "card stacks" -- we
* remember only each stack's topmost "card" */
struct diff_atom **patience_stacks;
patience_stacks = atom_pointers + unique_in_both_count;
unsigned int patience_stacks_count = 0;
/* Take all common, unique items from 'left' ... */
struct diff_atom *atom;
struct diff_atom **uniques_end = uniques;
diff_data_foreach_atom(atom, left) {
if (!PATIENCE(atom).unique_in_both)
continue;
*uniques_end = atom;
uniques_end++;
}
/* ...and sort them to the order found in 'right'.
* The idea is to find the leftmost stack that has a higher line
* number and add it to the stack's top.
* If there is no such stack, open a new one on the right. The
* line number is derived from the atom*, which are array items
* and hence reflect the relative position in the source file.
* So we got the common-uniques from 'left' and sort them
* according to PATIENCE(atom).pos_in_other. */
unsigned int i;
for (i = 0; i < unique_in_both_count; i++) {
atom = uniques[i];
unsigned int target_stack;
target_stack = find_target_stack(atom, patience_stacks,
patience_stacks_count);
assert(target_stack <= patience_stacks_count);
patience_stacks[target_stack] = atom;
if (target_stack == patience_stacks_count)
patience_stacks_count++;
/* Record a back reference to the next stack on the
* left, which will form the final longest sequence
* later. */
PATIENCE(atom).prev_stack = target_stack ?
patience_stacks[target_stack - 1] : NULL;
{
int xx;
for (xx = 0; xx < patience_stacks_count; xx++) {
debug(" %s%d",
(xx == target_stack) ? ">" : "",
diff_atom_idx(right,
PATIENCE(patience_stacks[xx]).pos_in_other));
}
debug("\n");
}
}
/* backtrace through prev_stack references to form the final
* longest common sequence */
lcs_tail = patience_stacks[patience_stacks_count - 1];
lcs_count = patience_stacks_count;
/* uniques and patience_stacks are no longer needed.
* Backpointers are in PATIENCE(atom).prev_stack */
free(atom_pointers);
}
lcs = recallocarray(NULL, 0, lcs_count, sizeof(struct diff_atom*));
struct diff_atom **lcs_backtrace_pos = &lcs[lcs_count - 1];
struct diff_atom *atom;
for (atom = lcs_tail; atom; atom = PATIENCE(atom).prev_stack, lcs_backtrace_pos--) {
assert(lcs_backtrace_pos >= lcs);
*lcs_backtrace_pos = atom;
}
unsigned int i;
if (DEBUG) {
debug("\npatience LCS:\n");
for (i = 0; i < lcs_count; i++) {
debug("\n L "); debug_dump_atom(left, right, lcs[i]);
debug(" R "); debug_dump_atom(right, left,
PATIENCE(lcs[i]).pos_in_other);
}
}
/* TODO: For each common-unique line found (now listed in lcs), swallow
* lines upwards and downwards that are identical on each side. Requires
* a way to represent atoms being glued to adjacent atoms. */
debug("\ntraverse LCS, possibly recursing:\n");
/* Now we have pinned positions in both files at which it makes sense to
* divide the diff problem into smaller chunks. Go into the next round:
* look at each section in turn, trying to again find common-unique
* lines in those smaller sections. As soon as no more are found, the
* remaining smaller sections are solved by Myers. */
/* left_pos and right_pos are indexes in left/right->atoms.head until
* which the atoms are already handled (added to result chunks). */
unsigned int left_pos = 0;
unsigned int right_pos = 0;
for (i = 0; i <= lcs_count; i++) {
struct diff_atom *atom;
struct diff_atom *atom_r;
/* left_idx and right_idx are indexes of the start of this
* section of identical lines on both sides.
* left_pos marks the index of the first still unhandled line,
* left_idx is the start of an identical section some way
* further down, and this loop adds an unsolved chunk of
* [left_pos..left_idx[ and a solved chunk of
* [left_idx..identical_lines.end[. */
unsigned int left_idx;
unsigned int right_idx;
debug("iteration %u of %u left_pos %u right_pos %u\n",
i, lcs_count, left_pos, right_pos);
if (i < lcs_count) {
atom = lcs[i];
atom_r = PATIENCE(atom).pos_in_other;
debug("lcs[%u] = left[%u] = right[%u]\n", i,
diff_atom_idx(left, atom), diff_atom_idx(right, atom_r));
left_idx = diff_atom_idx(left, atom);
right_idx = diff_atom_idx(right, atom_r);
} else {
/* There are no more identical lines until the end of
* left and right. */
atom = NULL;
atom_r = NULL;
left_idx = left->atoms.len;
right_idx = right->atoms.len;
}
/* 'atom' (if not NULL) now marks an atom that matches on both
* sides according to patience-diff (a common-unique identical
* atom in both files).
* Handle the section before and the atom itself; the section
* after will be handled by the next loop iteration -- note that
* i loops to last element + 1 ("i <= lcs_count"), so that there
* will be another final iteration to pick up the last remaining
* items after the last LCS atom.
*/
debug("iteration %u left_pos %u left_idx %u"
" right_pos %u right_idx %u\n",
i, left_pos, left_idx, right_pos, right_idx);
/* Section before the matching atom */
struct diff_atom *left_atom = &left->atoms.head[left_pos];
unsigned int left_section_len = left_idx - left_pos;
struct diff_atom *right_atom = &(right->atoms.head[right_pos]);
unsigned int right_section_len = right_idx - right_pos;
if (left_section_len && right_section_len) {
/* Record an unsolved chunk, the caller will apply
* inner_algo() on this chunk. */
if (!diff_state_add_chunk(state, false,
left_atom, left_section_len,
right_atom,
right_section_len))
goto free_and_exit;
} else if (left_section_len && !right_section_len) {
/* Only left atoms and none on the right, they form a
* "minus" chunk, then. */
if (!diff_state_add_chunk(state, true,
left_atom, left_section_len,
right_atom, 0))
goto free_and_exit;
} else if (!left_section_len && right_section_len) {
/* No left atoms, only atoms on the right, they form a
* "plus" chunk, then. */
if (!diff_state_add_chunk(state, true,
left_atom, 0,
right_atom, right_section_len))
goto free_and_exit;
}
/* else: left_section_len == 0 and right_section_len == 0, i.e.
* nothing here. */
/* The atom found to match on both sides forms a chunk of equals
* on each side. In the very last iteration of this loop, there
* is no matching atom, we were just cleaning out the remaining
* lines. */
if (atom) {
void *ok;
ok = diff_state_add_chunk(state, true,
atom, 1,
PATIENCE(atom).pos_in_other, 1);
if (!ok)
goto free_and_exit;
}
left_pos = left_idx + 1;
right_pos = right_idx + 1;
debug("end of iteration %u left_pos %u left_idx %u"
" right_pos %u right_idx %u\n",
i, left_pos, left_idx, right_pos, right_idx);
}
debug("** END %s\n", __func__);
rc = DIFF_RC_OK;
free_and_exit:
left->root->current = NULL;
right->root->current = NULL;
free(atom_patience_left);
free(atom_patience_right);
if (lcs)
free(lcs);
return rc;
}

47
man/diff.1 Normal file
View file

@ -0,0 +1,47 @@
.\" $OpenBSD$
.\"
.\" Copyright (c) 2018 Martin Pieuchot
.\" Copyright (c) 2020 Neels Hofmeyr <neels@hofmeyr.de>
.\"
.\" Permission to use, copy, modify, and distribute this software for any
.\" purpose with or without fee is hereby granted, provided that the above
.\" copyright notice and this permission notice appear in all copies.
.\"
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
.Dd $Mdocdate: August 28 2017 $
.Dt DIFF 1
.Os
.Sh NAME
.Nm diff
.Nd compare files
.Sh SYNOPSIS
.Nm diff
.Ar file1 file2
.Sh DESCRIPTION
The
.Nm
utility compares the contents of
.Ar file1
and
.Ar file2
line by line.
.Sh EXIT STATUS
The
.Nm
utility exits with one of the following values:
.Pp
.Bl -tag -width Ds -offset indent -compact
.It 0
No differences were found.
.It 1
Differences were found.
.It >1
An error occurred.
.El

11
test/GNUmakefile Normal file
View file

@ -0,0 +1,11 @@
.PHONY: test verify clean
test: verify clean
# verify_all.sh runs 'make' on sub-directories containing C tests
verify:
./verify_all.sh
clean:
-rm verify.*
-$(MAKE) -C ../lib clean
-$(MAKE) -C arraylist_test clean
-$(MAKE) -C results_test clean

12
test/Makefile Normal file
View file

@ -0,0 +1,12 @@
.PHONY: test verify clean
test: verify clean
# verify_all.sh runs 'make' on sub-directories containing C tests
verify:
./verify_all.sh
clean:
-rm verify.*
make -C arraylist_test clean
make -C results_test clean

7
test/README Normal file
View file

@ -0,0 +1,7 @@
The test produces a diff, which is successful if it is able to reconstruct the
original source files from it. It is not tested whether diff output is optimal
or beautiful.
Since different diff algorithms can produce different diff outputs, the
expect*.diff files are merely provided for reference and are not part of the
tests.

58
test/arraylist_test.c Normal file
View file

@ -0,0 +1,58 @@
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arraylist.h>
void test_basic(void)
{
int *p;
ARRAYLIST(int) list;
ARRAYLIST_INIT(list, 2);
#define dump() do {\
printf("(%d items)\n", list.len); \
ARRAYLIST_FOREACH(p, list) \
printf("[%lu] %d\n", \
(unsigned long)ARRAYLIST_IDX(p, list), *p); \
printf("\n"); \
} while(0)
dump();
ARRAYLIST_ADD(p, list);
*p = 100;
dump();
ARRAYLIST_ADD(p, list);
*p = 101;
dump();
ARRAYLIST_ADD(p, list);
*p = 102;
dump();
#define insert_test(AT) do {\
printf("insert at [" #AT "]:\n"); \
ARRAYLIST_INSERT(p, list, AT); \
*p = AT; \
dump(); \
} while(0)
insert_test(list.len - 1);
insert_test(1);
insert_test(0);
insert_test(6);
insert_test(123);
insert_test(-42);
printf("clear:\n");
ARRAYLIST_CLEAR(list);
dump();
ARRAYLIST_FREE(list);
}
int main(void)
{
test_basic();
}

View file

@ -0,0 +1,20 @@
.PHONY: regress clean
CFLAGS = -fsanitize=address -fsanitize=undefined -g -O3
CFLAGS += -Wstrict-prototypes -Wunused-variable -Wuninitialized
CFLAGS+= -I$(CURDIR)/../../compat/include \
-I$(CURDIR)/../../include \
-I$(CURDIR)/../../lib
$(CURDIR)/arraylist_test: $(CURDIR)/../arraylist_test.c $(CURDIR)/../../lib/libdiff.a
gcc $(CFLAGS) -o $@ $^
$(CURDIR)/../../lib/libdiff.a: $(CURDIR)/../../lib/*.[hc] $(CURDIR)/../../include/*.h
$(MAKE) -C $(CURDIR)/../../lib
regress: $(CURDIR)/arraylist_test
$(CURDIR)/arraylist_test
clean:
-rm $(CURDIR)/arraylist_test

View file

@ -0,0 +1,11 @@
.PATH:${.CURDIR}/../../lib
.PATH:${.CURDIR}/..
PROG = arraylist_test
SRCS = arraylist_test.c
CPPFLAGS = -I${.CURDIR}/../../include -I${.CURDIR}/../../lib
NOMAN = yes
.include <bsd.regress.mk>

View file

@ -0,0 +1,76 @@
==== run-regress-arraylist_test ====
(0 items)
(1 items)
[0] 100
(2 items)
[0] 100
[1] 101
(3 items)
[0] 100
[1] 101
[2] 102
insert at [list.len - 1]:
(4 items)
[0] 100
[1] 101
[2] 3
[3] 102
insert at [1]:
(5 items)
[0] 100
[1] 1
[2] 101
[3] 3
[4] 102
insert at [0]:
(6 items)
[0] 0
[1] 100
[2] 1
[3] 101
[4] 3
[5] 102
insert at [6]:
(7 items)
[0] 0
[1] 100
[2] 1
[3] 101
[4] 3
[5] 102
[6] 6
insert at [123]:
(8 items)
[0] 0
[1] 100
[2] 1
[3] 101
[4] 3
[5] 102
[6] 6
[7] 123
insert at [-42]:
(9 items)
[0] 0
[1] 100
[2] 1
[3] 101
[4] 3
[5] 102
[6] 6
[7] 123
[8] -42
clear:
(0 items)

14
test/expect.results_test Normal file
View file

@ -0,0 +1,14 @@
==== run-regress-results_test ====
--- test_minus_after_plus()
[0] same lines L2 R2 @L 0 @R 0
[1] minus lines L3 R0 @L 2 @R 2
[2] plus lines L0 R3 @L 5 @R 2
[3] same lines L2 R2 @L 5 @R 5
--- test_plus_after_plus()
[0] same lines L2 R2 @L 0 @R 0
[1] minus lines L3 R0 @L 2 @R 2
[2] plus lines L0 R3 @L 5 @R 2
[3] same lines L2 R2 @L 5 @R 5

12
test/expect001.diff Normal file
View file

@ -0,0 +1,12 @@
--- test001.left.txt
+++ test001.right.txt
@@ -1,7 +1,6 @@
-A
-B
C
+B
A
B
-B
A
+C

16
test/expect002.diff Normal file
View file

@ -0,0 +1,16 @@
--- test002.left.txt
+++ test002.right.txt
@@ -1,10 +1,9 @@
-A
-B
C
+B
A
B
-B
A
+C
X
-Y
Z
+Q

10
test/expect003.diff Normal file
View file

@ -0,0 +1,10 @@
--- test003.left.txt
+++ test003.right.txt
@@ -1,5 +1,4 @@
-a
+x
b
c
-d
-e
+y

24
test/expect004.diff Normal file
View file

@ -0,0 +1,24 @@
--- test004.left.txt
+++ test004.right.txt
@@ -1,3 +1,10 @@
+int Chunk_bounds_check(Chunk *chunk, size_t start, size_t n)
+{
+ if (chunk == NULL) return 0;
+
+ return start <= chunk->length && n <= chunk->length - start;
+}
+
void Chunk_copy(Chunk *src, size_t src_start, Chunk *dst, size_t dst_start, size_t n)
{
if (!Chunk_bounds_check(src, src_start, n)) return;
@@ -5,10 +12,3 @@
memcpy(dst->data + dst_start, src->data + src_start, n);
}
-
-int Chunk_bounds_check(Chunk *chunk, size_t start, size_t n)
-{
- if (chunk == NULL) return 0;
-
- return start <= chunk->length && n <= chunk->length - start;
-}

12
test/expect005.diff Normal file
View file

@ -0,0 +1,12 @@
--- test005.left.txt
+++ test005.right.txt
@@ -1,7 +1,7 @@
+The Slits
+Gil Scott Heron
David Axelrod
Electric Prunes
-Gil Scott Heron
-The Slits
Faust
The Sonics
The Sonics

24
test/expect006.diff Normal file
View file

@ -0,0 +1,24 @@
--- test006.left.txt
+++ test006.right.txt
@@ -3,7 +3,7 @@
It is important to specify the year of the copyright. Additional years
should be separated by a comma, e.g.
- Copyright (c) 2003, 2004
+ Copyright (c) 2003, 2004, 2005
If you add extra text to the body of the license, be careful not to
add further restrictions.
@@ -11,7 +11,6 @@
/*
* Copyright (c) CCYY YOUR NAME HERE <user@your.dom.ain>
*
- * Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
@@ -23,3 +22,4 @@
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
+An extra line

5
test/expect007.diff Normal file
View file

@ -0,0 +1,5 @@
--- test007.left.txt
+++ test007.right.txt
@@ -1 +1 @@
-x
+abcdx

9
test/expect008.diff Normal file
View file

@ -0,0 +1,9 @@
--- test008.left.txt
+++ test008.right.txt
@@ -1 +1,6 @@
x
+a
+b
+c
+d
+x

13
test/expect009.diff Normal file
View file

@ -0,0 +1,13 @@
--- test009.left.txt
+++ test009.right.txt
@@ -1,3 +1,10 @@
x
a
b
+c
+d
+e
+f
+x
+a
+b

19
test/expect010.diff Normal file
View file

@ -0,0 +1,19 @@
--- test010.left.txt
+++ test010.right.txt
@@ -1,4 +1,4 @@
-/* $OpenBSD: usbdevs_data.h,v 1.715 2020/01/20 07:09:11 jsg Exp $ */
+/* $OpenBSD$ */
/*
* THIS FILE IS AUTOMATICALLY GENERATED. DO NOT EDIT.
@@ -10979,6 +10979,10 @@
},
{
USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8192EU,
+ "RTL8192EU",
+ },
+ {
+ USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8192EU_2,
"RTL8192EU",
},
{

19
test/expect011.diff Normal file
View file

@ -0,0 +1,19 @@
--- test011.left.txt
+++ test011.right.txt
@@ -1,4 +1,4 @@
-/* $OpenBSD: usbdevs_data.h,v 1.715 2020/01/20 07:09:11 jsg Exp $ */
+/* $OpenBSD$ */
/*
* THIS FILE IS AUTOMATICALLY GENERATED. DO NOT EDIT.
@@ -375,6 +375,10 @@
},
{
USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8192EU,
+ "RTL8192EU",
+ },
+ {
+ USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8192EU_2,
"RTL8192EU",
},
{

21
test/expect012.diff Normal file
View file

@ -0,0 +1,21 @@
--- test012.left.txt
+++ test012.right.txt
@@ -1,4 +1,4 @@
-1 left
+1 right
2
3
4
@@ -17,6 +17,12 @@
17
18
19
+14
+15
+16 right
+17
+18
+19
20
21
22

10
test/expect013.diff Normal file
View file

@ -0,0 +1,10 @@
--- test013.left-w.txt
+++ test013.right-w.txt
@@ -3,5 +3,5 @@
C
D
E
-F
-G
+F x
+y G

4
test/expect014.diff Normal file
View file

@ -0,0 +1,4 @@
--- test014.left.txt
+++ test014.right.txt
@@ -0,0 +1 @@
+A

4
test/expect015.diff Normal file
View file

@ -0,0 +1,4 @@
--- test015.left.txt
+++ test015.right.txt
@@ -1 +0,0 @@
-A

30
test/expect016.diff Normal file
View file

@ -0,0 +1,30 @@
--- test016.left.txt
+++ test016.right.txt
@@ -254,7 +254,7 @@
const char *uri, *dirname;
char *proto, *host, *port, *repo_name, *server_path;
char *default_destdir = NULL, *id_str = NULL;
- const char *repo_path;
+ const char *repo_path, *remote_repo_path;
struct got_repository *repo = NULL;
struct got_pathlist_head refs, symrefs, wanted_branches, wanted_refs;
struct got_pathlist_entry *pe;
@@ -275,6 +275,9 @@
goto done;
}
got_path_strip_trailing_slashes(server_path);
+ remote_repo_path = server_path;
+ while (remote_repo_path[0] == '/')
+ remote_repo_path++;
if (asprintf(&gotconfig,
"remote \"%s\" {\n"
"\tserver %s\n"
@@ -285,7 +288,7 @@
"}\n",
GOT_FETCH_DEFAULT_REMOTE_NAME, host, proto,
port ? "\tport " : "", port ? port : "", port ? "\n" : "",
- server_path,
+ remote_repo_path,
mirror_references ? "\tmirror-references yes\n" : "") == -1) {
error = got_error_from_errno("asprintf");
goto done;

16
test/expect018.diff Normal file
View file

@ -0,0 +1,16 @@
--- test018.left-T.txt
+++ test018.right-T.txt
@@ -1,7 +1,6 @@
-A
-B
-C
-A
-B
-B
-A
+C
+B
+A
+B
+A
+C

204
test/expect019.diff Normal file
View file

@ -0,0 +1,204 @@
--- test019.left.txt
+++ test019.right.txt
@@ -40,8 +40,23 @@
#include "got_lib_object.h"
static const struct got_error *
-diff_blobs(struct got_diffreg_result **resultp,
-struct got_blob_object *blob1, struct got_blob_object *blob2,
+add_line_offset(off_t **line_offsets, size_t *nlines, off_t off)
+{
+ off_t *p;
+
+ p = reallocarray(*line_offsets, *nlines + 1, sizeof(off_t));
+ if (p == NULL)
+ return got_error_from_errno("reallocarray");
+ *line_offsets = p;
+ (*line_offsets)[*nlines] = off;
+ (*nlines)++;
+ return NULL;
+}
+
+static const struct got_error *
+diff_blobs(off_t **line_offsets, size_t *nlines,
+ struct got_diffreg_result **resultp, struct got_blob_object *blob1,
+ struct got_blob_object *blob2,
const char *label1, const char *label2, mode_t mode1, mode_t mode2,
int diff_context, int ignore_whitespace, FILE *outfile)
{
@@ -52,7 +67,12 @@
char *idstr1 = NULL, *idstr2 = NULL;
size_t size1, size2;
struct got_diffreg_result *result;
+ off_t outoff = 0;
+ int n;
+ if (line_offsets && *line_offsets && *nlines > 0)
+ outoff = (*line_offsets)[*nlines - 1];
+
if (resultp)
*resultp = NULL;
@@ -116,10 +136,32 @@
goto done;
}
}
- fprintf(outfile, "blob - %s%s\n", idstr1,
+ n = fprintf(outfile, "blob - %s%s\n", idstr1,
modestr1 ? modestr1 : "");
- fprintf(outfile, "blob + %s%s\n", idstr2,
+ if (n < 0) {
+ err = got_error_from_errno("fprintf");
+ goto done;
+ }
+ outoff += n;
+ if (line_offsets) {
+ err = add_line_offset(line_offsets, nlines, outoff);
+ if (err)
+ goto done;
+ }
+
+ n = fprintf(outfile, "blob + %s%s\n", idstr2,
modestr2 ? modestr2 : "");
+ if (n < 0) {
+ err = got_error_from_errno("fprintf");
+ goto done;
+ }
+ outoff += n;
+ if (line_offsets) {
+ err = add_line_offset(line_offsets, nlines, outoff);
+ if (err)
+ goto done;
+ }
+
free(modestr1);
free(modestr2);
}
@@ -129,7 +171,7 @@
goto done;
if (outfile) {
- err = got_diffreg_output(NULL, NULL, result, f1, f2,
+ err = got_diffreg_output(line_offsets, nlines, result, f1, f2,
label1 ? label1 : idstr1,
label2 ? label2 : idstr2,
GOT_DIFF_OUTPUT_UNIDIFF, diff_context, outfile);
@@ -158,21 +200,21 @@
struct got_object_id *id2, const char *label1, const char *label2,
mode_t mode1, mode_t mode2, struct got_repository *repo)
{
- const struct got_error *err;
struct got_diff_blob_output_unidiff_arg *a = arg;
- err = diff_blobs(NULL, blob1, blob2, label1, label2, mode1, mode2,
- a->diff_context, a->ignore_whitespace, a->outfile);
- return err;
+ return diff_blobs(&a->line_offsets, &a->nlines, NULL,
+ blob1, blob2, label1, label2, mode1, mode2, a->diff_context,
+ a->ignore_whitespace, a->outfile);
}
const struct got_error *
-got_diff_blob(struct got_blob_object *blob1, struct got_blob_object *blob2,
+got_diff_blob(off_t **line_offsets, size_t *nlines,
+ struct got_blob_object *blob1, struct got_blob_object *blob2,
const char *label1, const char *label2, int diff_context,
int ignore_whitespace, FILE *outfile)
{
- return diff_blobs(NULL, blob1, blob2, label1, label2, 0, 0, diff_context,
- ignore_whitespace, outfile);
+ return diff_blobs(line_offsets, nlines, NULL, blob1, blob2,
+ label1, label2, 0, 0, diff_context, ignore_whitespace, outfile);
}
static const struct got_error *
@@ -259,7 +301,8 @@
{
const struct got_error *err = NULL;
- err = diff_blobs(result, blob1, blob2, NULL, NULL, 0, 0, 3, 0, NULL);
+ err = diff_blobs(NULL, NULL, result, blob1, blob2,
+ NULL, NULL, 0, 0, 3, 0, NULL);
if (err) {
got_diffreg_result_free(*result);
*result = NULL;
@@ -702,7 +745,8 @@
}
const struct got_error *
-got_diff_objects_as_blobs(struct got_object_id *id1, struct got_object_id *id2,
+got_diff_objects_as_blobs(off_t **line_offsets, size_t *nlines,
+ struct got_object_id *id1, struct got_object_id *id2,
const char *label1, const char *label2, int diff_context,
int ignore_whitespace, struct got_repository *repo, FILE *outfile)
{
@@ -722,8 +766,8 @@
if (err)
goto done;
}
- err = got_diff_blob(blob1, blob2, label1, label2, diff_context,
- ignore_whitespace, outfile);
+ err = got_diff_blob(line_offsets, nlines, blob1, blob2,
+ label1, label2, diff_context, ignore_whitespace, outfile);
done:
if (blob1)
got_object_blob_close(blob1);
@@ -733,13 +777,15 @@
}
const struct got_error *
-got_diff_objects_as_trees(struct got_object_id *id1, struct got_object_id *id2,
+got_diff_objects_as_trees(off_t **line_offsets, size_t *nlines,
+ struct got_object_id *id1, struct got_object_id *id2,
char *label1, char *label2, int diff_context, int ignore_whitespace,
struct got_repository *repo, FILE *outfile)
{
const struct got_error *err;
struct got_tree_object *tree1 = NULL, *tree2 = NULL;
struct got_diff_blob_output_unidiff_arg arg;
+ int want_lineoffsets = (line_offsets != NULL && *line_offsets != NULL);
if (id1 == NULL && id2 == NULL)
return got_error(GOT_ERR_NO_OBJ);
@@ -757,8 +803,20 @@
arg.diff_context = diff_context;
arg.ignore_whitespace = ignore_whitespace;
arg.outfile = outfile;
+ if (want_lineoffsets) {
+ arg.line_offsets = *line_offsets;
+ arg.nlines = *nlines;
+ } else {
+ arg.line_offsets = NULL;
+ arg.nlines = 0;
+ }
err = got_diff_tree(tree1, tree2, label1, label2, repo,
got_diff_blob_output_unidiff, &arg, 1);
+
+ if (want_lineoffsets) {
+ *line_offsets = arg.line_offsets; /* was likely re-allocated */
+ *nlines = arg.nlines;
+ }
done:
if (tree1)
got_object_tree_close(tree1);
@@ -768,8 +826,9 @@
}
const struct got_error *
-got_diff_objects_as_commits(struct got_object_id *id1,
- struct got_object_id *id2, int diff_context, int ignore_whitespace,
+got_diff_objects_as_commits(off_t **line_offsets, size_t *nlines,
+ struct got_object_id *id1, struct got_object_id *id2,
+ int diff_context, int ignore_whitespace,
struct got_repository *repo, FILE *outfile)
{
const struct got_error *err;
@@ -788,7 +847,7 @@
if (err)
goto done;
- err = got_diff_objects_as_trees(
+ err = got_diff_objects_as_trees(line_offsets, nlines,
commit1 ? got_object_commit_get_tree_id(commit1) : NULL,
got_object_commit_get_tree_id(commit2), "", "", diff_context,
ignore_whitespace, repo, outfile);

868
test/expect021.diff Normal file
View file

@ -0,0 +1,868 @@
--- test021.left.txt
+++ test021.right.txt
@@ -1,4 +1,4 @@
-/* $OpenBSD: softraid_crypto.c,v 1.91 2013/03/31 15:44:52 jsing Exp $ */
+/* $OpenBSD: softraid_crypto.c,v 1.139 2020/07/13 00:06:22 kn Exp $ */
/*
* Copyright (c) 2007 Marco Peereboom <marco@peereboom.us>
* Copyright (c) 2008 Hans-Joerg Hoexer <hshoexer@openbsd.org>
@@ -25,7 +25,6 @@
#include <sys/buf.h>
#include <sys/device.h>
#include <sys/ioctl.h>
-#include <sys/proc.h>
#include <sys/malloc.h>
#include <sys/pool.h>
#include <sys/kernel.h>
@@ -34,6 +33,7 @@
#include <sys/queue.h>
#include <sys/fcntl.h>
#include <sys/disklabel.h>
+#include <sys/vnode.h>
#include <sys/mount.h>
#include <sys/sensors.h>
#include <sys/stat.h>
@@ -42,7 +42,6 @@
#include <sys/dkio.h>
#include <crypto/cryptodev.h>
-#include <crypto/cryptosoft.h>
#include <crypto/rijndael.h>
#include <crypto/md5.h>
#include <crypto/sha1.h>
@@ -54,7 +53,6 @@
#include <scsi/scsi_disk.h>
#include <dev/softraidvar.h>
-#include <dev/rndvar.h>
/*
* The per-I/O data that we need to preallocate. We cannot afford to allow I/O
@@ -62,18 +60,15 @@
* because we assert that only one ccb per WU will ever be active.
*/
struct sr_crypto_wu {
- TAILQ_ENTRY(sr_crypto_wu) cr_link;
+ struct sr_workunit cr_wu; /* Must be first. */
struct uio cr_uio;
struct iovec cr_iov;
struct cryptop *cr_crp;
- struct cryptodesc *cr_descs;
- struct sr_workunit *cr_wu;
void *cr_dmabuf;
};
-struct sr_crypto_wu *sr_crypto_wu_get(struct sr_workunit *, int);
-void sr_crypto_wu_put(struct sr_crypto_wu *);
+struct sr_crypto_wu *sr_crypto_prepare(struct sr_workunit *, int);
int sr_crypto_create_keys(struct sr_discipline *);
int sr_crypto_get_kdf(struct bioc_createraid *,
struct sr_discipline *);
@@ -92,12 +87,11 @@
struct bioc_discipline *);
int sr_crypto_meta_opt_handler(struct sr_discipline *,
struct sr_meta_opt_hdr *);
-int sr_crypto_write(struct cryptop *);
+void sr_crypto_write(struct cryptop *);
int sr_crypto_rw(struct sr_workunit *);
-int sr_crypto_rw2(struct sr_workunit *, struct sr_crypto_wu *);
+int sr_crypto_dev_rw(struct sr_workunit *, struct sr_crypto_wu *);
void sr_crypto_done(struct sr_workunit *);
-int sr_crypto_read(struct cryptop *);
-void sr_crypto_finish_io(struct sr_workunit *);
+void sr_crypto_read(struct cryptop *);
void sr_crypto_calculate_check_hmac_sha1(u_int8_t *, int,
u_int8_t *, int, u_char *);
void sr_crypto_hotplug(struct sr_discipline *, struct disk *, int);
@@ -113,6 +107,7 @@
int i;
/* Fill out discipline members. */
+ sd->sd_wu_size = sizeof(struct sr_crypto_wu);
sd->sd_type = SR_MD_CRYPTO;
strlcpy(sd->sd_name, "CRYPTO", sizeof(sd->sd_name));
sd->sd_capabilities = SR_CAP_SYSTEM_DISK | SR_CAP_AUTO_ASSEMBLE;
@@ -143,8 +138,14 @@
sr_error(sd->sd_sc, "%s requires exactly one chunk",
sd->sd_name);
goto done;
- }
+ }
+ if (coerced_size > SR_CRYPTO_MAXSIZE) {
+ sr_error(sd->sd_sc, "%s exceeds maximum size (%lli > %llu)",
+ sd->sd_name, coerced_size, SR_CRYPTO_MAXSIZE);
+ goto done;
+ }
+
/* Create crypto optional metadata. */
omi = malloc(sizeof(struct sr_meta_opt_item), M_DEVBUF,
M_WAITOK | M_ZERO);
@@ -208,7 +209,7 @@
if (data != NULL) {
/* Kernel already has mask key. */
- bcopy(data, sd->mds.mdd_crypto.scr_maskkey,
+ memcpy(sd->mds.mdd_crypto.scr_maskkey, data,
sizeof(sd->mds.mdd_crypto.scr_maskkey));
} else if (bc->bc_key_disk != NODEV) {
/* Read the mask key from the key disk. */
@@ -248,117 +249,69 @@
}
struct sr_crypto_wu *
-sr_crypto_wu_get(struct sr_workunit *wu, int encrypt)
+sr_crypto_prepare(struct sr_workunit *wu, int encrypt)
{
struct scsi_xfer *xs = wu->swu_xs;
struct sr_discipline *sd = wu->swu_dis;
struct sr_crypto_wu *crwu;
struct cryptodesc *crd;
int flags, i, n;
- daddr64_t blk = 0;
+ daddr_t blkno;
u_int keyndx;
- DNPRINTF(SR_D_DIS, "%s: sr_crypto_wu_get wu: %p encrypt: %d\n",
+ DNPRINTF(SR_D_DIS, "%s: sr_crypto_prepare wu %p encrypt %d\n",
DEVNAME(sd->sd_sc), wu, encrypt);
- mtx_enter(&sd->mds.mdd_crypto.scr_mutex);
- if ((crwu = TAILQ_FIRST(&sd->mds.mdd_crypto.scr_wus)) != NULL)
- TAILQ_REMOVE(&sd->mds.mdd_crypto.scr_wus, crwu, cr_link);
- mtx_leave(&sd->mds.mdd_crypto.scr_mutex);
- if (crwu == NULL)
- panic("sr_crypto_wu_get: out of wus");
-
+ crwu = (struct sr_crypto_wu *)wu;
crwu->cr_uio.uio_iovcnt = 1;
crwu->cr_uio.uio_iov->iov_len = xs->datalen;
if (xs->flags & SCSI_DATA_OUT) {
crwu->cr_uio.uio_iov->iov_base = crwu->cr_dmabuf;
- bcopy(xs->data, crwu->cr_uio.uio_iov->iov_base, xs->datalen);
+ memcpy(crwu->cr_uio.uio_iov->iov_base, xs->data, xs->datalen);
} else
crwu->cr_uio.uio_iov->iov_base = xs->data;
- if (xs->cmdlen == 10)
- blk = _4btol(((struct scsi_rw_big *)xs->cmd)->addr);
- else if (xs->cmdlen == 16)
- blk = _8btol(((struct scsi_rw_16 *)xs->cmd)->addr);
- else if (xs->cmdlen == 6)
- blk = _3btol(((struct scsi_rw *)xs->cmd)->addr);
-
+ blkno = wu->swu_blk_start;
n = xs->datalen >> DEV_BSHIFT;
/*
* We preallocated enough crypto descs for up to MAXPHYS of I/O.
- * Since there may be less than that we need to tweak the linked list
+ * Since there may be less than that we need to tweak the amount
* of crypto desc structures to be just long enough for our needs.
*/
- crd = crwu->cr_descs;
- for (i = 0; i < ((MAXPHYS >> DEV_BSHIFT) - n); i++) {
- crd = crd->crd_next;
- KASSERT(crd);
- }
- crwu->cr_crp->crp_desc = crd;
+ KASSERT(crwu->cr_crp->crp_ndescalloc >= n);
+ crwu->cr_crp->crp_ndesc = n;
flags = (encrypt ? CRD_F_ENCRYPT : 0) |
CRD_F_IV_PRESENT | CRD_F_IV_EXPLICIT;
- /* Select crypto session based on block number */
- keyndx = blk >> SR_CRYPTO_KEY_BLKSHIFT;
- if (keyndx >= SR_CRYPTO_MAXKEYS)
- goto unwind;
+ /*
+ * Select crypto session based on block number.
+ *
+ * XXX - this does not handle the case where the read/write spans
+ * across a different key blocks (e.g. 0.5TB boundary). Currently
+ * this is already broken by the use of scr_key[0] below.
+ */
+ keyndx = blkno >> SR_CRYPTO_KEY_BLKSHIFT;
crwu->cr_crp->crp_sid = sd->mds.mdd_crypto.scr_sid[keyndx];
- if (crwu->cr_crp->crp_sid == (u_int64_t)-1)
- goto unwind;
+ crwu->cr_crp->crp_opaque = crwu;
crwu->cr_crp->crp_ilen = xs->datalen;
crwu->cr_crp->crp_alloctype = M_DEVBUF;
+ crwu->cr_crp->crp_flags = CRYPTO_F_IOV | CRYPTO_F_NOQUEUE;
crwu->cr_crp->crp_buf = &crwu->cr_uio;
- for (i = 0, crd = crwu->cr_crp->crp_desc; crd;
- i++, blk++, crd = crd->crd_next) {
+ for (i = 0; i < crwu->cr_crp->crp_ndesc; i++, blkno++) {
+ crd = &crwu->cr_crp->crp_desc[i];
crd->crd_skip = i << DEV_BSHIFT;
crd->crd_len = DEV_BSIZE;
crd->crd_inject = 0;
crd->crd_flags = flags;
- crd->crd_alg = CRYPTO_AES_XTS;
-
- switch (sd->mds.mdd_crypto.scr_meta->scm_alg) {
- case SR_CRYPTOA_AES_XTS_128:
- crd->crd_klen = 256;
- break;
- case SR_CRYPTOA_AES_XTS_256:
- crd->crd_klen = 512;
- break;
- default:
- goto unwind;
- }
+ crd->crd_alg = sd->mds.mdd_crypto.scr_alg;
+ crd->crd_klen = sd->mds.mdd_crypto.scr_klen;
crd->crd_key = sd->mds.mdd_crypto.scr_key[0];
- bcopy(&blk, crd->crd_iv, sizeof(blk));
+ memcpy(crd->crd_iv, &blkno, sizeof(blkno));
}
- crwu->cr_wu = wu;
- crwu->cr_crp->crp_opaque = crwu;
return (crwu);
-
-unwind:
- /* steal the descriptors back from the cryptop */
- crwu->cr_crp->crp_desc = NULL;
-
- return (NULL);
-}
-
-void
-sr_crypto_wu_put(struct sr_crypto_wu *crwu)
-{
- struct cryptop *crp = crwu->cr_crp;
- struct sr_workunit *wu = crwu->cr_wu;
- struct sr_discipline *sd = wu->swu_dis;
-
- DNPRINTF(SR_D_DIS, "%s: sr_crypto_wu_put crwu: %p\n",
- DEVNAME(wu->swu_dis->sd_sc), crwu);
-
- /* steal the descriptors back from the cryptop */
- crp->crp_desc = NULL;
-
- mtx_enter(&sd->mds.mdd_crypto.scr_mutex);
- TAILQ_INSERT_TAIL(&sd->mds.mdd_crypto.scr_wus, crwu, cr_link);
- mtx_leave(&sd->mds.mdd_crypto.scr_mutex);
}
int
@@ -386,9 +339,8 @@
if (sizeof(sd->mds.mdd_crypto.scr_meta->scm_kdfhint) <
kdfinfo->genkdf.len)
goto out;
- bcopy(&kdfinfo->genkdf,
- sd->mds.mdd_crypto.scr_meta->scm_kdfhint,
- kdfinfo->genkdf.len);
+ memcpy(sd->mds.mdd_crypto.scr_meta->scm_kdfhint,
+ &kdfinfo->genkdf, kdfinfo->genkdf.len);
}
/* copy mask key to run-time meta data */
@@ -396,7 +348,7 @@
if (sizeof(sd->mds.mdd_crypto.scr_maskkey) <
sizeof(kdfinfo->maskkey))
goto out;
- bcopy(&kdfinfo->maskkey, sd->mds.mdd_crypto.scr_maskkey,
+ memcpy(sd->mds.mdd_crypto.scr_maskkey, &kdfinfo->maskkey,
sizeof(kdfinfo->maskkey));
}
@@ -404,7 +356,7 @@
rv = 0;
out:
explicit_bzero(kdfinfo, bc->bc_opaque_size);
- free(kdfinfo, M_DEVBUF);
+ free(kdfinfo, M_DEVBUF, bc->bc_opaque_size);
return (rv);
}
@@ -424,7 +376,7 @@
rv = 0;
break;
default:
- DNPRINTF(SR_D_DIS, "%s: unsupported encryption algorithm %u\n",
+ DNPRINTF(SR_D_DIS, "%s: unsupported encryption algorithm %d\n",
"softraid", alg);
rv = -1;
goto out;
@@ -450,7 +402,7 @@
rv = 0;
break;
default:
- DNPRINTF(SR_D_DIS, "%s: unsupported encryption algorithm %u\n",
+ DNPRINTF(SR_D_DIS, "%s: unsupported encryption algorithm %d\n",
"softraid", alg);
rv = -1;
goto out;
@@ -615,6 +567,17 @@
sr_error(sd->sd_sc, "incorrect key or passphrase");
rv = EPERM;
goto out;
+ }
+
+ /* Copy new KDF hint to metadata, if supplied. */
+ if (kdfinfo2->flags & SR_CRYPTOKDF_HINT) {
+ if (kdfinfo2->genkdf.len >
+ sizeof(sd->mds.mdd_crypto.scr_meta->scm_kdfhint))
+ goto out;
+ explicit_bzero(sd->mds.mdd_crypto.scr_meta->scm_kdfhint,
+ sizeof(sd->mds.mdd_crypto.scr_meta->scm_kdfhint));
+ memcpy(sd->mds.mdd_crypto.scr_meta->scm_kdfhint,
+ &kdfinfo2->genkdf, kdfinfo2->genkdf.len);
}
/* Mask the disk keys. */
@@ -630,7 +593,7 @@
sizeof(sd->mds.mdd_crypto.scr_key), check_digest);
/* Copy new encrypted key and HMAC to metadata. */
- bcopy(check_digest, sd->mds.mdd_crypto.scr_meta->chk_hmac_sha1.sch_mac,
+ memcpy(sd->mds.mdd_crypto.scr_meta->chk_hmac_sha1.sch_mac, check_digest,
sizeof(sd->mds.mdd_crypto.scr_meta->chk_hmac_sha1.sch_mac));
rv = 0; /* Success */
@@ -638,7 +601,7 @@
out:
if (p) {
explicit_bzero(p, ksz);
- free(p, M_DEVBUF);
+ free(p, M_DEVBUF, ksz);
}
explicit_bzero(check_digest, sizeof(check_digest));
@@ -686,7 +649,7 @@
DNPRINTF(SR_D_META,"%s: sr_crypto_create_key_disk cannot "
"open %s\n", DEVNAME(sc), devname);
vput(vn);
- goto fail;
+ goto done;
}
open = 1; /* close dev on error */
@@ -696,19 +659,12 @@
FREAD, NOCRED, curproc)) {
DNPRINTF(SR_D_META, "%s: sr_crypto_create_key_disk ioctl "
"failed\n", DEVNAME(sc));
- VOP_CLOSE(vn, FREAD | FWRITE, NOCRED, curproc);
- vput(vn);
- goto fail;
+ goto done;
}
- if (label.d_secsize != DEV_BSIZE) {
- sr_error(sc, "%s has unsupported sector size (%d)",
- devname, label.d_secsize);
- goto fail;
- }
if (label.d_partitions[part].p_fstype != FS_RAID) {
- sr_error(sc, "%s partition not of type RAID (%d)\n",
+ sr_error(sc, "%s partition not of type RAID (%d)",
devname, label.d_partitions[part].p_fstype);
- goto fail;
+ goto done;
}
/*
@@ -728,7 +684,7 @@
km->scmi.scm_size = 0;
km->scmi.scm_coerced_size = 0;
strlcpy(km->scmi.scm_devname, devname, sizeof(km->scmi.scm_devname));
- bcopy(&sd->sd_meta->ssdi.ssd_uuid, &km->scmi.scm_uuid,
+ memcpy(&km->scmi.scm_uuid, &sd->sd_meta->ssdi.ssd_uuid,
sizeof(struct sr_uuid));
sr_checksum(sc, km, &km->scm_checksum,
@@ -745,7 +701,7 @@
sm->ssdi.ssd_version = SR_META_VERSION;
sm->ssd_ondisk = 0;
sm->ssdi.ssd_vol_flags = 0;
- bcopy(&sd->sd_meta->ssdi.ssd_uuid, &sm->ssdi.ssd_uuid,
+ memcpy(&sm->ssdi.ssd_uuid, &sd->sd_meta->ssdi.ssd_uuid,
sizeof(struct sr_uuid));
sm->ssdi.ssd_chunk_no = 1;
sm->ssdi.ssd_volid = SR_KEYDISK_VOLID;
@@ -785,7 +741,7 @@
omi->omi_som->som_type = SR_OPT_KEYDISK;
omi->omi_som->som_length = sizeof(struct sr_meta_keydisk);
skm = (struct sr_meta_keydisk *)omi->omi_som;
- bcopy(sd->mds.mdd_crypto.scr_maskkey, &skm->skm_maskkey,
+ memcpy(&skm->skm_maskkey, sd->mds.mdd_crypto.scr_maskkey,
sizeof(skm->skm_maskkey));
SLIST_INSERT_HEAD(&fakesd->sd_meta_opt, omi, omi_link);
fakesd->sd_meta->ssdi.ssd_opt_no++;
@@ -799,19 +755,16 @@
goto done;
fail:
- if (key_disk)
- free(key_disk, M_DEVBUF);
+ free(key_disk, M_DEVBUF, sizeof(struct sr_chunk));
key_disk = NULL;
done:
- if (omi)
- free(omi, M_DEVBUF);
+ free(omi, M_DEVBUF, sizeof(struct sr_meta_opt_item));
if (fakesd && fakesd->sd_vol.sv_chunks)
- free(fakesd->sd_vol.sv_chunks, M_DEVBUF);
- if (fakesd)
- free(fakesd, M_DEVBUF);
- if (sm)
- free(sm, M_DEVBUF);
+ free(fakesd->sd_vol.sv_chunks, M_DEVBUF,
+ sizeof(struct sr_chunk *));
+ free(fakesd, M_DEVBUF, sizeof(struct sr_discipline));
+ free(sm, M_DEVBUF, sizeof(struct sr_metadata));
if (open) {
VOP_CLOSE(vn, FREAD | FWRITE, NOCRED, curproc);
vput(vn);
@@ -855,7 +808,7 @@
sr_error(sc, "cannot open key disk %s", devname);
goto done;
}
- if (VOP_OPEN(vn, FREAD | FWRITE, NOCRED, curproc)) {
+ if (VOP_OPEN(vn, FREAD, NOCRED, curproc)) {
DNPRINTF(SR_D_META,"%s: sr_crypto_read_key_disk cannot "
"open %s\n", DEVNAME(sc), devname);
vput(vn);
@@ -869,17 +822,10 @@
NOCRED, curproc)) {
DNPRINTF(SR_D_META, "%s: sr_crypto_read_key_disk ioctl "
"failed\n", DEVNAME(sc));
- VOP_CLOSE(vn, FREAD | FWRITE, NOCRED, curproc);
- vput(vn);
goto done;
}
- if (label.d_secsize != DEV_BSIZE) {
- sr_error(sc, "%s has unsupported sector size (%d)",
- devname, label.d_secsize);
- goto done;
- }
if (label.d_partitions[part].p_fstype != FS_RAID) {
- sr_error(sc, "%s partition not of type RAID (%d)\n",
+ sr_error(sc, "%s partition not of type RAID (%d)",
devname, label.d_partitions[part].p_fstype);
goto done;
}
@@ -887,7 +833,7 @@
/*
* Read and validate key disk metadata.
*/
- sm = malloc(SR_META_SIZE * 512, M_DEVBUF, M_WAITOK | M_ZERO);
+ sm = malloc(SR_META_SIZE * DEV_BSIZE, M_DEVBUF, M_WAITOK | M_ZERO);
if (sr_meta_native_read(sd, dev, sm, NULL)) {
sr_error(sc, "native bootprobe could not read native metadata");
goto done;
@@ -911,7 +857,7 @@
key_disk->src_vn = vn;
key_disk->src_size = 0;
- bcopy((struct sr_meta_chunk *)(sm + 1), &key_disk->src_meta,
+ memcpy(&key_disk->src_meta, (struct sr_meta_chunk *)(sm + 1),
sizeof(key_disk->src_meta));
/* Read mask key from optional metadata. */
@@ -920,13 +866,12 @@
omh = omi->omi_som;
if (omh->som_type == SR_OPT_KEYDISK) {
skm = (struct sr_meta_keydisk *)omh;
- bcopy(&skm->skm_maskkey,
- sd->mds.mdd_crypto.scr_maskkey,
+ memcpy(sd->mds.mdd_crypto.scr_maskkey, &skm->skm_maskkey,
sizeof(sd->mds.mdd_crypto.scr_maskkey));
} else if (omh->som_type == SR_OPT_CRYPTO) {
/* Original keydisk format with key in crypto area. */
- bcopy(omh + sizeof(struct sr_meta_opt_hdr),
- sd->mds.mdd_crypto.scr_maskkey,
+ memcpy(sd->mds.mdd_crypto.scr_maskkey,
+ omh + sizeof(struct sr_meta_opt_hdr),
sizeof(sd->mds.mdd_crypto.scr_maskkey));
}
}
@@ -934,15 +879,13 @@
open = 0;
done:
- for (omi = SLIST_FIRST(&som); omi != SLIST_END(&som); omi = omi_next) {
+ for (omi = SLIST_FIRST(&som); omi != NULL; omi = omi_next) {
omi_next = SLIST_NEXT(omi, omi_link);
- if (omi->omi_som)
- free(omi->omi_som, M_DEVBUF);
- free(omi, M_DEVBUF);
+ free(omi->omi_som, M_DEVBUF, 0);
+ free(omi, M_DEVBUF, sizeof(struct sr_meta_opt_item));
}
- if (sm)
- free(sm, M_DEVBUF);
+ free(sm, M_DEVBUF, SR_META_SIZE * DEV_BSIZE);
if (vn && open) {
VOP_CLOSE(vn, FREAD, NOCRED, curproc);
@@ -950,18 +893,45 @@
}
return key_disk;
+}
+
+static void
+sr_crypto_free_sessions(struct sr_discipline *sd)
+{
+ u_int i;
+
+ for (i = 0; i < SR_CRYPTO_MAXKEYS; i++) {
+ if (sd->mds.mdd_crypto.scr_sid[i] != (u_int64_t)-1) {
+ crypto_freesession(sd->mds.mdd_crypto.scr_sid[i]);
+ sd->mds.mdd_crypto.scr_sid[i] = (u_int64_t)-1;
+ }
+ }
}
int
sr_crypto_alloc_resources(struct sr_discipline *sd)
{
- struct cryptoini cri;
+ struct sr_workunit *wu;
struct sr_crypto_wu *crwu;
+ struct cryptoini cri;
u_int num_keys, i;
DNPRINTF(SR_D_DIS, "%s: sr_crypto_alloc_resources\n",
DEVNAME(sd->sd_sc));
+ sd->mds.mdd_crypto.scr_alg = CRYPTO_AES_XTS;
+ switch (sd->mds.mdd_crypto.scr_meta->scm_alg) {
+ case SR_CRYPTOA_AES_XTS_128:
+ sd->mds.mdd_crypto.scr_klen = 256;
+ break;
+ case SR_CRYPTOA_AES_XTS_256:
+ sd->mds.mdd_crypto.scr_klen = 512;
+ break;
+ default:
+ sr_error(sd->sd_sc, "unknown crypto algorithm");
+ return (EINVAL);
+ }
+
for (i = 0; i < SR_CRYPTO_MAXKEYS; i++)
sd->mds.mdd_crypto.scr_sid[i] = (u_int64_t)-1;
@@ -979,61 +949,34 @@
}
/*
- * For each wu allocate the uio, iovec and crypto structures.
- * these have to be allocated now because during runtime we can't
- * fail an allocation without failing the io (which can cause real
+ * For each work unit allocate the uio, iovec and crypto structures.
+ * These have to be allocated now because during runtime we cannot
+ * fail an allocation without failing the I/O (which can cause real
* problems).
*/
- mtx_init(&sd->mds.mdd_crypto.scr_mutex, IPL_BIO);
- TAILQ_INIT(&sd->mds.mdd_crypto.scr_wus);
- for (i = 0; i < sd->sd_max_wu; i++) {
- crwu = malloc(sizeof(*crwu), M_DEVBUF,
- M_WAITOK | M_ZERO | M_CANFAIL);
- if (crwu == NULL)
- return (ENOMEM);
- /* put it on the list now so if we fail it'll be freed */
- mtx_enter(&sd->mds.mdd_crypto.scr_mutex);
- TAILQ_INSERT_TAIL(&sd->mds.mdd_crypto.scr_wus, crwu, cr_link);
- mtx_leave(&sd->mds.mdd_crypto.scr_mutex);
-
+ TAILQ_FOREACH(wu, &sd->sd_wu, swu_next) {
+ crwu = (struct sr_crypto_wu *)wu;
crwu->cr_uio.uio_iov = &crwu->cr_iov;
crwu->cr_dmabuf = dma_alloc(MAXPHYS, PR_WAITOK);
crwu->cr_crp = crypto_getreq(MAXPHYS >> DEV_BSHIFT);
if (crwu->cr_crp == NULL)
return (ENOMEM);
- /* steal the list of cryptodescs */
- crwu->cr_descs = crwu->cr_crp->crp_desc;
- crwu->cr_crp->crp_desc = NULL;
}
- bzero(&cri, sizeof(cri));
- cri.cri_alg = CRYPTO_AES_XTS;
- switch (sd->mds.mdd_crypto.scr_meta->scm_alg) {
- case SR_CRYPTOA_AES_XTS_128:
- cri.cri_klen = 256;
- break;
- case SR_CRYPTOA_AES_XTS_256:
- cri.cri_klen = 512;
- break;
- default:
- return (EINVAL);
- }
+ memset(&cri, 0, sizeof(cri));
+ cri.cri_alg = sd->mds.mdd_crypto.scr_alg;
+ cri.cri_klen = sd->mds.mdd_crypto.scr_klen;
- /* Allocate a session for every 2^SR_CRYPTO_KEY_BLKSHIFT blocks */
- num_keys = sd->sd_meta->ssdi.ssd_size >> SR_CRYPTO_KEY_BLKSHIFT;
- if (num_keys >= SR_CRYPTO_MAXKEYS)
+ /* Allocate a session for every 2^SR_CRYPTO_KEY_BLKSHIFT blocks. */
+ num_keys = ((sd->sd_meta->ssdi.ssd_size - 1) >>
+ SR_CRYPTO_KEY_BLKSHIFT) + 1;
+ if (num_keys > SR_CRYPTO_MAXKEYS)
return (EFBIG);
- for (i = 0; i <= num_keys; i++) {
+ for (i = 0; i < num_keys; i++) {
cri.cri_key = sd->mds.mdd_crypto.scr_key[i];
if (crypto_newsession(&sd->mds.mdd_crypto.scr_sid[i],
&cri, 0) != 0) {
- for (i = 0;
- sd->mds.mdd_crypto.scr_sid[i] != (u_int64_t)-1;
- i++) {
- crypto_freesession(
- sd->mds.mdd_crypto.scr_sid[i]);
- sd->mds.mdd_crypto.scr_sid[i] = (u_int64_t)-1;
- }
+ sr_crypto_free_sessions(sd);
return (EINVAL);
}
}
@@ -1046,39 +989,30 @@
void
sr_crypto_free_resources(struct sr_discipline *sd)
{
+ struct sr_workunit *wu;
struct sr_crypto_wu *crwu;
- u_int i;
DNPRINTF(SR_D_DIS, "%s: sr_crypto_free_resources\n",
DEVNAME(sd->sd_sc));
if (sd->mds.mdd_crypto.key_disk != NULL) {
- explicit_bzero(sd->mds.mdd_crypto.key_disk, sizeof
- sd->mds.mdd_crypto.key_disk);
- free(sd->mds.mdd_crypto.key_disk, M_DEVBUF);
+ explicit_bzero(sd->mds.mdd_crypto.key_disk,
+ sizeof(*sd->mds.mdd_crypto.key_disk));
+ free(sd->mds.mdd_crypto.key_disk, M_DEVBUF,
+ sizeof(*sd->mds.mdd_crypto.key_disk));
}
sr_hotplug_unregister(sd, sr_crypto_hotplug);
- for (i = 0; sd->mds.mdd_crypto.scr_sid[i] != (u_int64_t)-1; i++) {
- crypto_freesession(sd->mds.mdd_crypto.scr_sid[i]);
- sd->mds.mdd_crypto.scr_sid[i] = (u_int64_t)-1;
- }
+ sr_crypto_free_sessions(sd);
- mtx_enter(&sd->mds.mdd_crypto.scr_mutex);
- while ((crwu = TAILQ_FIRST(&sd->mds.mdd_crypto.scr_wus)) != NULL) {
- TAILQ_REMOVE(&sd->mds.mdd_crypto.scr_wus, crwu, cr_link);
-
- if (crwu->cr_dmabuf != NULL)
+ TAILQ_FOREACH(wu, &sd->sd_wu, swu_next) {
+ crwu = (struct sr_crypto_wu *)wu;
+ if (crwu->cr_dmabuf)
dma_free(crwu->cr_dmabuf, MAXPHYS);
- if (crwu->cr_crp) {
- /* twiddle cryptoreq back */
- crwu->cr_crp->crp_desc = crwu->cr_descs;
+ if (crwu->cr_crp)
crypto_freereq(crwu->cr_crp);
- }
- free(crwu, M_DEVBUF);
}
- mtx_leave(&sd->mds.mdd_crypto.scr_mutex);
sr_wu_free(sd);
sr_ccb_free(sd);
@@ -1165,65 +1099,60 @@
sr_crypto_rw(struct sr_workunit *wu)
{
struct sr_crypto_wu *crwu;
- int s, rv = 0;
+ daddr_t blkno;
+ int rv = 0;
- DNPRINTF(SR_D_DIS, "%s: sr_crypto_rw wu: %p\n",
+ DNPRINTF(SR_D_DIS, "%s: sr_crypto_rw wu %p\n",
DEVNAME(wu->swu_dis->sd_sc), wu);
- if (wu->swu_xs->flags & SCSI_DATA_OUT) {
- crwu = sr_crypto_wu_get(wu, 1);
- if (crwu == NULL)
- return (1);
+ if (sr_validate_io(wu, &blkno, "sr_crypto_rw"))
+ return (1);
+
+ if (wu->swu_xs->flags & SCSI_DATA_OUT) {
+ crwu = sr_crypto_prepare(wu, 1);
crwu->cr_crp->crp_callback = sr_crypto_write;
- s = splvm();
- if (crypto_invoke(crwu->cr_crp))
- rv = 1;
- else
+ rv = crypto_dispatch(crwu->cr_crp);
+ if (rv == 0)
rv = crwu->cr_crp->crp_etype;
- splx(s);
} else
- rv = sr_crypto_rw2(wu, NULL);
+ rv = sr_crypto_dev_rw(wu, NULL);
return (rv);
}
-int
+void
sr_crypto_write(struct cryptop *crp)
{
struct sr_crypto_wu *crwu = crp->crp_opaque;
- struct sr_workunit *wu = crwu->cr_wu;
+ struct sr_workunit *wu = &crwu->cr_wu;
int s;
- DNPRINTF(SR_D_INTR, "%s: sr_crypto_write: wu %x xs: %x\n",
+ DNPRINTF(SR_D_INTR, "%s: sr_crypto_write: wu %p xs: %p\n",
DEVNAME(wu->swu_dis->sd_sc), wu, wu->swu_xs);
if (crp->crp_etype) {
/* fail io */
wu->swu_xs->error = XS_DRIVER_STUFFUP;
s = splbio();
- sr_crypto_finish_io(wu);
+ sr_scsi_done(wu->swu_dis, wu->swu_xs);
splx(s);
}
- return (sr_crypto_rw2(wu, crwu));
+ sr_crypto_dev_rw(wu, crwu);
}
int
-sr_crypto_rw2(struct sr_workunit *wu, struct sr_crypto_wu *crwu)
+sr_crypto_dev_rw(struct sr_workunit *wu, struct sr_crypto_wu *crwu)
{
struct sr_discipline *sd = wu->swu_dis;
struct scsi_xfer *xs = wu->swu_xs;
struct sr_ccb *ccb;
struct uio *uio;
- int s;
- daddr64_t blk;
+ daddr_t blkno;
- if (sr_validate_io(wu, &blk, "sr_crypto_rw2"))
- goto bad;
+ blkno = wu->swu_blk_start;
- blk += sd->sd_meta->ssd_data_offset;
-
- ccb = sr_ccb_rw(sd, 0, blk, xs->datalen, xs->data, xs->flags, 0);
+ ccb = sr_ccb_rw(sd, 0, blkno, xs->datalen, xs->data, xs->flags, 0);
if (!ccb) {
/* should never happen but handle more gracefully */
printf("%s: %s: too many ccbs queued\n",
@@ -1236,17 +1165,10 @@
ccb->ccb_opaque = crwu;
}
sr_wu_enqueue_ccb(wu, ccb);
+ sr_schedule_wu(wu);
- s = splbio();
-
- if (sr_check_io_collision(wu))
- goto queued;
-
- sr_raid_startwu(wu);
-
-queued:
- splx(s);
return (0);
+
bad:
/* wu is unwound by sr_wu_put */
if (crwu)
@@ -1259,77 +1181,39 @@
{
struct scsi_xfer *xs = wu->swu_xs;
struct sr_crypto_wu *crwu;
- struct sr_ccb *ccb;
int s;
/* If this was a successful read, initiate decryption of the data. */
if (ISSET(xs->flags, SCSI_DATA_IN) && xs->error == XS_NOERROR) {
- /* only fails on implementation error */
- crwu = sr_crypto_wu_get(wu, 0);
- if (crwu == NULL)
- panic("sr_crypto_intr: no wu");
+ crwu = sr_crypto_prepare(wu, 0);
crwu->cr_crp->crp_callback = sr_crypto_read;
- ccb = TAILQ_FIRST(&wu->swu_ccb);
- if (ccb == NULL)
- panic("sr_crypto_done: no ccbs on workunit");
- ccb->ccb_opaque = crwu;
- DNPRINTF(SR_D_INTR, "%s: sr_crypto_intr: crypto_invoke %p\n",
+ DNPRINTF(SR_D_INTR, "%s: sr_crypto_done: crypto_dispatch %p\n",
DEVNAME(wu->swu_dis->sd_sc), crwu->cr_crp);
- s = splvm();
- crypto_invoke(crwu->cr_crp);
- splx(s);
+ crypto_dispatch(crwu->cr_crp);
return;
}
s = splbio();
- sr_crypto_finish_io(wu);
+ sr_scsi_done(wu->swu_dis, wu->swu_xs);
splx(s);
}
void
-sr_crypto_finish_io(struct sr_workunit *wu)
-{
- struct sr_discipline *sd = wu->swu_dis;
- struct scsi_xfer *xs = wu->swu_xs;
- struct sr_ccb *ccb;
-#ifdef SR_DEBUG
- struct sr_softc *sc = sd->sd_sc;
-#endif /* SR_DEBUG */
-
- splassert(IPL_BIO);
-
- DNPRINTF(SR_D_INTR, "%s: sr_crypto_finish_io: wu %x xs: %x\n",
- DEVNAME(sc), wu, xs);
-
- if (wu->swu_cb_active == 1)
- panic("%s: sr_crypto_finish_io", DEVNAME(sd->sd_sc));
- TAILQ_FOREACH(ccb, &wu->swu_ccb, ccb_link) {
- if (ccb->ccb_opaque == NULL)
- continue;
- sr_crypto_wu_put(ccb->ccb_opaque);
- }
-
- sr_scsi_done(sd, xs);
-}
-
-int
sr_crypto_read(struct cryptop *crp)
{
struct sr_crypto_wu *crwu = crp->crp_opaque;
- struct sr_workunit *wu = crwu->cr_wu;
+ struct sr_workunit *wu = &crwu->cr_wu;
int s;
- DNPRINTF(SR_D_INTR, "%s: sr_crypto_read: wu %x xs: %x\n",
+ DNPRINTF(SR_D_INTR, "%s: sr_crypto_read: wu %p xs: %p\n",
DEVNAME(wu->swu_dis->sd_sc), wu, wu->swu_xs);
if (crp->crp_etype)
wu->swu_xs->error = XS_DRIVER_STUFFUP;
s = splbio();
- sr_crypto_finish_io(wu);
+ sr_scsi_done(wu->swu_dis, wu->swu_xs);
splx(s);
-
- return (0);
}
void

12
test/expect101.diff Normal file
View file

@ -0,0 +1,12 @@
--- test101.left-P.txt
+++ test101.right-P.txt
@@ -1,7 +1,6 @@
-A
-B
C
+B
A
B
-B
A
+C

16
test/expect102.diff Normal file
View file

@ -0,0 +1,16 @@
--- test102.left-P.txt
+++ test102.right-P.txt
@@ -1,10 +1,9 @@
-A
-B
C
+B
A
B
-B
A
+C
X
-Y
Z
+Q

10
test/expect103.diff Normal file
View file

@ -0,0 +1,10 @@
--- test103.left-P.txt
+++ test103.right-P.txt
@@ -1,5 +1,4 @@
-a
+x
b
c
-d
-e
+y

24
test/expect104.diff Normal file
View file

@ -0,0 +1,24 @@
--- test104.left-P.txt
+++ test104.right-P.txt
@@ -1,3 +1,10 @@
+int Chunk_bounds_check(Chunk *chunk, size_t start, size_t n)
+{
+ if (chunk == NULL) return 0;
+
+ return start <= chunk->length && n <= chunk->length - start;
+}
+
void Chunk_copy(Chunk *src, size_t src_start, Chunk *dst, size_t dst_start, size_t n)
{
if (!Chunk_bounds_check(src, src_start, n)) return;
@@ -5,10 +12,3 @@
memcpy(dst->data + dst_start, src->data + src_start, n);
}
-
-int Chunk_bounds_check(Chunk *chunk, size_t start, size_t n)
-{
- if (chunk == NULL) return 0;
-
- return start <= chunk->length && n <= chunk->length - start;
-}

12
test/expect105.diff Normal file
View file

@ -0,0 +1,12 @@
--- test105.left-P.txt
+++ test105.right-P.txt
@@ -1,7 +1,7 @@
+The Slits
+Gil Scott Heron
David Axelrod
Electric Prunes
-Gil Scott Heron
-The Slits
Faust
The Sonics
The Sonics

24
test/expect106.diff Normal file
View file

@ -0,0 +1,24 @@
--- test106.left-P.txt
+++ test106.right-P.txt
@@ -3,7 +3,7 @@
It is important to specify the year of the copyright. Additional years
should be separated by a comma, e.g.
- Copyright (c) 2003, 2004
+ Copyright (c) 2003, 2004, 2005
If you add extra text to the body of the license, be careful not to
add further restrictions.
@@ -11,7 +11,6 @@
/*
* Copyright (c) CCYY YOUR NAME HERE <user@your.dom.ain>
*
- * Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
@@ -23,3 +22,4 @@
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
+An extra line

5
test/expect107.diff Normal file
View file

@ -0,0 +1,5 @@
--- test107.left-P.txt
+++ test107.right-P.txt
@@ -1 +1 @@
-x
+abcdx

9
test/expect108.diff Normal file
View file

@ -0,0 +1,9 @@
--- test108.left-P.txt
+++ test108.right-P.txt
@@ -1 +1,6 @@
x
+a
+b
+c
+d
+x

13
test/expect109.diff Normal file
View file

@ -0,0 +1,13 @@
--- test109.left-P.txt
+++ test109.right-P.txt
@@ -1,3 +1,10 @@
x
a
b
+c
+d
+e
+f
+x
+a
+b

19
test/expect110.diff Normal file
View file

@ -0,0 +1,19 @@
--- test110.left-P.txt
+++ test110.right-P.txt
@@ -1,4 +1,4 @@
-/* $OpenBSD: usbdevs_data.h,v 1.715 2020/01/20 07:09:11 jsg Exp $ */
+/* $OpenBSD$ */
/*
* THIS FILE IS AUTOMATICALLY GENERATED. DO NOT EDIT.
@@ -10982,6 +10982,10 @@
"RTL8192EU",
},
{
+ USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8192EU_2,
+ "RTL8192EU",
+ },
+ {
USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8188EUS,
"RTL8188EUS",
},

19
test/expect111.diff Normal file
View file

@ -0,0 +1,19 @@
--- test111.left-P.txt
+++ test111.right-P.txt
@@ -1,4 +1,4 @@
-/* $OpenBSD: usbdevs_data.h,v 1.715 2020/01/20 07:09:11 jsg Exp $ */
+/* $OpenBSD$ */
/*
* THIS FILE IS AUTOMATICALLY GENERATED. DO NOT EDIT.
@@ -378,6 +378,10 @@
"RTL8192EU",
},
{
+ USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8192EU_2,
+ "RTL8192EU",
+ },
+ {
USB_VENDOR_TPLINK, USB_PRODUCT_TPLINK_RTL8188EUS,
"RTL8188EUS",
},

21
test/expect112.diff Normal file
View file

@ -0,0 +1,21 @@
--- test112.left-P.txt
+++ test112.right-P.txt
@@ -1,4 +1,4 @@
-1 left
+1 right
2
3
4
@@ -17,6 +17,12 @@
17
18
19
+14
+15
+16 right
+17
+18
+19
20
21
22

10
test/expect113.diff Normal file
View file

@ -0,0 +1,10 @@
--- test113.left-Pw.txt
+++ test113.right-Pw.txt
@@ -3,5 +3,5 @@
C
D
E
-F
-G
+F x
+y G

4
test/expect114.diff Normal file
View file

@ -0,0 +1,4 @@
--- test114.left-P.txt
+++ test114.right-P.txt
@@ -0,0 +1 @@
+A

4
test/expect115.diff Normal file
View file

@ -0,0 +1,4 @@
--- test115.left-P.txt
+++ test115.right-P.txt
@@ -1 +0,0 @@
-A

30
test/expect116.diff Normal file
View file

@ -0,0 +1,30 @@
--- test116.left-P.txt
+++ test116.right-P.txt
@@ -254,7 +254,7 @@
const char *uri, *dirname;
char *proto, *host, *port, *repo_name, *server_path;
char *default_destdir = NULL, *id_str = NULL;
- const char *repo_path;
+ const char *repo_path, *remote_repo_path;
struct got_repository *repo = NULL;
struct got_pathlist_head refs, symrefs, wanted_branches, wanted_refs;
struct got_pathlist_entry *pe;
@@ -275,6 +275,9 @@
goto done;
}
got_path_strip_trailing_slashes(server_path);
+ remote_repo_path = server_path;
+ while (remote_repo_path[0] == '/')
+ remote_repo_path++;
if (asprintf(&gotconfig,
"remote \"%s\" {\n"
"\tserver %s\n"
@@ -285,7 +288,7 @@
"}\n",
GOT_FETCH_DEFAULT_REMOTE_NAME, host, proto,
port ? "\tport " : "", port ? port : "", port ? "\n" : "",
- server_path,
+ remote_repo_path,
mirror_references ? "\tmirror-references yes\n" : "") == -1) {
error = got_error_from_errno("asprintf");
goto done;

64
test/expect117.diff Normal file
View file

@ -0,0 +1,64 @@
--- test117.left-P.txt
+++ test117.right-P.txt
@@ -65,6 +65,8 @@
struct sr_crypto_kdfinfo *, struct sr_crypto_kdfinfo *);
int sr_crypto_create(struct sr_discipline *,
struct bioc_createraid *, int, int64_t);
+int sr_crypto_init(struct sr_discipline *,
+ struct bioc_createraid *);
int sr_crypto_assemble(struct sr_discipline *,
struct bioc_createraid *, int, void *);
int sr_crypto_alloc_resources(struct sr_discipline *);
@@ -117,18 +119,34 @@
sr_crypto_create(struct sr_discipline *sd, struct bioc_createraid *bc,
int no_chunk, int64_t coerced_size)
{
- struct sr_meta_opt_item *omi;
- int rv = EINVAL;
+ int rv = EINVAL;
if (no_chunk != 1) {
sr_error(sd->sd_sc, "%s requires exactly one chunk",
sd->sd_name);
- goto done;
+ return (rv);
}
- if (coerced_size > SR_CRYPTO_MAXSIZE) {
+ sd->sd_meta->ssdi.ssd_size = coerced_size;
+
+ rv = sr_crypto_init(sd, bc);
+ if (rv)
+ return (rv);
+
+ sd->sd_max_ccb_per_wu = no_chunk;
+ return (0);
+}
+
+int
+sr_crypto_init(struct sr_discipline *sd, struct bioc_createraid *bc)
+{
+ struct sr_meta_opt_item *omi;
+ int rv = EINVAL;
+
+ if (sd->sd_meta->ssdi.ssd_size > SR_CRYPTO_MAXSIZE) {
sr_error(sd->sd_sc, "%s exceeds maximum size (%lli > %llu)",
- sd->sd_name, coerced_size, SR_CRYPTO_MAXSIZE);
+ sd->sd_name, sd->sd_meta->ssdi.ssd_size,
+ SR_CRYPTO_MAXSIZE);
goto done;
}
@@ -170,12 +188,8 @@
if (!(bc->bc_flags & BIOC_SCNOAUTOASSEMBLE) && bc->bc_key_disk == NODEV)
goto done;
- sd->sd_meta->ssdi.ssd_size = coerced_size;
-
sr_crypto_create_keys(sd);
- sd->sd_max_ccb_per_wu = no_chunk;
-
rv = 0;
done:
return (rv);

1
test/expect123.diff Normal file
View file

@ -0,0 +1 @@
0a1

9
test/expect124.diff Normal file
View file

@ -0,0 +1,9 @@
--- test124.left-p.txt
+++ test124.right-p.txt
@@ -11,5 +11,5 @@ doSomethingThenPrintHello(int test)
struct testfile *
return_test(int test) {
- return NULL;
+ return test*2;
}

12
test/expect125.diff Normal file
View file

@ -0,0 +1,12 @@
--- test125.left.txt
+++ test125.right.txt
@@ -1,7 +1,7 @@
This is a test
of missing trailing new lines
in context
-this line has a change
+this line has the change
this is the same
this is too
and this one
\ No newline at end of file

28
test/expect126.diff Normal file
View file

@ -0,0 +1,28 @@
--- test126.left.txt
+++ test126.right.txt
@@ -1,4 +1,4 @@
-/* $OpenBSD: a_time_tm.c,v 1.15 2018/04/25 11:48:21 tb Exp $ */
+/* $OpenBSD: a_time_tm.c,v 1.16 2020/12/16 18:35:59 tb Exp $ */
/*
* Copyright (c) 2015 Bob Beck <beck@openbsd.org>
*
@@ -108,10 +108,9 @@
return (-1);
lt = tm;
- if (lt == NULL) {
- memset(&ltm, 0, sizeof(ltm));
+ if (lt == NULL)
lt = &ltm;
- }
+ memset(lt, 0, sizeof(*lt));
/* Timezone is required and must be GMT (Zulu). */
if (bytes[len - 1] != 'Z')
@@ -168,4 +167,4 @@
}
return (type);
-}
+}
\ No newline at end of file

178
test/results_test.c Normal file
View file

@ -0,0 +1,178 @@
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <stdlib.h>
#include <arraylist.h>
#include <diff_main.h>
#include <diff_internal.h>
#include <diff_debug.h>
void test_minus_after_plus(void)
{
struct diff_result *result = malloc(sizeof(struct diff_result));
struct diff_data d_left, d_right;
char *left_data = "a\nb\nc\nd\ne\nm\nn\n";
char *right_data = "a\nb\nj\nk\nl\nm\nn\n";
int i;
printf("\n--- %s()\n", __func__);
d_left = (struct diff_data){
.data = left_data,
.len = strlen(left_data),
.root = &d_left,
};
d_right = (struct diff_data){
.data = right_data,
.len = strlen(right_data),
.root = &d_right,
};
*result = (struct diff_result) {
.left = &d_left,
.right = &d_right,
};
diff_atomize_text_by_line(NULL, result->left);
diff_atomize_text_by_line(NULL, result->right);
struct diff_state state = {
.result = result,
.recursion_depth_left = 32,
};
diff_data_init_subsection(&state.left, result->left,
result->left->atoms.head,
result->left->atoms.len);
diff_data_init_subsection(&state.right, result->right,
result->right->atoms.head,
result->right->atoms.len);
/* "same" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[0], 2,
&state.right.atoms.head[0], 2);
/* "plus" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[2], 0,
&state.right.atoms.head[2], 3);
/* "minus" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[2], 3,
&state.right.atoms.head[5], 0);
/* "same" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[5], 2,
&state.right.atoms.head[5], 2);
for (i = 0; i < result->chunks.len; i++) {
struct diff_chunk *c = &result->chunks.head[i];
enum diff_chunk_type t = diff_chunk_type(c);
printf("[%d] %s lines L%d R%d @L %lld @R %lld\n",
i, (t == CHUNK_MINUS ? "minus" :
(t == CHUNK_PLUS ? "plus" :
(t == CHUNK_SAME ? "same" : "?"))),
c->left_count,
c->right_count,
(long long)(c->left_start ? diff_atom_root_idx(result->left, c->left_start) : -1LL),
(long long)(c->right_start ? diff_atom_root_idx(result->right, c->right_start) : -1LL));
}
diff_result_free(result);
diff_data_free(&d_left);
diff_data_free(&d_right);
}
void test_plus_after_plus(void)
{
struct diff_result *result = malloc(sizeof(struct diff_result));
struct diff_data d_left, d_right;
char *left_data = "a\nb\nc\nd\ne\nm\nn\n";
char *right_data = "a\nb\nj\nk\nl\nm\nn\n";
struct diff_chunk *c;
printf("\n--- %s()\n", __func__);
d_left = (struct diff_data){
.data = left_data,
.len = strlen(left_data),
.root = &d_left,
};
d_right = (struct diff_data){
.data = right_data,
.len = strlen(right_data),
.root = &d_right,
};
*result = (struct diff_result) {
.left = &d_left,
.right = &d_right,
};
diff_atomize_text_by_line(NULL, result->left);
diff_atomize_text_by_line(NULL, result->right);
struct diff_state state = {
.result = result,
.recursion_depth_left = 32,
};
diff_data_init_subsection(&state.left, result->left,
result->left->atoms.head,
result->left->atoms.len);
diff_data_init_subsection(&state.right, result->right,
result->right->atoms.head,
result->right->atoms.len);
/* "same" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[0], 2,
&state.right.atoms.head[0], 2);
/* "minus" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[2], 3,
&state.right.atoms.head[2], 0);
/* "plus" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[5], 0,
&state.right.atoms.head[2], 1);
/* "plus" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[5], 0,
&state.right.atoms.head[3], 2);
/* "same" section */
diff_state_add_chunk(&state, true,
&state.left.atoms.head[5], 2,
&state.right.atoms.head[5], 2);
ARRAYLIST_FOREACH(c, result->chunks) {
enum diff_chunk_type t = diff_chunk_type(c);
printf("[%lu] %s lines L%d R%d @L %lld @R %lld\n",
(unsigned long)ARRAYLIST_IDX(c, result->chunks),
(t == CHUNK_MINUS ? "minus" :
(t == CHUNK_PLUS ? "plus" :
(t == CHUNK_SAME ? "same" : "?"))),
c->left_count,
c->right_count,
(long long)(c->left_start ? diff_atom_root_idx(result->left, c->left_start) : -1LL),
(long long)(c->right_start ? diff_atom_root_idx(result->right, c->right_start) : -1LL));
}
diff_result_free(result);
diff_data_free(&d_left);
diff_data_free(&d_right);
}
int main(void)
{
test_minus_after_plus();
test_plus_after_plus();
return 0;
}

View file

@ -0,0 +1,20 @@
.PHONY: regress clean
CFLAGS = -fsanitize=address -fsanitize=undefined -g -O3
CFLAGS += -Wstrict-prototypes -Wunused-variable -Wuninitialized
CFLAGS+= -I$(CURDIR)/../../compat/include \
-I$(CURDIR)/../../include \
-I$(CURDIR)/../../lib
$(CURDIR)/results_test: $(CURDIR)/../results_test.c $(CURDIR)/../../lib/libdiff.a
gcc $(CFLAGS) -o $@ $^
$(CURDIR)/../../lib/libdiff.a: $(CURDIR)/../../lib/*.[hc] $(CURDIR)/../../include/*.h
$(MAKE) -C $(CURDIR)/../../lib
regress: $(CURDIR)/results_test
$(CURDIR)/results_test
clean:
-rm $(CURDIR)/results_test

View file

@ -0,0 +1,11 @@
.PATH:${.CURDIR}/../../lib
.PATH:${.CURDIR}/..
PROG = results_test
SRCS = results_test.c diff_atomize_text.c diff_main.c
CPPFLAGS = -I${.CURDIR}/../../include -I${.CURDIR}/../../lib
NOMAN = yes
.include <bsd.regress.mk>

7
test/test001.left.txt Normal file
View file

@ -0,0 +1,7 @@
A
B
C
A
B
B
A

6
test/test001.right.txt Normal file
View file

@ -0,0 +1,6 @@
C
B
A
B
A
C

10
test/test002.left.txt Normal file
View file

@ -0,0 +1,10 @@
A
B
C
A
B
B
A
X
Y
Z

9
test/test002.right.txt Normal file
View file

@ -0,0 +1,9 @@
C
B
A
B
A
C
X
Z
Q

5
test/test003.left.txt Normal file
View file

@ -0,0 +1,5 @@
a
b
c
d
e

4
test/test003.right.txt Normal file
View file

@ -0,0 +1,4 @@
x
b
c
y

14
test/test004.left.txt Normal file
View file

@ -0,0 +1,14 @@
void Chunk_copy(Chunk *src, size_t src_start, Chunk *dst, size_t dst_start, size_t n)
{
if (!Chunk_bounds_check(src, src_start, n)) return;
if (!Chunk_bounds_check(dst, dst_start, n)) return;
memcpy(dst->data + dst_start, src->data + src_start, n);
}
int Chunk_bounds_check(Chunk *chunk, size_t start, size_t n)
{
if (chunk == NULL) return 0;
return start <= chunk->length && n <= chunk->length - start;
}

14
test/test004.right.txt Normal file
View file

@ -0,0 +1,14 @@
int Chunk_bounds_check(Chunk *chunk, size_t start, size_t n)
{
if (chunk == NULL) return 0;
return start <= chunk->length && n <= chunk->length - start;
}
void Chunk_copy(Chunk *src, size_t src_start, Chunk *dst, size_t dst_start, size_t n)
{
if (!Chunk_bounds_check(src, src_start, n)) return;
if (!Chunk_bounds_check(dst, dst_start, n)) return;
memcpy(dst->data + dst_start, src->data + src_start, n);
}

7
test/test005.left.txt Normal file
View file

@ -0,0 +1,7 @@
David Axelrod
Electric Prunes
Gil Scott Heron
The Slits
Faust
The Sonics
The Sonics

7
test/test005.right.txt Normal file
View file

@ -0,0 +1,7 @@
The Slits
Gil Scott Heron
David Axelrod
Electric Prunes
Faust
The Sonics
The Sonics

25
test/test006.left.txt Normal file
View file

@ -0,0 +1,25 @@
Below is an example license to be used for new code in OpenBSD,
modeled after the ISC license.
It is important to specify the year of the copyright. Additional years
should be separated by a comma, e.g.
Copyright (c) 2003, 2004
If you add extra text to the body of the license, be careful not to
add further restrictions.
/*
* Copyright (c) CCYY YOUR NAME HERE <user@your.dom.ain>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/

25
test/test006.right.txt Normal file
View file

@ -0,0 +1,25 @@
Below is an example license to be used for new code in OpenBSD,
modeled after the ISC license.
It is important to specify the year of the copyright. Additional years
should be separated by a comma, e.g.
Copyright (c) 2003, 2004, 2005
If you add extra text to the body of the license, be careful not to
add further restrictions.
/*
* Copyright (c) CCYY YOUR NAME HERE <user@your.dom.ain>
*
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
An extra line

1
test/test007.left.txt Normal file
View file

@ -0,0 +1 @@
x

1
test/test007.right.txt Normal file
View file

@ -0,0 +1 @@
abcdx

1
test/test008.left.txt Normal file
View file

@ -0,0 +1 @@
x

6
test/test008.right.txt Normal file
View file

@ -0,0 +1,6 @@
x
a
b
c
d
x

3
test/test009.left.txt Normal file
View file

@ -0,0 +1,3 @@
x
a
b

10
test/test009.right.txt Normal file
View file

@ -0,0 +1,10 @@
x
a
b
c
d
e
f
x
a
b

Some files were not shown because too many files have changed in this diff Show more