CI: work around LeakSanitizer crashes with use_tls=0

Without this fix, we have randomly been getting CI failures due to
LeakSanitizer itself crashing after all the tests in a program have
succeeded. This has been happening randomly for a long time, but
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1486
made it very reliably repeatable in the job x86_64-debian-full-build
(and no other job) in the test-subsurface-shot program.

--- Fixture 2 (GL) ok: passed 4, skipped 0, failed 0, total 4
Tracer caught signal 11: addr=0x1b8 pc=0x7f6b3ba640f0 sp=0x7f6b2cc77d10
==489==LeakSanitizer has encountered a fatal error.

I was also able to get a core file after twiddling, but there it ended
up with lsan aborting itself rather than a segfault.

We got some clues that use_tls=0 might work around this, from
https://github.com/google/sanitizers/issues/1342
https://github.com/google/sanitizers/issues/1409
and some other projects that have cargo-culted the same workaround.

Using that cause more false leaks to appear, so they need to be
suppressed. I suppose we are not interested in catching leaks in glib
using code, so I opted to suppress g_malloc0 altogether. Pinpointing it
better might have required much more slower stack tracing.

wl_shm_buffer_begin_access() uses TLS, so no wonder it gets flagged.

ld-*.so is simply uninteresting to us, and it got flagged too.

Since this might have been fixed already in LeakSanitizer upstream, who
knows, leave some notes to revisit this when we upgrade that in CI.

This fix seems to make the branch of
https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1486
in my quick testing.

Suggested-by: Derek Foreman <derek.foreman@collabora.com>
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.com>
This commit is contained in:
Pekka Paalanen 2024-04-09 15:41:45 +03:00
parent de8e3168f0
commit 3179e0f0e0
3 changed files with 13 additions and 1 deletions

View file

@ -82,6 +82,7 @@ stages:
variables:
BUILD_OS: debian
LLVM_VERSION: 15
# If you upgrade from bookworm, see the use_tls=0 notes in tests/meson.build.
FDO_DISTRIBUTION_VERSION: bookworm
FDO_DISTRIBUTION_EXEC: 'env FDO_CI_CONCURRENT=${FDO_CI_CONCURRENT} BUILD_ARCH=${BUILD_ARCH} KERNEL_IMAGE=${KERNEL_IMAGE} KERNEL_DEFCONFIG=${KERNEL_DEFCONFIG} LLVM_VERSION=${LLVM_VERSION} FDO_DISTRIBUTION_VERSION=${FDO_DISTRIBUTION_VERSION} bash .gitlab-ci/debian-install.sh'

View file

@ -4,3 +4,9 @@
# fontconfig library because turning off fast unwind -- required to catch other
# originating leaks from fontconfig; would stall our tests timing them out.
leak:libfontconfig
# Workarounds for the LeakSanitizer use_tls=0 workaround,
# see tests/meson.build
leak:wl_shm_buffer_begin_access
leak:g_malloc0
leak:/ld-*.so*

View file

@ -428,8 +428,13 @@ configure_file(output: 'test-config.h', configuration: test_config_h)
test_env = {}
# there are some leaks in fontconfig we can't fix;
# use_tls=0 is a workaround for LeakSanitizer crashing after successful
# program exit when it scans for leaks. Due to use_tls=0 even more
# suppressions had to be added.
# TODO XXX: Try to revert the addition of use_tls=0 when our CI image
# upgrades from Debian Bookworm to something more recent.
if get_option('b_sanitize') in ['address', 'address,undefined' ]
test_env += { 'LSAN_OPTIONS': 'suppressions=@0@'.format(dir_gitlab_ci / 'leak-sanitizer.supp') }
test_env += { 'LSAN_OPTIONS': 'use_tls=0:suppressions=@0@'.format(dir_gitlab_ci / 'leak-sanitizer.supp') }
endif
foreach t : tests