This is tested by binary-leb128, and previously was failing to load the module
when the element section type parameter was encoded as a multi-byte LEB128.
However, since we weren't checking to see whether the modules loaded
successfully, this failed silently. The addition of WasmAnalysis forces us to
check whether the Wasm module parses correctly.
This fixes some failures from call_indirect, among others. Instead of using an
uponentry injection, this new implementation hijacks the ".locals" directive to
add a CallOther pcodeop, which is usable from both the decompiler and emulator.
When unreachable, br_table may underflow the stack, which is ok. However,
br_table needs to push Unknown types back on the stack, rather than the expected
types. This subtlety was caught by the unreached-valid meet-bottom testcase.
After a br, the remaining code in the block is unreachable and types are no
longer tracked on the stack. Thus, multiple null types can creep in, which was
causing (among other things) crashes when analyzing a select instruction after a
br.
Tables and element segments apparently default to holding null refs, and since
we're using 0x00000000 for null (`ref.null`, `ref.is_null`), we need to ensure
that they are initialized appropriately.
In the future, we may want to revisit whether zero is the right value for null -
e.g. can zero be a valid value for some reference type?
The VM may jump to an uninitialized address such as 0x00000000. If no context is
available at that address, this would crash the emulator before the instruction
can even be hit (e.g. for a breakpoint).
float2float may modify the floating point value, but f{32,64}.const instructions
are supposed to load the raw constant value. Fixes test failures on
float_literals.
load32/load64 were incorrectly taking a laneidx, causing disassembly failures.
Also, take this opportunity to implement some of the SIMD operations using
common pcodeops. This will reduce emulation burden, as well as making loads and
stores explicit for dataflow analysis.
laneidx is actually just a byte, not a LEB128. Attempting to use 16 arguments in
a pcodeop hits a hardcoded limit of 8 args in PcodeEmit, so switch i8x16.shuffle
to using 4 32-bit arguments instead. This is the same syntax as used by
wasm-objdump by default.
Similarly to the x86 AVX2 test, this will allow us to do (basic) testing of Wasm
SIMD opcodes via autovectorization. Only the -O3 binary is expected to contain
SIMD opcodes.
We pass almost every test with -O0 and -O3, with the exception of pcode_conversions_Main.
There's a rather ugly hack needed to convert certain function pointers from
table indices (used by the actual code) to byte addresses (used by the emulator
to set PC). The way this is implemented is decidedly not ideal; a much better
solution would be to somehow hook readCodePointer in the
ProcessorEmulatorTestAdapter subclass; this would also enable the "procedure
descriptor indirection" fix to be moved into a processor-specific
implementation.
Changing the structs to protected in `ProcessorEmulatorTestAdapter` is for
convenience, so that the subclass does not need to go look those up again.
The zero-size check added to `BytesPcodeExecutorStateSpace#read` fixes a bug
which caused an exception when reading 0 bytes (`offset + size - 1` is not a
valid calculation in that case).
This successfully builds a working binary with Clang 15.
- `mem*` definitions in `misc_BODY.c` and new `main` definition in `tpp.py`
are for C99 compatibility
- `encoding='utf8'` in build.py produces a more readable log
- `TestInfo_force` in pcode_test.c ensures that the entire `MainInfo` structure
is included in the binary
The traditional stack analyzer, StackVariableAnalyzer
(NewFunctionStackAnalysisCmd) depends on having registers that contain stack
addresses visible as assembly operands. However, due to the Wasm disassembler
design, which hides the Wasm stack registers from the disassembly, this stack
analysis cannot automatically extract stack variables.
As a fix, adapt FunctionStackAnalysisCmd to operate on the hidden Wasm stack
register operands to detect operations that interact with C stack addresses.
This helps with cross references to C stack variables, as well as allowing C
stack variables to be properly interacted with in the decompiler.
It seems like LLVM, for instance, uses table indices as function pointers, so
this script is likely to be useful for anything compiled with LLVM. As a guess,
analyze_dyncalls is probably only useful for programs compiled using the
Emscripten fastcomp backend.
Without this, we get the error "<pentry> tags within a group must be
distinguished by size or type" when attempting to load the pos-stack compiler
spec. This will break functions that have multiple output arguments, but those
are expected to be rare (and they are not well-supported by Ghidra anyway).
Disassembly and verification are implemented, but almost all of the SIMD opcodes
are just stubbed out with pcodeops for now, so semantics aren't implemented. This
is probably good enough for now.
This patch implements a EmulateInstructionStateModifier for Wasm which provides
support for emulating Wasm instructions.
The memory contents must contain a full module for this to work, because
instruction semantics still depend on module details (e.g. the type of certain
operations depends on metadata like the types of imports or globals).
This allows the whole module to be loaded in memory and eliminates the previous
duplication of the .code bytes in the .module and the .function bytes in RAM.
Using Program throughout is rather lazy, and in many cases totally unnecessary.
Using finer-grained interfaces and classes like AddressFactory allows usage of
the code in non-Program contexts, such as emulation.
The use of 64 bits was mostly a debugging feature to ensure we didn't accidentally
mix references with normal types. 64 bit addresses cause some problems, though,
such as 64-bit immediates (e.g. from ref.func) not always being treated as addresses
in the decompiler.