Segmentation faults when using P/Invoke = pointer issues? Not necessarily
When debugging new RavenDB’s 32-bit pager for Linux-based ARM environments, which has platform specific functionality implemented in C and P/Invoked from C# code, I ran into an issue: when starting, RavenDB was throwing a segmentation fault and crashing. Since the C# code didn’t change much, my immediate suspect was some sort of pointer issue in C code, such as trying to dereference a null pointer.
GDB is awesome for handling segfaults
The GNU Debugger or GDB is very good at tracing such issues. Let’s see how we can find a segfault source in a small example.
Consider the following code:
1 | void throw_segment_fault() |
Now, let’s compile this and use GDB to find where the segfault happens:
note that
a.out
is an executable compiled from segmentFaultThrower.c
1 | gcc segmentFaultThrower.c |
Running GDB with such parameters will yield the following output
1 | GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git |
We have now run our application with GDB attached and paused. Executing run
command actually starts the program.
1 | (gdb) run |
Now running bt
command (backtrack) will show the stack trace where the segfault has happened.
1 | (gdb) bt |
This is nice, but GDB can do better! Compiling our application with -g
flag, will include debug information into executable.
So, if we re-compile with the flag, start GDB and issue run
command, we will see the following
1 | (gdb) run |
Running bt
will now print source code lines in the stack trace
1 | (gdb) bt |
If segfaults happen when P/Invoking a function, it is not necessarily because of null pointers!
The new 32-bit pager that I mentioned at the beginning, was using P/Invokes to C code that was used to access operating-system APIs, such as memory-mapping related functions.
I made sure to compile the C library with -g
flag and ran RavenDB with GDB attached. I saw the following output (notice the output of bt
command):
1 | Thread 1 "Raven.Server" received signal SIGSEGV, Segmentation fault. |
Such output looked weird to me, especially the corrupt stack part, so I looked at the relevant code.
The P/Invoke call in C# part looked like this:
1 | var rc = rvn_mmap_file(size, |
In the C part, rvn_mmap_file()
signature looks like this:
1 | EXPORT int32_t rvn_mmap_file(int64_t sz, int64_t flags, void *handle, int64_t offset, void **addr, int32_t *detailed_error_code) |
In this case,
int64_t
is a typedef forlong long
andint32_t
is a typedef forint
.
The first thing I noticed is that the handle
parameter value is 0 (which means null
pointer) and the offset
parameter has unreasonably large value.
In C# code, by the point the rvn_mmap_file()
is invoked, _handle
is guaranteed to have a value (otherwise the code would have failed earlier). Together with corrupt stack notification from GDB while executing the bt
command, I suspected that some offsets are wrong, since the segfault happens when invoking rvn_mmap_file()
itself.
After looking some more at the code, I noticed that the flags
parameter is int64_t
and the definition of the corresponding flags enum in C# looks like this:
1 | [ ] |
Since in C# enums are of System.Int32
type, this in fact was the issue. The fix was simply to change the flags
type so the signature became:
1 | EXPORT int32_t rvn_mmap_file(int64_t sz, int32_t flags, void *handle, int64_t offset, void **addr, int32_t *detailed_error_code) |
The moral of the story
Usually, segmentation faults are associated with null point dereference or other types of pointer issue, but as we can see here, this doesn’t have to be so.