Examining The Stack For Fun And Profit
Well, maybe not so much for profit, but certainly for fun. This is a wandering journey of exploration and discovery, learning a variety of interesting and useful things.
One of the concerns with an embedded system is how much memory it needs, known as the memory footprint. This consists of the persistent storage needed for the program (i.e. the flash memory or filesystem space that stores the executable image), and the volatile storage needed to hold the data while executing over long periods of runtime (i.e. the RAM in all its flavors).
The RAM consists of 3 areas: static memory, heap, and stack. For the C language:
- Static memory is the fixed allocation of memory used to store global variables, file-scope static variables, and function-scope static variables. These form the .bss and .data segments.
- Heap is the dynamic memory allocated via malloc() and deallocated via free().
- Stack is the dynamic memory allocated automatically by calling functions, consisting of stack frames for each function in the current call stack, containing function local variables and return information (i.e. the address to return to when the function returns). It can also be allocated via the alloca() function, extending the current stack frame. In either case, stack is deallocated automatically when the enclosing function returns.
While heap and stack are both used dynamically, their allocation pools are themselves allocated as fixed, static memory regions that have been reserved for them. The sizes of those regions are defined by the runtime environment, and in some cases can be adjusted from their defaults, particularly in embedded systems running on bare metal or under an RTOS (Real-Time Operating System).
There are two competing requirements when it comes to dynamic memory sizing. First, it's often desirable to run with the minimum amount of memory so that the product can be shipped with smaller memory chips, or smaller capacity memory built into the microcontroller, to minimize cost.
Second, the regions need to be large enough to handle the maximum dynamic allocations that will be needed for the system to run properly over long periods, for all code paths, no matter what it does.
Bad things can happen when dynamic allocations overflow, exceeding the space reserved for them. The system may crash, or it may continue to run, but with corrupted information that causes it to misbehave. Depending on what the system is controlling, this can have serious real-world consequences. A misbehaving music player may be annoying, but a misbehaving engine or factory control system can kill people.
These conditions are known as heap exhaustion and stack overflow. In order to avoid them, you need to know how much heap and stack a system will need, so a common task during embedded system development is measuring memory consumption (note that in some embedded systems, use of heap is prohibited, due to the risks of fragmentation, non-deterministic allocation time, and exhaustion).
These measurements can help drive decisions about what size chips to buy and what changes to make to the software. For situations where the chip sizes have already been established by cost and hardware requirements, they can help drive decisions about whether to use the software, or find alternative software with better measurements.
Different systems have different tools for making measurements. Here, I'm using a Raspberry Pi running Raspbian Linux. These specific techniques should be applicable to any Linux platform. Other, non-Linux systems should have similar capabilities that allow generally analogous techniques.
The advantage of doing this on Linux is that it's a very easy platform to work with. There are a variety of good tools built in, and you can do native development, development and testing on the same device. That's not always true with other embedded system environments, where you have to do cross-development (i.e. the system where you do development, the development system, is completely different from the system that runs the code, the target system).
The disadvantage of doing this on Linux is that Linux has its own set of libraries and specific way of doing things, so this information doesn't always apply as directly to other systems. In particular, it's a general-purpose operating system and runtime environment, not an embedded system. So just bear that in mind when working on those other systems.
In this particular case, I found that a system was using significantly more stack in its initialization that in the rest of its operation. That's unfortunate, because that means that even though it could run for a long period with a small stack, it needed a larger stack for a brief period, that would then be wasted for the rest of the time. Sometimes you just have to live with that, but it's worth digging into to understand and see if there's anything you can do about it.
What I discovered was that the getaddrinfo() function used for setting up Linux network socket connections consumes a lot of stack, especially when you consider it's processing what's probably a short hostname string. Investigating further, I realized there was nothing I could do about it, and the Raspberry Pi has plenty of memory for stack, but it serves as a useful exercise for illustrating debugging and analysis techniques.
Start with a simple program to exercise the function. I'll run this under gdb, the GNU debugger, to examine the stack, using some gdb scripting to help. Then I'll use the information I find to examine glibc source code, the GNU C runtime library. Along the way, we'll learn not only about the internals of the function and what it calls, but also something about how dynamic library loading works under Linux.
Full documentation on gdb is at Debugging with GDB. But it's easiest to learn by example. Gdb commands can be abbreviated, so while I'll use some full commands, I'll use a lot of the abbreviations. I'll explain each command briefly as I use it. Hitting return at the prompt repeats or continues the last command (I've added extra blank lines in the listings below for readability, but if you see a prompt with nothing else on the line, that's where I've just hit return for the repeat/continue effect). Gdb is a fantastic tool, well worth learning, widely supported on a range of platforms. In some cases, other tools are built on top of it.
Many of these techniques work on bare-metal embedded systems as well. You can run a cross-tool version of gdb against a remote target device connected via some kind of communications channel. For instance, I do similar things on an ST Micro Nucleo board connected to my laptop via a USB cable. I run openOCD on the laptop, which communicates with the built-in ST-LINK on the Nucleo over the cable. Together, openOCD and the cable provide the communications channel for the ARM cross-tool version of gdb, also running on the laptop, to do debugging of the remote target. I'll be covering that in a later blog post.
Like the Nucleo, the Raspberry Pi processor is also an ARM chip, so it helps to know ARM assembly language and EABI (Embedded Application Binary Interface) for debugging. You don't need to be an expert in it, but you need to be able to read it at a rough level; any previous experience with assembly language helps. I found this tutorial to be just what I needed.
Here's the program (test-getaddrinfo.c), just enough to run the function under test and provide some gdb breakpoint targets (main() doesn't even need any parameters):
#include <sys/socket.h> #include <netdb.h> #include <string.h> int main() { struct addrinfo hints; struct addrinfo* address_list; memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_protocol = IPPROTO_TCP; int result = getaddrinfo("test.example.com", "80", &hints, &address_list); return result; }
Build it with debug symbols via the -g option:
pi@raspberrypi:~/Projects/test-getaddrinfo $ gcc test-getaddrinfo.c -o test-getaddrinfo -g
Here's a set of gdb script functions (stack_functions.gdb) for playing around with the stack. The gdb scripting language is pretty simple, and very worth knowing; here's documentation on it.
# Functions for examining and manipulating the stack in gdb. # Script constants. set $one_kb = 1024.0 set $safety_margin = 16 # Raspbian Linux stack parameters. set $stack_start = 0x7efdf000 set $stack_end = 0x7f000000 set $stack_size = $stack_end - $stack_start define stack_args if $argc < 2 printf "Usage: stack_args <offset|start> <length|end>\n" else if $arg0 < $stack_start # Assume arg0 is a relative offset from start of stack. set $offset = (int)$arg0 else # Assume arg0 is an absolute address, so compute its offset. set $offset = (int)$arg0 - $stack_start end if $arg1 < $stack_start # Assume arg1 is a relative length. set $length = (int)$arg1 else # Assume arg1 is an absolute address, so compute its length. set $length = (int)$arg1 - $stack_start - $offset end end end document stack_args Usage: stack_args <offset|start> <length|end> Set stack region offset and length from arguments. end define dump_stack if $argc < 2 printf "Usage: dump_stack <offset|start> <length|end>\n" else stack_args $arg0 $arg1 set $i = 0 while $i < $length set $addr = $stack_start + $offset + $i x/4wx $addr set $i = $i + 16 end end end document dump_stack Usage: dump_stack <offset|start> <length|end> Dumps stack starting at <offset|start> bytes, 4 longwords at a time, for <length|end> bytes. end define clear_stack if $argc < 2 printf "Usage: clear_stack <offset|start> <length|end>\n" else stack_args $arg0 $arg1 if $stack_start + $offset + $safety_margin >= $sp printf "Error: start is in active stack.\n" else if $stack_start + $offset + $length + safety_margin >= $sp printf "Error: end is in active stack.\n" else set $i = 0 while $i < $length set $addr = $stack_start + $offset + $i set *((int *) $addr) = 0 set $i = $i + 4 # Takes a while, so give some feedback. if $i % 10000 == 0 printf "Cleared %d\n", $i end end end end end end document clear_stack Usage: clear_stack <offset|start> <length|end> Clears stack starting at <offset|start> bytes, one longword at a time, for <length|end> bytes. end define stack_offset if $argc < 1 printf "Usage: stack_offset <address>\n" else # Cast to int is needed to set $depth when $arg0 is $sp. set $addr = (int)$arg0 set $offset = $addr - $stack_start set $depth = $stack_end - $addr printf "Address %10d = 0x%08x\n", $addr, $addr if $addr < $stack_start || $addr >= $stack_end printf "Warning: address is not in stack.\n" end printf "Stack size %6d = 0x%05x = %5.1fKB, 0x%x-0x%x\n", $stack_size, $stack_size, $stack_size / $one_kb, $stack_start, $stack_end printf "Stack offset %6d = 0x%05x = %5.1fKB\n", $offset, $offset, $offset / $one_kb printf "Stack depth %6d = 0x%05x = %5.1fKB\n", $depth, $depth, $depth / $one_kb end end document stack_offset Usage: stack_offset <address> Shows stack offset and depth represented by address. end define scan_stack if $argc < 2 printf "Usage: scan_stack <offset|start> <length|end>\n" else stack_args $arg0 $arg1 set $addr = $stack_start + $offset set $i = 0 while $i < $length && *((int *) $addr) == 0 set $addr = $stack_start + $offset + $i set $i = $i + 4 # Takes a while, so give some feedback. if $i % 10000 == 0 printf "Scanned %d\n", $i end end if *((int *) $addr) != 0 if $addr < $sp set $offset = $sp - $addr printf "Found data %d bytes deeper than current stack frame (0x%x).\n", $offset, $sp else printf "Stack is clear up to current stack frame (0x%x), it is deepest stack usage.\n", $sp end stack_offset $addr dump_stack $addr-$stack_start 64 else printf "Stack is clear in requested range.\n" end end end document scan_stack Usage: scan_stack <offset|start> <length|end> Scans stack for non-zero contents starting at <offset|start> bytes, one longword at a time, for <length|end> bytes. end define stack_walk set $first_sp = $sp set $last_sp = $sp set $total = 0 frame printf "Top stack frame 0x%08x\n\n", $last_sp # Loop will error out gracefully when there are no more frames. while 1 up set $delta = $sp - $last_sp set $total = $total + $delta printf "Last stack frame 0x%08x, current 0x%08x, size of last %4d = 0x%03x, total deeper %6d = 0x%05x = %5.1fKB\n\n", $last_sp, $sp, $delta, $delta, $total, $total, $total / $one_kb set $last_sp = $sp end end document stack_walk Usage: stack_walk Walks stack frames upward from currently selected frame and computes incremental and cumulative size of frames, so that stack consumption can be attributed to specific functions. Use "f 0" to select deepest frame of call stack, or "f <n>" to select frame <n> higher up in stack. end
How do I know where the stack boundaries are for the $stack_start and $stack_end variables? On Linux, the file /proc/<pid>/maps
This type of thing is very system-specific, so a different Linux platform might use different addresses. For a bare-metal system such as the Nucleo board, and possibly for an RTOS, these addresses would be found in the linker control script (.ld file).
The Linux stack grows backwards, from end to start (i.e. from higher address to lower address). The size of the stack is known as its depth. It consists of a series of stack frames, one per function call in a call tree (think of a stack of plates building up, but using frames instead of plates). Each frame consists of all the temporary storage that a function needs. This includes saving processor registers that need to be preserved across calls, and any local variables. In some cases, function parameters may be passed via the stack, but the ARM EABI dictates that functions pass the first group of arguments via registers.
The stack is created as zero-filled memory at process creation. The fact that it's initialized to known values makes it easy to find the deepest point of consumption by searching for the first non-zero location.
Two registers are important for tracking the stack, SP (Stack Pointer) and FP (Frame Pointer). The FP is actually R11. Gdb identifies these symbolically as $sp and $r11.
Pushing data onto the stack and popping data off it automatically changes the SP. Offset values can also be subtracted from the SP and added to it to bulk-allocate and deallocate space.
Here's an enormously important note about memory allocated by subtracting from the SP: this does not change the values of the memory locations in the allocated space. It simply moves the stack boundary to include them, and the memory has whatever values were previously stored there. Thus, the variables or data structures this space maps to in the program are uninitialized. That's why you have to assign values to your local variables in some way before you read them. Otherwise you read random, unknown data left there by whoever wrote to those locations last. This is a common source of bugs.
At certain points in a function, the SP is saved to the FP to mark the frame. The actual mechanics of how and when that is done are specified by the EABI.
To analyze the stack usage, start the program under gdb and pull in the stack functions (the -q option here is quiet mode to suppress boilerplate startup messages):
pi@raspberrypi:~/Projects/test-getaddrinfo $ gdb -q ./test-getaddrinfo Reading symbols from ./test-getaddrinfo...done. (gdb) source stack_functions.gdb
The program isn't actually running yet. List the program source and set breakpoints on the main() function and the line containing the final return statement:
(gdb) list 1 #include <sys/socket.h> 2 #include <netdb.h> 3 #include <string.h> 4 5 int 6 main() 7 { 8 struct addrinfo hints; 9 struct addrinfo* address_list; 10 (gdb) 11 memset(&hints, 0, sizeof(hints)); 12 hints.ai_family = AF_UNSPEC; 13 hints.ai_socktype = SOCK_STREAM; 14 hints.ai_protocol = IPPROTO_TCP; 15 16 int result = getaddrinfo("test.example.com", "80", &hints, &address_list); 17 return result; 18 } (gdb) b main Breakpoint 1 at 0x10480: file test-getaddrinfo.c, line 11. (gdb) b 17 Breakpoint 2 at 0x104c4: file test-getaddrinfo.c, line 17.
Run the program. When it stops at the first breakpoint, the first executable line of the main() function, show the process memory map to verify the stack addresses (look for the [stack] line in the command output):
(gdb) r Starting program: /home/pi/Projects/test-getaddrinfo/test-getaddrinfo Breakpoint 1, main () at test-getaddrinfo.c:11 11 memset(&hints, 0, sizeof(hints)); (gdb) info proc map process 10163 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x10000 0x11000 0x1000 0x0 /home/pi/Projects/test-getaddrinfo/test-getaddrinfo 0x20000 0x21000 0x1000 0x0 /home/pi/Projects/test-getaddrinfo/test-getaddrinfo 0x21000 0x22000 0x1000 0x1000 /home/pi/Projects/test-getaddrinfo/test-getaddrinfo 0x76e64000 0x76f8e000 0x12a000 0x0 /lib/arm-linux-gnueabihf/libc-2.24.so 0x76f8e000 0x76f9d000 0xf000 0x12a000 /lib/arm-linux-gnueabihf/libc-2.24.so 0x76f9d000 0x76f9f000 0x2000 0x129000 /lib/arm-linux-gnueabihf/libc-2.24.so 0x76f9f000 0x76fa0000 0x1000 0x12b000 /lib/arm-linux-gnueabihf/libc-2.24.so 0x76fa0000 0x76fa3000 0x3000 0x0 0x76fb8000 0x76fbd000 0x5000 0x0 /usr/lib/arm-linux-gnueabihf/libarmmem.so 0x76fbd000 0x76fcc000 0xf000 0x5000 /usr/lib/arm-linux-gnueabihf/libarmmem.so 0x76fcc000 0x76fcd000 0x1000 0x4000 /usr/lib/arm-linux-gnueabihf/libarmmem.so 0x76fcd000 0x76fce000 0x1000 0x5000 /usr/lib/arm-linux-gnueabihf/libarmmem.so 0x76fce000 0x76fef000 0x21000 0x0 /lib/arm-linux-gnueabihf/ld-2.24.so 0x76ff9000 0x76ffb000 0x2000 0x0 0x76ffb000 0x76ffc000 0x1000 0x0 [sigpage] 0x76ffc000 0x76ffd000 0x1000 0x0 [vvar] 0x76ffd000 0x76ffe000 0x1000 0x0 [vdso] 0x76ffe000 0x76fff000 0x1000 0x20000 /lib/arm-linux-gnueabihf/ld-2.24.so 0x76fff000 0x77000000 0x1000 0x21000 /lib/arm-linux-gnueabihf/ld-2.24.so 0x7efdf000 0x7f000000 0x21000 0x0 [stack] 0xffff0000 0xffff1000 0x1000 0x0 [vectors]
Those are the addresses I used in the stack functions. Scan the stack from its zero offset for its full length to find the first non-zero value (the stack functions accept either offset values from start of stack or absolute memory addresses):
(gdb) scan_stack 0 $stack_size Scanned 10000 Scanned 20000 Scanned 30000 Scanned 40000 Scanned 50000 Scanned 60000 Scanned 70000 Scanned 80000 Scanned 90000 Scanned 100000 Scanned 110000 Scanned 120000 Found data 4660 bytes deeper than current stack frame (0x7effeeb0). Address 2130697340 = 0x7effdc7c Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 126076 = 0x1ec7c = 123.1KB Stack depth 9092 = 0x02384 = 8.9KB 0x7effdc7c: 0x00000020 0x00002e41 0x61656100 0x01006962 0x7effdc8c: 0x00000024 0x06003605 0x09010806 0x12020a01 0x7effdc9c: 0x14011304 0x16011501 0x18031701 0x1c021a01 0x7effdcac: 0x00012201 0x00000000 0x7effe8f4 0x00000000
The scan found a non-zero 32-bit word at 4660 bytes deeper into the stack than the current stack frame. It prints out the stack size and addresses, the offset of the word, and the total depth that offset represents. Then it dumps the contents of that memory for 16 words. Note that one of the words contains a value that is itself a stack address (i.e. is in the range of the stack addresses 0x7efdf000-0x7f000000).
You might be wondering why there's stuff on the stack deeper than the current frame, when the program hasn't even begun the main() function yet. That's the pre-main code at work. Every system has some startup code that runs before your actual program code. It could be doing library initialization, data initialization (for instance, copying the contents of the .data segment from the executable image into the static memory space set aside for initialized global and static variables), setting up registers, etc. Again, this is very system-dependent.
So where is the current stack frame? Look at the backtrace, which is the list of stack frames for all the functions curently in the call tree up to the breakpoint, and look at the stack offsets represented by the current SP and FP:
(gdb) ba #0 main () at test-getaddrinfo.c:11 (gdb) stack_offset $sp Address 2130702000 = 0x7effeeb0 Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 130736 = 0x1feb0 = 127.7KB Stack depth 4432 = 0x01150 = 4.3KB (gdb) stack_offset $r11 Address 2130702044 = 0x7effeedc Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 130780 = 0x1fedc = 127.7KB Stack depth 4388 = 0x01124 = 4.3KB
The backtrace is really short, just main(), with no indication of who called it (though we know that something in the pre-main code got us this far). The SP and FP are a little different from each other, indicating the stack frame contains a small amount of data. But generally, the total stack depth is 4.3KB, consisting of the main() stack frame, and whatever the pre-main setup.
Look at the actual assembly code that the compiler generated for main(). Since it exists as binary machine code in memory, we need to disassemble it back to assembly language (gdb doesn't support decompiling assembly language back to the original C source code, but it does know which lines of source code represent which ranges of assembly instructions via the debug symbols included by the gcc -g option):
(gdb) disassemble Dump of assembler code for function main: 0x00010474 <+0>: push {r11, lr} 0x00010478 <+4>: add r11, sp, #4 0x0001047c <+8>: sub sp, sp, #40 ; 0x28 => 0x00010480 <+12>: sub r3, r11, #40 ; 0x28 0x00010484 <+16>: mov r2, #32 0x00010488 <+20>: mov r1, #0 0x0001048c <+24>: mov r0, r3 0x00010490 <+28>: bl 0x10328 <memset@plt> 0x00010494 <+32>: mov r3, #0 0x00010498 <+36>: str r3, [r11, #-36] ; 0xffffffdc 0x0001049c <+40>: mov r3, #1 0x000104a0 <+44>: str r3, [r11, #-32] ; 0xffffffe0 0x000104a4 <+48>: mov r3, #6 0x000104a8 <+52>: str r3, [r11, #-28] ; 0xffffffe4 0x000104ac <+56>: sub r3, r11, #44 ; 0x2c 0x000104b0 <+60>: sub r2, r11, #40 ; 0x28 0x000104b4 <+64>: ldr r1, [pc, #24] ; 0x104d4 <main+96> 0x000104b8 <+68>: ldr r0, [pc, #24] ; 0x104d8 <main+100> 0x000104bc <+72>: bl 0x10334 <getaddrinfo@plt> 0x000104c0 <+76>: str r0, [r11, #-8] 0x000104c4 <+80>: ldr r3, [r11, #-8] 0x000104c8 <+84>: mov r0, r3 0x000104cc <+88>: sub sp, r11, #4 0x000104d0 <+92>: pop {r11, pc} 0x000104d4 <+96>: andeq r0, r1, r12, asr #10 0x000104d8 <+100>: andeq r0, r1, r0, asr r5 End of assembler dump.
Every function consists of a prologue, a body, and an epilogue. The prologue and epilogue are all the automatic code that the compiler generates to enter and exit the function, according to the EABI. The body is the code the compiler generates to implement the logic of the function, according to the C language statements.
The specific details of these varies a bit based on the particilular function call, but in general:
- The prologue saves off registers that need to be preserved while they get reused by the function and sets up space for local variables.
- The epilogue sets up the function return value, restores saved registers, and deallocates local variables.
When source code is available to gdb, the disassemble command takes option /s to intermingle source and assembly. This makes it easier to see the distinct parts of the function, and is a useful way to learn how the compiler translates C source constructs to assembly; it gets even more interesting with optimized code.
(gdb) disassemble /s Dump of assembler code for function main: test-getaddrinfo.c: 7 { 0x00010474 <+0>: push {r11, lr} 0x00010478 <+4>: add r11, sp, #4 0x0001047c <+8>: sub sp, sp, #40 ; 0x28 8 struct addrinfo hints; 9 struct addrinfo* address_list; 10 11 memset(&hints, 0, sizeof(hints)); 0x00010480 <+12>: sub r3, r11, #40 ; 0x28 0x00010484 <+16>: mov r2, #32 0x00010488 <+20>: mov r1, #0 0x0001048c <+24>: mov r0, r3 0x00010490 <+28>: bl 0x10328 <memset@plt> 12 hints.ai_family = AF_UNSPEC; 0x00010494 <+32>: mov r3, #0 0x00010498 <+36>: str r3, [r11, #-36] ; 0xffffffdc 13 hints.ai_socktype = SOCK_STREAM; 0x0001049c <+40>: mov r3, #1 0x000104a0 <+44>: str r3, [r11, #-32] ; 0xffffffe0 14 hints.ai_protocol = IPPROTO_TCP; 0x000104a4 <+48>: mov r3, #6 0x000104a8 <+52>: str r3, [r11, #-28] ; 0xffffffe4 15 16 int result = getaddrinfo("test.example.com", "80", &hints, &address_list); 0x000104ac <+56>: sub r3, r11, #44 ; 0x2c 0x000104b0 <+60>: sub r2, r11, #40 ; 0x28 0x000104b4 <+64>: ldr r1, [pc, #24] ; 0x104d4 <main+96> 0x000104b8 <+68>: ldr r0, [pc, #24] ; 0x104d8 <main+100> 0x000104bc <+72>: bl 0x10334 <getaddrinfo@plt> => 0x000104c0 <+76>: str r0, [r11, #-8] 17 return result; 0x000104c4 <+80>: ldr r3, [r11, #-8] 18 } 0x000104c8 <+84>: mov r0, r3 0x000104cc <+88>: sub sp, r11, #4 0x000104d0 <+92>: pop {r11, pc} 0x000104d4 <+96>: andeq r0, r1, r12, asr #10 0x000104d8 <+100>: andeq r0, r1, r0, asr r5 End of assembler dump.
Here's the prologue:
0x00010474 <+0>: push {r11, lr} 0x00010478 <+4>: add r11, sp, #4 0x0001047c <+8>: sub sp, sp, #40 ; 0x28
This pushes the FP and the LR (Link Register), containing the return address of the caller, onto the stack, then adds 4 bytes to the SP, storing the result in the FP, to allocate the saved register space in the stack frame. Then it subtracts 40 bytes from the SP for the local variables.
Here's the epilogue:
0x000104c8 <+84>: mov r0, r3 0x000104cc <+88>: sub sp, r11, #4 0x000104d0 <+92>: pop {r11, pc} 0x000104d4 <+96>: andeq r0, r1, r12, asr #10 0x000104d8 <+100>: andeq r0, r1, r0, asr r5
This sets up the return value in R0 as specified by the EABI, subtracts 4 bytes from the FP and stores the results in the SP, then pop the saved FP and LR (I'm not sure at the moment what those andeq lines do, is that something special with returning from main()?).
But there's some subtlety to this. What about the 40 bytes that were subtracted from the SP in the prologue? And why is the LR popped back into the PC?
The FP actually contains the value of the SP before the 40 bytes were subtracted. That marked the boundary of the frame. So just subtracting 4 from it is sufficient to restore the SP to its previous value, pointing to the saved FP and LR.
By popping the saved LR directly into the PC, the processor automatically resumes executing at that saved address. It's equivalent to popping the saved value into the LR, then jumping to the address in the LR.
These are the kinds of things that the compiler does to generate efficient code. But note that this is unoptimized code. That is, it is code that directly implements the C statements. It's not always obvious that the assembly instructions are a direct implementation, because sometimes the implementation is doing interesting things to generate side effects in registers, that are then used in subsequent statements.
But the compiler has many more tricks up it's sleeve. You can have it optimize for speed or for space, which causes it to implement the C code in slightly different ways. There are always multiple ways of accomplishing things, with various tradeoffs. Optimization affects those tradeoffs to achieve a particular goal while still following the C source code logic.
By default, gcc generates unoptimized code, because that allows you to track execution directly in gdb. Optimized code can do weird things from a debugging standpoint, making debugging more difficult. It's still quite possible, it's just more complex.
Look in the body of the function, the code between the prologue and epilogue. Note the arrow pointing to the instruction at offset +12. That's where the current breakpoint is holding execution. Look at the bl instructions:
... 0x00010490 <+28>: bl 0x10328 <memset@plt> ... 0x000104bc <+72>: bl 0x10334 <getaddrinfo@plt
Those are the function calls that main() makes. These are branch-and-link instructions: branch to the named function and link the current PC value (i.e. save it in the LR) as the return address.
The function names match the C source code. But what's that @plt stuff? That's the Procedure Linkage Table, part of library loading. We'll look at PLT trampolines in a bit. You didn't realize this included gymnastics, did you?
But for now we know about the stack as it relates to main(). Continue until the next breakpoint, right before main() exits:
(gdb) c Continuing. Breakpoint 2, main () at test-getaddrinfo.c:17 17 return result;
That means the program has called the memset() and getaddrinfo() functions, doing whatever stack manipulation they needed, and returned back to this line, where main() is about to return its result, ending the program.
What does the stack look like now?
(gdb) stack_offset $sp Address 2130702000 = 0x7effeeb0 Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 130736 = 0x1feb0 = 127.7KB Stack depth 4432 = 0x01150 = 4.3KB (gdb) stack_offset $r11 Address 2130702044 = 0x7effeedc Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 130780 = 0x1fedc = 127.7KB Stack depth 4388 = 0x01124 = 4.3KB
SP and FP look like they did before. That confirms that whatever happened to the stack, it's returned to the state that main() expects; it's maintained the context of main(). What does a scan show?
(gdb) scan_stack 0 $stack_size Scanned 10000 Scanned 20000 Scanned 30000 Scanned 40000 Scanned 50000 Scanned 60000 Scanned 70000 Scanned 80000 Scanned 90000 Scanned 100000 Scanned 110000 Found data 11648 bytes deeper than current stack frame (0x7effeeb0). Address 2130690352 = 0x7effc130 Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 119088 = 0x1d130 = 116.3KB Stack depth 16080 = 0x03ed0 = 15.7KB 0x7effc130: 0x76ff94b0 0x7effc1a8 0x76e66c28 0x000004b0 0x7effc140: 0x7effc1ac 0x76fd8548 0x00000001 0x76e6c754 0x7effc150: 0x000004b0 0x76e70804 0x76ff94b0 0x7effc1ac 0x7effc160: 0x7effc1a8 0x00000000 0x76ffecf0 0x76e70804
Something went a lot deeper into the stack. There's data 11,648 bytes deeper in it, for a maximum depth of 15.7KB.
How can we find out who did that? The answer is to set a watchpoint. That's essentially a data breakpoint. The existing breakpoints are execution breakpoints, where gdb interrupts the program when it executes a particular address. For a data breakpoint, i.e. a watchpoint, gdb interrupts the program when it writes to a particular address; it watches to see when the address gets written.
Restart the program. When it breaks at main(), set a watchpoint on the stack address that the scan found, casting it as a pointer to an int (working with watchpoints can be a bit finicky, so make sure these steps are in exactly this order, with this exact syntax):
(gdb) r The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /home/pi/Projects/test-getaddrinfo/test-getaddrinfo Breakpoint 1, main () at test-getaddrinfo.c:11 11 memset(&hints, 0, sizeof(hints)); (gdb) watch *(int*)0x7effc130 Hardware watchpoint 3: *(int*)0x7effc130
Continue from there, and BAM! Caught the writer in the act:
(gdb) c Continuing. Hardware watchpoint 3: *(int*)0x7effc130 Old value = 0 New value = 1996461232 check_match (undef_name=undef_name@entry=0x76df8116 "strcasecmp", ref=0x76df775c, ref@entry=0x59c2869, version=0x22e80, version@entry=0x76fffabc, flags=1, flags@entry=2, type_class=type_class@entry=1, sym=0x76e6c754, sym@entry=0x770037f0, symidx=symidx@entry=1200, strtab=0x76e70804 "", strtab@entry=0x0, map=map@entry=0x76ff94b0, versioned_sym=versioned_sym@entry=0x7effc1ac, num_versions=num_versions@entry=0x7effc1a8) at dl-lookup.c:92 92 dl-lookup.c: No such file or directory.
What is the program doing at this point? The backtrace will show that. It's quite a bit longer than the previous backtrace, with a deep call stack:
(gdb) ba #0 check_match (undef_name=undef_name@entry=0x76df8116 "strcasecmp", ref=0x76df775c, ref@entry=0x59c2869, version=0x22e80, version@entry=0x76fffabc, flags=1, flags@entry=2, type_class=type_class@entry=1, sym=0x76e6c754, sym@entry=0x770037f0, symidx=symidx@entry=1200, strtab=0x76e70804 "", strtab@entry=0x0, map=map@entry=0x76ff94b0, versioned_sym=versioned_sym@entry=0x7effc1ac, num_versions=num_versions@entry=0x7effc1a8) at dl-lookup.c:92 #1 0x76fd8548 in do_lookup_x (undef_name=0xb3850d3a <error: Cannot access memory at address 0xb3850d3a>, undef_name@entry=0x76df8116 "strcasecmp", new_hash=1994852356, new_hash@entry=3011841338, old_hash=0x76fec84c, old_hash@entry=0x7effc218, ref=0x59c2869, result=<optimized out>, result@entry=0x7effc220, scope=0x76fffabc, i=<optimized out>, version=<optimized out>, version@entry=0x22e80, flags=flags@entry=1, skip=skip@entry=0x0, type_class=type_class@entry=1, undef_map=undef_map@entry=0x22ac0) at dl-lookup.c:423 #2 0x76fd8b20 in _dl_lookup_symbol_x (undef_name=0x76df8116 "strcasecmp", undef_map=0x22ac0, ref=0x7effc28c, ref@entry=0x7effc284, symbol_scope=0x22c78, version=0x22e80, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:833 #3 0x76fde10c in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at dl-runtime.c:111 #4 0x76fe5320 in _dl_runtime_resolve () at ../sysdeps/arm/dl-trampoline.S:57 #5 0x76e05eec in __GI_ns_samename (a=a@entry=0x7effcaf8 "test.example.com", b=0x7effcf38 "test.example.com", b@entry=0x402 <error: Cannot access memory at address 0x402>) at ns_samedomain.c:196 #6 0x76dff850 in __GI___res_nameinquery (name=0x402 <error: Cannot access memory at address 0x402>, name@entry=0x0, type=1, class=1, buf=buf@entry=0x7effe088 "~\027\201\203", eom=eom@entry=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:287 #7 0x76dff984 in __GI___res_queriesmatch (buf1=0x7effd4e0 "~\027\001", buf1@entry=0x7effd40c "n", eom1=0x7effd502 "", eom1@entry=0x7effd40c "n", buf2=0x7effe088 "~\027\201\203", eom2=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:342 #8 0x76e00578 in send_dg (ansp2_malloced=<optimized out>, resplen2=<optimized out>, anssizp2=<optimized out>, ansp2=<optimized out>, anscp=<optimized out>, gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, ns=0, terrno=0x7effe088, anssizp=0x7effd4c0, ansp=0x7effd3fc, buflen2=<optimized out>, buf2=<optimized out>, buflen=<optimized out>, buf=<optimized out>, statp=0x7effd420) at res_send.c:1422 #9 __libc_res_nsend (statp=statp@entry=0x76fa1b50 <_res>, buf=0x7effd40c "n", buf@entry=0x7effd4e0 "~\027\001", buflen=0, buflen@entry=34, buf2=0x0, buf2@entry=0x7effd504 "s\372\001", buflen2=buflen2@entry=34, ans=<optimized out>, ans@entry=0x7effe088 "~\027\201\203", anssiz=<optimized out>, anssiz@entry=2048, ansp=ansp@entry=0x7effe894, ansp2=ansp2@entry=0x7effe898, nansp2=nansp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, ansp2_malloced=ansp2_malloced@entry=0x7effe8a4) at res_send.c:533 #10 0x76dfdd70 in __GI___libc_res_nquery (statp=statp@entry=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=class@entry=1, type=439963904, type@entry=0, answer=0x7effe088 "~\027\201\203", answer@entry=0x0, anslen=2048, anslen@entry=439963904, answerp=0x7effe894, answerp@entry=0x76ffece8 <__stack_chk_guard>, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:222 #11 0x76dfe37c in __libc_res_nquerydomain (statp=statp@entry=0x76fa1b50 <_res>, name=0x7effe088 "~\027\201\203", name@entry=0x10550 "test.example.com", domain=domain@entry=0x0, class=1, class@entry=0, type=439963904, type@entry=2130700444, answer=0x7effe088 "~\027\201\203", answer@entry=0x9d <error: Cannot access memory at address 0x9d>, anslen=2048, anslen@entry=1994515264, answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=0x7effe89c, nanswerp2@entry=0x7effe8a4, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:592 #12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700444, type@entry=439963904, answer=0x7effe088 "~\027\201\203", anslen=anslen@entry=2048, answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:376 #13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe998, buffer=0x7effea88 "\177", buflen=1024, errnop=errnop@entry=0x7effe99c, herrnop=herrnop@entry=0x7effe9ac, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326 #14 0x76f1dee0 in gaih_inet (name=<optimized out>, name@entry=0x10550 "test.example.com", service=<optimized out>, req=0x7effeeb4, pai=pai@entry=0x7effea40, naddrs=<optimized out>, naddrs@entry=0x7effea4c, tmpbuf=<optimized out>, tmpbuf@entry=0x7effea80) at ../sysdeps/posix/getaddrinfo.c:848 #15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeeb0) at ../sysdeps/posix/getaddrinfo.c:2391 #16 0x000104c0 in main () at test-getaddrinfo.c:16
Looking at the stack offset that the current SP represents, we can see this is indeed the deep point:
(gdb) stack_offset $sp Address 2130690352 = 0x7effc130 Stack size 135168 = 0x21000 = 132.0KB, 0x7efdf000-0x7f000000 Stack offset 119088 = 0x1d130 = 116.3KB Stack depth 16080 = 0x03ed0 = 15.7KB
Walk the stack, unwinding the stack frame and computing the size of each one (the set height command disables pagination, so gdb doesn't prompt to continue partway through the output):
(gdb) set height 0 (gdb) stack_walk #0 check_match (undef_name=undef_name@entry=0x76df8116 "strcasecmp", ref=0x76df775c, ref@entry=0x59c2869, version=0x22e80, version@entry=0x76fffabc, flags=1, flags@entry=2, type_class=type_class@entry=1, sym=0x76e6c754, sym@entry=0x770037f0, symidx=symidx@entry=1200, strtab=0x76e70804 "", strtab@entry=0x0, map=map@entry=0x76ff94b0, versioned_sym=versioned_sym@entry=0x7effc1ac, num_versions=num_versions@entry=0x7effc1a8) at dl-lookup.c:92 92 in dl-lookup.c Top stack frame 0x7effc130 #1 0x76fd8548 in do_lookup_x (undef_name=0xb3850d3a <error: Cannot access memory at address 0xb3850d3a>, undef_name@entry=0x76df8116 "strcasecmp", new_hash=1994852356, new_hash@entry=3011841338, old_hash=0x76fec84c, old_hash@entry=0x7effc218, ref=0x59c2869, result=<optimized out>, result@entry=0x7effc220, scope=0x76fffabc, i=<optimized out>, version=<optimized out>, version@entry=0x22e80, flags=flags@entry=1, skip=skip@entry=0x0, type_class=type_class@entry=1, undef_map=undef_map@entry=0x22ac0) at dl-lookup.c:423 423 in dl-lookup.c Last stack frame 0x7effc130, current 0x7effc148, size of last 24 = 0x018, total deeper 24 = 0x00018 = 0.0KB #2 0x76fd8b20 in _dl_lookup_symbol_x (undef_name=0x76df8116 "strcasecmp", undef_map=0x22ac0, ref=0x7effc28c, ref@entry=0x7effc284, symbol_scope=0x22c78, version=0x22e80, type_class=type_class@entry=1, flags=1, skip_map=skip_map@entry=0x0) at dl-lookup.c:833 833 in dl-lookup.c Last stack frame 0x7effc148, current 0x7effc1d8, size of last 144 = 0x090, total deeper 168 = 0x000a8 = 0.2KB #3 0x76fde10c in _dl_fixup (l=<optimized out>, reloc_arg=<optimized out>) at dl-runtime.c:111 111 dl-runtime.c: No such file or directory. Last stack frame 0x7effc1d8, current 0x7effc278, size of last 160 = 0x0a0, total deeper 328 = 0x00148 = 0.3KB #4 0x76fe5320 in _dl_runtime_resolve () at ../sysdeps/arm/dl-trampoline.S:57 57 ../sysdeps/arm/dl-trampoline.S: No such file or directory. Last stack frame 0x7effc278, current 0x7effc2a8, size of last 48 = 0x030, total deeper 376 = 0x00178 = 0.4KB #5 0x76e05eec in __GI_ns_samename (a=a@entry=0x7effcaf8 "test.example.com", b=0x7effcf38 "test.example.com", b@entry=0x402 <error: Cannot access memory at address 0x402>) at ns_samedomain.c:196 196 ns_samedomain.c: No such file or directory. Last stack frame 0x7effc2a8, current 0x7effc2c0, size of last 24 = 0x018, total deeper 400 = 0x00190 = 0.4KB #6 0x76dff850 in __GI___res_nameinquery (name=0x402 <error: Cannot access memory at address 0x402>, name@entry=0x0, type=1, class=1, buf=buf@entry=0x7effe088 "~\027\201\203", eom=eom@entry=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:287 287 res_send.c: No such file or directory. Last stack frame 0x7effc2c0, current 0x7effcae8, size of last 2088 = 0x828, total deeper 2488 = 0x009b8 = 2.4KB #7 0x76dff984 in __GI___res_queriesmatch (buf1=0x7effd4e0 "~\027\001", buf1@entry=0x7effd40c "n", eom1=0x7effd502 "", eom1@entry=0x7effd40c "n", buf2=0x7effe088 "~\027\201\203", eom2=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:342 342 in res_send.c Last stack frame 0x7effcae8, current 0x7effcf28, size of last 1088 = 0x440, total deeper 3576 = 0x00df8 = 3.5KB #8 0x76e00578 in send_dg (ansp2_malloced=<optimized out>, resplen2=<optimized out>, anssizp2=<optimized out>, ansp2=<optimized out>, anscp=<optimized out>, gotsomewhere=<synthetic pointer>, v_circuit=<synthetic pointer>, ns=0, terrno=0x7effe088, anssizp=0x7effd4c0, ansp=0x7effd3fc, buflen2=<optimized out>, buf2=<optimized out>, buflen=<optimized out>, buf=<optimized out>, statp=0x7effd420) at res_send.c:1422 1422 in res_send.c Last stack frame 0x7effcf28, current 0x7effd368, size of last 1088 = 0x440, total deeper 4664 = 0x01238 = 4.6KB #9 __libc_res_nsend (statp=statp@entry=0x76fa1b50 <_res>, buf=0x7effd40c "n", buf@entry=0x7effd4e0 "~\027\001", buflen=0, buflen@entry=34, buf2=0x0, buf2@entry=0x7effd504 "s\372\001", buflen2=buflen2@entry=34, ans=<optimized out>, ans@entry=0x7effe088 "~\027\201\203", anssiz=<optimized out>, anssiz@entry=2048, ansp=ansp@entry=0x7effe894, ansp2=ansp2@entry=0x7effe898, nansp2=nansp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, ansp2_malloced=ansp2_malloced@entry=0x7effe8a4) at res_send.c:533 533 in res_send.c Last stack frame 0x7effd368, current 0x7effd368, size of last 0 = 0x000, total deeper 4664 = 0x01238 = 4.6KB #10 0x76dfdd70 in __GI___libc_res_nquery (statp=statp@entry=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=class@entry=1, type=439963904, type@entry=0, answer=0x7effe088 "~\027\201\203", answer@entry=0x0, anslen=2048, anslen@entry=439963904, answerp=0x7effe894, answerp@entry=0x76ffece8 <__stack_chk_guard>, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:222 222 res_query.c: No such file or directory. Last stack frame 0x7effd368, current 0x7effd4c0, size of last 344 = 0x158, total deeper 5008 = 0x01390 = 4.9KB #11 0x76dfe37c in __libc_res_nquerydomain (statp=statp@entry=0x76fa1b50 <_res>, name=0x7effe088 "~\027\201\203", name@entry=0x10550 "test.example.com", domain=domain@entry=0x0, class=1, class@entry=0, type=439963904, type@entry=2130700444, answer=0x7effe088 "~\027\201\203", answer@entry=0x9d <error: Cannot access memory at address 0x9d>, anslen=2048, anslen@entry=1994515264, answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=0x7effe89c, nanswerp2@entry=0x7effe8a4, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:592 592 in res_query.c Last stack frame 0x7effd4c0, current 0x7effd780, size of last 704 = 0x2c0, total deeper 5712 = 0x01650 = 5.6KB #12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700444, type@entry=439963904, answer=0x7effe088 "~\027\201\203", anslen=anslen@entry=2048, answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:376 376 in res_query.c Last stack frame 0x7effd780, current 0x7effdbe0, size of last 1120 = 0x460, total deeper 6832 = 0x01ab0 = 6.7KB #13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe998, buffer=0x7effea88 "\177", buflen=1024, errnop=errnop@entry=0x7effe99c, herrnop=herrnop@entry=0x7effe9ac, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326 326 nss_dns/dns-host.c: No such file or directory. Last stack frame 0x7effdbe0, current 0x7effe068, size of last 1160 = 0x488, total deeper 7992 = 0x01f38 = 7.8KB #14 0x76f1dee0 in gaih_inet (name=<optimized out>, name@entry=0x10550 "test.example.com", service=<optimized out>, req=0x7effeeb4, pai=pai@entry=0x7effea40, naddrs=<optimized out>, naddrs@entry=0x7effea4c, tmpbuf=<optimized out>, tmpbuf@entry=0x7effea80) at ../sysdeps/posix/getaddrinfo.c:848 848 ../sysdeps/posix/getaddrinfo.c: No such file or directory. Last stack frame 0x7effe068, current 0x7effe8e0, size of last 2168 = 0x878, total deeper 10160 = 0x027b0 = 9.9KB #15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeeb0) at ../sysdeps/posix/getaddrinfo.c:2391 2391 in ../sysdeps/posix/getaddrinfo.c Last stack frame 0x7effe8e0, current 0x7effe9e8, size of last 264 = 0x108, total deeper 10424 = 0x028b8 = 10.2KB #16 0x000104c0 in main () at test-getaddrinfo.c:16 16 int result = getaddrinfo("test.example.com", "80", &hints, &address_list); Last stack frame 0x7effe9e8, current 0x7effeeb0, size of last 1224 = 0x4c8, total deeper 11648 = 0x02d80 = 11.4KB Initial frame selected; you cannot go up.
For each stack frame, this reports the size of the previous frame, and the total deeper stack (i.e. the cumulative space deeper in the stack). What we're looking for is large frames.
Scrolling through this, there are several where "size of last" is over 1000 bytes. Generally speaking, anything over a hundred is pretty big. That's especially true on embedded systems, where the entire stack may just be a kilobyte or two, whether for bare-metal or per RTOS thread.
Then looking at the preceeding frame in each case, we see the following suspects (I've edited the stack walk output to line up the "size of last" lines with their corresponding frames, and eliminated everything under 1000 bytes):
#5 0x76e05eec in __GI_ns_samename (a=a@entry=0x7effcaf8 "test.example.com", b=0x7effcf38 "test.example.com", b@entry=0x402 <error: Cannot access memory at address 0x402>) at ns_samedomain.c:196 5 Last stack frame 0x7effc2c0, current 0x7effcae8, size of last 2088 = 0x828, total deeper 2488 = 0x009b8 = 2.4KB #6 0x76dff850 in __GI___res_nameinquery (name=0x402 <error: Cannot access memory at address 0x402>, name@entry=0x0, type=1, class=1, buf=buf@entry=0x7effe088 "~\027\201\203", eom=eom@entry=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:287 6 Last stack frame 0x7effcae8, current 0x7effcf28, size of last 1088 = 0x440, total deeper 3576 = 0x00df8 = 3.5KB #7 0x76dff984 in __GI___res_queriesmatch (buf1=0x7effd4e0 "~\027\001", buf1@entry=0x7effd40c "n", eom1=0x7effd502 "", eom1@entry=0x7effd40c "n", buf2=0x7effe088 "~\027\201\203", eom2=0x7effe888 "_nss_dns_get\210\340\377~") at res_send.c:342 7 Last stack frame 0x7effcf28, current 0x7effd368, size of last 1088 = 0x440, total deeper 4664 = 0x01238 = 4.6KB #11 0x76dfe37c in __libc_res_nquerydomain (statp=statp@entry=0x76fa1b50 <_res>, name=0x7effe088 "~\027\201\203", name@entry=0x10550 "test.example.com", domain=domain@entry=0x0, class=1, class@entry=0, type=439963904, type@entry=2130700444, answer=0x7effe088 "~\027\201\203", answer@entry=0x9d <error: Cannot access memory at address 0x9d>, anslen=2048, anslen@entry=1994515264, answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=0x7effe89c, nanswerp2@entry=0x7effe8a4, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:592 11 Last stack frame 0x7effd780, current 0x7effdbe0, size of last 1120 = 0x460, total deeper 6832 = 0x01ab0 = 6.7KB #12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700444, type@entry=439963904, answer=0x7effe088 "~\027\201\203", anslen=anslen@entry=2048, answerp=answerp@entry=0x7effe894, answerp2=answerp2@entry=0x7effe898, nanswerp2=nanswerp2@entry=0x7effe89c, resplen2=resplen2@entry=0x7effe8a0, answerp2_malloced=answerp2_malloced@entry=0x7effe8a4) at res_query.c:376 12 Last stack frame 0x7effdbe0, current 0x7effe068, size of last 1160 = 0x488, total deeper 7992 = 0x01f38 = 7.8KB #13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe998, buffer=0x7effea88 "\177", buflen=1024, errnop=errnop@entry=0x7effe99c, herrnop=herrnop@entry=0x7effe9ac, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326 13 Last stack frame 0x7effe068, current 0x7effe8e0, size of last 2168 = 0x878, total deeper 10160 = 0x027b0 = 9.9KB #15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeeb0) at ../sysdeps/posix/getaddrinfo.c:2391 15 Last stack frame 0x7effe9e8, current 0x7effeeb0, size of last 1224 = 0x4c8, total deeper 11648 = 0x02d80 = 11.4KB
This is all a result of calling getaddrinfo(). What's it doing that takes so much space?
Switch to frame 15 and disassemble the current function, which is __GI_getaddrinfo(). Notice that gdb says there's no such file, since this is a prebuilt library function, so leave off the /s option. The function is long, so I've just shown the prologue:
(gdb) f 15 #15 0x76f1f010 in __GI_getaddrinfo (name=<optimized out>, service=<optimized out>, hints=<optimized out>, pai=0x7effeec0) at ../sysdeps/posix/getaddrinfo.c:2391 2391 ../sysdeps/posix/getaddrinfo.c: No such file or directory. (gdb) disassemble Dump of assembler code for function __GI_getaddrinfo: 0x76f1eef0 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0x76f1eef4 <+4>: add r11, sp, #32 0x76f1eef8 <+8>: ldr r6, [pc, #2712] ; 0x76f1f998 <__GI_getaddrinfo+2728> 0x76f1eefc <+12>: sub sp, sp, #1184 ; 0x4a0
The prologue saves off a number of registers that the function will be using (7 plus the usual FP and LR). The last line extends the frame by 1184 bytes, so there's the bulk of that 1224 bytes we saw listed in the stack walk. Adding the 32 bytes that the register save needs, we get 1216. That's close enough for now.
Why does this function need such a large stack frame? We don't have the source...but we have the source file name and partial path. A quick Internet search for "sysdeps/posix/getaddrinfo.c" turns up a website that lists a version of this file: Woboq getaddrinfo source code. Awesome!
Gdb says we're currently at line 2391 of the file, where it calls gaih_inet (frame 14). Searching the source listing webpage, getaddrinfo() calls gaih_inet() at a different line, 2265:
2263 struct scratch_buffer tmpbuf; 2264 scratch_buffer_init (&tmpbuf); 2265 last_i = gaih_inet (name, pservice, hints, end, &naddrs, &tmpbuf);
That means this listing isn't for the exact same version of the library we're running. Again, this is close enough for now.
One of the arguments to gaih_inet() is tmpbuf, a local variable defined in line 2263. Any time you see a large allocation, things called "buffers" are good candidates for investigation. Clicking on scratch_buffer takes us to its structure declaration:
64 /* Scratch buffer. Must be initialized with scratch_buffer_init 65 before its use. */ 66 struct scratch_buffer { 67 void *data; /* Pointer to the beginning of the scratch area. */ 68 size_t length; /* Allocated space at the data pointer, in bytes. */ 69 union { max_align_t __align; char __c[1024]; } __space; 70 };
BAM again! It contains a character array of 1024 bytes.
Now that we have navigable source code, we can follow this procedure on down the stack. Frame 13, _nss_dns_gethostbyname4_r(), is the next large one reported by the stack walk, with 2168 bytes. Examine its prologue:
(gdb) f 13 #13 0x76e1e340 in _nss_dns_gethostbyname4_r (name=name@entry=0x10550 "test.example.com", pat=pat@entry=0x7effe9a8, buffer=0x7effea98 "\177", buflen=1024, errnop=errnop@entry=0x7effe9ac, herrnop=herrnop@entry=0x7effe9bc, ttlp=ttlp@entry=0x0) at nss_dns/dns-host.c:326 326 nss_dns/dns-host.c: No such file or directory. (gdb) disassemble Dump of assembler code for function _nss_dns_gethostbyname4_r: 0x76e1e268 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0x76e1e26c <+4>: add r11, sp, #32 0x76e1e270 <+8>: ldr r4, [pc, #812] ; 0x76e1e5a4 <_nss_dns_gethostbyname4_r+828> 0x76e1e274 <+12>: sub sp, sp, #76 ; 0x4c
Again, 9 registers get pushed, but then the frame is only extended by 76 bytes. So something different is going on.
Search for the function name in the Woboq search box. Even though the line numbers don't match exactly due to version skew, they're close, and the right file names are showing up, matching what gdb reports, helping to confirm that we're looking at the right code.
Examining that code, the function doesn't have one of those big scratch_buffer structures, but another suspicous looking line is this one, allocating 2048 bytes, close to the reported frame size when you add the 32 bytes for register saves and the 76 bytes of frame extension:
364 host_buffer.buf = orig_host_buffer = (querybuf *) alloca (2048);
That looks similar to malloc(), right? But malloc() does heap allocation, not stack. However, it turns out alloca() is a stack allocator, as described here.
A little further down the disassembly, we see this manipulation of the SP, so that must be the implementation of alloca():
0x76e1e2bc <+84>: sub r3, sp, #2048 ; 0x800 0x76e1e2c0 <+88>: ldr r1, [pc, #736] ; 0x76e1e5a8 <_nss_dns_gethostbyname4_r+832> 0x76e1e2c4 <+92>: ldr r4, [pc, #736] ; 0x76e1e5ac <_nss_dns_gethostbyname4_r+836> 0x76e1e2c8 <+96>: sub sp, r3, #8
For frame 12, __GI___libc_res_nsearch():
(gdb) f 12 #12 0x76dfe764 in __GI___libc_res_nsearch (statp=0x76fa1b50 <_res>, name=0x10550 "test.example.com", class=0, class@entry=1, type=2130700460, type@entry=439963904, answer=0x7effe098 "<\202\201\203", anslen=anslen@entry=2048, answerp=answerp@entry=0x7effe8a4, answerp2=answerp2@entry=0x7effe8a8, nanswerp2=nanswerp2@entry=0x7effe8ac, resplen2=resplen2@entry=0x7effe8b0, answerp2_malloced=answerp2_malloced@entry=0x7effe8b4) at res_query.c:376 376 res_query.c: No such file or directory. (gdb) disassemble Dump of assembler code for function __GI___libc_res_nsearch: 0x76dfe640 <+0>: push {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0x76dfe644 <+4>: sub sp, sp, #1120 ; 0x460
That looks like a familiar stack frame expansion of 1120 bytes. But searching Woboq for "__GI___libc_res_nsearch" doesn't find a match. Try eliminating some of that stuff that looks like library prefix from the name and search for just "res_nsearch". That gets us to several results, one of which is res_query.c, the file gdb listed as containing the function. Frame 11 is __libc_res_nquerydomain() in the same source file. While it's not quite clear looking at the source file what's going on with frame 12, we can see there's a function res_nquerydomain() in it.
Why the naming confusion? I think it has something to do with glibc naming conventions and library symbol formation.
Looking around at frame 11 and 10, we seem to be in the right place, but the function that frame 10 calls, __libc_res_nsend(), doesn't show up as a call in the source file, even when stripping the name down. So we seem to be getting off track. Perhaps it's the library version difference catching up with us.
But looking around at some of the other functions in the file, we can see that in this version of the file, res_nquerydomain() calls context_querydomain_common(), which calls __res_context_querydomain(), which has this local variable, another "buffer":
568 char nbuf[MAXDNAME];
How big is MAXDNAME? Clicking on it shows this:
79 #define MAXDNAME NS_MAXDNAME
Searching for NS_MAXDNAME reveals this:
59 #define NS_MAXDNAME 1025 /*%< maximum domain name */
That's another BAM! These two symbols show up as local buffer sizes in several functions. So that's probably what's going on in our version of the library, another large local buffer, sized for a large worst-case maximum domain name string.
Move on to frame 7, __GI___res_queriesmatch() in file res_send.c. Entering that file name in the Woboq search box gets us to it. Searching for the trailing part of the function name, we find res_queriesmatch(). Scrolling through it, this jumps right out:
377 char tname[MAXDNAME+1];
The exact same story for frame 6, __GI___res_nameinquery(). So these giant buffers are getting allocated repeatedly all down the call stack. Frame 5, __GI_ns_samename() in ns_samedomain.c, has this line:
191 char ta[NS_MAXDNAME], tb[NS_MAXDNAME];
GAAAAH, two of them! Ironically, the string we're using is just "test.example.com". The buffers need to be sized for the maximum possible name, but assuming that at every level is pretty wasteful.
We have our answer about how an extra 11.4KB of stack gets sucked up translating a domain name to an IP address. For a general-purpose OS like Linux, that's not really a big deal. But that would never fly on a small embedded system.
That's why you see custom libraries for embedded systems, streamlined TCP/IP stacks and such. Among other things, they would probably constrain domain names to much shorter strings. This illustrates one of the differences between general-purpose coding, such as for desktops and servers, and embedded systems.
Now we have another tangent. The source for ns_samename() shows it calling strcasecmp(). But the gdb backtrace shows it calling _dl_runtime_resolve() in file sysdeps/arm/dl-trampoline.S. Interesting, an assembly language file with a strange name!
Woboq search reveals a number of architecture-specific versions of dl-trampoline.S. We want the ARM version. It starts with this line:
1 /* PLT trampolines. ARM version.
What the heck is a PLT trampoline? It's the code for the procedure linkage table that triggers dynamic library loading and then jumps to the function in the library, as described in PLT and GOT - the key to code sharing and dynamic libraries.
Why the term "trampoline"? I guess because the first caller to the function runs into the stub, which causes library loading and fixup operations, then bounces into the actual function. Subsequent callers will just jump to the function, skipping the trampoline.
Sounds like something else fun to learn about!
- Comments
- Write a Comment Select to add a comment
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: