; /home/wcohen/present/202207youarehere/example.c:14
4010ae: e8 8d ff ff ff callq 0x401040
; /home/wcohen/present/202207youarehere/example.c:16
4010b3: 31 c0 xorl %eax, %eax
4010b5: 5b popq %rbx
4010b6: c3
The first instruction at 0x401060 maps to the original source code file example.c line 9, the opening { for the main function.
The next instruction 0x401061 maps to line 364 of stdlib.h line 364, the inlined atoi function. This is setting up one of the arguments to the later strtol call.
The instruction 0x401065 is also associated with the opening { of the main function.
Instructions 0x401068 and 0x40106d set the remaining arguments for the strtol call that takes place at 0x40106f. In this case, you can see that the compiler has reordered the instructions and causes some bouncing between line 9 of example.c and line 364, or the stdlib.h include file, as you step through the instructions on the debugger.
You can also see some mixing of instructions for lines 12, 13, and 14 from example.c in the output of llvm-objdump above. The compiler has moved the divide instructions (0x40190) for line 13 before some of the instructions for line 12 to hide the latency of the divide. As you step through the instructions in the debugger for this code, you see the debugger jump back and forth between lines rather than doing all the instructions from one line before moving on to the next line. Also notice as you step though that line 13 with the divide operation was not shown, but the divide definitely occurred to produce the output. You can see GDB bouncing between lines when stepping through the program's main function:
(gdb) run 1 2
Starting program: /home/wcohen/present/202207youarehere/example 1 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Breakpoint 1, main (argc=3, argv=0x7fffffffdbe8) at /usr/include/stdlib.h:364
364 return (int) strtol (__nptr, (char **) NULL, 10);
(gdb) print $pc
$10 = (void (*)()) 0x401060
(gdb) next
10 a = atoi(argv[1]);
(gdb) print $pc
$11 = (void (*)()) 0x401061
(gdb) next
11 b = atof(argv[2]);
(gdb) print $pc
$12 = (void (*)()) 0x401074
(gdb) next
10 a = atoi(argv[1]);
(gdb) print $pc
$13 = (void (*)()) 0x40107a
(gdb) next
11 b = atof(argv[2]);
(gdb) print $pc
$14 = (void (*)()) 0x401080
(gdb) next
12 a = a + 1;
(gdb) print $pc
$15 = (void (*)()) 0x401085
(gdb) next
14 printf ("a = %d, b = %f\n", a, b);
(gdb) print $pc
$16 = (void (*)()) 0x4010ae
(gdb) next
a = 2, b = 0.047619
15 return 0;
(gdb) print $pc
$17 = (void (*)()) 0x4010b3 With this simple example, you can see that the order of instructions does not match the original source code. When the program is running normally, you would never observe those changes. However, they are quite visible when using a debugger to step through the code. The boundaries between lines of code become blurred. This has other implications. When you decide to set a breakpoint to a line following a line with variable update, the compiler scheduler may have moved the variable after the location you expect the variable to be updated, and you don’t get the expected value for the variable at the breakpoint.
Which of the instructions for a line get the breakpoint?
With the previous example.c, the compiler generated multiple instructions to implement individual lines of code. How does the debugger know which of those instructions should be the one that it places the breakpoint on? There’s an additional statement flag in the line information that marks the recommended locations to place the breakpoints. You can see those instructions marked with S in the column below SBPE in eu-readelf --debug-dump=decodedline example:
DWARF section [31] '.debug_line' at offset 0x50fd:
CU [c] example.c
line:col SBPE* disc isa op address (Statement Block Prologue Epilogue *End)
/home/wcohen/present/202207youarehere/example.c (mtime: 0, length: 0)
9:1 S 0 0 0 0x0000000000401060
10:2 S 0 0 0 0x0000000000401060
/usr/include/stdlib.h (mtime: 0, length: 0)
362:1 S 0 0 0 0x0000000000401060
364:3 S 0 0 0 0x0000000000401060
/home/wcohen/present/202207youarehere/example.c (mtime: 0, length: 0)
9:1 0 0 0 0x0000000000401060
/usr/include/stdlib.h (mtime: 0, length: 0)
364:16 0 0 0 0x0000000000401061
364:16 0 0 0 0x0000000000401065
/home/wcohen/present/202207youarehere/example.c (mtime: 0, length: 0)
9:1 0 0 0 0x0000000000401065
/usr/include/stdlib.h (mtime: 0, length: 0)
364:16 0 0 0 0x0000000000401068
364:16 0 0 0 0x000000000040106f
364:16 0 0 0 0x0000000000401074
/usr/include/bits/stdlib-float.h (mtime: 0, length: 0)
27:10 0 0 0 0x0000000000401074
/usr/include/stdlib.h (mtime: 0, length: 0)
364:10 0 0 0 0x000000000040107a
/home/wcohen/present/202207youarehere/example.c (mtime: 0, length: 0)
11:2 S 0 0 0 0x0000000000401080
/usr/include/bits/stdlib-float.h (mtime: 0, length: 0)
25:1 S 0 0 0 0x0000000000401080
27:3 S 0 0 0 0x0000000000401080
27:10 0 0 0 0x0000000000401080
27:10 0 0 0 0x0000000000401085
/home/wcohen/present/202207youarehere/example.c (mtime: 0, length: 0)
12:2 S 0 0 0 0x0000000000401085
12:8 0 0 0 0x0000000000401085
14:2 0 0 0 0x000000000040108b
13:8 0 0 0 0x0000000000401090
13:4 0 0 0 0x0000000000401098
12:8 0 0 0 0x00000000004010a0
14:2 0 0 0 0x00000000004010a3
12:4 0 0 0 0x00000000004010a8
13:2 S 0 0 0 0x00000000004010ae
14:2 S 0 0 0 0x00000000004010ae
15:2 S 0 0 0 0x00000000004010b3
16:1 0 0 0 0x00000000004010b3
16:1 0 0 0 0x00000000004010b6
16:1 * 0 0 0 0x00000000004010b6
- Groups of instructions are delimited by the path to the source file for those instructions.
- The left column contains the line number and column that the instruction maps back to, followed by the flags.
- The hexadecimal number is the address of the instruction, followed by the offset into the function of the instruction.
If you look carefully at the output, you see that some instructions map back to multiple lines in the code. For example, 0x0000000000401060 maps to both line 9 and 10 of example.c. The same instruction also maps to lines 362 and 364 of /usr/include/stdlib.h. The mappings are not one-to-one. One line of source code may map to multiple instructions, and one instruction may map to multiple lines of code. When the debugger decides to print out a single line mapping for an instruction, it might not be the one that you expect.
Merging and eliminating of lines
As you saw in the output of the detailed line mapping information, mappings are not one-to-one. There are cases where the compiler can eliminate instructions because they have no effect on the final result of the program. The compiler may also merge instructions from separate lines through optimizations, such as common subexpression elimination (CSE), and omit that the instruction could have come from more than one place in the code.
The following example was compiled on an x86_64 Fedora 36 machine, using GCC-12.2.1. Depending on the particular environment, you may not get the same results, because different versions of compilers may optimize the code differently.
Note the if-else statement in the code. Both have statements doing the same expensive divides. The compiler factors out the divide operation.
#include
#include
int
main(int argc, char* argv[])
{
int a,b,c;
a = atoi(argv[1]);
b = atoi(argv[2]);
if (b) {
c = 100/a;
} else {
c = 100/a;
}
printf ("a = %d, b = %d, c = %d\n", a, b, c);
return 0;
}Looking at objdump -dl whichline, you see one divide operation in the binary:
/home/wcohen/present/202207youarehere/whichline.c:13
401085: b8 64 00 00 00 mov $0x64,%eax
40108a: f7 fb idiv %ebxLine 13 is one of the lines with a divide, but you might suspect that there are other line numbers associated with those addresses. Look at the output of eu-readelf --debug-dump=decodedline whichline to see whether there are other line numbers associated with those addresses.
Line 11, where the other divide occurs, is not in this list:
/usr/include/stdlib.h (mtime: 0, length: 0)
364:16 0 0 0 0x0000000000401082
364:16 0 0 0 0x0000000000401085
/home/wcohen/present/202207youarehere/whichline.c (mtime: 0, length: 0)
10:2 S 0 0 0 0x0000000000401085
13:3 S 0 0 0 0x0000000000401085
15:2 S 0 0 0 0x0000000000401085
13:5 0 0 0 0x0000000000401085 If the results are unused, the compiler may completely eliminate generating code for some lines.
Consider the following example, where the else clause computes c = 100 * a, but does not use it:
#include
#include
int
main(int argc, char* argv[])
{
int a,b,c;
a = atoi(argv[1]);
b = atoi(argv[2]);
if (b) {
c = 100/a;
printf ("a = %d, b = %d, c = %d\n", a, b, c);
} else {
c = 100 * a;
printf ("a = %d, b = %d\n", a, b);
}
return 0;
}
Programming and development