gprof
's -l option causes the program to perform
line-by-line profiling. In this mode, histogram
samples are assigned not to functions, but to individual
lines of source code. This only works with programs compiled with
older versions of the gcc
compiler. Newer versions of gcc
use a different program - gcov
- to display line-by-line
profiling information.
With the older versions of gcc
the program usually has to be
compiled with a -g option, in addition to -pg, in order
to generate debugging symbols for tracking source code lines.
Note, in much older versions of gcc
the program had to be
compiled with the -a command line option as well.
The flat profile is the most useful output table
in line-by-line mode.
The call graph isn't as useful as normal, since
the current version of gprof
does not propagate
call graph arcs from source code lines to the enclosing function.
The call graph does, however, show each line of code
that called each function, along with a count.
Here is a section of gprof
's output, without line-by-line profiling.
Note that ct_init
accounted for four histogram hits, and
13327 calls to init_block
.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 30.77 0.13 0.04 6335 6.31 6.31 ct_init Call graph (explanation follows) granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds index % time self children called name 0.00 0.00 1/13496 name_too_long 0.00 0.00 40/13496 deflate 0.00 0.00 128/13496 deflate_fast 0.00 0.00 13327/13496 ct_init [7] 0.0 0.00 0.00 13496 init_block
Now let's look at some of gprof
's output from the same program run,
this time with line-by-line profiling enabled. Note that ct_init
's
four histogram hits are broken down into four lines of source code—one hit
occurred on each of lines 349, 351, 382 and 385. In the call graph,
note how
ct_init
's 13327 calls to init_block
are broken down
into one call from line 396, 3071 calls from line 384, 3730 calls
from line 385, and 6525 calls from 387.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self time seconds seconds calls name 7.69 0.10 0.01 ct_init (trees.c:349) 7.69 0.11 0.01 ct_init (trees.c:351) 7.69 0.12 0.01 ct_init (trees.c:382) 7.69 0.13 0.01 ct_init (trees.c:385) Call graph (explanation follows) granularity: each sample hit covers 4 byte(s) for 7.69% of 0.13 seconds % time self children called name 0.00 0.00 1/13496 name_too_long (gzip.c:1440) 0.00 0.00 1/13496 deflate (deflate.c:763) 0.00 0.00 1/13496 ct_init (trees.c:396) 0.00 0.00 2/13496 deflate (deflate.c:727) 0.00 0.00 4/13496 deflate (deflate.c:686) 0.00 0.00 5/13496 deflate (deflate.c:675) 0.00 0.00 12/13496 deflate (deflate.c:679) 0.00 0.00 16/13496 deflate (deflate.c:730) 0.00 0.00 128/13496 deflate_fast (deflate.c:654) 0.00 0.00 3071/13496 ct_init (trees.c:384) 0.00 0.00 3730/13496 ct_init (trees.c:385) 0.00 0.00 6525/13496 ct_init (trees.c:387) [6] 0.0 0.00 0.00 13496 init_block (trees.c:408)