Profiling for cpp application --- perf, hotspot, gprof2dot, gprof

 

Introduction

To make a good software, in terms of throughput, resource control or latency etc..., one fundamental aspect is that you need to know how to do measurement. There are many free tools for you to do profile and benchmark your application.

In this post, I would like to show how to use perf and hotspot on Linux for C++ application profile visualization.

hotspot for linux perf
gprof2dot for linux perf
gprof2dot for gprof

Wordings

  • instrumentation profiling
    • need to insert code hooks explicitly record metrics
       
  • sampling profiling
  • profiling
    • visualization your program
      • function call stack
      • function execution time
  •  benchmark
    • timing your program
    • you use this to understand how long does your program need to run for a task

Tools

╔════════════╦══════════════════════════╦════════════════════════════════════════════════════════════════════════════╗
║  Tool name ║ Type                     ║ Comment                                                                    ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║    gprof   ║ sampling profiler        ║ it misses the key events, this is not what you want for micro optimization ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║    perf    ║ sampling profiler        ║ good, have cache miss counting, branch miss counting                       ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║    Vtune   ║ sampling profiler        ║ created by intel                                                           ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║   DTrace   ║ profiler                 ║                                                                            ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║ valgrind   ║ instrumentation profiler ║ slow                                                                       ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║ callgrind  ║ instrumentation profiler ║ it is too intrusive, does not catch I/O slowness/jitter                    ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║ Optick     ║ instrumentation profiler ║ good for game application profiling                                        ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║ gperftools ║ benchmark                ║ it is not representative of a realistic environment                        ║
╠════════════╬══════════════════════════╬════════════════════════════════════════════════════════════════════════════╣
║ hotspot    ║ profiler reader          ║ it can read perf record output file                                        ║
╚════════════╩══════════════════════════╩════════════════════════════════════════════════════════════════════════════╝

perf, hotspot [4], [5]

Installation

# install through os repository
$ sudo apt install linux-tools-$(uname -r) linux-tools-generic

# you can build perf locally
$ sudo apt install flex bison libelf-dev libunwind-dev libaudit-dev libslang2-dev libdw-dev
$ git clone https://github.com/torvalds/linux --depth=1
$ cd linux/tools/perf/
$ make
$ make install
$ sudo cp perf /usr/bin
$ perf

# require enough permission
$ sudo su # As Root
$ sysctl -w kernel.perf_event_paranoid=-1
$ echo 0 > /proc/sys/kernel/kptr_restrict
$ exit

Usage --- perf

# build your app
$ g++ you_app.cpp -g3 -o you_app

# record, report and annotate
$ perf record ./you_app
$ ls
you_app you_app.cpp perf.data

$ perf report
$ perf annotate

# stat
$ perf stat ./you_app

            861.62 msec task-clock                #    0.999 CPUs utilized          
                67      context-switches          #   77.761 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
               139      page-faults               #  161.325 /sec                   
     4,393,074,476      cycles                    #    5.099 GHz                    
    14,250,101,192      instructions              #    3.24  insn per cycle         
     2,259,534,632      branches                  #    2.622 G/sec                  
        15,829,588      branch-misses             #    0.70% of all branches        

       0.862198714 seconds time elapsed

       0.857723000 seconds user
       0.004026000 seconds sys
       

Usage --- hotspot

# install hotspot
$ sudo apt-get install hotspot

# usage --- generate perf output
$ perf record --call-graph dwarf <your application>

$ hotspot ./perf.data

gprof [1]

Installation and usage --- gprof

# installation for gprof on ubuntu
$ apt-get install binutils

# build your app
$ g++ your_app.cpp -pg -o your_app

# run app
$ ./your_app

# view result
$ ls -hal
gmon.out your_app.cpp your_app
$ gprof ./your_app | grep -v std | grep -v static | grep -v cxx

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00    0.01                 main [1]
                0.00    0.01       1/1           Demo_Word_Ladder() [3]
                0.00    0.00       1/1           CmdOpts<main::Opts>::parse(int, char const* const*) [444]
                0.00    0.00       1/2           main::Opts::~Opts() [437]
-----------------------------------------------
-----------------------------------------------
                0.00    0.01       1/1           main [1]
[3]    100.0    0.00    0.01       1         Demo_Word_Ladder() [3]
                0.00    0.01       1/1           Word_Ladder_Solution::Run_Test(int, int) [4]
                0.00    0.00       1/1           Word_Ladder_Solution::Word_Ladder_Solution() [440]
                0.00    0.00       1/1           Word_Ladder_Solution::~Word_Ladder_Solution() [441]
-----------------------------------------------
                0.00    0.01       1/1           Demo_Word_Ladder() [3]
[4]    100.0    0.00    0.01       1         Word_Ladder_Solution::Run_Test(int, int) [4]
-----------------------------------------------
                0.00    0.01     200/200         Word_Ladder_Solution::Run_Test(int, int) [4]
                

gprof2dot [2], [3]

Installation and usage --- gprof2dot, gprof, perf

# installation
$ pip install gprof2dot

# view image --- perf
$ perf script | c++filt | gprof2dot -w -f perf | dot -Tpng -o output.png

# view image --- gprof
$ gprof ./your_app | gprof2dot -w | dot -Tpng -o output_gprof.png

Reference

[1] D. (2020, October 7). Profiling with gprof. YouTube. https://www.youtube.com/watch?v=re79V7hNiBY

[2] J. (n.d.). GitHub - jrfonseca/gprof2dot: Converts profiling output to a dot graph. GitHub. https://github.com/jrfonseca/gprof2dot

[3] Hide long description of function while profiling with gprof2dot. (n.d.). Stack Overflow. https://stackoverflow.com/a/30457325/2358836

[4] Neutrino’s Blog: 在 Linux 上使用 Perf 做效能分析(入門篇). (n.d.). https://tigercosmos.xyz/post/2020/08/system/perf-basic/

[5] K. (n.d.). GitHub - KDAB/hotspot: The Linux perf GUI for performance analysis. GitHub. https://github.com/KDAB/hotspot#debian--ubuntu

 

Comments