Golang性能分析之PProf

PProf是什么

pprof 是用于可视化和分析性能分析数据的工具
pprof 以 profile.proto 读取分析样本的集合，并生成报告以可视化并帮助分析数据（支持文本和图形报告）
profile.proto 是一个 Protocol Buffer v3 的描述文件，它描述了一组 callstack 和 symbolization 信息，作用是表示统计分析的一组采样的调用栈，是很常见的 stacktrace 配置文件格式

可以做什么

CPU Profiling：CPU 分析，按照一定的频率采集所监听的应用程序 CPU（含寄存器）的使用情况，可确定应用程序在主动消耗 CPU 周期时花费时间的位置
Memory Profiling：内存分析，在应用程序进行堆分配时记录堆栈跟踪，用于监视当前和历史内存使用情况，以及检查内存泄漏
Block Profiling：阻塞分析，记录 goroutine 阻塞等待同步（包括定时器通道）的位置
Mutex Profiling：互斥锁分析，报告互斥锁的竞争情况

如何安装使用

Golang自带PProf包

package main
import (
    "log"
    "net/http"
    _ "net/http/pprof"
)
func main() {
    go func() {
        for {
            log.Println(data.Add("https://github.com/Looper"))
        }
    }()
    http.ListenAndServe("0.0.0.0:8889", nil)
}
var datas []string
 
func Add(str string) string {
    data := []byte(str)
    sData := string(data)
    datas = append(datas, sData)
 
    return sData
}

gin框架 go get github.com/gin-contrib/pprof

package app
 
import (
   "edj-orderfee-calculator/common"
   "log"
   "time"
 
   "git.edaijia-inc.cn/edaijia/ego/elogger"
   eloggergin "git.edaijia-inc.cn/edaijia/ego/elogger/gin"
   "github.com/gin-contrib/pprof"
   "github.com/gin-gonic/gin"
)
 
func Run() {
   conf := common.InitConfig()
   elogger.InitELogger(conf.Log)
   r := gin.New()
   pprof.Register(r)//PProf入口
   r.Use(eloggergin.GinLoggerMiddleware())
   r.Use(eloggergin.GinRecoveryMiddleware())
   if conf.Env == "prod" {
      gin.SetMode(gin.ReleaseMode)
   }
   common.Init(conf, r)
   Router(r)
   r.Run(conf.HttpAddr)
}

使用模式

Report generation：报告生成
Interactive terminal use：交互式终端使用
Web interface：Web 界面

分析-CPU Profiling

先写一段CPU占用较高的代码

ch := make(chan int, 15)
// 读取chan
go func() {
   for {
      select {
      case i := <-ch:
         // 只读取15次chan
         fmt.Println(i)
      default:
         // 读取15次chan以后的操作一直在这个空语句无任何IO操作的default条件里死循环，无法出让P，以保证一个GPM关系。
         // 而如果无default条件的话，则系统当读取完15次chan后，当前goroutine会发生 chan IO 阻塞, Go调度器根据GPM的调度关系,会将当前执行关系中的G切换出去，再从LRQ队列中取一个新的G，重新组成一个GPM继续执行，以实现合理利用计算机资源，提高GO的高并发性能
      }
   }
}()
 
// 写入15个值到chan
for i := 0; i < 15; i++ {
   ch <- i
}

执行go tool pprof http://127.0.0.1:8889/debug/pprof/profile(默认进行 30s 的 CPU Profiling，得到一个分析用的 profile 文件)

Fetching profile over HTTP from http://127.0.0.1:8889/debug/pprof/profile
Saved profile in pprof/pprof.samples.cpu.018.pb.gz
Type: cpu
Time: Mar 21, 2022 at 3:42pm (CST)
Duration: 30.13s, Total samples = 25.73s (85.40%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 25.55s, 99.30% of 25.73s total
Dropped 19 nodes (cum <= 0.13s)
      flat  flat%   sum%        cum   cum%
     9.93s 38.59% 38.59%     21.12s 82.08%  runtime.selectnbrecv
     8.60s 33.42% 72.02%     11.19s 43.49%  runtime.chanrecv
     4.42s 17.18% 89.20%     25.56s 99.34%  edj-orderfee-calculator/app.CpuDemo.func1
     2.59s 10.07% 99.26%      2.59s 10.07%  runtime.empty (inline)
     0.01s 0.039% 99.30%      0.15s  0.58%  runtime.findrunnable
         0     0% 99.30%      0.16s  0.62%  runtime.mcall
         0     0% 99.30%      0.16s  0.62%  runtime.park_m
         0     0% 99.30%      0.17s  0.66%  runtime.schedule
(pprof)

flat：函数在 CPU 上运行的时间 flat%：函数在CPU上运行时间的百分比 sum%：是从上到当前行所有函数累加使用 CPU 的比例 cum：这个函数以及子函数运行所占用的时间，应该大于等于flat cum%：这个函数以及子函数运行所占用的比例，应该大于等于flat% 最后一列：函数的名字

通过 list 函数名查看具体的代码

(pprof)  list edj-orderfee-calculator/app.CpuDemo.func1
Total: 25.73s
ROUTINE ======================== edj-orderfee-calculator/app.CpuDemo.func1 in edj-orderfee-calculator/app/app.go
     4.42s     25.56s (flat, cum) 99.34% of Total
         .          .     43:   ch := make(chan int, 15)
         .          .     44:   // 读取chan
         .          .     45:   go func() {
         .          .     46:           for {
         .          .     47:                   select {
     4.42s     25.56s     48:                   case i := <-ch:
         .          .     49:                           // 只读取15次chan
         .          .     50:                           fmt.Println(i)
         .          .     51:                   default:
         .          .     52:                           // 读取15次chan以后的操作一直在这个空语句无任何IO操作的default条件里死循环，无法出让P，以保证一个GPM关系。
         .          .     53:                           // 而如果无default条件的话，则系统当读取完15次chan后，当前goroutine会发生 chan IO 阻塞, Go调度器根据GPM的调度关系,会将当前执行关系中的G切换出去，再从LRQ队列中取一个新的G，重新组成一个GPM继续执行，以实现合理利用计算机资源，提高GO的高并发性能
(pprof)

图像生成

Tips：此功能需要安装 graphviz

Type: cpu
Time: Mar 8, 2022 at 7:07pm (CST)
Duration: 30.17s, Total samples = 23.10s (76.56%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web

其他 Profiling

go tool pprof http://127.0.0.1:8889/debug/pprof/allocs : 分析内存分配
go tool pprof http://127.0.0.1:8889/debug/pprof/block : 分析堆栈跟踪导致阻塞的同步原语
go tool pprof http://127.0.0.1:8889/debug/pprof/cmdline : 分析命令行调用的程序，web下调用报错
go tool pprof http://127.0.0.1:8889/debug/pprof/goroutine : 分析当前 goroutine 的堆栈信息
go tool pprof http://127.0.0.1:8889/debug/pprof/mutex : 分析堆栈跟踪竞争状态互斥锁的持有者
go tool pprof http://127.0.0.1:8889/debug/pprof/threadcreate : 分析堆栈跟踪系统新线程的创建
go tool pprof http://127.0.0.1:8889/debug/pprof/trace : 分析追踪当前程序的执行状况
web 页面 http://127.0.0.1:8889/debug/pprof/

goroutine泄露排查

如果启动了1个goroutine，但并没有符合预期的退出，直到程序结束，此goroutine才退出，这种情况就是goroutine泄露。每个goroutine占用2KB内存，泄露1百万goroutine至少泄露2KB * 1000000 = 2GB内存，为什么说至少呢？goroutine执行过程中还存在一些变量，如果这些变量指向堆内存中的内存，GC会认为这些内存仍在使用，不会对其进行回收，这些内存谁都无法使用，造成了内存泄露。所以goroutine泄露有2种方式造成内存泄露：

goroutine本身的栈所占用的空间造成内存泄露。
goroutine中的变量所占用的堆内存导致堆内存泄露，这一部分是能通过heap profile体现出来的

怎么确定是goroutine泄露引发的内存泄露 判断依据：在节点正常运行的情况下，隔一段时间获取goroutine的数量，如果后面获取的那次，某些goroutine比前一次多，如果多获取几次，是持续增长的，就极有可能是goroutine泄露。

goroutine泄露代码

outCh := make(chan int)
go func() {
   if false {
       <-outCh
   }
   select {}
}()
 
// 每s起100个goroutine，goroutine会阻塞，不释放内存
tick := time.Tick(time.Second / 100)
i := 0
for range tick {
   i++
   fmt.Println(i)
   alloc1(outCh)
}
func alloc1(outCh chan<- int) {
    go alloc2(outCh)
}
 
func alloc2(outCh chan<- int) {
    func() {
        defer fmt.Println("alloc-fm exit")
        // 分配内存，假用一下
        buf := make([]byte, 1024*1024*10)
        _ = len(buf)
        fmt.Println("alloc done")
 
        outCh <- 0 // 53行
    }()
}

执行go tool pprof http://127.0.0.1:8889/debug/pprof/goroutine

Fetching profile over HTTP from http://127.0.0.1:8889/debug/pprof/goroutine
Saved profile in pprof/pprof.goroutine.004.pb.gz
Type: goroutine
Time: Mar 18, 2022 at 11:50am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1388, 99.93% of 1389 total
Dropped 42 nodes (cum <= 6)
      flat  flat%   sum%        cum   cum%
      1388 99.93% 99.93%       1388 99.93%  runtime.gopark
         0     0% 99.93%       1380 99.35%  edj-orderfee-calculator/app.alloc2
         0     0% 99.93%       1380 99.35%  edj-orderfee-calculator/app.alloc2.func1
         0     0% 99.93%       1380 99.35%  runtime.chansend
         0     0% 99.93%       1380 99.35%  runtime.chansend1
(pprof)
 
 
Fetching profile over HTTP from http://127.0.0.1:8889/debug/pprof/goroutine
Saved profile in pprof/pprof.goroutine.005.pb.gz
Type: goroutine
Time: Mar 18, 2022 at 11:50am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2751, 100% of 2752 total
Dropped 42 nodes (cum <= 13)
      flat  flat%   sum%        cum   cum%
      2751   100%   100%       2751   100%  runtime.gopark
         0     0%   100%       2743 99.67%  edj-orderfee-calculator/app.alloc2
         0     0%   100%       2743 99.67%  edj-orderfee-calculator/app.alloc2.func1
         0     0%   100%       2743 99.67%  runtime.chansend
         0     0%   100%       2743 99.67%  runtime.chansend1
(pprof)

go tool pprof -base pprof/pprof.goroutine.004.pb.gz pprof/pprof.goroutine.005.pb.gz 我们发现以004为基准005文件新增了goroutine1363个，可以判定为goroutine泄露

Type: goroutine
Time: Mar 18, 2022 at 11:50am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1363, 100% of 1363 total
      flat  flat%   sum%        cum   cum%
      1363   100%   100%       1363   100%  runtime.gopark
         0     0%   100%       1363   100%  edj-orderfee-calculator/app.alloc2
         0     0%   100%       1363   100%  edj-orderfee-calculator/app.alloc2.func1
         0     0%   100%       1363   100%  runtime.chansend
         0     0%   100%       1363   100%  runtime.chansend1
(pprof)

再通过005文件，go tool pprof pprof/pprof.goroutine.005.pb.gz 看一眼执行到gopark的goroutine数量，即挂起的goroutine数量：2751

Type: goroutine
Time: Mar 18, 2022 at 11:50am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 2751, 100% of 2752 total
Dropped 42 nodes (cum <= 13)
      flat  flat%   sum%        cum   cum%
      2751   100%   100%       2751   100%  runtime.gopark
         0     0%   100%       2743 99.67%  edj-orderfee-calculator/app.alloc2
         0     0%   100%       2743 99.67%  edj-orderfee-calculator/app.alloc2.func1
         0     0%   100%       2743 99.67%  runtime.chansend
         0     0%   100%       2743 99.67%  runtime.chansend1
(pprof)

Web可视化查看(以goroutine为例)

访问 http://127.0.0.1:8889/debug/pprof/goroutine?debug=1 阻塞的原因是outCh这个写操作无法完成，outCh是无缓冲的通道，并且由于以下代码是死代码，所以goroutine始终没有从outCh读数据，造成outCh阻塞，进而造成无数个alloc2的goroutine阻塞，形成内存泄露

goroutine泄露的场景总结

channel的读或者写：
1. 无缓冲channel的阻塞通常是写操作因为没有读而阻塞
2. 有缓冲的channel因为缓冲区满了，写操作阻塞
3. 期待从channel读数据，结果没有goroutine写
select操作，select里也是channel操作，如果所有case上的操作阻塞，goroutine也无法继续执行

文章目录