本文将介绍core dump的相关内容,最后以一个具体的例子来展示:使用GDB解析core dump的基本用法。

1. core dump的基本概念

A core dump is the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally (crashed). In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information.

2. 启用core dump

1
2
3
acrn@acrn:~$ help ulimit
Options:
-c the maximum size of core files created
1
ulimit -c unlimited

3. 设置 core 文件的存储目录和命名格式

设置 core 的存储目录和命名格式,主要是修改配置文件 /proc/sys/kernel/core_pattern

1
2
3
4
5
# 1. 默认在当前程序执行目录下生成,core-程序名-程序pid-时间 [core-test-3451-1516257740]
echo "core-%e-%p-%t" > /proc/sys/kernel/core_pattern

# 2. 添加路径,可以把所有的 core 集中到一个文件夹里 [把所有的core文件放到 /root/core-file 目录下]
echo "/root/core-file/core-%e-%p-%t" > /proc/sys/kernel/core_pattern

为什么会有设置core文件存储目录这个需求呢?答案是:如果程序中调用了chdir函数,则有可能改变了当前工作目录。这时core文件创建在chdir指定的路径下。

为什么会有设置core文件命名格式的需求呢?答案是:内核在coredump时所产生的core文件放在与该程序相同的目录中,并且文件名固定为core。很显然,如果有多个程序产生core文件,或者同一个程序多次崩溃,就会重复覆盖同一个core文件,因此我们有必要对不同程序生成的core文件分别命名。

4. 如何判断一个文件是coredump文件?

4.1 method1

在Linux系统下,coredump文件本身是ELF格式的,因此,我们可以通过readelf命令进行判断。

1
2
3
4
5
6
7
8
9
acrn@acrn:~/test$ readelf -h core
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: CORE (Core file)

可以看到ELF文件头的Type字段的类型是:CORE (Core file)

4.2 method2

通过简单的file命令进行快速判断:

1
2
acrn@acrn:~/test$ file core
core: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from './a.out'

core file

5. Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// FILE:test.c
#include<stdlib.h>

void repeatFree(char *p)
{
if(NULL != p)
{
free(p);
}
}

int main()
{
char* pstr =(char*) malloc(10);

repeatFree(pstr); // 第一次释放

repeatFree(pstr); // 第二次释放

return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
acrn@acrn:~/test$ gcc -g test.c -o test
acrn@acrn:~/test$ ./test
*** Error in `./test': double free or corruption (fasttop): 0x000000000164c010 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fb1be3fb7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fb1be40437a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fb1be40853c]
./test[0x400585]
./test[0x4005b6]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fb1be3a4830]
./test[0x400499]
======= Memory map: ========
00400000-00401000 r-xp 00000000 103:02 3938165 /home/acrn/test/test
00600000-00601000 r--p 00000000 103:02 3938165 /home/acrn/test/test
00601000-00602000 rw-p 00001000 103:02 3938165 /home/acrn/test/test
0164c000-0166d000 rw-p 00000000 00:00 0 [heap]
7fb1b8000000-7fb1b8021000 rw-p 00000000 00:00 0
7fb1b8021000-7fb1bc000000 ---p 00000000 00:00 0
7fb1be16e000-7fb1be184000 r-xp 00000000 103:02 8917652 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb1be184000-7fb1be383000 ---p 00016000 103:02 8917652 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb1be383000-7fb1be384000 rw-p 00015000 103:02 8917652 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fb1be384000-7fb1be544000 r-xp 00000000 103:02 8913417 /lib/x86_64-linux-gnu/libc-2.23.so
7fb1be544000-7fb1be744000 ---p 001c0000 103:02 8913417 /lib/x86_64-linux-gnu/libc-2.23.so
7fb1be744000-7fb1be748000 r--p 001c0000 103:02 8913417 /lib/x86_64-linux-gnu/libc-2.23.so
7fb1be748000-7fb1be74a000 rw-p 001c4000 103:02 8913417 /lib/x86_64-linux-gnu/libc-2.23.so
7fb1be74a000-7fb1be74e000 rw-p 00000000 00:00 0
7fb1be74e000-7fb1be774000 r-xp 00000000 103:02 8913403 /lib/x86_64-linux-gnu/ld-2.23.so
7fb1be955000-7fb1be958000 rw-p 00000000 00:00 0
7fb1be972000-7fb1be973000 rw-p 00000000 00:00 0
7fb1be973000-7fb1be974000 r--p 00025000 103:02 8913403 /lib/x86_64-linux-gnu/ld-2.23.so
7fb1be974000-7fb1be975000 rw-p 00026000 103:02 8913403 /lib/x86_64-linux-gnu/ld-2.23.so
7fb1be975000-7fb1be976000 rw-p 00000000 00:00 0
7fff6edaa000-7fff6edcb000 rw-p 00000000 00:00 0 [stack]
7fff6edf8000-7fff6edfb000 r--p 00000000 00:00 0 [vvar]
7fff6edfb000-7fff6edfd000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted (core dumped)
acrn@acrn:~/test$ ls
core-test-14363-1618606735 test test.c

gdb 调试,找出出错的位置 gdb 程序名 core文件名

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
acrn@acrn:~/test$ gdb test core-test-14363-1618606735
Reading symbols from test...done.
[New LWP 14363]
Core was generated by `./test'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fb1be3b9428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fb1be3b9428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fb1be3bb02a in __GI_abort () at abort.c:89
#2 0x00007fb1be3fb7ea in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fb1be514ed8 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007fb1be40437a in malloc_printerr (ar_ptr=<optimized out>, ptr=<optimized out>, str=0x7fb1be514fa0 "double free or corruption (fasttop)", action=3) at malloc.c:5006
#4 _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3867
#5 0x00007fb1be40853c in __GI___libc_free (mem=<optimized out>) at malloc.c:2968
#6 0x0000000000400585 in repeatFree (p=0x164c010 "") at test.c:8
#7 0x00000000004005b6 in main () at test.c:18

在 gdb 内,输入 where 或者bt可以看出, 我们写的程序出错的两行:

1
2
#6  0x0000000000400585 in repeatFree (p=0x164c010 "") at test.c:8
#7 0x00000000004005b6 in main () at test.c:18

在 repeatFree 函数中,test.c 文件的第 8 行错啦,释放了两次内存。

还有一个值得挖掘的信息是:Program terminated with signal SIGABRT, Aborted.

signal的详细信息请查询signal(7)

6. material

如果想要了解更加详细的内容,推荐阅读https://averageradical.github.io/Linux_Core_Dumps.pdf 中对于core dump的介绍。


参考资料:

  1. Linux 下生成 core dump 配置和用法
  2. 详解coredump
  3. signal(7)
  4. wikipedia Core dump