Linux 的 bash 找不到指令時,如何作到自動提示安裝指令?
實例演練使用系統工具理解程式運行的邏輯
先前提過一些工具:
這篇以一個簡單的例子,說明如何用這些工具快速找出程式是如何運作的。這是除錯時的重要技能之一。
本篇的目的不是說明問題的答案,而是說明遇到各種狀況時,如何用工具有效率地找到答案。
問題描述
在 Ubuntu 的 bash
下執行指令後,若沒有安裝指令的話,會出現提示告訴你怎麼裝:
$ apt-rdepends
The program 'apt-rdepends' is currently not installed. You can install it by typing:
sudo apt-get install apt-rdepends
但若直接用 bash
執行,卻不會有這效果:
$ bash -c apt-rdepends
bash: apt-rdepends: command not found
這背後是怎麼運作的呢?
步驟一: 用 strace 找線索
先找出目前 bash 的 PID:
$ echo $$
30038
開另一個 terminal 用 strace
追踪資訊,再回到原本 terminal 執行 apt-rdepends
,然後看 strace
輸出。
一開始我只有用 strace -e open
,看有無讀特別的檔案,但沒看出什麼線索,再來看全部的 system call,還是沒看到什麼線索,但有看到呼叫 clone()
: 表示有 child process 作事。再來就連 child process 的一起看:
strace -p 30038 -s 512 -f -o /tmp/t
/tmp/t
的部份內容如下:
...
26771 faccessat(AT_FDCWD, "/usr/lib/command-not-found", X_OK) = 0
...
26771 clone(child_stack=0, ...) = 26772
...
26772 execve("/usr/lib/command-not-found", ["/usr/lib/command-not-found", "--", "apt-rdepends"], [/* 88 vars */]) = 026772 write(2, "The program 'apt-rdepends' is currently not installed. You can install it by typing:\n", 85) = 8526772 write(2, "sudo apt install apt-rdepends\n", 30) = 30
由此可知是透過 /usr/lib/command-not-found
處理的。
看一下它的介紹:
$ dpkg --search /usr/lib/command-not-found
command-not-found: /usr/lib/command-not-found
$ apt-cache show command-not-found
...
Description-en: Suggest installation of packages in interactive bash sessions
This package will install a handler for command_not_found that looks up programs not currently installed but available from the repositories.
有興趣了解它怎麼找的,可以用 apt-get source command-not-found
取得原始碼研究。再來的問題是: bash 為什麼會呼叫它?
相信以 /usr/lib/command-not-found
為關鍵字上網搜尋,滿有機會找到答案。不過我想演練如何自行找到答案,就繼續用「硬漢」的作法往下找。
步驟二: 用 gdb 找線索
我想找出呼叫 /usr/lib/command-not-found
當下 bash
的 backtrace,這裡可能會有些線索。
前置作業:
- 寫一個程式讓它
sleep(3600)
,然後暫時換掉/usr/lib/command-not-found
,這樣執行command-not-found
後會卡住不動 。
2. 安裝 bash
debug symbol:
$ apt-cache search bash | grep dbg
...
bash-dbgsym - debug symbols for package bash
$ sudo apt-get install bash-dbgsym
3. 取得 bash 原始碼:
$ cd /home/fcamel/dev/
$ apt-get source bash
( 產生 bash-4.3 )
$ cd bash-4.3
$ mkdir out # 產生一個空目錄, 後面給 gdb 用
4. 回到原本的 bash 執行:
$ echo $$
30038
$ apt-rdepends
( 卡住不會動 )
5. 開新的 terminal 用 gdb attach 剛才的 bash。
這裡我用 cgdb,會切上下兩個視窗,下面是 gdb,上面是原始碼。下面只貼 gdb 視窗的部份:
$ cgdb -p 30038
(gdb) bt
#0 __GI___waitpid at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1 waitchld at .././jobs.c:3224
#2 wait_for at .././jobs.c:2485
#3 execute_command_internal at .././execute_cmd.c:829
#4 execute_command at .././execute_cmd.c:390
#5 reader_loop at .././eval.c:160
#6 main at .././shell.c:756(gdb) directory /home/fcamel/dev/bash-4.3/out # 告訴 gdb 原始碼位置Source directories searched: /home/fcamel/dev/bash-4.3/out:$cdir:$cwd(gdb) up(gdb) up(gdb) l # 列出原始碼,方便文章內閱讀。
2480 # endif
2481 sigaction (SIGCHLD, &act, &oact);
2482 # endif /* MUST_UNBLOCK_CHLD */
2483 queue_sigchld = 1;
2484 waiting_for_child++;
2485 r = waitchld (pid, 1); /* XXX */
2486 waiting_for_child--;
2487 #if 0
2488 itrace("wait_for: blocking wait for %d returns %d child = %p", (int)pid, r, child);
2489 #endif(gdb) p pid
$1 = 30479
所以要再看 process 30479 在作什麼:
$ cgdb -p 30479
(gdb) bt
#0 __GI___waitpid at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1 waitchld at .././jobs.c:3224
#2 wait_for at .././jobs.c:2485
#3 execute_command_internal at .././execute_cmd.c:829
#4 execute_command at .././execute_cmd.c:390
#5 execute_connection at .././execute_cmd.c:2502
#6 execute_command_internal at .././execute_cmd.c:945
#7 execute_command at .././execute_cmd.c:390
#8 execute_if_command at .././execute_cmd.c:3438
#9 execute_command_internal at .././execute_cmd.c:897
#10 execute_command_internal at .././execute_cmd.c:937
#11 execute_function at .././execute_cmd.c:4547
#12 execute_shell_function at .././execute_cmd.c:4609
#13 execute_disk_command at .././execute_cmd.c:5000
#14 execute_simple_command at .././execute_cmd.c:4236
#15 execute_command_internal at .././execute_cmd.c:787
#16 execute_command at .././execute_cmd.c:390
#17 reader_loop at .././eval.c:160
#18 main at .././shell.c:756(gdb) directory /home/fcamel/dev/bash-4.3/out
Source directories searched: /home/fcamel/dev/bash-4.3/out:$cdir:$cwd(gdb) f 2
#2 0x000000000044854b in wait_for (pid=30480) at .././jobs.c:2485(gdb) p pid
$1 = 30480
一樣用 gdb 看 30480,發現是自己替換的程式 (或用 ps axuw | grep 30480
驗證 )。
回頭看 30479。frame 12 的 execute_shell_function
看來可能會有線索:
(gdb) f 12
#12 0x0000000000439e36 in execute_shell_function (var=var@entry=0x127e448, words=0x1526f68) at .././execute_cmd.c:4609(gdb) l
4604
4605 bitmap = new_fd_bitmap (FD_BITMAP_DEFAULT_SIZE);
4606 begin_unwind_frame ("execute-shell-function");
4607 add_unwind_protect (dispose_fd_bitmap, (char *)bitmap);
4608
4609 ret = execute_function (var, words, 0, bitmap, 0, 0);
4610
4611 dispose_fd_bitmap (bitmap);
4612 discard_unwind_frame ("execute-shell-function");
4613(gdb) p var
$2 = (SHELL_VAR *) 0x127e448(gdb) p *var
$3 = {name = 0x127e488 "command_not_found_handle", value = 0x127e4c8 "\t", exportstr = 0x0, dynamic_value = 0x0, assign_func = 0x0, attributes = 8, context = 0}
獲得新的線索 command_not_found_handle
。顧名思義,看起來像是 bash
提供找不到命令時的 hook 。
步驟三: 查文件
知道更精確的關鍵字後,查看看 man bash 有沒有說明。文內搜尋 command_not_found_handle
就找到答案了:
COMMAND EXECUTION
...
If the name is neither a shell function nor a builtin, and contains no slashes, bash searches each element of the PATH for a directory containing an executable file by that name. Bash uses a hash table to remember the full pathnames of executable files (see hash under SHELL BUILTIN COMMANDS below). A full search of the directories in PATH is performed only if the command is not found in the hash table. If the search is unsuccessful, the shell searches for a defined shell function named command_not_found_handle. If that function exists, it is invoked with the original command and the original command’s arguments as its arguments, and the function’s exit status becomes the exit status of the shell. If that function is not defined, the shell prints an error message and returns an exit status of 127.
所以這是 bash 內建的功能,找不到命令時會看有沒有定義它,有定義就呼叫。可見系統的 bashrc 有定義 command_not_found_handle
為 /usr/lib/command-not-found
。稍微找找,會發現寫在 /etc/bash.bashrc
。
若文件裡沒說明,再回去看 bash 原始碼和 command_not_found_handle
相關程式,有了一些想法後回頭用 gdb 設中斷點觀察以取得更多資訊。必要時也可重編 bash,在自己的 bash 裡加 log 獲得更多資訊。