Troubleshooting in Linux

Amit Kumar
4 min readJul 4, 2019

--

Linux issues and troubleshooting
===================
Some Linux distributions don’t offer a safe mode and automatic repair tools but you can see recovery mode which provides some options that keep your files and can repair the problem. There is also another possibility by using a Linux live CD or USB which will help to use tools or commands.

Two Phase approach to troubleshooting
=====================
1) Fault Analysis Phase
— State the problem in understandable format or words.
— Gather information i.e. what’s not working, is it one user or all user problem ?
is it a reproducible problem?
— Identify what is and what is not working.
2) Fault Diagnosis Phase
— Based on the fault analysis and past experience determine the most probable cause.
— Test and verify the probable causes.
— Take corrective action.
— Most important part need to ensure not to introduce any new problems.

3) Document the results of the fault analysis & the fault diagnosis phases.

Gather information or collect logs
===================
complete description of server.
symptoms and error messages to describe what exactly the problem is.
wether it’s a single user or group of user problem and what they are doing before it occurs, wether problem can be reproduced?
Is there any script available for the same?
Determine wether it’s a known problem which is occuring frequently or is it an intermittent problem?
Any changes made to the server prior this issue.

OS logs
=====
Files under /var/log:
— boot.log — messages from bootup
— messages — standard system error messages.
— anaconda — O/S install logs
— dmesg — log of boot messages showing H/W errors.

Other logs exists for mail, cron, security, etc.
Other directories in /var/log/ exists for cups, httpd, samba, etc.

dmesg utility
========
dmesg: print out buffer showing latest hardware issues.
buffer can truncate when it is full
- /var/log/boot*
- /var/log/dmesg*

Troubleshooting Resources
================
Man page provide the usage of a command and available options & configuration parameters.
Many commands and services have a -d/D option for debugging or a -v/V option for verbose.
The usr/share/doc/ directory contains information about packages installed on your system plus release notes and manuals.

Cause of common problems
================
Services not running
- use the service command to start aservice or check the status.
- use the chkconfig command to start a service at boot time.

configuration errors
firewall (iptables) is prohibiting a connection
— stop iptables and test to determine if firewall is blocking.

PAM is prohibiting authentication
- View /var/log/secure for authentication error messages.

Troubleshooting Boot problems
=================
Configuration errors in following files can cause system booting problem:
- /boot/grub/grub.configuration
- /etc/inittab
- /etc/fstab

Boot into rescue mode to correct boot problems.
- Rescue mode boots from installation media.
- File systems are mounted under /mnt/sysimage.
- Use chroot to change the root partition of the rescue mode environment.
- Then use vi, fsck, rpm and other utilities to fix the boot problems.

Use the grub-install to reinstall the boot loader.

Sosreport or sysreport
==============
sysreport — Till Linux 4.5
sosreport — after Linux 4.5

sosreport or sysreport will not collect /var/log/ data and should be collected seperately to troubleshoot further.
/var/log directory is important as it helps in troubleshooting or investigate system issues for a longer timeframe and captures
the screenshot of the error messages.use below command to collect all the content of /var/log/

# zip -r logs.zip /var/log/*

Other useful commands required to investigate or troubleshoot linux issues:-
sysstat — monitor system performance and usage

It includes:-
/usr/bin/iostat :- collects cpu statistics, i/o statistics for devices, partitions and network file system.
/usr/bin/mpstat :- collects process related statistics.
/usr/bin/sar :- collects reports and saves system activity information such as cpu, memory, disks, interrupts, network interfaces, TTY, kernel tables, NFS, sockets etc.

dmidecode -q : same as prtconf in solaris
collecting crash dump :- kdump command is used in Linux.
kdump :- collects a memory image (vmcore) that helps in determining the cause of the system crash.

Typical Causes of NFS Problems
==================
The rcpbind or NFS deamons are not running.
— NFS daemons are nfs & nfslock.
Syntax errors:
— on client mount command
— In /etc/exports file on server

Permission problems
-Check UIDs & GIDs
Firewall is blocking NFS packets
— Check iptables rules or stop iptables service

DNS host name resolution
- Ensure /etc/resolv.conf contains correct entries.

Making grub.cfg:
=========
On Redhat/Fredora/CentOS : grub2-mkconfig -o /boot/grub2/grub.cfg
On Debian/Ubuntu : grub-mkconfig -o /boot/grub/grub.cfg

Note:- “grub-mkconfig” reads /etc/defaults/grub, then uses helper scripts located in the directory /etc/grub.d to probe the computer, looking for installed OS, it then writes appropriate grub.cfg file. It is a tool to create or update a grub.cfg file.

Installing the grub bootloader:
==================
On Redhat/Fredora/CentOS : grub2-install -recheck /dev/sda
On Debian/Ubuntu : grub-install -recheck /dev/sda

Note:- “grub-install” reads grub.cfg and writes an appropriate boot code into MBR (master boot record) BIOS boot partition, and/or EFI system
partition, as necessary. you donot need to do this every time you change grub’s configuration. It only needs to be done if boot grub’s boot code
has never been installed on this disk.

The Initial RAM Disk (initrd) :
=================
The initrd provides a mechanism for providing necessary modules early in the boot process.

Creating an initrd image with “dracut”:
— — — — — — — — — — — — — — — —
# dracut initrd-new.img

for a different kernel : # dracut initrd-new.img kernel version

Note :- On debian based systems, mkinitramfs tool can be used for creating initrd image.

That’s enough to look into any Linux issue and troubleshoot either system is all down or running. Always try to understand the issue to get the exact
solution inspite jumping to conclusion or trying to solve the issue.

Thanks for reading !!

Amit Kumar

--

--