Linux Real Time-Troubleshooting-ScenariosπŸ‘¨β€πŸ’»

Neel Shah
8 min readOct 6, 2023

--

It is always crucial to understand the issue. There should be the right approach or a step-by-step process to be followed to troubleshoot the issues. Doesn’t matter you are a

Software Developer or DevOps Engineer or an Architect, Unix./Linux is used widely and you should be aware with the issues and correct approach to resolve it.

Let’s discuss on the few of them :

Issue 1 : Server is not reachable or unable to connect Approach / Solution :

.

β”œβ”€β”€ Ping the server by Hostname and IP Address

β”‚ β”œβ”€β”€ Hostname/IP Address is pingable

β”‚ β”‚ β”œβ”€β”€ Issue might be on the client side as server is reachable

β”‚ β”œβ”€β”€ Hostname is not pingable but IP Address is pingable

β”‚ β”‚ β”œβ”€β”€ Could be the DNS issue

β”‚ β”‚ β”‚ β”œβ”€β”€ check /etc/hosts

β”‚ β”‚ β”‚ β”œβ”€β”€ check /etc/resolv.conf

β”‚ β”‚ β”‚ β”œβ”€β”€ check /etc/nsswitch.conf

β”‚ β”‚ β”‚ β”œβ”€β”€ (Optional) DNS can also be defined in the

/etc/sysconfig/network-scripts/ifcfg-<interface>

β”‚ β”œβ”€β”€ Hostname/IP Address both are not pingable

β”‚ β”‚ β”œβ”€β”€ Check the other server on its same network to see if there is Network side access issue or other overall something bad

β”‚ β”‚ β”‚ β”œβ”€β”€ False: Issue is not overall network side but its with that host/server

β”‚ β”‚ β”‚ β”œβ”€β”€ True: Might be overall network side issue

β”‚ β”‚ β”œβ”€β”€ Logged into server by Virtual Console, if the server is PoweredON. Check the uptime

β”‚ β”‚ β”œβ”€β”€ Check if the server has the IP, and has UP status of Network interface

β”‚ β”‚ β”‚ β”œβ”€β”€ (Optional) Also check IP related information from

/etc/sysconfig/network-scripts/ifcfg-<interface>

β”‚ β”‚ β”œβ”€β”€ Ping the gateway, also check routes

β”‚ β”‚ β”œβ”€β”€ Check Selinux, Firewall rules

β”‚ β”‚ β”œβ”€β”€ Check physical cable conn

Issue 2 : Unable to connect to website or an application Approach / Solution :

.

β”œβ”€β”€ Ping the server by Hostname and IP Address

β”‚ β”œβ”€β”€ False: Above Troublshooting Diagram β€œServer is not reachable or cannot connect”

β”‚ β”œβ”€β”€ True: Check the service availabilty by using telnet command with port

β”‚ β”‚ β”œβ”€β”€ True: Service is running

β”‚ β”‚ β”œβ”€β”€ False: Service is not reachable or running

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the service status using systemctl or other command

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the firewall/selinux

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the service logs

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the service configuration

└── …

Issue 3 : Unable to ssh as root or any other user. Approach / Solution :

.

β”œβ”€β”€ Ping the server by Hostname and IP Address

β”‚ β”œβ”€β”€ False: Above Troublshooting Diagram β€œServer is not reachable or cannot connect”

β”‚ β”œβ”€β”€ True: Check the service availabilty by using telnet command with port

β”‚ β”‚ β”œβ”€β”€ True: Service is running

β”‚ β”‚ β”‚ β”œβ”€β”€ Issue migh be on client side

β”‚ β”‚ β”‚ β”œβ”€β”€ User might be disabled, nologin shell, disabled root login and other configuration

β”‚ β”‚ β”œβ”€β”€ False: Service is not reachable or running

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the service status using systemctl or other command

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the firewall/selinux

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the service logs

β”‚ β”‚ β”‚ β”œβ”€β”€ Check the service configuration

└── …

Issue 4 : Disk Space is full issue or add/extend disk space Approach / Solution :

.

β”œβ”€β”€ System Performance degradation detection

β”‚ β”œβ”€β”€ Application getting slow/unresponsive

β”‚ β”œβ”€β”€ Commands are not running (For Example: as / disk space is full)

β”‚ β”œβ”€β”€ Cannot do logging and other etc

β”œβ”€β”€ Analyse the issue

β”‚ β”œβ”€β”€ df command to find the problematic filesystem space issue

β”œβ”€β”€ Action

β”‚ β”œβ”€β”€ After finding the specific filesystem, use du command in that filesystem to get which files/directories are large

β”‚ β”œβ”€β”€ Compress/remove big files

β”‚ β”œβ”€β”€ Move the items to another partition/server

β”‚ β”œβ”€β”€ Check the health status of the disks using badblocks command (For Example: #badblocks -v /dev/sda)

β”‚ β”œβ”€β”€ Check which process is IO Bound (using iostat)

β”‚ β”œβ”€β”€ Create a link to file/dir

β”œβ”€β”€ New disk addition

β”‚ β”œβ”€β”€ Simple partition

β”‚ β”‚ β”œβ”€β”€ Add disk to VM

β”‚ β”‚ β”œβ”€β”€ Check the new disk with df/lsblk command

β”‚ β”‚ β”œβ”€β”€ fdisk to create partition. Better to have LVM partition

β”‚ β”‚ β”œβ”€β”€ Create filesytem and mount it

β”‚ β”‚ β”œβ”€β”€ fstab entry for persistent

β”‚ β”œβ”€β”€ LVM Partition

β”‚ β”‚ β”œβ”€β”€ Add disk to VM

β”‚ β”‚ β”œβ”€β”€ Check the new disk with df/lsblk command

β”‚ β”‚ β”œβ”€β”€ fdisk to create LVM partition

β”‚ β”‚ β”œβ”€β”€ PV, VG, LV

β”‚ β”‚ β”œβ”€β”€ Create filesytem and mount it

β”‚ β”‚ β”œβ”€β”€ fstab entry for persistent

β”‚ β”œβ”€β”€ Extend LVM partition

β”‚ β”‚ β”œβ”€β”€ Add disk, and create LVM partition

β”‚ β”‚ β”œβ”€β”€ Add LVM partition (PV) in existing VG

β”‚ β”‚ β”œβ”€β”€ Extend LV and resize filesystem

└── …

Issue 5 : Filesystem corrupted Approach / Solution :

.

β”œβ”€β”€ One of the error that cause the system unable to BOOT UP

β”œβ”€β”€ Check /var/log/messages, dmesg and other log files

β”œβ”€β”€ If we have a badsector logs, we have to run fsck

β”‚ β”œβ”€β”€ True:

β”‚ β”‚ β”œβ”€β”€ reboot the system into resuce mode as booting it from CDROM by applying ISO

β”‚ β”‚ β”œβ”€β”€ proceed with option 1, which mount the original root filesystem under

/mnt/sysimage

β”‚ β”‚ β”œβ”€β”€ edit fstab entries or create a new file with the help of blkid and reboot

└── …

Issue 6 : fstab file missing or bad entry Approach / Solution :

.

β”œβ”€β”€ One of the error that cause the system unable to BOOT UP

β”œβ”€β”€ Check /var/log/messages, dmesg and other log files

β”œβ”€β”€ If we have a badsector logs, we have to run fsck

β”‚ β”œβ”€β”€ True:

β”‚ β”‚ β”œβ”€β”€ reboot the system into resuce mode as booting it from CDROM by applying ISO

β”‚ β”‚ β”œβ”€β”€ proceed with option 1, which mount the original root filesystem under

/mnt/sysimage

β”‚ β”‚ β”œβ”€β”€ edit fstab entries or create a new file with the help of blkid and reboot

└── …

Issue 7 : Can’t cd to the directory even if user has sudo privileges Approach / Solution :

.

β”œβ”€β”€ Reasons and Resolution

β”‚ β”œβ”€β”€ Directory does not exist

β”‚ β”œβ”€β”€ Pathname conflict: relative vs absolute path

β”‚ β”œβ”€β”€ Parent directory permission/ownership

β”‚ β”œβ”€β”€ Doesn’t have executable permission on target directory

β”‚ β”œβ”€β”€ Hidden directory

└── …

Issue 8 : Can’t Create Links Approach / Solution :

.

β”œβ”€β”€ Reasons and Resolution

β”‚ β”œβ”€β”€ Target directory/File does not exist

β”‚ β”œβ”€β”€ Pathname conflict: relative vs absolute path β€” (should be complete path)

β”‚ β”œβ”€β”€ Parent directory permission/ownership

β”‚ β”œβ”€β”€ Target file permission/ownership β€” (as there should be read permission)

β”‚ β”œβ”€β”€ Hidden directory/file

└── …

Issue 9 : Running Out of Memory Approach / Solution :

.

β”œβ”€β”€ Types

β”‚ β”œβ”€β”€ Cache (L1, L2, L3)

β”‚ β”œβ”€β”€ RAM

β”‚ β”‚ β”œβ”€β”€ Usage

β”‚ β”‚ β”‚ β”œβ”€β”€ #free -h

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Total (Total assigned memory)

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Used (Total actual used memory)

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Free (Actual free memory)

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Shared (Shared Memory)

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Buff/Cache (Pages cache memory)

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ Available (Memory can be freed)

β”‚ β”‚ β”‚ β”œβ”€β”€ /proc/meminfo

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ file active

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ file inactive

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ anon active

β”‚ β”‚ β”‚ β”‚ β”œβ”€β”€ anon inactive

β”‚ β”œβ”€β”€ Swap (Virtual Memory)

β”œβ”€β”€ Resolution

β”‚ β”œβ”€β”€ Identify the processes that are using high memory using top, htop, ps etc.

β”‚ β”œβ”€β”€ Check the OOM in logs and also check if there is a memory commitment in sysctl.conf

β”‚ β”œβ”€β”€ Kill or restart the process/service

β”‚ β”œβ”€β”€ prioritize the process using nice

β”‚ β”œβ”€β”€ Add/Extend the swap space

β”‚ β”œβ”€β”€ Add more physical more RAM

└── …

Issue 10 : Add/ Extend the Swap Space Approach / Solution :

.

β”œβ”€β”€ Due to running out of memory, we would need to add more swap space

β”‚ β”œβ”€β”€ Create a file with #dd, as it will reserve the blocks of disk for swap file

β”‚ β”œβ”€β”€ Set permission 600 and give root ownership

β”‚ β”œβ”€β”€ #mkswap

β”‚ β”œβ”€β”€ Now Turned swap on #swapon

β”‚ β”œβ”€β”€ fstab entry for persistent

└── …

Issue 11 : Unable to Run Certain Commands Approach / Solution :

.

β”œβ”€β”€ Troubleshooting and Resolution

β”‚ β”œβ”€β”€ command

β”‚ β”‚ β”œβ”€β”€ Could be the system related command which non root user does not have the access

β”‚ β”‚ β”œβ”€β”€ Could be the user defined script/command

β”‚ β”œβ”€β”€ Troubleshooting

β”‚ β”‚ β”œβ”€β”€ permission/ownership of the command/script

β”‚ β”‚ β”œβ”€β”€ sudo permission

β”‚ β”‚ β”œβ”€β”€ absolute/relative path of command/script

β”‚ β”‚ β”œβ”€β”€ not defined in user $PATH variable

β”‚ β”‚ β”œβ”€β”€ command is not installed

β”‚ β”‚ β”œβ”€β”€ command library is missing or deleted

└── …

Issue 12 : System Unexpectedly reboot and process restart ? Approach / Solution :

.

β”œβ”€β”€ Troubleshooting and Resolution

β”‚ β”œβ”€β”€ System reboot/crash reasons

β”‚ β”‚ β”œβ”€β”€ CPU stress

β”‚ β”‚ β”œβ”€β”€ RAM stress

β”‚ β”‚ β”œβ”€β”€ Kernel fault

β”‚ β”‚ β”œβ”€β”€ Hardware fault

β”‚ β”œβ”€β”€ Process restart

β”‚ β”‚ β”œβ”€β”€ System reboot

β”‚ β”‚ β”œβ”€β”€ Restart itself

β”‚ β”‚ β”œβ”€β”€ Watchdog application

β”‚ β”‚ β”‚ β”œβ”€β”€ To prevent high stress on system resources

β”‚ β”‚ β”‚ β”œβ”€β”€ If application causing stress, so it will restart or terminate

β”‚ β”œβ”€β”€ Troubleshooting

β”‚ β”‚ β”œβ”€β”€ After logged in, check the status by using commands like uptime, top, dmesg, journalctl, iostat -xz 1

β”‚ β”‚ β”œβ”€β”€ syslog.log, boot.log, dmesg, messages.log etc

β”‚ β”‚ β”œβ”€β”€ custom log path of applicatoin

β”‚ β”‚ β”œβ”€β”€ if not completely accessible, so take the virutal console like from ILO, IDRAC etc

β”‚ β”‚ β”œβ”€β”€ open a case and reach out a vendor

└── …

Issue 13 : Unable to get IP Address Approach / Solution :

.

β”œβ”€β”€ IP Assignment Methods

β”‚ β”œβ”€β”€ DHCP

β”‚ β”‚ β”œβ”€β”€ Fixed Allocation

β”‚ β”‚ β”œβ”€β”€ Dynamic Allocation

β”‚ β”œβ”€β”€ Static

β”œβ”€β”€ Troubleshooting

β”‚ β”œβ”€β”€ check network setting from virtualization environment like VMware, VirtualBox or etc

β”‚ β”œβ”€β”€ check the IP address is assigned or not

β”‚ β”œβ”€β”€ check the NIC status from host side using #lspci, #nmcli etc

β”‚ β”œβ”€β”€ restart network service

└── …

Issue 14 : Backup and Restore File Permissions in Linux Approach / Solutions :

.

β”œβ”€β”€ Troubleshooting

β”‚ β”œβ”€β”€ The best option is to create the ACL file of Dir/Files before changing the permissions in bulk

β”‚ β”‚ β”œβ”€β”€ Create the acl file before changing the permission (or backup the file permission): ~$ getfacl -R <dir> > permissions.acl

β”‚ β”‚ β”œβ”€β”€ Restore File Permissions: ~$ setfacl β€” restore=permissions.acl

β”‚ β”œβ”€β”€ Restore from the VM Snapshot (But not always a good option for production)

β”‚ β”œβ”€β”€ Rebuild the VM (this option is safe for future)

└── …

Useful Tip Related Disk Partition :

.

β”œβ”€β”€ Tips

β”‚ β”œβ”€β”€ After adding/attaching a new disk to a VM, we can get its status from lsblk command by doing ~$echo 1 > /sys/block/sda/device/rescan

β”‚ β”œβ”€β”€ If we increase disk size of existing disk than the additional space get appended to the existing disk without affecting the already existed FileSystem and Partition

β”‚ β”œβ”€β”€ We can also recreate the filesystem on block device as it will automatically format the old one

β”‚ β”œβ”€β”€ If we have a disk(with created partition/FS) we can share the .vmdk to other VM. So after mounting we would have a same data as it was on previous one.

└── END

If you got some help from this then do share with your friends also .Follow Neel Shah for latest updates around Devops !!

The credits of the above trouble shooting techniques is to anonymous found from linkedin.

--

--

Neel Shah
Neel Shah

Written by Neel Shah

Developer Advocate at Middleware || Open Source and Devops Guy || Building DevOps Communities @hashicorp @cncf @docker