It is always crucial to understand the issue. There should be the right approach or a step-by-step process to be followed to troubleshoot the issues. Doesnβt matter you are a
Software Developer or DevOps Engineer or an Architect, Unix./Linux is used widely and you should be aware with the issues and correct approach to resolve it.
Letβs discuss on the few of them :
Issue 1 : Server is not reachable or unable to connect Approach / Solution :
.
βββ Ping the server by Hostname and IP Address
β βββ Hostname/IP Address is pingable
β β βββ Issue might be on the client side as server is reachable
β βββ Hostname is not pingable but IP Address is pingable
β β βββ Could be the DNS issue
β β β βββ check /etc/hosts
β β β βββ check /etc/resolv.conf
β β β βββ check /etc/nsswitch.conf
β β β βββ (Optional) DNS can also be defined in the
/etc/sysconfig/network-scripts/ifcfg-<interface>
β βββ Hostname/IP Address both are not pingable
β β βββ Check the other server on its same network to see if there is Network side access issue or other overall something bad
β β β βββ False: Issue is not overall network side but its with that host/server
β β β βββ True: Might be overall network side issue
β β βββ Logged into server by Virtual Console, if the server is PoweredON. Check the uptime
β β βββ Check if the server has the IP, and has UP status of Network interface
β β β βββ (Optional) Also check IP related information from
/etc/sysconfig/network-scripts/ifcfg-<interface>
β β βββ Ping the gateway, also check routes
β β βββ Check Selinux, Firewall rules
β β βββ Check physical cable conn
Issue 2 : Unable to connect to website or an application Approach / Solution :
.
βββ Ping the server by Hostname and IP Address
β βββ False: Above Troublshooting Diagram βServer is not reachable or cannot connectβ
β βββ True: Check the service availabilty by using telnet command with port
β β βββ True: Service is running
β β βββ False: Service is not reachable or running
β β β βββ Check the service status using systemctl or other command
β β β βββ Check the firewall/selinux
β β β βββ Check the service logs
β β β βββ Check the service configuration
βββ β¦
Issue 3 : Unable to ssh as root or any other user. Approach / Solution :
.
βββ Ping the server by Hostname and IP Address
β βββ False: Above Troublshooting Diagram βServer is not reachable or cannot connectβ
β βββ True: Check the service availabilty by using telnet command with port
β β βββ True: Service is running
β β β βββ Issue migh be on client side
β β β βββ User might be disabled, nologin shell, disabled root login and other configuration
β β βββ False: Service is not reachable or running
β β β βββ Check the service status using systemctl or other command
β β β βββ Check the firewall/selinux
β β β βββ Check the service logs
β β β βββ Check the service configuration
βββ β¦
Issue 4 : Disk Space is full issue or add/extend disk space Approach / Solution :
.
βββ System Performance degradation detection
β βββ Application getting slow/unresponsive
β βββ Commands are not running (For Example: as / disk space is full)
β βββ Cannot do logging and other etc
βββ Analyse the issue
β βββ df command to find the problematic filesystem space issue
βββ Action
β βββ After finding the specific filesystem, use du command in that filesystem to get which files/directories are large
β βββ Compress/remove big files
β βββ Move the items to another partition/server
β βββ Check the health status of the disks using badblocks command (For Example: #badblocks -v /dev/sda)
β βββ Check which process is IO Bound (using iostat)
β βββ Create a link to file/dir
βββ New disk addition
β βββ Simple partition
β β βββ Add disk to VM
β β βββ Check the new disk with df/lsblk command
β β βββ fdisk to create partition. Better to have LVM partition
β β βββ Create filesytem and mount it
β β βββ fstab entry for persistent
β βββ LVM Partition
β β βββ Add disk to VM
β β βββ Check the new disk with df/lsblk command
β β βββ fdisk to create LVM partition
β β βββ PV, VG, LV
β β βββ Create filesytem and mount it
β β βββ fstab entry for persistent
β βββ Extend LVM partition
β β βββ Add disk, and create LVM partition
β β βββ Add LVM partition (PV) in existing VG
β β βββ Extend LV and resize filesystem
βββ β¦
Issue 5 : Filesystem corrupted Approach / Solution :
.
βββ One of the error that cause the system unable to BOOT UP
βββ Check /var/log/messages, dmesg and other log files
βββ If we have a badsector logs, we have to run fsck
β βββ True:
β β βββ reboot the system into resuce mode as booting it from CDROM by applying ISO
β β βββ proceed with option 1, which mount the original root filesystem under
/mnt/sysimage
β β βββ edit fstab entries or create a new file with the help of blkid and reboot
βββ β¦
Issue 6 : fstab file missing or bad entry Approach / Solution :
.
βββ One of the error that cause the system unable to BOOT UP
βββ Check /var/log/messages, dmesg and other log files
βββ If we have a badsector logs, we have to run fsck
β βββ True:
β β βββ reboot the system into resuce mode as booting it from CDROM by applying ISO
β β βββ proceed with option 1, which mount the original root filesystem under
/mnt/sysimage
β β βββ edit fstab entries or create a new file with the help of blkid and reboot
βββ β¦
Issue 7 : Canβt cd to the directory even if user has sudo privileges Approach / Solution :
.
βββ Reasons and Resolution
β βββ Directory does not exist
β βββ Pathname conflict: relative vs absolute path
β βββ Parent directory permission/ownership
β βββ Doesnβt have executable permission on target directory
β βββ Hidden directory
βββ β¦
Issue 8 : Canβt Create Links Approach / Solution :
.
βββ Reasons and Resolution
β βββ Target directory/File does not exist
β βββ Pathname conflict: relative vs absolute path β (should be complete path)
β βββ Parent directory permission/ownership
β βββ Target file permission/ownership β (as there should be read permission)
β βββ Hidden directory/file
βββ β¦
Issue 9 : Running Out of Memory Approach / Solution :
.
βββ Types
β βββ Cache (L1, L2, L3)
β βββ RAM
β β βββ Usage
β β β βββ #free -h
β β β β βββ Total (Total assigned memory)
β β β β βββ Used (Total actual used memory)
β β β β βββ Free (Actual free memory)
β β β β βββ Shared (Shared Memory)
β β β β βββ Buff/Cache (Pages cache memory)
β β β β βββ Available (Memory can be freed)
β β β βββ /proc/meminfo
β β β β βββ file active
β β β β βββ file inactive
β β β β βββ anon active
β β β β βββ anon inactive
β βββ Swap (Virtual Memory)
βββ Resolution
β βββ Identify the processes that are using high memory using top, htop, ps etc.
β βββ Check the OOM in logs and also check if there is a memory commitment in sysctl.conf
β βββ Kill or restart the process/service
β βββ prioritize the process using nice
β βββ Add/Extend the swap space
β βββ Add more physical more RAM
βββ β¦
Issue 10 : Add/ Extend the Swap Space Approach / Solution :
.
βββ Due to running out of memory, we would need to add more swap space
β βββ Create a file with #dd, as it will reserve the blocks of disk for swap file
β βββ Set permission 600 and give root ownership
β βββ #mkswap
β βββ Now Turned swap on #swapon
β βββ fstab entry for persistent
βββ β¦
Issue 11 : Unable to Run Certain Commands Approach / Solution :
.
βββ Troubleshooting and Resolution
β βββ command
β β βββ Could be the system related command which non root user does not have the access
β β βββ Could be the user defined script/command
β βββ Troubleshooting
β β βββ permission/ownership of the command/script
β β βββ sudo permission
β β βββ absolute/relative path of command/script
β β βββ not defined in user $PATH variable
β β βββ command is not installed
β β βββ command library is missing or deleted
βββ β¦
Issue 12 : System Unexpectedly reboot and process restart ? Approach / Solution :
.
βββ Troubleshooting and Resolution
β βββ System reboot/crash reasons
β β βββ CPU stress
β β βββ RAM stress
β β βββ Kernel fault
β β βββ Hardware fault
β βββ Process restart
β β βββ System reboot
β β βββ Restart itself
β β βββ Watchdog application
β β β βββ To prevent high stress on system resources
β β β βββ If application causing stress, so it will restart or terminate
β βββ Troubleshooting
β β βββ After logged in, check the status by using commands like uptime, top, dmesg, journalctl, iostat -xz 1
β β βββ syslog.log, boot.log, dmesg, messages.log etc
β β βββ custom log path of applicatoin
β β βββ if not completely accessible, so take the virutal console like from ILO, IDRAC etc
β β βββ open a case and reach out a vendor
βββ β¦
Issue 13 : Unable to get IP Address Approach / Solution :
.
βββ IP Assignment Methods
β βββ DHCP
β β βββ Fixed Allocation
β β βββ Dynamic Allocation
β βββ Static
βββ Troubleshooting
β βββ check network setting from virtualization environment like VMware, VirtualBox or etc
β βββ check the IP address is assigned or not
β βββ check the NIC status from host side using #lspci, #nmcli etc
β βββ restart network service
βββ β¦
Issue 14 : Backup and Restore File Permissions in Linux Approach / Solutions :
.
βββ Troubleshooting
β βββ The best option is to create the ACL file of Dir/Files before changing the permissions in bulk
β β βββ Create the acl file before changing the permission (or backup the file permission): ~$ getfacl -R <dir> > permissions.acl
β β βββ Restore File Permissions: ~$ setfacl β restore=permissions.acl
β βββ Restore from the VM Snapshot (But not always a good option for production)
β βββ Rebuild the VM (this option is safe for future)
βββ β¦
Useful Tip Related Disk Partition :
.
βββ Tips
β βββ After adding/attaching a new disk to a VM, we can get its status from lsblk command by doing ~$echo 1 > /sys/block/sda/device/rescan
β βββ If we increase disk size of existing disk than the additional space get appended to the existing disk without affecting the already existed FileSystem and Partition
β βββ We can also recreate the filesystem on block device as it will automatically format the old one
β βββ If we have a disk(with created partition/FS) we can share the .vmdk to other VM. So after mounting we would have a same data as it was on previous one.
βββ END
If you got some help from this then do share with your friends also .Follow Neel Shah for latest updates around Devops !!
The credits of the above trouble shooting techniques is to anonymous found from linkedin.