Monitoring Linux Performance – 12 Essential Commands for Sysadmins

As a Linux sysadmin, keeping track of system performance is critical. A sluggish or overloaded server can lead to unhappy users, lost revenue, and even system crashes.

Having the right tools and know-how to monitor, analyze and troubleshoot performance issues can help identify problems early and prevent impact to users.

In this comprehensive guide, we will explore 12 powerful command line tools in Linux to inspect different aspects of system performance.

Why Monitoring Performance Matters

While Linux is capable of running smoothly for months without issues, servers under load can develop problems over time that degrade responsiveness. Issues like:

  • High CPU or RAM utilization
  • Slow disk I/O
  • Network bottlenecks
  • Running out of free memory
  • Failures of components

Performance problems manifest in different ways – websites loading slowly, applications hanging or timing out, users unable to login, and so on.

Without monitoring, these issues may be found out only after users start complaining about disruption. This leads to extended downtimes and loss of business.

However with active monitoring, abnormalities can be detected early and preemptive action taken before major impact. Proactive monitoring therefore minimizes problems faced by users.

Now let‘s get into the commands…

Monitoring CPU/Process Activity

1. top

The top command provides a dynamic real-time view of the running system. It can display a sorted list of processes ranked by CPU, memory, and other metrics.

Here is an example output:

top command output

Key fields include:

  • %CPU – CPU utilization percentage
  • %MEM – Memory utilization percentage
  • TIME+ – Total CPU time used
  • PID – Process ID

Monitoring top helps identify processes hogging CPU cycles or memory. Spikes may indicate a process is stuck in a loop or leaking memory.

top also shows total CPU utilization at the top, so if this value is consistently high (over 80%), it likely indicates insufficient processing power.

Limitation is that top only gives a snapshot of current process activity. For longer term data, tools like sar would be better.

2. ps

The ps command gives a static one time view of running processes. The most common flags:

ps aux - Snapshot of all running processes including their owners
ps ef - Full list without any truncation  

Example showing a truncated list:

ps output

Filtering ps output for a specific process helps determine if it is running or not.

3. pidstat

The pidstat tool monitors and reports statistics for Linux tasks/processes. It can measure CPU, memory, I/O usage per process and system wide.

Monitoring CPU per process ID:

pidstat -p 1234 -u

pidstat output per process

System-wide CPU stats:

pidstat -u

The tool is very versatile for drilling down usage either system-wide or per process level over custom intervals.

Handy for correlating process spikes seen in top / ps with more detailed metrics.

Memory Usage Commands

4. free

The most basic command for checking memory utilization is free. It displays total, used and free memory on the system.

           total        used        free      shared  buff/cache   available
Mem:        7854        1075         311           2        6467        6182
Swap:         0           0           0

If the available memory is consistently low, or swap usage high, it likely indicates insufficient RAM. Upgrading memory or switching to 64-bit OS could help.

5. /proc/meminfo

A more detailed output is available in /proc/meminfo:

cat /proc/meminfo

MemTotal:        7853332 kB
MemFree:          312884 kB
MemAvailable:    6181796 kB 
...

This helps break down slab and page cache usage in memory along with other advanced metrics. Any abnormal values are easier to spot.

Monitoring /proc/meminfo helps correlate metrics like page faults, swapping activity with application/OS behavior.

Storage Performance Commands

6. iostat

The iostat tool monitors and reports input/output statistics for block devices and partitions.

To view basic disk I/O rates:

iostat -d -x -k

iostat output

Metrics like await time and service time indicate issues if consistently high. This could happen with struggling disks or a saturated I/O subsystem.

For correlating device latency with process, the -p option can be used. This is handy for tracing storage performance bottlenecks.

7. ioping

ioping measures disk read/write latency in a way similar to classic ping. It helped determine whether storage is slowing things down.

ioping -c 10 .

--- . (ext4 /dev/sda1) ioping statistics ---
10 requests completed in 1.22 ms, 81 iops, 7.93 mb/s
min/avg/max/mdev = 0.08/0.12/0.35/0.10 ms

Here we see the disk is able to handle 80+ IOPS with average latency of 0.12 ms. For SSDs this would be much lower.

Any high latency above a few ms typically indicates hardware problems on that disk or array.

Network Monitoring Commands

8. netstat

The netstat command prints network connections, routing tables, interface statistics and more.

Some examples –

Active connections:

netstat -natp

Network interfaces:

netstat -i

Kernel IP routing table:

netstat -r

Any errors indicate failures, bottlenecks or misconfiguration. Useful for tracing connectivity issues.

9. tcpdump

The tcpdump tool captures network packet data that matches specific criteria. This helps analyze traffic flowing between application servers.

Below dumps packets going to/from port 80:

tcpdump -i any -n -A ‘port 80‘

tcpdump output

Monitoring raw traffic is useful for verifying known good behavior and identifying anomalies. Issues like duplicate packets, timeouts, unusual payloads all get captured.

Limitation is manual analysis of dumps is complex and only useful for focused troubleshooting.

System Health and Activity Reporting

10. uptime

The most basic indicator of system health is the uptime command. This displays OS start time and 1/5/15 minute load averages.

For example:

uptime

15:17:34 up  2:23,  3 users,  load average: 0.33, 0.36, 0.59

The load averages quantify demand – how much waiting processes are queued over each time period.

Higher than normal load is the first indicator of potential performance degradation (usually from CPU or disk bottleneck).

11. dmesg

The Linux kernel manages devices, hardware errors and keeps internal diagostics logs called message buffers.

Viewing the buffer with dmesg provides valuable troubleshooting data. Any hardware/driver errors, unusual events all get logged here.

dmesg | tail  

...
[93201.359338] bridge firewalling registered
[93211.115069] perf: interrupt took too long (3163 > 3162), lowering kernel.perf_event_max_sample_rate to 63000

Monitoring it regularly helps correlate user reported issues with any kernel alerts for that timeframe.

12. sar

The sar tool collects and reports extensive system activity metrics including –

  • CPU
  • Memory
  • Paging
  • Disk I/O
  • Network
  • Kernel tables

It does this system-wide as well as per core CPU/disk device.

Reports get saved over time. Historical statistics then help determine peak usage patterns and anomalies.

View CPU use over 5 second intervals:

sar -u 5

sar cpu usage over time

For storage devices:

sar -d 5

Handy for longer term correlation and baselining expected load. The diversity of metrics in sar surpasses most monitoring tools.

Real World Example

Let‘s walk through troubleshooting a hypothetical performance issue leveraging many of these tools.

  • Users start reporting website loading extremely slowly
  • Checking uptime, load averages are high – initial indicator of issue
  • Running top shows overall CPU usage over 90% consistently – system under high load
  • Sort top by CPU usage and MySQL is using 30%+ cycles – potential culprit identified
  • Get MySQL process ID and run pidstat to see CPU usage, disk IO over time – gather supporting metrics
  • Compare iostat disk metrics when MySQL CPU is high vs normal – looks like IO bottleneck
  • Check kernel message buffer (dmesg) and logs for errors – no hardware or driver failures

After gathering all this supporting data, we can conclude the MySQL server is causing resource contention leading to slow website performance under load.

Potential fixes could be optimizing database queries, adding more CPU power or faster storage for MySQL to run on.

Conclusion

Linux provides a phenomenal set of tools for systems administrators to inspect running servers and troubleshoot issues. Knowing which ones to use for CPU, memory, disk and networking metrics takes time to master.

We covered the 12 most popular command line utilities used by sysadmins daily to monitor Linux performance. Each has specific capabilities – like short term monitoring, long term history, system level or drill down process level.

Using a combination of these tools provides comprehensive visibility for proactive alerting and speedy diagnosis of performance degradation. They form an invaluable arsenal that every Linux professional should be equipped with to battle outages!