Linux performance analysis: Analyzing Linux performance

Analyzing Linux performance

This blog will contains a basic guide for beginners on how to analyze performance issues on Linux servers.

There are many technics and many ways to approach such issues, I'll try to describe things which worked for me.
I'm always happy to learn new things, so if you think there are better ways or missing stuff, let me know and I'll try to improve the page.

The major three bottlenecks:

1. CPU

2. Disk (IO)

3. Memory

When an application is not performing as expected, usually we will try to see which of the above is limiting the application.

If for example the application is taking 100% of the server's CPU, it means the application is CPU bound. This means the reason the application can't go faster is because lack of CPU.

In some cases the solution may be adding CPU power, or fixing the application algorithm.
Finding the resources which are limiting our application is important both as system administrators and programmers.

Sometimes our application can provide poor results because other applications are taking the system's resources. This is why it is important to start with a wide check of the server and see which processes are taking most of the resources.

Starting the analysis:

When starting examining a new machine we can start with the uname command in order to see what setup are we dealing with. For example:

From the command we can learn basic information on the machine.

Stuff like kernel name / version / release, processor type, platform name, operating system and node name. This might be useful to check if we are on a 32 / 64 bit system, right host, etc...

The next tool we will usually move to when checking a new setup is top.

It provides good overview data on the system including most significant processes data.

For writing the blog I've installed Ubuntu server image.

Bellow you can see the output of the top command:

The first line is similar to what the uptime command provides.
You will get the system time, up time, users number and load average.
Load average is a very important information so please start with reading about it in the command's man page:
uptime ubuntu man page

On an idle machine (like the one in the picture above) you will see that the load average are very low and almost zero. If the machine has 2 CPUs and the load average is close to 2 this means that the machine is fully occupied. Since the load average includes both CPU and IO active processes it can provide us a basic summary on both resources. Memory issues usually becomes IO issue so these numbers will usually covers all three resources problems.
Another benefit of the load average numbers is that it provides data on the last 1, 5 and 15 minutes.
So by just looking on this numbers we can see overall load in the last 15 minutes and by comparing it to the average of the last 1 and 5 minutes we can see if the load is steady or increasing / decreasing.

As mentioned in the the man page, the numbers in the load average depends on the number of cores.
To get more information on the cores we have we can look at /proc/cpuinfo.
This file has plenty of data on the machine's setup and you can get additional information by some file manipulations like done at LINUX: SHOW THE NUMBER OF CPU CORES AND SOCKETS ON YOUR SYSTEM

Another more simple approach to find the number of cores is to press 1 while you are in top.
This will provide you with a line per CPU as in the picture bellow:

Single-threaded vs multi-threaded

The above top view with a line per core is very helpful to understand better CPU bottlenecks in multicore setups.

After pressing 1 in top, in multicore / multi cpu / hyper threading machines you will see multiple lines, one per each logical CPU. If you have 8 virtual CPUs it means that your maximum CPU throughput per process as top shows it is 800% (if the process fully uses all 8 cores). In lots of situations where the application or task you are profiling are single threaded it means it will utilize only one virtual CPU out of the 8. So when you will run top on the machine although your application is CPU bound and is limited by CPU power you will see only 12.5% used in the top CPU summary line (100/8) although your application is taking 100% of a single core. Because top is showing you the average of all 8 cores it will show only 12.5 if all other 7 cores are idle. When pressing 1 and getting a line per core you can see if one of the cores is fully utilized. Another good way to find CPU bound applications is to see if the top CPU processes are using 100% CPU (or more). If so it means there is a chance that a single thread has reached the core limit. You should remember that a single process can take 800% cpu on 8 cores setup since it sums up the usage of all available cores.

In the following example you can see a setup with two cores where 2 single threaded processes are taking 100% CPU each but still the CPU summary line on the top shows 100% too since it is the average of both cores.

* For simulating a 100% single threaded application you can run "cat /dev/urandom > /dev null". This command copies random data to dev/null which requires lots of CPU but doesn't affect the rest of the resources.

The default processes / threads sort order of top is by CPU consumption (descending). So the processes taking the most CPU will be on the top lines of the table. This makes it easy for us to see if there are specific processes which are taking most of the CPU resources.

In cases we have a multi-threaded application and we want to understand better if one of the threads in the application is blocked on CPU we can run top with "-H" on the command line or press "H" while already in top and switch between process view to thread view. This will show threads instead of processes. Since lots of Java application provide names for their running threads we can even see specifically the names of the running tasks.

Below you can see the result of a load test run on the Ubuntu default Tomcat server. The stress test is run using Apachae JMeter and includes simple web get requests of the default page.
When we run top in default mode we will see a single line for java (since there is a single Tomcat process) taking 134%:

But after running it top with "-H" we get:

Now we see each thread in the Tomcat process and the amount of CPU it takes.
Please notice that besides the PID, %CPU, TIME+ and COMMAND columns all the rest are taken from the common process, so there is no use to sum them up.

Another useful key in top is "c". pressing it will replace the command column from displaying just the executable name to the full command line. For example:

Now we can see the full path of the executable and its command line parameters. Since the column is limited in size, you can use "ps -ef" to get a full listing of the command line you are looking for. So if we are looking for some java command line we can do: "ps -ef | grep java" and get:

Basic profiling (for programmers)

There are lots of good professional profilers out there, just to name few:
C / C++: VTune, OProfile, callgrind
Java: YourKit, JProfiler, VisualVM, Netbeans Profiler

But a lot of times taking few snapshots of the current process / thread which is taking all the CPU can be enough for quickly identifying the current program bottleneck.
For that we can use pstack (pstack) for C / C++ or for Java processes jstack (jstack).
By taking several snapshots of the current stack trace of the process we can see what the process is doing most of the time. If we see that in 7 out of 10 stack snapshots the process is inside a method called CUtils::WastingCPUTime, this is probably a good place to start looking for code optimization.

I'll planning on adding to this blog lots more.
This is just a little sample to gather inputs.
Please leave me a comment if there are specific topics you are interested in.
Thanks in advance,
Avner

Linux performance analysis

Sunday, February 8, 2015

Analyzing Linux performance

Analyzing Linux performance

The major three bottlenecks:

Starting the analysis:

Single-threaded vs multi-threaded

Now we can see the full path of the executable and its command line parameters. Since the column is limited in size, you can use "ps -ef" to get a full listing of the command line you are looking for. So if we are looking for some java command line we can do: "ps -ef | grep java" and get:

Basic profiling (for programmers)

Recommended reading :

Recommended tools:

No comments:

Post a Comment

About Me

Blog Archive