has got wrong. Sometimes it is temporary performance issues and other times it
becomes very periodic in nature. That is when I really want to know what is
going wrong. Off course I would like to know what are the broad failures to
even start debugging. Is the hardware too old to run current software? Does a
particular linux distro like ubuntu has made bad performance choices and should
I just change the distro? Is there a huge problem in which Linux is designed
and I need to change to better designed OS's like BSD or MacOS or Vista? Is
there a known sever performance bug in some middleware (like X server) making
any blocking system calls?
However, the answer to these is often hard just because of the complexity in
which software systems interact with each other. For example, firefox may start
eating up memory and swap out my editor. I will not even be able to type. Some
may say that there is no application isolation in Linux. Others may say that
firefox or the javascript interpreter in firefox is very memory hogging. An
equally likely guess may be that the webpage that I tried to load contains
badly written javascript code or bad data. But these are still guesses and I
would like to know why I am suffering.
An exact answer requires first finding out which application is
misbehaving. After this more curious people can profile the application
on some dataset to find out the bottlenecks and probably find a fix. In
this post I will just talk about broad available tools
available in Linux that will help you find the misbehaving application
or the set of applications. After this one can use tools like gprof or
specalizind profiling that comes with a particular application. Most of
these is pretty trivial and I am not disclosing anything new.
First and foremost just run the command "top" and check memory usage, CPU usgae
and the overall load. You will see something like:
The load
averagedenotes the CPU load on the system for three time periods, one five
and fifteen minutes. My current laptop (Dell Inspiron with 13.3" form factor)
has two CPUs (dual core CPU) and so a load of below 2 means my CPUs are not
overloaded. If CPUs were overloaded then you have to check the top programs
that are taking most of the CPU time. The next important thing to look in the
figure is the memory usage. It shows total memory of 3GB out of which roughly
1.6GB are already used. Actually if you are running the system for a while you
will find that entire 3GB will be used. You may ask why is so much of memory is
used. The answer is that the linux is very aggressive in caching. It caches
any disk block that you read/write from/to the disk and improves performance of
the application. If there is available memory why not use it.
You will also see that the virtual memory of Xorg (the X server), firefox, and
rhythmbox (the music player) is very high. Most of them are taking more than
half a GB of virtual memory. The first thing that comes to my mind is that
these applications are too memory hogging. However I monitored the virtual
memory of firefox (by typing "less /proc/26745/smaps" firefox process id is
26745). I found that the heap is just 60MB and rest of the virtual memory is
all used by different libraries as shared objects. You do not need to worry
about these as linux maintains only one copy of these shared objects and they
are shared across a number of applications.
Sometimes the machine may not have enough memory to run all the applications
and you will find that everything is slow. This may mean that the applications
are hitting the disk and to look at the disk activity you should start
monitoring the virtual memory status together with disk I/O. Run "vmstat 1" and
you will see something like:
The important thing to look at is the swap activity and the I/O activity. If
you see heavy read or write activity on any of these that means you are I/O
bound in your applications.
Another tool to look at will be latencytop. It requires root privilages to run and you will something like
You can monitor overall system latency as well as latency of individual
application. It performs a number of tests like fsync on the disk, read from a
pipe etc and will show what are the main causes of latency for a particular
application. Thanks to Jon Oberheide for
bringing this tool into my notice.
Finally if any of your applications are stuck, you may like to use another tool
called strace. First find out
the process id f the application. For example, to find process id of firefox
run the command "ps aux|grep firefox". Then run strace -p
will find the system call if any it is stuck at.
No comments:
Post a Comment