2011-04-03

Tracing From An "idle in transaction" Postgres Backend To The Responsible Process On A Remote System

We've all seen those nasty little "idle in transaction" processes. So I can remember, here's how to go from an unhappy postgres backend to the guilty process on a remote system.

1) Find the unhappy backend

$ ps auxww | grep postgres | grep 'idle in transaction'
postgres 22838 1.2 0.3 74476 31736 ? Rs 16:40 4:36 postgres: nutricate trans_production 10.177.66.254(47257) idle in transaction

2) We now have a remote IP and port number. PostgreSQL 9.0 is quite forthcoming about this, but older versions (and other servers) aren't quite so friendly. Once you have the PID of the unhappy postmaster, you can use either lsof or netstat to find out about the tcp connection, if ps didn't give it to you.

$ sudo netstat -ep | grep $PID
tcp 0 138 db-trans01:postgresql db-reports01:57882 ESTABLISHED postgres 702200 22838/postgres: nutr

3) ssh into the remote server (db-reports01 in the above output) and use netstat again and grep for the port number (57882 in the output above) to figure out which process owns the other end of the tcp connection.

$ sudo netstat -ep | grep 57882
tcp 0 90 db-reports01:57882 db-trans01:postgresql ESTABLISHED nutricate 672248278 16388/python2.6 

4) We now have the PID of the misbehaving process and can delve deeper to find out what it is, and eventually what it's doing.

$ ps auxww | grep 16388
2000 16388 67.6 8.0 275056 89624 pts/8 Rl+ 16:26 212:26 /usr/bin/python2.6 lib/nutricate/django/reports/summarize_parallel.py root

0 comments: