No RSS feeds have been linked to this section.

Entries in pt-query-digest (1)

Monday
Dec312012

Using tcpdump and pt-query-digest for Low Level Traffic Analysis

Recently we had an "interesting" situation arise that required a bit of work and a bit of thought. The issue was that we had some slave servers that began "hanging". We noticed that a table change didn't get propogated down one of the slaves and investigation showed that the master binary log execution position (Exec_Master_Log_Pos in the output of SHOW SLAVE STATUS\G) was not changing. There were no errors in the error log. Stopping slave on the server and restarting it brought replication back online but it only lasted for a few minutes. Then the exact same behaviour was exhibited -- no errors but replication hung.

It was an odd situation and one in which I thought the core problem was a network issue. However, I had to prove the problem to other people. I thought it was a good time to pull out one of the many tools in the Percona Toolkit. In this case I combined the standard Unix tcpdump tool with pt-query-digest to easily verify that, in fact, the slave server was periodically just "losing contact" with the master -- it would send out a request, but the master would never receive it. 

The following was run on the master server. I using the host option narrowed down the data being logged the specific slave (host.ip.address) as this server had multiple slaves. This data was then piped into query digest where it gave me the results in real time.

tcpdump -s 65535 -x -nn -q -tttt -i any host host.ip.address | pt-query-digest --type tcpdump --print --noreport

The following was run on the slave: 

tcpdump -s 65535 -x -nn -q -tttt -i any  port 3306 | pt-query-digest --type tcpdump --print --noreport

 In this case I capatured all data on port 3306 on any interface. A few minutes of analysis of these commands running simultaneously and it became very clear that the core issue was a network problem (since resolved thankfully).