HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Extract query fingerprints and tables from large log files using pt-query-digest

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fingerprintstableslogquerydigestlargefilesusingextractand

Problem

I tried to parse a large log file using pt-query-digest but it gives "Out of memory!" when I try to read the file. I have 3GB RAM, 40GB empty space on HDD, the file is 20GB. I did not found anything in the documentation about memory.

I reduce the command at:

pt-query-digest --group-by fingerprint --print --no-report /path/to/largeLogFile.log
pt-query-digest --table-access /path/to/largeLogFile.log


I have 2 questions:

-
How can I extract the query fingerprints from a large log file generated by general_log ?

-
How can I extract all the tables that were used ?

Edit:

Tried @DTest --sample option (with 2 and 1), but the result is the same. Just before it reaches 50% memory in htop it dies every time (see screnshoot):

I did an strace just before it dies I get:

read(3, "phone from teams as t ri"..., 4096) = 4096
mremap(0x5b27a000, 1543499776, 1543503872, MREMAP_MAYMOVE) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1543503872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x64c2d000)                         = 0x8c2e000
mmap2(NULL, 1543634944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x5b07a000
munmap(0x5b07a000, 548864)              = 0
munmap(0x5b200000, 499712)              = 0
mprotect(0x5b100000, 135168, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 1543503872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "Out of memory!\n", 15Out of memory!
)        = 15
munmap(0x5b27a000, 1543499776)          = 0
... [output removed] ...


The only option I can think so far is splitting the file "bite size" pieces (let's say 20 files of 1GB each), digest them and merge the result.

Solution

I would try to use the --sample option:

pt-query-digest --sample 2 --print --no-report /path/to/largeLogFile.log


This would filter out all but the first 2 occurrences of a query fingerprint. --no-report removes the memory and cpu overhead of aggregating.

Code Snippets

pt-query-digest --sample 2 --print --no-report /path/to/largeLogFile.log

Context

StackExchange Database Administrators Q#21225, answer score: 2

Revisions (0)

No revisions yet.