patternsqlMinor
Determine when a PostgreSQL database was last changed
Viewed 0 times
postgresqllastdatabasedeterminewhenwaschanged
Problem
I'm looking at altering how backups are done and am wondering if there is a way to determine which databases in a postgreql cluster have not been recently changed?
Instead of using pg_dumpall, I'd like to use pg_dump and only dump those databases that have changed since the last backup (some databases don't get updated very often)-- the idea being that if nothing has changed then the current backup should still be good.
Does anyone know of a way to determine when a specific database was last updated/changed?
Thanks...
Update:
I was hoping to not have to write triggers all over tha place as I have no control over
the creation of databases in one particular cluster (let alone creation of db objects within a database).
Digging further, it looks like there is a correlation between the contents of the $PGDATA/global/pg_database file (specifically the second field) and the directory names under $PGDATA/base.
Going out on a limb, I'd guess that the second field of the pg_database file is the database oid and that each database has its own subdirectory under $PGDATA/base (with the oid for the subdirectory name). Is that correct? If so, is it reasonable to use the file timestamps from the files under $PGDATA/base/* as the trigger for needing a backup?
...or is there a better way?
Thanks again...
Instead of using pg_dumpall, I'd like to use pg_dump and only dump those databases that have changed since the last backup (some databases don't get updated very often)-- the idea being that if nothing has changed then the current backup should still be good.
Does anyone know of a way to determine when a specific database was last updated/changed?
Thanks...
Update:
I was hoping to not have to write triggers all over tha place as I have no control over
the creation of databases in one particular cluster (let alone creation of db objects within a database).
Digging further, it looks like there is a correlation between the contents of the $PGDATA/global/pg_database file (specifically the second field) and the directory names under $PGDATA/base.
Going out on a limb, I'd guess that the second field of the pg_database file is the database oid and that each database has its own subdirectory under $PGDATA/base (with the oid for the subdirectory name). Is that correct? If so, is it reasonable to use the file timestamps from the files under $PGDATA/base/* as the trigger for needing a backup?
...or is there a better way?
Thanks again...
Solution
While using
In the off chance that this may be useful for others, I'm including the backup script that I've put in place. This works for Pg 8.4.x but not for 8.2.x-- YMMV depending on the version of Pg used.
``
my %curr_stats = parse_stats (@stats);
# do a backup if needed
foreach my $datname (sort keys %curr_stats) {
my $needs_backup = 0;
if ($opts{all}) {
$needs_backup = 1;
}
elsif ( ! exists $last_stats{$datname} ) {
$needs_backup = 1;
warn "no last stats for $datname\n" if ($opts{debug});
}
else {
for (qw (tup_inserted tup_updated tup_deleted)) {
if ($last_stats{$datname}{$_} != $curr_stats{$datname}{$_}) {
$needs_backup = 1;
warn "$_ stats do not match for $datname\n" if ($opts{debug});
}
}
}
if ($needs_backup) {
backup_db ($datname);
}
else {
chitchat ("Database \"$datname\" does not currently require backing up.");
}
}
# update the pg_stat_database data
open my $fh, '>', $last_stats_file || die "Could not open $last_stats_file for output. !$\n";
print $fh @stats;
close $fh;
}
sub parse_stats {
my @in = @_;
my %stats;
chomp @in;
select datname, xact_commit from pg_stat_database; as suggested by @Jack Douglas doesn't quite work (apparently due to autovacuum), select datname, tup_inserted, tup_updated, tup_deleted from pg_stat_database does appear to work. Both DML and DDL changes will change the values of tup_* columns while a vacuum does not (vacuum analyze on the other hand...).In the off chance that this may be useful for others, I'm including the backup script that I've put in place. This works for Pg 8.4.x but not for 8.2.x-- YMMV depending on the version of Pg used.
``
#!/usr/bin/env perl
=head1 Synopsis
pg_backup -- selectively backup a postgresql database cluster
=head1 Description
Perform backups (pg_dump*) of postgresql databases in a cluster on an
as needed basis.
For some database clusters, there may be databases that are:
a. rarely updated/changed and therefore shouldn't require dumping as
often as those databases that are frequently changed/updated.
b. are large enough that dumping them without need is undesirable.
The global data is always dumped without regard to whether any
individual databses need backing up or not.
=head1 Usage
pg_backup [OPTION]...
General options:
-F, --format=c|t|p output file format for data dumps
(custom, tar, plain text) (default is custom)
-a, --all backup (pg_dump) all databases in the cluster
(default is to only pg_dump databases that have
changed since the last backup)
--backup-dir directory to place backup files in
(default is ./backups)
-v, --verbose verbose mode
--help show this help, then exit
Connection options:
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port number
-U, --username=NAME connect as specified database user
-d, --database=NAME connect to database name for global data
=head1 Notes
This utility has been developed against PostgreSQL version 8.4.x. Older
versions of PostgreSQL may not work.
vacuum does not appear to trigger a backup unless there is actually
something to vacuum whereas vacuum analyze appears to always trigger a
backup.
=head1 Copyright and License
Copyright (C) 2011 by Gregory Siems
This library is free software; you can redistribute it and/or modify it
under the same terms as PostgreSQL itself, either PostgreSQL version
8.4 or, at your option, any later version of PostgreSQL you may have
available.
=cut
use strict;
use warnings;
use Getopt::Long;
use Data::Dumper;
use POSIX qw(strftime);
my %opts = get_options();
my $connect_options = '';
$connect_options .= "--$_=$opts{$_} " for (qw(username host port));
my $shared_dump_args = ($opts{verbose})
? $connect_options . ' --verbose '
: $connect_options;
my $backup_prefix = (exists $opts{host} && $opts{host} ne 'localhost')
? $opts{backup_dir} . '/' . $opts{host} . '-'
: $opts{backup_dir} . '/';
do_main();
########################################################################
sub do_main {
backup_globals();
my $last_stats_file = $backup_prefix . 'last_stats';
# get the previous pg_stat_database data
my %last_stats;
if ( -f $last_stats_file) {
%last_stats = parse_stats (split "\n", slurp_file ($last_stats_file));
}
# get the current pg_stat_database data
my $cmd = 'psql ' . $connect_options;
$cmd .= " $opts{database} " if (exists $opts{database});
$cmd .= "-Atc \"
select date_trunc('minute', now()), datid, datname,
xact_commit, tup_inserted, tup_updated, tup_deleted
from pg_stat_database
where datname not in ('template0','template1','postgres'); \"";
$cmd =~ s/\ns+/ /g;
my @stats = $cmd`;my %curr_stats = parse_stats (@stats);
# do a backup if needed
foreach my $datname (sort keys %curr_stats) {
my $needs_backup = 0;
if ($opts{all}) {
$needs_backup = 1;
}
elsif ( ! exists $last_stats{$datname} ) {
$needs_backup = 1;
warn "no last stats for $datname\n" if ($opts{debug});
}
else {
for (qw (tup_inserted tup_updated tup_deleted)) {
if ($last_stats{$datname}{$_} != $curr_stats{$datname}{$_}) {
$needs_backup = 1;
warn "$_ stats do not match for $datname\n" if ($opts{debug});
}
}
}
if ($needs_backup) {
backup_db ($datname);
}
else {
chitchat ("Database \"$datname\" does not currently require backing up.");
}
}
# update the pg_stat_database data
open my $fh, '>', $last_stats_file || die "Could not open $last_stats_file for output. !$\n";
print $fh @stats;
close $fh;
}
sub parse_stats {
my @in = @_;
my %stats;
chomp @in;
Code Snippets
#!/usr/bin/env perl
=head1 Synopsis
pg_backup -- selectively backup a postgresql database cluster
=head1 Description
Perform backups (pg_dump*) of postgresql databases in a cluster on an
as needed basis.
For some database clusters, there may be databases that are:
a. rarely updated/changed and therefore shouldn't require dumping as
often as those databases that are frequently changed/updated.
b. are large enough that dumping them without need is undesirable.
The global data is always dumped without regard to whether any
individual databses need backing up or not.
=head1 Usage
pg_backup [OPTION]...
General options:
-F, --format=c|t|p output file format for data dumps
(custom, tar, plain text) (default is custom)
-a, --all backup (pg_dump) all databases in the cluster
(default is to only pg_dump databases that have
changed since the last backup)
--backup-dir directory to place backup files in
(default is ./backups)
-v, --verbose verbose mode
--help show this help, then exit
Connection options:
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port number
-U, --username=NAME connect as specified database user
-d, --database=NAME connect to database name for global data
=head1 Notes
This utility has been developed against PostgreSQL version 8.4.x. Older
versions of PostgreSQL may not work.
`vacuum` does not appear to trigger a backup unless there is actually
something to vacuum whereas `vacuum analyze` appears to always trigger a
backup.
=head1 Copyright and License
Copyright (C) 2011 by Gregory Siems
This library is free software; you can redistribute it and/or modify it
under the same terms as PostgreSQL itself, either PostgreSQL version
8.4 or, at your option, any later version of PostgreSQL you may have
available.
=cut
use strict;
use warnings;
use Getopt::Long;
use Data::Dumper;
use POSIX qw(strftime);
my %opts = get_options();
my $connect_options = '';
$connect_options .= "--$_=$opts{$_} " for (qw(username host port));
my $shared_dump_args = ($opts{verbose})
? $connect_options . ' --verbose '
: $connect_options;
my $backup_prefix = (exists $opts{host} && $opts{host} ne 'localhost')
? $opts{backup_dir} . '/' . $opts{host} . '-'
: $opts{backup_dir} . '/';
do_main();
########################################################################
sub do_main {
backup_globals();
my $last_stats_file = $backup_prefix . 'last_stats';
# get the previous pg_stat_database data
my %last_stats;
if ( -f $last_stats_file) {
%last_stats = parse_stats (split "\n", slurp_file ($last_stats_file));
}
# get the current pg_stat_database data
my $cmd = 'psql ' . $connect_options;
$cmd .= " $opts{database} " if (exists $opts{database});
$cmd .= Context
StackExchange Database Administrators Q#5110, answer score: 9
Revisions (0)
No revisions yet.