Details
-
Bug
-
Status: Done
-
Low
-
Resolution: Fixed
-
None
-
None
Description
**Reported in Launchpad by Pim van der Wal last update 23-08-2017 06:48:06
pt-table-sync 2.2.12
Percona Server 5.5.24
pt-table-sync does not pick up the character set from the table definition that it is using and instead uses the character set that is specified by the client. This can actually cause data to be modified on the master to the wrong character set. For example:
1. Create a utf8 table on the master
create table t1 (a int not null primary key, b varchar(200)) character set=utf8;
2. Insert utf8 data:
insert into t1 (a,b) values (1,'');
3. Remove the data on the slave only (sql_log_bin is only for master-master):
set sql_log_bin=0; delete from t1 where a=1;
4. Run pt-table-sync on the slave without specifying the character set:
pt-table-sync --print --sync-to-master h=localhost,D=test,t=t1
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', '??') /percona-toolkit src_db:test src_tbl:t1 src_dsn=test,P=3306,h=test-host-1,t=t1 dst_db:test dst_tbl:t1 dst_dsn
=test,h=localhost,t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:15724 user:root host:test-host-2/;
This will overwrite the data on the master with ??. Specifying --charset=utf8 fixes the problem for this table:
pt-table-sync --print --sync-to-master h=localhost,D=test,t=t1 --charset=utf8
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', '') /percona-toolkit src_db:test src_tbl:t1 src_dsn:A=utf8,D=test,P=3306,h=test-host-1,t=t1 dst_db:test dst_tbl:t1 dst_dsn:A=utf8,D=test,h=localhost,t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:16095 user:root host:test-host-2/;
Unfortunately that forces the same character set on all tables for that command which makes it harder to run on servers with multiple character sets. Ideally pt-table-sync would figure out what the character set the table in question is and generate the statements using the correct character set. To allow the output to contain multiple character sets the character set for non-default character set strings should be specified like this: _utf8'string' although I can see how this would be difficult for tables with multiple character sets and when output is captured in a file.