[PT-1551] pt-table-checksum fails on MySQL 8.0.11 Created: 11/May/18  Updated: 22/Jun/18  Resolved: 22/Jun/18

Status: Done
Project: Percona Toolkit
Component/s: None
Affects Version/s: None
Fix Version/s: 3.0.11

Type: Bug Priority: High
Reporter: Carlos Salguero Assignee: Carlos Salguero
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0 minutes
Time Spent: 13 minutes
Original Estimate: Not Specified


 Description   

I am testing pt-table-checksum and this test is failing: t/pt-table-checksum/ignore_columns.t

If you have a sandbox with at least a master and a slave, you can reproduce the problem as follow:

I am going to assume the master is running on port 12345 and the slave is at port 12346

On the master, run:

CREATE DATABASE IF NOT EXISTS test; 
USE test; 

DROP TABLE IF EXISTS `issue_94`; 
CREATE TABLE `issue_94` ( 
    a INT NOT NULL, 
    b INT NOT NULL, 
    c CHAR(16) NOT NULL, 
    INDEX idx (a) 
); 

INSERT INTO issue_94 VALUES 
(1,2,'apple'),
(3,4,'banana'),
(5,6,'kiwi'),
(7,8,'orange'),
(9,10,'grape'),
(11,12,'coconut');

 So, on the master and on the slave, you will have these rows:

mysql --port=12345 -e "SELECT * FROM test.issue_94"
+----+----+---------+ 
| a  | b  | c       | 
+----+----+---------+ 
| 1  | 2  | apple   | 
| 3  | 4  | banana  | 
| 5  | 6  | kiwi    | 
| 7  | 8  | orange  | 
| 9  | 10 | grape   |
| 11 | 12 | coconut | 
+----+----+---------+
On the slave run:
update test.issue_94 set c=''

and now on the slave you will have the column c empty on all rows:

mysql --port=12346 -e "SELECT * FROM test.issue_94" 
+----+-----+--+
| a  | b  | c | 
+----+----+---+
|  1 |  2 |   | 
|  3 |  4 |   | 
|  5 |  6 |   | 
|  7 |  8 |   | 
|  9 | 10 |   | 
| 11 | 12 |   |
+----+----+---+  

The problem starts here:
If you run pt-table-checksum, checksumming all columns, produces this output

bin/pt-table-checksum h=127.0.0.1,P=12345,u=msandbox,p=msandbox -d test -t issue_94 
Checking if all tables can be checksummed ... 
Starting checksum ... 
       TS      ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME  TABLE 
05-08T15:57:51   0      1     6      0        1      0    0.094 test.issue_94

which is correct.

Then re-run pt-table-checksum ignoring column c:

bin/pt-table-checksum h=127.0.0.1,P=12345,u=msandbox,p=msandbox -d test -t issue_94 --ignore-columns c 
Checking if all tables can be checksummed ... 
Starting checksum ... 
       TS      ERRORS DIFFS ROWS DIFF_ROWS CHUNKS SKIPPED TIME  TABLE 
05-08T15:57:54   0      1     6      0        1      0    0.033 test.issue_94

As you can see, it says DIFFS 1 which is incorrect since the column c was ignored and all other columns have the same values.

If you run again 

bin/pt-table-checksum h=127.0.0.1,P=12345,u=msandbox,p=msandbox -d test -t issue_94 --ignore-columns c

 
it shows the correct value, DIFFS 0 but if you re-run pt-table-checksum WITHOUT ignoring the column c, the result will be invalid again, until you re-run the 2 twice with or without --ignore-columns c

It seems like the first time is somehow getting an old value and re-running the program it gets the correct values.



 Comments   
Comment by Carlos Salguero [ 15/May/18 ]

I found the cause of the issue. 

It seems like in MySQL replication lag is not being tested (or is it a MySQL 8 problem?) correctly. 
If I add a sleep here, before checking for the differences, it works.

Comment by Carlos Salguero [ 22/Jun/18 ]

PT-1551 New wait for master method to pt-table-checksum

This is part of PT-1554. While I was testing pt-table-checksum ignore_columns.t was failing and it was because the original method in pt-table-checksum to wait for the slaves to catch up, wasn't enough.
I added a new method who calls MySQL's SELECT MASTER_POS_WAIT from the MasterSlave package.

Generated at Wed Sep 19 05:05:03 UTC 2018 using Jira 7.12.1#712002-sha1:609a50578ba6bc73dbf8b05dddd7c04a04b6807c.