Uploaded image for project: 'Percona Server for MySQL'
  1. Percona Server for MySQL
  2. PS-5635

Introduce crypt_schema 2 for better error checking in encryption threads.

    XMLWordPrintable

    Details

      Description

      keyring decryption error handling is quite different from MK decryption error handling. One comes from MariaDB, the other from MySQL. The MySQL encryption error handling checks if an encryption key of given key id can be retrieved from keyring and then checks if the retrieved key is the correct one. It all happens during tablespace validation as part of innodb startup.

      The encryption threads on the other hand checks if a given encryption key id exists in keyring in the same way MySQL is checking it, however it is never checked on the server startup if the encryption key is the correct one. Thus MariaDB's implementation added error handling in almost all places where page can retrieved from buffer. This is a big diff.

      It is hard to maintain this approach which is different from the one upstream is taking. We have seen this with innodb_corrupt_table_action.

      This ticket is to "convert" the MariaDBs error handling into the MySQL one. Please note that encrypting by encryption threads can result in tablespace being encrypted by multiple key's versions.

       

       From commit message:

      This WL implements MySQL way of validating if all needed encryption keys
      are present and valid - for tables encrypted by encryption threads.
      crypt_data, stored on page0 of a space was extended with encryption
      validation tag. For not encrypted space this tag is just a string:
      ENC_VAL_TAG_V1_1. Also crypt_data was extended with max_key_version.
      When we rotate from not-encrypted to encrypted space (first encryption
      of a space) we execute the following steps (at the start of rotation):
      1) max_key_version is set to the latest version of encryption key that
      we use to encrypt the space.
      2) validation tag is encrypted with max_key_version of encryption key.
      3) DD space’s flag online_encryption is set to true.

      DD_SPACE_ONLINE_ENC_PROGRESS is a new flag added by this WL. We set it to true so
      not fully encrypted tables would be also validated on server startup.
      After a successful rotation DD_SPACE_ONLINE_ENC_PROGRESS flag will be replaced with
      encrypted flag, also min_key_version will be set to max_key_version.

      On server startup we validate if all needed keys are available (and are
      the correct ones) for such table by executing the steps:
      1) if encrypted flag is set or DD_SPACE_ONLINE_ENC_PROGRESS DD flag is set goto 2;
      else return OK
      2) fetch max_key_version version of a key_id encryption key from
      keyring; else return FAIL
      3) decrypt validation tag with key fetched in 2).
      4) compare decrypted validation tag with ENC_VAL_TAG_V1_1 – if it
      matches return OK; else return FAIL.

      When rotation not encrypted => encrypted finishes we remove DD
      DD_SPACE_ONLINE_ENC_PROGRESS flag and replace it with encryption flag.

      The situation is similar when we re-encrypt space. The difference is
      that validation tag is encrypted with a range of key versions
      [min_key_version; max_key_version]; given min_key_version > 0, i.e. we
      are re-encrypting a fully encrypted space. We now will re-encrypt
      validation tag with range of key versions [max_key_version + 1;
      new_max_key_version] We execute the steps (at the beginning of
      re-encryption):
      1) we fetch latest version of encryption key from keyring. The latest
      version is saved as new_max_key_version.
      2) we fetch keys with versions [max_key_version + 1;
      new_max_key_version] from keyring.
      3) we re-encrypt already encrypted validation tag with key_versions from
      max_key_version + 1 till new_max_key_version. i.e. tag is already
      encrypted with keys from min_key_version to max_key_version, now we
      encrypt again this tag with max_key_version+1, then this re-encrypted
      tag gets encrypted with max_key_version+2 and so on, till we reach
      version new_max_key_version (tag is also encrypted with
      new_max_key_version).
      4) we update max_key_version to new_max_key_version and we store the
      max_key_version and encrypted validation tag on page0.

      When re-encryption is finished we encrypt validation tag plaintext
      (ENC_VAL_TAG_V1_1) with only max_key_version and save it on page0. We
      encrypt with only max_key_version, because this is the only key that we
      are currently using to encrypt this table (all pages have been
      re-encrypted with this key when rotation finishes).

      On server startup we validate if all needed keys are available (and are
      the correct ones) for such table by executing the steps:

      1) if encrypted flag is set we goto step 2 ; else OK;
      2) we fetch all keys versions from [min_key_version; max_key_version] if
      any of the keys is missing – then FAIL.
      3) we decrypt encrypted validation_tag with fetched keys version –
      starting from max_key_version and going down to min_key_version.
      4) we compare if final decrypted validation tag is equal to
      ENC_VAL_TAG_V1_1.

      If server crashes or server gets shutdown before encryption threads
      finish off rotating the tables we may be in a situation that we need to
      validated multiple key versions from min_key_version to max_key_version.
      However, normally the encryption threads will have a chance to fully
      rotate the spaces and there will be only one key to validate, i.e.
      max_key_version.

      In case a space is fully rotated we encrypt validation tag plaintext
      (i.e. ENC_VAL_TAG_V1_1) with max_key_version and save it to page0. Also
      we set min_key_version to max_key_version.

      There is also a corner case. In case we are validating a space for which
      only a subset of pages is
      encrypted, we may be in a situation that there are multiple encryption
      keys and they do not fully fill the range [min_key_version, max_key_version],
      since min_key_version == 0 is a marker that there are some unencrypted pages,
      without any version. The encryption keys might be in some range [n, max_key_version],
      where n > min_key_version and min_key_version = 0.
      Thus for this situation (min_key_version = 0) we validate the tag after
      each decryption. If the tag matches after any decryption it means we
      have all the valid keys we need to decrypt space.

      This WL also changes the way uuid is stored on page0. Instead of using
      uuid as a string we write it as hex on page0.

      This WL also adds a new field to crypt_data, and thus to page0 –
      max_key_version.

      Since the page0 crypt_data information was changed (uuid, max_key_version) we
      also added a new private version (3) of crypt_data.
      Since we are not backward compatible yet with this feature, we fail the
      upgrade if table is either fully or partially encrypted by encryption
      threads.

      When we process redo log for writing crypt_data we also validate if all needed keys
      are available (and are the correct ones).

      This WL also removes MariaDB’s error handling for missing encryption
      keys or when incorrect key is used to decrypt a table. Decryption with
      incorrect key would result in returning a corrupted page. Since MySQL
      does not cope with corrupted pages and in most cases crashes when comes
      across a corrupted page – the error handling was a big diff and it was
      hard to maintain. Also, we found out that not all cases where covered.

       

        Attachments

          Activity

            People

            Assignee:
            robert.golebiowski Robert Golebiowski (Inactive)
            Reporter:
            robert.golebiowski Robert Golebiowski (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 4 weeks, 4 days, 5 hours, 5 minutes
                4w 4d 5h 5m

                  Smart Checklist