Hi guys,
PHP 7.3.23 + MySQLi 10.1.40-MariaDB-cll-lve DB + Joomla 3.9.22 + SEF Translate 4.7 Commercial
-
on a DB set as latin1_swedish_ci
- with different Tables collation: latin1_swedish_ci / utf8_general_ci / utf8mb4_general_ci / utf8mb4_unicode_ci
First - I change the entire DB collation through phpMyAdmin, following the "
Changing the default collation for a database" guide, by selecting:
utf8mb4_unicode_ci + "Change all tables collations" + "Change all tables columns collations"
Second - Admin Tools > Change Database Collation (on the existing tables) > Choose a collation > utf8mb4_unicode_ci > Apply
15 of all the 356 Tables are not moved and are always as utf8_general_ci
As you can see 2 are from SEF Translate.
1 - Won't this create any kind of issues ? Please, How to fix these tables ?
Or at the moment I just have to ignore them and leave them as they are after the transformation ?
Now, I read about it and, as far as I understand (in particular many thanks to Nicholas' help, Akeeba), it is a MySQL limitation, but also a "bad implementation" from you,
the technical explanation I found - I have the DB, the Table and the Column Collation / character set format. Often it depends on:
MySQL has a limit of 767 bytes for each index. Many extensions use a VARCHAR(255) column which is also indexed without a character limit.
This essentially tells MySQL to create an index with the maximum amount of data the column can hold. Plain old UTF-8 is up to 3 bytes per character and the column is set to use 255 characters therefore the index size is 3x255 = 765 bytes which is less than the 767 byte limit.
If this column were to be converted to UTF8MB4 each character can take up to 4 bytes (that's what MB4 means: MutiByte, up to 4 bytes per character). Therefore the index size would need to be 4*255 = 1020 bytes. However this is bigger than MySQL's 767 byte limit for indices. Therefore the table cannot be converted to UTF8MB4.
There are two solutions BUT they must be implemented
by the developer of the affected extension, you:
- A - to set the column to VARCHAR(191). The index in this case is 4*191 = 764 bytes which is under MySQL's limit. This is perfectly fine for most practical cases. Very few title fields need to go that big, even in languages which use a massive amount of characters per word (e.g. German and Greek)
- B - to keep the columns as VARCHAR(255) but to set a character limit to the index. This is the recommended solution for performance reasons but its applicability depends on the data shape. For example, a developer may choose to create an index on the column but limit it to 64 characters. This will only be a problem if you expect to have multiple rows with their first 64 characters in common. In fact, often, since it is very difficult to determine when creating a mass distributed extension which can be used in contexts, the developer can't even imagine and he prefers to convert the column to VARCHAR(191).
Now, if this is the case, Joomla added UTF8MB4 support nearly three years ago. The real reason for UTF8MB4 support in Joomla is security, because it mitigates a lot of trickery which could lead in SQL injections. Joomla 4 standard is (the old) utf8mb4_unicode_ci.
You should already have addressed it and I'm sure you are on it. It seems really a simple implementation and I'm really surprised that here we are still speaking about "utf8_general_ci", Do you agree ?
2 - Please, What is your plan on it ?