-
Notifications
You must be signed in to change notification settings - Fork 268
Description
Basic Information
From the forum: https://www.simplemachines.org/community/index.php?topic=593992.0
There were some very old attachments & avatars not properly migrated. For old attachments, I believe early SMF 2.0 & 1.x (the cutoff ~2009), the file hash in the attachment record is blank, and the file follows a certain naming convention in the file system. In order to lookup the file & do the conversion, it needs to nail the file naming convention. Part of that involves some early logic that removed diacritics (accents) from the file system filenames.
I think this problem has been around a while, but not understood clearly until we had a forum with a LOT of very old attachments...
The "old school" logic looks like this in the 2.1 upgrader:
SMF/other/upgrade_2-1_mysql.sql
Line 473 in 05f4aa8
| // Remove international characters (windows-1252) |
The 2.0 & 1.1 attachment display logic were each slightly different. This logic (getLegacyAttachmentFilename()) has changed multiple times over the years, I think mainly due to php syntax tweaks throwing curveballs at the old logic. Earlier versions of that source code actually had binary chars in the source (SMF1.1, getLegacyAttachmentFilename() in Subs.php).
Note that I recently wrote an attachment fixit utility using this logic borrowed from 2.0:
https://github.com/sbulen/sjrbTools/blob/48327cfc4f65277993838ef3d00e479a2a3226b2/smf_attachment_fix.php#L259
(It's been a while, but...) When testing that utility, I found the upgrader logic wasn't working for utf8 source DBs. But I found that emulating some earlier 2.0 upgrader logic for attachment retrieval worked better... This utility was successful at fixing the problem in the forum.
2.0 version for comparison:
SMF/other/upgrade_2-0_mysql.sql
Line 1268 in 05f4aa8
| $clean_name = strtr($filename, 'ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'SZszYAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy'); |
So... The fix...
We need to incorporate that 2.0-like logic into the 2.1 upgrader. It works for utf8 source DBs, whereas the existing logic does not.
The part that is unclear to me, & needs more testing, is whether we need to keep the existing logic (or use different logic) for non-utf8 source DBs.
So this requires more research.
Steps to reproduce
- Upgrade a 2.0 or earlier forum with lots of old attachments, including attachments with diacritics (accents) in the filenames. Key sign you're using the old school logic - the file_hash is blank on the attachment record. These are the ones getting skipped, because it cannot find the file in the filesystem.
Expected result
No response
Actual result
No response
Version/Git revision
3.0 Alpha 4 & 2.1.7
Database Engine
All
Database Version
8.4.4
PHP Version
8.4.5
Logs
Additional Information
No response