Context - Why was this issue created?
Integration tests for WPSEO_Utils::sanitize_url method (tests/WP/Inc/Utils_Test.php) are failing on the following case:
'with_non_encoded_non_latin_url' => [
'expected' => 'https://example.com/%da%af%d8%b1%d9%88%d9%87-%d8%aa%d9%84%da%af%d8%b1%d8%a7%d9%85-%d8%b3%d8%a6%d9%88',
'url_to_sanitize' => 'https://example.com/گروه-تلگرام-سئو',
]
The issue seems to be with wp_parse_url() call, which returns a corrupted string for the URL's path (گر��-ت�گرا�-سئ�_ instead of روه-تلگرام-سئو).
Considering this test case has been written in March 2020, it might be that something has changed in wp_parse_url() implementation.
What is the goal of this issue?
- Restore the original
WPSEO_Utils::sanitize_url behaviour for URLs that have non-Unicode characters in their path.
What needs to be done to achieve the goal?
- Investigate
wp_parse_url() current behaviour
- Change
WPSEO_Utils::wp_parse_url() accordingly
Does the issue still need UX or research?
No
If available, what are the tips for fixing the problem or possible solutions?
- If
wp_parse_url() needs its input to be encoded, change WPSEO_Utils::sanitize_url behaviour accordingly (I would say by limiting the change only in the specific case covered by the test (i.e., when non-encoded non-latin characters are present in the URL's path)
What is the expected result/behavior?
WPSEO_Utils::sanitize_url() should return a correctly encoded URL, as expected in the integration test.
Should documentation be added or updated for this change? And if so, where?
No
Context - Why was this issue created?
Integration tests for WPSEO_Utils::sanitize_url method (tests/WP/Inc/Utils_Test.php) are failing on the following case:
The issue seems to be with
wp_parse_url()call, which returns a corrupted string for the URL's path (گر��-ت�گرا�-سئ�_ instead of روه-تلگرام-سئو).Considering this test case has been written in March 2020, it might be that something has changed in
wp_parse_url()implementation.What is the goal of this issue?
WPSEO_Utils::sanitize_urlbehaviour for URLs that have non-Unicode characters in their path.What needs to be done to achieve the goal?
wp_parse_url()current behaviourWPSEO_Utils::wp_parse_url()accordinglyDoes the issue still need UX or research?
No
If available, what are the tips for fixing the problem or possible solutions?
wp_parse_url()needs its input to be encoded, changeWPSEO_Utils::sanitize_urlbehaviour accordingly (I would say by limiting the change only in the specific case covered by the test (i.e., when non-encoded non-latin characters are present in the URL's path)What is the expected result/behavior?
WPSEO_Utils::sanitize_url()should return a correctly encoded URL, as expected in the integration test.Should documentation be added or updated for this change? And if so, where?
No