Function transliterator_transliterate()
will help you to create nice transliterated string from user input. It can be used for creation of URL addresses (slugs) or for sanitization of uploaded filename.
Install php-intl package
First of all install, PHP Intl extension otherwise when you try to run the function transliterator_transliterate()
you get error like this
Call to undefined function.
Commands for Ubuntu 16.04 Xenial Xerus with php 7.0:
sudo apt-get install php-intl
sudo service php7.0-fpm restart
Commands for Ubuntu 20.04 LTS (Focal Fossa) with php 7.4:
sudo apt install php-intl
sudo service php7.4-fpm restart
Basic code sample
This line represents core of the function:
$string = transliterator_transliterate('Any-Latin;Latin-ASCII;', $string);
Sanitize (transliterate) uploaded filename
It transliterate non-ASCII characters into ASCII (毛泽东 -> mao ze dong).
This is complete test.php
for fileName
function code transliterating filename of the upload file to Latin characters:
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Test</title>
</head>
<body>
<?php
function pr($string) {
print '<hr>';
print '"' . fileName($string) . '"';
print '<br>';
print '"' . $string . '"';
}
function fileName($string) {
// remove html tags
$clean = strip_tags($string);
// transliterate
$clean = transliterator_transliterate('Any-Latin;Latin-ASCII;', $clean);
// remove non-number and non-letter characters
$clean = str_replace('--', '-', preg_replace('/[^a-z0-9-\_]/i', '', preg_replace(array(
'/\s/',
'/[^\w-\.\-]/'
), array(
'_',
''
), $clean)));
// replace '-' for '_'
$clean = strtr($clean, array(
'-' => '_'
));
// remove double '__'
$positionInString = stripos($clean, '__');
while ($positionInString !== false) {
$clean = str_replace('__', '_', $clean);
$positionInString = stripos($clean, '__');
}
// remove '_' from the end and beginning of the string
$clean = rtrim(ltrim($clean, '_'), '_');
// lowercase the string
return strtolower($clean);
}
pr('_replace(\'~&([a-z]{1,2})(ac134/56f4315981743 8765475[]lt7ňl2ú5äňú138yé73ťž7ýľute|');
pr(htmlspecialchars('<script>alert(\'hacked\')</script>'));
pr('Álix----_Ãxel!?!?');
pr('áéíóúÁÉÍÓÚ');
pr('üÿÄËÏÖÜ.ŸåÅ');
pr('nie4č a a§ôňäääaš');
pr('Мао Цзэдун');
pr('毛泽东');
pr('ماو تسي تونغ');
pr('مائو تسهتونگ');
pr('מאו דזה-דונג');
pr('მაო ძედუნი');
pr('Mao Trạch Đông');
pr('毛澤東');
pr('เหมา เจ๋อตง');
?>
</body>
</html>
PHP8 update: Custom transliterate rules
Sometimes you need to write custom transliterate rules for specific languages. Our example is about the Russian language where letter “ш” which English reads as “sh” is transliterated by function transliterator_transliterate()
as “s”.
Solutions to this is code as follows:
$str = 'Финиш';
$rules = <<<'RULES'
:: NFC ;
ё > e; ж > zh; й > i; х > kh; ц > ts; ч > ch; ш > sh; щ > shch; ъ > ie;
э > e; ю > iu; я > ia;
:: Cyrillic-Latin ;
RULES;
$tls = Transliterator::createFromRules($rules);
echo $tls->transliterate($str) . PHP_EOL;
More about Transliterator class on PHP.net. More transliterate solutions are here.
Leave a Reply