PHP Australia Conference 2015

mb_split

(PHP 4 >= 4.2.0, PHP 5)

mb_splitDivide cadenas de caracteres multibyte usando una expresión regular

Descripción

array mb_split ( string $pattern , string $string [, int $limit = -1 ] )

Divide un string multibyte usando la expresión regular pattern y devolviendo el resultado como un array.

Parámetros

pattern

El patrón de la expresión regular.

string

El string que va a ser dividido.

limit
Si se especifica el parámetro opcional limit, la división será de limit elementos como máximo.

Valores devueltos

El resultado como array.

Notas

Nota:

La codificación de caracteres especificada por mb_regex_encoding() se usará como codificación de caracteres para esta función de forma predeterminada.

Ver también

  • mb_regex_encoding() - Establecer/obtener la codificación de caracteres para expresiones regulares multibyte
  • mb_ereg() - Comparación de expresiones regulares con soporte multibyte

add a note add a note

User Contributed Notes 6 notes

up
14
adjwilli at yahoo dot com
6 years ago
I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method.

<?php
function mbStringToArray ($string) {
   
$strlen = mb_strlen($string);
    while (
$strlen) {
       
$array[] = mb_substr($string,0,1,"UTF-8");
       
$string = mb_substr($string,1,$strlen,"UTF-8");
       
$strlen = mb_strlen($string);
    }
    return
$array;
}
?>
up
3
gunkan at terra dot es
2 years ago
To split an string like this: "日、に、本、ほん、語、ご" using the "、" delimiter i used:

     $v = mb_split('、',"日、に、本、ほん、語、ご");

but didn't work.

The solution was to set this before:

       mb_regex_encoding('UTF-8');
      mb_internal_encoding("UTF-8");
     $v = mb_split('、',"日、に、本、ほん、語、ご");

and now it's working:

Array
(
    [0] => 日
    [1] => に
    [2] => 本
    [3] => ほん
    [4] => 語
    [5] => ご
)
up
8
boukeversteegh at gmail dot com
3 years ago
The $pattern argument doesn't use /pattern/ delimiters, unlike other regex functions such as preg_match.

<?php
  
# Works. No slashes around the /pattern/
  
print_r( mb_split("\s", "hello world") );
   Array (
      [
0] => hello
     
[1] => world
  
)

  
# Doesn't work:
  
print_r( mb_split("/\s/", "hello world") );
   Array (
      [
0] => hello world
  
)
?>
up
8
boukeversteegh at gmail dot com
4 years ago
In addition to Sezer Yalcin's tip.

This function splits a multibyte string into an array of characters. Comparable to str_split().

<?php
function mb_str_split( $string ) {
   
# Split at all position not after the start: ^
    # and not before the end: $
   
return preg_split('/(?<!^)(?!$)/u', $string );
}

$string   = '火车票';
$charlist = mb_str_split( $string );

print_r( $charlist );
?>

# Prints:
Array
(
    [0] => 火
    [1] => 车
    [2] => 票
)
up
2
gert dot matern at web dot de
5 years ago
We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string:

'Weiße Rosen sind nicht grün!'

And because I didn't find a regex to simulate a str_split I optimized the first solution from adjwilli a bit:

<?php
$string
= 'Weiße Rosen sind nicht grün!'
$stop   = mb_strlen( $string);
$result = array();

for(
$idx = 0; $idx < $stop; $idx++)
{
  
$result[] = mb_substr( $string, $idx, 1);
}
?>

Here is an example with adjwilli's function:

<?php
mb_internal_encoding
( 'UTF-8');
mb_regex_encoding( 'UTF-8'); 

function
mbStringToArray
( $string
)
{
 
$stop   = mb_strlen( $string);
 
$result = array();

  for(
$idx = 0; $idx < $stop; $idx++)
  {
    
$result[] = mb_substr( $string, $idx, 1);
  }

  return
$result;
}

echo
'<pre>', PHP_EOL,
print_r( mbStringToArray( 'Weiße Rosen sind nicht grün!', true)), PHP_EOL,
'</pre>';
?>

Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.
up
0
qdb at kukmara dot ru
4 years ago
an other way to str_split multibyte string:
<?php
$s
='әӘөүҗңһ';

//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for(
$i=0;$i<$temp_a_len;$i++){
   
//$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
   
$temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}

echo(
'<pre>');
print_r($temp_a);
echo(
'</pre>');

//also possible to directly use UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo(
$temp_s);
?>
To Top