The Normalizer class

Introduction

Normalization is a process that involves transforming characters and sequences of characters into a formally-defined underlying representation. This process is most important when text needs to be compared for sorting and searching, but it is also used when storing text to ensure that the text is stored in a consistent representation.

The Unicode Consortium has defined a number of normalization forms reflecting the various needs of applications:

  • Normalization Form D (NFD) - Canonical Decomposition
  • Normalization Form C (NFC) - Canonical Decomposition followed by Canonical Composition
  • Normalization Form KD (NFKD) - Compatibility Decomposition
  • Normalization Form KC (NFKC) - Compatibility Decomposition followed by Canonical Composition
The different forms are defined in terms of a set of transformations on the text, transformations that are expressed by both an algorithm and a set of data files.

Class synopsis

Normalizer
/* Constants */
public const int Normalizer::FORM_D;
public const int Normalizer::NFD;
public const int Normalizer::FORM_KD;
public const int Normalizer::NFKD;
public const int Normalizer::FORM_C;
public const int Normalizer::NFC;
public const int Normalizer::FORM_KC;
public const int Normalizer::NFKC;
public const int Normalizer::FORM_KC_CF;
public const int Normalizer::NFKC_CF;
/* Methods */
public static stringnull getRawDecomposition(string $string, int $form = Normalizer::FORM_C)
public static bool isNormalized(string $string, int $form = Normalizer::FORM_C)
public static stringfalse normalize(string $string, int $form = Normalizer::FORM_C)

Predefined Constants

The following constants define the normalization form used by the normalizer:

Normalizer::FORM_C
Normalization Form C (NFC) - Canonical Decomposition followed by Canonical Composition
Normalizer::FORM_D
Normalization Form D (NFD) - Canonical Decomposition
Normalizer::NFD

Normalizer::FORM_KC
Normalization Form KC (NFKC) - Compatibility Decomposition, followed by Canonical Composition
Normalizer::NFKC

Normalizer::FORM_KC_CF

Normalizer::FORM_KD
Normalization Form KD (NFKD) - Compatibility Decomposition
Normalizer::NFKD

Normalizer::NFC

Normalizer::NFKC_CF

Changelog

Version Description
8.0.0 Normalizer::NONE has been removed.
Table of Contents