You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document establishes the rules of encoding and decoding entities. XML specifies five predefined entities which are &, <, > ,' and ".
Parsing XML
While parsing the string representing of an XML document:
For the value of CDATA, comment and processing instruction nodes; no special processing is needed and all characters from the input string is passed to the node value as-is.
For the value of text and attribute nodes; all five predefined entitites, numeric character references (&#nnnn;) and hexadecimal numeric character references (&#xhhhh;) are decoded (note that the x must be lowercase). The two examples given below for the Yen character applies to all numeric character references.
Value
Decoded Value
&
&
<
<
>
>
'
'
"
"
	
\t
\n


\n
\r

\r
¥
¥ (e.g.)
¥
¥ (e.g.)
Serializing XML
While serializing an XML document to its string representation:
For the value of CDATA, comment and processing instruction nodes; no special processing is needed.
For the value of text nodes; &, < and > are encoded into predefined entitites, other characters are left as-is.
Value
Encoded Value
&
&
<
<
>
>
For the value of attribute nodes, &, <, > and " are encoded (apostrophe is not encoded since attribute values are always serialized with double quotes). In additition, \t, \nand \r are also encoded into numeric character references.
Value
Encoded Value
&
&
<
<
>
>
"
"
\t
	
\n
\r
Round Trip: String -> XML -> String
A round trip will potentially result in different string representations as described below.
CDATA, comment and processing instruction nodes can safely round trip as their values are not processed at all.
Value of text nodes may potentially change due to differences in parsing and serialization.
Value
Decoded Value
Encoded Value
Same?
&
&
&
✓
<
<
<
✓
>
>
>
✓
'
'
'
✗
"
"
"
✗
	
\t
\t
✗
\n
\n
✗


\n
\n
✗
\r
\r
✗

\r
\r
✗
¥
¥
¥
✗
¥
¥
¥
✗
Value of attribute nodes may also change as below.
Value
Decoded Value
Encoded Value
Same?
&
&
&
✓
<
<
<
✓
>
>
>
✓
'
'
'
✗
"
"
"
✓
	
\t
	
✓
\n
✓


\n
✗
\r
✓

\r
✗
¥
¥
¥
✗
¥
¥
¥
✗
Round Trip: XML -> String -> XML
A round from an XML document to its string representation and back to an XML document is safe.
CDATA, comment and processing instruction can safely round trip as their values are not processed at all.
Text nodes are safe because all encoded values are decoded back to their original character representations.