Skip to content
Advertisement

How to preserve “Character Reference Codes”(<) while reading content from XML file

I have used below code to read content from xml file

public static void toXSD() {
    SAXBuilder saxBuilder = new SAXBuilder();
    Document document;
        try {
            document = saxBuilder.build(new File("D:\Users\schintha\Desktop\Work\test_files\SUMMARY_11.xml"));
            for (Element element : document.getRootElement().getChildren()) {
                System.out.println("Name = " + element.getName());
                System.out.println("Value = " + element.getValue());
                System.out.println("Text = " + element.getText());                  
            }        
        } catch (JDOMException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();}}

My input file is shown below:

<?xml version="1.0" encoding="UTF-8"?>
<temp>
   <position>&lt;</position>   
</temp>

Output is

Name = position
Value = <
Text = <

In this regard , i request to let me know how to retrieve &lt; as is, instead of “<“.since it is not starting of tag but a value of tag “position”

Advertisement

Answer

Using text-commons org.apache.commons.text.StringEscapeUtils class escapeXml10 method, we can escape the character reference codes in the xml tags – StringEscapeUtils.escapeXml10(element.getValue())

Full example is shown below

public static void toXSD() {
    SAXBuilder saxBuilder = new SAXBuilder();
    Document document;
        try {
            document = saxBuilder.build(new File("D:\Users\schintha\Desktop\Work\test_files\SUMMARY_11.xml"));
            for (Element element : document.getRootElement().getChildren()) {
                System.out.println("Name = " + element.getName());
                System.out.println("Value = " + StringEscapeUtils.escapeXml10(element.getValue()));                                  
            }        
        } catch (JDOMException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();}}

Same input file used in question:

<?xml version="1.0" encoding="UTF-8"?>
<temp>
   <position>&lt;</position>   
</temp>

got expected output is(value of position tag without parsing)

Name = position
Value = &lt;
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement