patternjavaMinor

Validate XML using XSD, a Catalog Resolver, and JAXP DOM for XSLT

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

validatexsltxsdxmldomusingcatalogforandresolver

Problem

Background

As this related question describes, there does not appear to be a canonical way to validate XML files against an XSD then subsequently transform them using an XSL template with file paths determined from a catalog resolver.

The XSL templates can be XSLT 1.0 or XSLT 2.0, the latter requiring Saxon9HE.

Problem

The given answer works, but has a number of issues that are undesirable, including:

Using an XMLCatalogResolver and a CatalogResolver.

Creating an XML catalog resolver instance using the catalog resolver instance.

Traversing a DOM to determine the XSD URI.

Creating a SchemaFactory to perform the validation.

Calling the XML catalog resolver instance to find the local XSD file path.

Passing the catalog resolver instance to the XSL transformer instance.

It seems like those aspects of the code should be handled by existing APIs, especially the contortions required to extract the XSD URI from the DOM.

Source

A repository exists that contains the entire example, complete with catalog files, schema definitions, and XML tests. The main source file that has the problems noted above follows:

```
package src;

import java.io.*;
import java.net.URI;
import java.util.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

import javax.xml.parsers.*;
import javax.xml.xpath.*;
import javax.xml.XMLConstants;

import org.w3c.dom.*;
import org.xml.sax.*;

import org.apache.xml.resolver.tools.CatalogResolver;
import org.apache.xerces.util.XMLCatalogResolver;
import static org.apache.xerces.jaxp.JAXPConstants.JAXP_SCHEMA_LANGUAGE;
import static org.apache.xerces.jaxp.JAXPConstants.W3C_XML_SCHEMA;

import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Schema;
import javax.xml.validation.Validator;

import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;

import javax.xml.transform.dom.DOMSource;
import javax.xml.

Solution

/**
   * Retrieves the XML schema definition using an XSD.
   *
   * @param node The document (or child node) to traverse seeking processing
   * instruction nodes.
   * @return null if no XSD is present in the XML document.
   * @throws IOException Never thrown (uses StringReader).
   */
  private static String getSchemaURI( Node node ) throws IOException {
    String result = null;

    if( node.getNodeType() == Node.PROCESSING_INSTRUCTION_NODE ) {
      ProcessingInstruction pi = (ProcessingInstruction)node;

      logDebug( "NODE IS PROCESSING INSTRUCTION" );

      if( "xml-model".equals( pi.getNodeName() ) ) {
        logDebug( "PI IS XML MODEL" );

        // Hack to get the attributes.
        String data = pi.getData();

        if( data != null ) {
          final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];
          String href = parseNameValue( attributes[1] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
            result = href;
          }
        }
      }
    }
    else {
      // Try to get the schema type information.
      NamedNodeMap attrs = node.getAttributes();

      if( attrs != null ) {
        // TypeInfo.toString() returns values of the form:
        // schemaLocation="uri schemaURI"
        // The following loop extracts the schema URI.
        for( int i = 0; i < attrs.getLength(); i++ ) {
          Attr attribute = (Attr)attrs.item( i );
          TypeInfo typeInfo = attribute.getSchemaTypeInfo();
          String attr[] = parseNameValue( typeInfo.toString() );

          if( "schemaLocation".equalsIgnoreCase( attr[0] ) ) {
            result = attr[1].split( "\\s" )[1];
            break;
          }
        }
      }

      // Look deeper for the schema URI.
      if( result == null ) {
        NodeList list = node.getChildNodes();

        for( int i = 0; i < list.getLength(); i++ ) {
          result = getSchemaURI( list.item( i ) );

          if( result != null ) {
            break;
          }
        }
      }
    }

    return result;
  }

First off: The combination of 2-space tabs and new lines for elses on if-else statements is making it hard to read for me.

Now, I don't have a solution for your main problems. I think you'll have to ask somewhere else for that; I can't help you refactor out huge parts of your program just like that. All I can do is review the code as it is based on my knowledge in Java.

I believe this method suffers because you try to validate everything before deciding whether you're going to use it.

// Hack to get the attributes.
        String data = pi.getData();

        if( data != null ) {
          final String attributes[] = pi.getData().trim().split( "\\s+" );

data has no other uses. So why not do

final String attributes[] = data.trim().split( "\\s+" );

instead?

final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];
          String href = parseNameValue( attributes[1] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
            result = href;
          }

After this bit of code, you return result. There's an else block, but it's not executed if this snippet of code is reached.

In that light, there's no other uses for type and href in this function. Additionally, result was null to begin with.

So all that's actually relevant is to do this:

final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type )) {
            result = parseNameValue( attributes[1] )[1]; //href
          }

Validating whether href is null is not needed since you're just setting null to null otherwise anyway.

I also feel this function should be split in three:

One function for ProcessingInstruction nodes.

One function for determining SchemaURI from node.getAttributes()

and one function for determining SchemaURI from node.getChildNodes().

This will get rid of the deep nesting of statements you have here and make it easier to understand your code.

Code Snippets

/**
   * Retrieves the XML schema definition using an XSD.
   *
   * @param node The document (or child node) to traverse seeking processing
   * instruction nodes.
   * @return null if no XSD is present in the XML document.
   * @throws IOException Never thrown (uses StringReader).
   */
  private static String getSchemaURI( Node node ) throws IOException {
    String result = null;

    if( node.getNodeType() == Node.PROCESSING_INSTRUCTION_NODE ) {
      ProcessingInstruction pi = (ProcessingInstruction)node;

      logDebug( "NODE IS PROCESSING INSTRUCTION" );

      if( "xml-model".equals( pi.getNodeName() ) ) {
        logDebug( "PI IS XML MODEL" );

        // Hack to get the attributes.
        String data = pi.getData();

        if( data != null ) {
          final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];
          String href = parseNameValue( attributes[1] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
            result = href;
          }
        }
      }
    }
    else {
      // Try to get the schema type information.
      NamedNodeMap attrs = node.getAttributes();

      if( attrs != null ) {
        // TypeInfo.toString() returns values of the form:
        // schemaLocation="uri schemaURI"
        // The following loop extracts the schema URI.
        for( int i = 0; i < attrs.getLength(); i++ ) {
          Attr attribute = (Attr)attrs.item( i );
          TypeInfo typeInfo = attribute.getSchemaTypeInfo();
          String attr[] = parseNameValue( typeInfo.toString() );

          if( "schemaLocation".equalsIgnoreCase( attr[0] ) ) {
            result = attr[1].split( "\\s" )[1];
            break;
          }
        }
      }

      // Look deeper for the schema URI.
      if( result == null ) {
        NodeList list = node.getChildNodes();

        for( int i = 0; i < list.getLength(); i++ ) {
          result = getSchemaURI( list.item( i ) );

          if( result != null ) {
            break;
          }
        }
      }
    }

    return result;
  }

// Hack to get the attributes.
        String data = pi.getData();

        if( data != null ) {
          final String attributes[] = pi.getData().trim().split( "\\s+" );

final String attributes[] = data.trim().split( "\\s+" );

final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];
          String href = parseNameValue( attributes[1] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type ) && href != null ) {
            result = href;
          }

final String attributes[] = pi.getData().trim().split( "\\s+" );

          String type = parseNameValue( attributes[0] )[1];

          // TODO: Schema should = http://www.w3.org/2001/XMLSchema
          //String schema = attributes.getNamedItem( "schematypens" );

          if( "application/xml".equalsIgnoreCase( type )) {
            result = parseNameValue( attributes[1] )[1]; //href
          }

Context

StackExchange Code Review Q#62678, answer score: 2

Revisions (0)

No revisions yet.