org.omnaest.utils.xml
Class XMLIteratorFactory

java.lang.Object
  extended by org.omnaest.utils.xml.XMLIteratorFactory

public class XMLIteratorFactory
extends Object

The XMLIteratorFactory is a wrapper around StAX and JAXB which allows to split a given xml InputStream content into Object, Map or String content chunks.

Example:


Code using the XMLIteratorFactory to create an Iterator instance for all book elements:
 Iterator<Book> iterator = new XMLIteratorFactory( inputStream ).doLowerCaseXMLTagAndAttributeNames().newIterator( Book.class );
 

XML snippet:
  <Books>
     <Book>
         <Title>Simple title</Title>
         <author>an author</author>
     </Book>
     <Book>
         <Title>Second simple title</Title>
         <Author>Second author</Author>
     </Book>
  </Books>
 

JAXB annotated class:
 @XmlRootElement(name = "book")
 @XmlType(name = "book")
 @XmlAccessorType(XmlAccessType.FIELD)
 protected static class Book
 {
   @XmlElement(name = "title")
   private String title;
   
   @XmlElement(name = "author")
   private String author;
 }
 

There are several Iterator types offered:
Those types are faster in traversal of the original stream from top to bottom, whereby the slower ones can get some performance improvement by using parallel processing. The Iterator instances are thread safe by default and the Iterator.next() function can be called until an NoSuchElementException is thrown.
In normal circumstances an Iterator is not usable in multithreaded environments, since Iterator.hasNext() and Iterator.next() produce imminent gaps within the Lock of an element. This gap can be circumvented by calling which will force Iterator instances to use ThreadLocals internally. Otherwise do not use the Iterator.hasNext() method, since any other Thread can clear the Iterator before the call to Iterator.next() occurs.

The XMLIteratorFactory allows to modify the underlying event stream using e.g.:


If the XMLIteratorFactory should only operate on a subset of xml tags within a larger stream the concept of sopes is available, which can be instrumented by calling doAddXMLTagScope(QName).
If no scope's start tag is passed no reading of events will occur and the reading into a single Iterator will stop immediately when an end tag of a scope is matched.

Author:
Omnaest

Nested Class Summary
static class XMLIteratorFactory.JAXBTypeContentConverter<E>
           
static interface XMLIteratorFactory.JAXBTypeContentConverterFactory
           
static interface XMLIteratorFactory.XMLElementSelector
           
static class XMLIteratorFactory.XMLElementSelectorQNameBased
          XMLIteratorFactory.XMLElementSelector which matches a given QName
static interface XMLIteratorFactory.XMLEventTransformer
           
static class XMLIteratorFactory.XMLEventTransformerForTagAndAttributeName
          XMLIteratorFactory.XMLEventTransformer which allows to transform the tag and attribute names.
protected static class XMLIteratorFactory.XMLIterator
           
 
Field Summary
static String DEFAULT_ENCODING
           
static XMLIteratorFactory.JAXBTypeContentConverterFactory DEFAULT_JAXB_TYPE_CONTENT_CONVERTER_FACTORY
           
static XMLInstanceContextFactory XML_INSTANCE_CONTEXT_FACTORY_JAVA_STAX_DEFAULT
           
 
Constructor Summary
XMLIteratorFactory(InputStream inputStream)
          Similar to XMLIteratorFactory(InputStream, ExceptionHandler) using an ExceptionHandlerIgnoring
XMLIteratorFactory(InputStream inputStream, ExceptionHandler exceptionHandler)
          Note: the XMLIteratorFactory does not close the underlying InputStream
 
Method Summary
 XMLIteratorFactory close()
          Closes the internal XMLEventReader which closes all iterators immediately
 XMLIteratorFactory doAddXMLEventTransformer(XMLIteratorFactory.XMLEventTransformer xmlEventTransformer)
           
 XMLIteratorFactory doAddXMLTagScope(QName tagName)
          Returns a new XMLIteratorFactory instance with the configuration of this one but holding an additional xml tag scope restriction.
 XMLIteratorFactory doAddXMLTagTouchBarrier(QName tagName)
          Returns a new XMLIteratorFactory instance with the configuration of this one but holding an additional xml tag touch barrier restriction.
 XMLIteratorFactory doCreateThreadsafeIterators(boolean threadsafe)
          If given true as parameter the returned Iterator instances will use ThreadLocal states.
 XMLIteratorFactory doLowerCaseXMLTagAndAttributeNames()
          This adds an XMLIteratorFactory.XMLEventTransformer which does lower case the xml tag and attribute names
 XMLIteratorFactory doRemoveNamespacesForXMLTagAndAttributeNames()
          This adds an XMLIteratorFactory.XMLEventTransformer which does remove all Namespace declarations on any xml tag and attribute
 XMLIteratorFactory doUpperCaseXMLTagAndAttributeNames()
          This adds an XMLIteratorFactory.XMLEventTransformer which does upper case the xml tag and attribute names
<E> Iterator<E>
newIterator(Class<? extends E> type)
          Selects xml parts based on Classes annotated with JAXB compliant annotations and uses JAXB to create instances of the given type based on the data of the extracted xml chunks.
 Iterator<String> newIterator(QName qName)
          New Iterator which returns xml content chunks for all xml tags matching the given QName

Performance is fast with about 10000 elements per second beeing processed
<E> Iterator<E>
newIterator(QName qName, ElementConverter<String,E> elementConverter)
          Similar to newIterator(QName) but allows to specify an additional ElementConverter which post processes the extracted xml chunks
 Iterator<String> newIterator(XMLIteratorFactory.XMLElementSelector xmlElementSelector)
          Similar to newIterator(QName) but allows to specify a more general XMLIteratorFactory.XMLElementSelector instead of a QName
<E> Iterator<E>
newIterator(XMLIteratorFactory.XMLElementSelector xmlElementSelector, Class<? extends E> type)
          Similar to newIterator(Class) but allows to specify a XMLIteratorFactory.XMLElementSelector to select tags from the xml stream.
<E> Iterator<E>
newIterator(XMLIteratorFactory.XMLElementSelector xmlElementSelector, ElementConverter<String,E> elementConverter)
          Similar to newIterator(QName, ElementConverter) but allows to specify a more general XMLIteratorFactory.XMLElementSelector instead of a QName
 Iterator<Map<String,Object>> newIteratorMapBased(QName qName)
          New Iterator which returns Map entities each based on a single content chunk which are produced for all xml tags matching the given QName

Performance is medium to slow with about 1000 elements per second beeing processed.
 XMLIteratorFactory setEncoding(String encoding)
          Sets the encoding.
 XMLIteratorFactory setJAXBTypeContentConverterFactory(XMLIteratorFactory.JAXBTypeContentConverterFactory jaxbTypeContentConverterFactory)
          Allows to set another XMLIteratorFactory.JAXBTypeContentConverterFactory which is used to convert xml content to instances of JAXB based types.
 XMLIteratorFactory setXmlInstanceContextFactory(XMLInstanceContextFactory xmlInstanceContextFactory)
          Allows to set an alternative XMLInstanceContextFactory, e.g. to replace the current java default stax implementation by another one like Staxon or Jettison for JSON
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_ENCODING

public static final String DEFAULT_ENCODING
See Also:
Constant Field Values

XML_INSTANCE_CONTEXT_FACTORY_JAVA_STAX_DEFAULT

public static final XMLInstanceContextFactory XML_INSTANCE_CONTEXT_FACTORY_JAVA_STAX_DEFAULT

DEFAULT_JAXB_TYPE_CONTENT_CONVERTER_FACTORY

public static final XMLIteratorFactory.JAXBTypeContentConverterFactory DEFAULT_JAXB_TYPE_CONTENT_CONVERTER_FACTORY
Constructor Detail

XMLIteratorFactory

public XMLIteratorFactory(InputStream inputStream,
                          ExceptionHandler exceptionHandler)
Note: the XMLIteratorFactory does not close the underlying InputStream

Parameters:
inputStream - InputStream
exceptionHandler - ExceptionHandler
See Also:
XMLIteratorFactory

XMLIteratorFactory

public XMLIteratorFactory(InputStream inputStream)
Similar to XMLIteratorFactory(InputStream, ExceptionHandler) using an ExceptionHandlerIgnoring

Parameters:
inputStream - InputStream
See Also:
XMLIteratorFactory
Method Detail

doLowerCaseXMLTagAndAttributeNames

public XMLIteratorFactory doLowerCaseXMLTagAndAttributeNames()
This adds an XMLIteratorFactory.XMLEventTransformer which does lower case the xml tag and attribute names

Returns:
new XMLIteratorFactory instance

doRemoveNamespacesForXMLTagAndAttributeNames

public XMLIteratorFactory doRemoveNamespacesForXMLTagAndAttributeNames()
This adds an XMLIteratorFactory.XMLEventTransformer which does remove all Namespace declarations on any xml tag and attribute

Returns:
new XMLIteratorFactory instance

doUpperCaseXMLTagAndAttributeNames

public XMLIteratorFactory doUpperCaseXMLTagAndAttributeNames()
This adds an XMLIteratorFactory.XMLEventTransformer which does upper case the xml tag and attribute names

Returns:
new XMLIteratorFactory instance

doAddXMLEventTransformer

public XMLIteratorFactory doAddXMLEventTransformer(XMLIteratorFactory.XMLEventTransformer xmlEventTransformer)
Parameters:
xmlEventTransformer -
Returns:
new XMLIteratorFactory instance if the given XMLIteratorFactory.XMLEventTransformer is not null otherwise this instance

doAddXMLTagScope

public XMLIteratorFactory doAddXMLTagScope(QName tagName)
Returns a new XMLIteratorFactory instance with the configuration of this one but holding an additional xml tag scope restriction. A scope restriction means that the internal stream is forwarded until it finds the beginning of a xml tag and which is stopped when the end of the same xml tag is reached.

Be aware of the fact that scopes can be nested. To begin reading elements only one of all the scopes have to be entered. To stop an Iterator only one of the scopes has to be left.
So it is quite possible that even if one scope is left an enclosing scope is still valid, which means the selection matching is immediately active again until the enclosing scope is now left.

After a scope has been passed it is possible to iterate further by creating a new Iterator.

Parameters:
tagName - QName
Returns:

doAddXMLTagTouchBarrier

public XMLIteratorFactory doAddXMLTagTouchBarrier(QName tagName)
Returns a new XMLIteratorFactory instance with the configuration of this one but holding an additional xml tag touch barrier restriction. A touch barrier restriction means that the internal stream is validated in advance if the next start element will match the given xml tag. If this is the case, the traversal is stopped and the next element keeps unread, so that any further attempt to create a new Iterator of any kind will use the still remaining element of the touch barrier.

Parameters:
tagName - QName
Returns:
new XMLIteratorFactory instance

doCreateThreadsafeIterators

public XMLIteratorFactory doCreateThreadsafeIterators(boolean threadsafe)
If given true as parameter the returned Iterator instances will use ThreadLocal states. This results in the case that if one Thread resolves true for the Iterator.hasNext() function, the respective value will be locked to this Thread. Another Thread would e.g. then get false for the Iterator.hasNext() function even if the first Thread did not yet pulled the explicit value by invoking the Iterator.next() method.

Even if this circumstance allows to share any created Iterator instance between threads without loosing the contract of the Iterator, it must be ensured that any Thread which requests Iterator.hasNext() do actually pull the value. Otherwise the internally backed value gets lost with the dereferencing of the ThreadLocal.

Parameters:
threadsafe -
Returns:
this

newIterator

public Iterator<String> newIterator(QName qName)
New Iterator which returns xml content chunks for all xml tags matching the given QName

Performance is fast with about 10000 elements per second beeing processed

Parameters:
qName - QName
Returns:
See Also:
newIterator(QName, ElementConverter)

newIteratorMapBased

public Iterator<Map<String,Object>> newIteratorMapBased(QName qName)
New Iterator which returns Map entities each based on a single content chunk which are produced for all xml tags matching the given QName

Performance is medium to slow with about 1000 elements per second beeing processed.

For details how xml content is transformed to a Map instance see XMLNestedMapConverter

Parameters:
qName - QName
Returns:
See Also:
XMLNestedMapConverter, newIterator(QName, ElementConverter)

newIterator

public <E> Iterator<E> newIterator(QName qName,
                                   ElementConverter<String,E> elementConverter)
Similar to newIterator(QName) but allows to specify an additional ElementConverter which post processes the extracted xml chunks

Parameters:
qName - QName
elementConverter - ElementConverter
Returns:

newIterator

public <E> Iterator<E> newIterator(Class<? extends E> type)
Selects xml parts based on Classes annotated with JAXB compliant annotations and uses JAXB to create instances of the given type based on the data of the extracted xml chunks.

Performance is slow with about 500 elements per second beeing processed

Parameters:
type -
Returns:
Throws:
MissingXMLRootElementAnnotationException

newIterator

public <E> Iterator<E> newIterator(XMLIteratorFactory.XMLElementSelector xmlElementSelector,
                                   Class<? extends E> type)
Similar to newIterator(Class) but allows to specify a XMLIteratorFactory.XMLElementSelector to select tags from the xml stream.

Parameters:
xmlElementSelector - XMLIteratorFactory.XMLElementSelector
type -
Returns:

newIterator

public <E> Iterator<E> newIterator(XMLIteratorFactory.XMLElementSelector xmlElementSelector,
                                   ElementConverter<String,E> elementConverter)
Similar to newIterator(QName, ElementConverter) but allows to specify a more general XMLIteratorFactory.XMLElementSelector instead of a QName

Parameters:
xmlElementSelector - XMLIteratorFactory.XMLElementSelector
elementConverter - ElementConverter
Returns:

newIterator

public Iterator<String> newIterator(XMLIteratorFactory.XMLElementSelector xmlElementSelector)
Similar to newIterator(QName) but allows to specify a more general XMLIteratorFactory.XMLElementSelector instead of a QName

Parameters:
xmlElementSelector -
Returns:

setEncoding

public XMLIteratorFactory setEncoding(String encoding)
Sets the encoding. Default is "UTF-8"

Parameters:
encoding - the encoding to set
Returns:
this

close

public XMLIteratorFactory close()
Closes the internal XMLEventReader which closes all iterators immediately


setXmlInstanceContextFactory

public XMLIteratorFactory setXmlInstanceContextFactory(XMLInstanceContextFactory xmlInstanceContextFactory)
Allows to set an alternative XMLInstanceContextFactory, e.g. to replace the current java default stax implementation by another one like Staxon or Jettison for JSON

Parameters:
xmlInstanceContextFactory - XMLInstanceContextFactory
Returns:
this
See Also:
XML_INSTANCE_CONTEXT_FACTORY_JAVA_STAX_DEFAULT

setJAXBTypeContentConverterFactory

public XMLIteratorFactory setJAXBTypeContentConverterFactory(XMLIteratorFactory.JAXBTypeContentConverterFactory jaxbTypeContentConverterFactory)
Allows to set another XMLIteratorFactory.JAXBTypeContentConverterFactory which is used to convert xml content to instances of JAXB based types. See newIterator(Class).

Parameters:
jaxbTypeContentConverterFactory - XMLIteratorFactory.JAXBTypeContentConverterFactory
Returns:
this
See Also:
DEFAULT_JAXB_TYPE_CONTENT_CONVERTER_FACTORY


Copyright © 2013. All Rights Reserved.