Class HtmlParser

java.lang.Object
com.qwazr.library.html.HtmlParser
All Implemented Interfaces:
com.qwazr.extractor.ParserFactory, com.qwazr.extractor.ParserInterface

public class HtmlParser
extends java.lang.Object
implements com.qwazr.extractor.ParserFactory, com.qwazr.extractor.ParserInterface
  • Field Summary

    Fields inherited from interface com.qwazr.extractor.ParserInterface

    CONTENT, LANG_DETECTION, MIME_TYPE, TITLE
  • Constructor Summary

    Constructors 
    Constructor Description
    HtmlParser()  
  • Method Summary

    Modifier and Type Method Description
    com.qwazr.extractor.ParserInterface createParser()  
    com.qwazr.extractor.ParserResult extract​(javax.ws.rs.core.MultivaluedMap<java.lang.String,​java.lang.String> parameters, java.io.InputStream inputStream, javax.ws.rs.core.MediaType mediaType)  
    com.qwazr.extractor.ParserResult extract​(javax.ws.rs.core.MultivaluedMap<java.lang.String,​java.lang.String> parameters, java.nio.file.Path filePath)  
    java.util.Collection<com.qwazr.extractor.ParserField> getFields()  
    java.lang.String getName()  
    static org.apache.xerces.parsers.DOMParser getNewDomParser()  
    static org.cyberneko.html.HTMLConfiguration getNewHtmlConfiguration()
    Create a new NekoHTML configuration
    java.util.Collection<com.qwazr.extractor.ParserField> getParameters()  
    java.util.Collection<java.lang.String> getSupportedFileExtensions()  
    java.util.Collection<javax.ws.rs.core.MediaType> getSupportedMimeTypes()  
    static org.apache.xerces.parsers.DOMParser getThreadLocalDomParser()  

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

  • Method Details

    • getNewHtmlConfiguration

      public static org.cyberneko.html.HTMLConfiguration getNewHtmlConfiguration()
      Create a new NekoHTML configuration
      Returns:
    • getNewDomParser

      public static org.apache.xerces.parsers.DOMParser getNewDomParser()
    • getThreadLocalDomParser

      public static org.apache.xerces.parsers.DOMParser getThreadLocalDomParser()
    • getParameters

      public java.util.Collection<com.qwazr.extractor.ParserField> getParameters()
      Specified by:
      getParameters in interface com.qwazr.extractor.ParserFactory
    • getFields

      public java.util.Collection<com.qwazr.extractor.ParserField> getFields()
      Specified by:
      getFields in interface com.qwazr.extractor.ParserFactory
    • extract

      public com.qwazr.extractor.ParserResult extract​(javax.ws.rs.core.MultivaluedMap<java.lang.String,​java.lang.String> parameters, java.io.InputStream inputStream, javax.ws.rs.core.MediaType mediaType) throws java.io.IOException
      Specified by:
      extract in interface com.qwazr.extractor.ParserInterface
      Throws:
      java.io.IOException
    • extract

      public com.qwazr.extractor.ParserResult extract​(javax.ws.rs.core.MultivaluedMap<java.lang.String,​java.lang.String> parameters, java.nio.file.Path filePath) throws java.io.IOException
      Specified by:
      extract in interface com.qwazr.extractor.ParserInterface
      Throws:
      java.io.IOException
    • getName

      public java.lang.String getName()
      Specified by:
      getName in interface com.qwazr.extractor.ParserFactory
    • createParser

      public com.qwazr.extractor.ParserInterface createParser()
      Specified by:
      createParser in interface com.qwazr.extractor.ParserFactory
    • getSupportedFileExtensions

      public java.util.Collection<java.lang.String> getSupportedFileExtensions()
      Specified by:
      getSupportedFileExtensions in interface com.qwazr.extractor.ParserFactory
    • getSupportedMimeTypes

      public java.util.Collection<javax.ws.rs.core.MediaType> getSupportedMimeTypes()
      Specified by:
      getSupportedMimeTypes in interface com.qwazr.extractor.ParserFactory