public class GooseContentExtractor extends WebPageContentExtractor
Content extractor using Goose. Goose only accepts URLs on the
web, already downloaded HttpResults, or local Files are not supported.
| Constructor and Description |
|---|
GooseContentExtractor() |
| Modifier and Type | Method and Description |
|---|---|
String |
getExtractorName() |
Node |
getResultNode() |
String |
getResultText() |
String |
getResultTitle() |
static void |
main(String[] args) |
WebPageContentExtractor |
setDocument(Document document) |
WebPageContentExtractor |
setDocument(File file) |
WebPageContentExtractor |
setDocument(ws.palladian.retrieval.HttpResult httpResult) |
WebPageContentExtractor |
setDocument(String documentLocation) |
WebPageContentExtractor |
setDocument(URL url) |
getResultTextpublic WebPageContentExtractor setDocument(File file) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic WebPageContentExtractor setDocument(ws.palladian.retrieval.HttpResult httpResult) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic WebPageContentExtractor setDocument(String documentLocation) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic WebPageContentExtractor setDocument(URL url) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic WebPageContentExtractor setDocument(Document document) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic Node getResultNode()
getResultNode in class WebPageContentExtractorpublic String getResultText()
getResultText in class WebPageContentExtractorpublic String getResultTitle()
getResultTitle in class WebPageContentExtractorpublic String getExtractorName()
getExtractorName in class WebPageContentExtractorpublic static void main(String[] args) throws PageContentExtractorException
PageContentExtractorExceptionCopyright © 2018. All rights reserved.