public class BoilerpipeContentExtractor extends WebPageContentExtractor
Boilerpipe, as described in "Boilerplate Detection using Shallow Text Features"; Kohlschütter, Christian; Fankhauser, Peter; Nejdl, Wolfgang; 2010.
and,
http://www.l3s.de/~kohlschuetter/boilerplate/| Constructor and Description |
|---|
BoilerpipeContentExtractor() |
BoilerpipeContentExtractor(de.l3s.boilerpipe.extractors.ExtractorBase extractor) |
| Modifier and Type | Method and Description |
|---|---|
String |
getExtractorName() |
Node |
getResultNode() |
String |
getResultText() |
String |
getResultTitle() |
BoilerpipeContentExtractor |
setDocument(Document document) |
WebPageContentExtractor |
setDocument(File file) |
BoilerpipeContentExtractor |
setDocument(InputSource inputSource) |
getResultText, setDocument, setDocument, setDocumentpublic BoilerpipeContentExtractor()
public BoilerpipeContentExtractor(de.l3s.boilerpipe.extractors.ExtractorBase extractor)
public WebPageContentExtractor setDocument(File file) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic BoilerpipeContentExtractor setDocument(Document document) throws PageContentExtractorException
setDocument in class WebPageContentExtractorPageContentExtractorExceptionpublic BoilerpipeContentExtractor setDocument(InputSource inputSource) throws PageContentExtractorException
PageContentExtractorExceptionpublic Node getResultNode()
getResultNode in class WebPageContentExtractorpublic String getResultText()
getResultText in class WebPageContentExtractorpublic String getResultTitle()
getResultTitle in class WebPageContentExtractorpublic String getExtractorName()
getExtractorName in class WebPageContentExtractorCopyright © 2018. All rights reserved.