org.cogroo.formats.ad
Class ADChunkBasedShallowParserSampleStream
java.lang.Object
org.cogroo.formats.ad.ADChunk2SampleStream
org.cogroo.formats.ad.ADChunkBasedShallowParserSampleStream
- All Implemented Interfaces:
- opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
public class ADChunkBasedShallowParserSampleStream
- extends ADChunk2SampleStream
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the
Portuguese Chunker training.
The heuristic to extract chunks where based o paper 'A Machine Learning
Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero
Santos and Ruy Milidiú).
Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica"
.
12 de Fevereiro de 2006.
http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf
Detailed info about the NER tagset:
http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names
Note: Do not use this class, internal use only!
Method Summary |
opennlp.tools.chunker.ChunkSample |
read()
|
ADChunkBasedShallowParserSampleStream
public ADChunkBasedShallowParserSampleStream(opennlp.tools.util.ObjectStream<String> lineStream,
String commaSeparatedFunctTags,
boolean isIncludePOSTags,
boolean useCGTag,
boolean expandME)
ADChunkBasedShallowParserSampleStream
public ADChunkBasedShallowParserSampleStream(InputStream in,
String charsetName,
String commaSeparatedFunctTags,
boolean isIncludePOSTags,
boolean useCGTag,
boolean expandME)
- Creates a new
NameSample
stream from a InputStream
- Parameters:
in
- the Corpus InputStream
charsetName
- the charset of the Arvores Deitadas Corpus
read
public opennlp.tools.chunker.ChunkSample read()
throws IOException
- Specified by:
read
in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
- Overrides:
read
in class ADChunk2SampleStream
- Throws:
IOException
Copyright © 2012-2013 CoGrOO. All Rights Reserved.