org.cogroo.formats.ad
Class ADChunkBasedShallowParserSampleStream

java.lang.Object
  extended by org.cogroo.formats.ad.ADChunk2SampleStream
      extended by org.cogroo.formats.ad.ADChunkBasedShallowParserSampleStream
All Implemented Interfaces:
opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>

public class ADChunkBasedShallowParserSampleStream
extends ADChunk2SampleStream

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

Note: Do not use this class, internal use only!


Field Summary
 
Fields inherited from class org.cogroo.formats.ad.ADChunk2SampleStream
OTHER
 
Constructor Summary
ADChunkBasedShallowParserSampleStream(InputStream in, String charsetName, String commaSeparatedFunctTags, boolean isIncludePOSTags, boolean useCGTag, boolean expandME)
          Creates a new NameSample stream from a InputStream
ADChunkBasedShallowParserSampleStream(opennlp.tools.util.ObjectStream<String> lineStream, String commaSeparatedFunctTags, boolean isIncludePOSTags, boolean useCGTag, boolean expandME)
           
 
Method Summary
 opennlp.tools.chunker.ChunkSample read()
           
 
Methods inherited from class org.cogroo.formats.ad.ADChunk2SampleStream
close, convertFuncTag, reset, setEnd, setStart
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ADChunkBasedShallowParserSampleStream

public ADChunkBasedShallowParserSampleStream(opennlp.tools.util.ObjectStream<String> lineStream,
                                             String commaSeparatedFunctTags,
                                             boolean isIncludePOSTags,
                                             boolean useCGTag,
                                             boolean expandME)

ADChunkBasedShallowParserSampleStream

public ADChunkBasedShallowParserSampleStream(InputStream in,
                                             String charsetName,
                                             String commaSeparatedFunctTags,
                                             boolean isIncludePOSTags,
                                             boolean useCGTag,
                                             boolean expandME)
Creates a new NameSample stream from a InputStream

Parameters:
in - the Corpus InputStream
charsetName - the charset of the Arvores Deitadas Corpus
Method Detail

read

public opennlp.tools.chunker.ChunkSample read()
                                       throws IOException
Specified by:
read in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
Overrides:
read in class ADChunk2SampleStream
Throws:
IOException


Copyright © 2012-2013 CoGrOO. All Rights Reserved.