org.cogroo.formats.ad
Class ADChunk2SampleStream

java.lang.Object
  extended by org.cogroo.formats.ad.ADChunk2SampleStream
All Implemented Interfaces:
opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
Direct Known Subclasses:
ADChunkBasedHeadFinderSampleStream, ADChunkBasedShallowParserSampleStream

public class ADChunk2SampleStream
extends Object
implements opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.

The heuristic to extract chunks where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú).

Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

Note: Do not use this class, internal use only!


Field Summary
static String OTHER
           
 
Constructor Summary
ADChunk2SampleStream(InputStream in, String charsetName)
          Creates a new NameSample stream from a InputStream
ADChunk2SampleStream(opennlp.tools.util.ObjectStream<String> lineStream)
          Creates a new NameSample stream from a line stream, i.e.
 
Method Summary
 void close()
           
static String convertFuncTag(String t, boolean useCGTags)
           
 opennlp.tools.chunker.ChunkSample read()
           
 void reset()
           
 void setEnd(int aEnd)
           
 void setStart(int aStart)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OTHER

public static final String OTHER
See Also:
Constant Field Values
Constructor Detail

ADChunk2SampleStream

public ADChunk2SampleStream(opennlp.tools.util.ObjectStream<String> lineStream)
Creates a new NameSample stream from a line stream, i.e. ObjectStream< String>, that could be a PlainTextByLineStream object.

Parameters:
lineStream - a stream of lines as String

ADChunk2SampleStream

public ADChunk2SampleStream(InputStream in,
                            String charsetName)
Creates a new NameSample stream from a InputStream

Parameters:
in - the Corpus InputStream
charsetName - the charset of the Arvores Deitadas Corpus
Method Detail

read

public opennlp.tools.chunker.ChunkSample read()
                                       throws IOException
Specified by:
read in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
Throws:
IOException

convertFuncTag

public static String convertFuncTag(String t,
                                    boolean useCGTags)

setStart

public void setStart(int aStart)

setEnd

public void setEnd(int aEnd)

reset

public void reset()
           throws IOException,
                  UnsupportedOperationException
Specified by:
reset in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
Throws:
IOException
UnsupportedOperationException

close

public void close()
           throws IOException
Specified by:
close in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
Throws:
IOException


Copyright © 2012-2013 CoGrOO. All Rights Reserved.