org.cogroo.formats.ad
Class ADContractionNameSampleStream

java.lang.Object
  extended by org.cogroo.formats.ad.ADContractionNameSampleStream
All Implemented Interfaces:
opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>

public class ADContractionNameSampleStream
extends Object
implements opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>

Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese NER training.

The data contains common multiword expressions. The categories are:
intj, spec, conj-s, num, pron-indef, n, prop, adj, prp, adv

Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html

Information about the format:
Susana Afonso. "Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica" .
12 de Fevereiro de 2006. http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf

Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names

Note: Do not use this class, internal use only!


Constructor Summary
ADContractionNameSampleStream(InputStream in, String charsetName, Set<String> tags)
          Creates a new NameSample stream from a InputStream
ADContractionNameSampleStream(opennlp.tools.util.ObjectStream<String> lineStream, Set<String> tags)
          Creates a new NameSample stream from a line stream, i.e.
 
Method Summary
 void close()
           
 opennlp.tools.namefind.NameSample read()
           
 void reset()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ADContractionNameSampleStream

public ADContractionNameSampleStream(opennlp.tools.util.ObjectStream<String> lineStream,
                                     Set<String> tags)
Creates a new NameSample stream from a line stream, i.e. ObjectStream< String>, that could be a PlainTextByLineStream object.

Parameters:
lineStream - a stream of lines as String
tags - the tags we are looking for, or null for all

ADContractionNameSampleStream

public ADContractionNameSampleStream(InputStream in,
                                     String charsetName,
                                     Set<String> tags)
Creates a new NameSample stream from a InputStream

Parameters:
in - the Corpus InputStream
charsetName - the charset of the Arvores Deitadas Corpus
tags - the tags we are looking for, or null for all
Method Detail

read

public opennlp.tools.namefind.NameSample read()
                                       throws IOException
Specified by:
read in interface opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
Throws:
IOException

reset

public void reset()
           throws IOException,
                  UnsupportedOperationException
Specified by:
reset in interface opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
Throws:
IOException
UnsupportedOperationException

close

public void close()
           throws IOException
Specified by:
close in interface opennlp.tools.util.ObjectStream<opennlp.tools.namefind.NameSample>
Throws:
IOException


Copyright © 2012-2013 CoGrOO. All Rights Reserved.