|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.cogroo.formats.ad.ADChunk2SampleStream
public class ADChunk2SampleStream
Parser for Floresta Sita(c)tica Arvores Deitadas corpus, output to for the Portuguese Chunker training.
The heuristic to extract chunks where based o paper 'A Machine Learning
Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero
Santos and Ruy Milidiú).
Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica"
.
12 de Fevereiro de 2006.
http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf
Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names
Note: Do not use this class, internal use only!
Field Summary | |
---|---|
static String |
OTHER
|
Constructor Summary | |
---|---|
ADChunk2SampleStream(InputStream in,
String charsetName)
Creates a new NameSample stream from a InputStream |
|
ADChunk2SampleStream(opennlp.tools.util.ObjectStream<String> lineStream)
Creates a new NameSample stream from a line stream, i.e. |
Method Summary | |
---|---|
void |
close()
|
static String |
convertFuncTag(String t,
boolean useCGTags)
|
opennlp.tools.chunker.ChunkSample |
read()
|
void |
reset()
|
void |
setEnd(int aEnd)
|
void |
setStart(int aStart)
|
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String OTHER
Constructor Detail |
---|
public ADChunk2SampleStream(opennlp.tools.util.ObjectStream<String> lineStream)
NameSample
stream from a line stream, i.e.
ObjectStream
< String
>, that could be a
PlainTextByLineStream
object.
lineStream
- a stream of lines as String
public ADChunk2SampleStream(InputStream in, String charsetName)
NameSample
stream from a InputStream
in
- the Corpus InputStream
charsetName
- the charset of the Arvores Deitadas CorpusMethod Detail |
---|
public opennlp.tools.chunker.ChunkSample read() throws IOException
read
in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
IOException
public static String convertFuncTag(String t, boolean useCGTags)
public void setStart(int aStart)
public void setEnd(int aEnd)
public void reset() throws IOException, UnsupportedOperationException
reset
in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
IOException
UnsupportedOperationException
public void close() throws IOException
close
in interface opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample>
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |