org.cogroo.tools.chunker2
Class ChunkerME

java.lang.Object
  extended by org.cogroo.tools.chunker2.ChunkerME
All Implemented Interfaces:
Chunker

public class ChunkerME
extends Object
implements Chunker

The class represents a maximum-entropy-based chunker. Such a chunker can be used to find flat structures based on sequence inputs such as noun phrases or named entities.


Field Summary
static int DEFAULT_BEAM_SIZE
           
 
Constructor Summary
ChunkerME(ChunkerModel model)
          Initializes the current instance with the specified model.
ChunkerME(ChunkerModel model, int beamSize)
          Initializes the current instance with the specified model and the specified beam size.
 
Method Summary
 String[] chunk(String[] toks, String[] tags)
          Generates chunk tags for the given sequence returning the result in an array.
 double[] probs()
          Returns an array with the probabilities of the last decoded sequence.
 void probs(double[] probs)
          Populates the specified array with the probabilities of the last decoded sequence.
 opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags)
          Returns the top k chunk sequences for the specified sentence with the specified pos-tags
 opennlp.tools.util.Sequence[] topKSequences(String[] sentence, String[] tags, double minSequenceScore)
          Returns the top k chunk sequences for the specified sentence with the specified pos-tags
static ChunkerModel train(String lang, opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample> in, opennlp.tools.util.TrainingParameters mlParams, ChunkerFactory factory)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_BEAM_SIZE

public static final int DEFAULT_BEAM_SIZE
See Also:
Constant Field Values
Constructor Detail

ChunkerME

public ChunkerME(ChunkerModel model,
                 int beamSize)
Initializes the current instance with the specified model and the specified beam size.

Parameters:
model - The model for this featurizer.
beamSize - The size of the beam that should be used when decoding sequences.

ChunkerME

public ChunkerME(ChunkerModel model)
Initializes the current instance with the specified model. The default beam size is used.

Parameters:
model -
Method Detail

chunk

public String[] chunk(String[] toks,
                      String[] tags)
Description copied from interface: Chunker
Generates chunk tags for the given sequence returning the result in an array.

Specified by:
chunk in interface Chunker
Parameters:
toks - an array of the tokens or words of the sequence.
tags - an array of the pos tags of the sequence.
Returns:
an array of feature tags for each token in the sequence.

topKSequences

public opennlp.tools.util.Sequence[] topKSequences(String[] sentence,
                                                   String[] tags)
Description copied from interface: Chunker
Returns the top k chunk sequences for the specified sentence with the specified pos-tags

Specified by:
topKSequences in interface Chunker
Parameters:
sentence - The tokens of the sentence.
tags - The pos-tags for the specified sentence.
Returns:
the top k feature sequences for the specified sentence.

topKSequences

public opennlp.tools.util.Sequence[] topKSequences(String[] sentence,
                                                   String[] tags,
                                                   double minSequenceScore)
Description copied from interface: Chunker
Returns the top k chunk sequences for the specified sentence with the specified pos-tags

Specified by:
topKSequences in interface Chunker
Parameters:
sentence - The tokens of the sentence.
tags - The pos-tags for the specified sentence.
minSequenceScore - A lower bound on the score of a returned sequence.
Returns:
the top k feature sequences for the specified sentence.

probs

public void probs(double[] probs)
Populates the specified array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk. The specified array should be at least as large as the numbe of tokens in the previous call to chunk.

Parameters:
probs - An array used to hold the probabilities of the last decoded sequence.

probs

public double[] probs()
Returns an array with the probabilities of the last decoded sequence. The sequence was determined based on the previous call to chunk.

Returns:
An array with the same number of probabilities as tokens were sent to chunk when it was last called.

train

public static ChunkerModel train(String lang,
                                 opennlp.tools.util.ObjectStream<opennlp.tools.chunker.ChunkSample> in,
                                 opennlp.tools.util.TrainingParameters mlParams,
                                 ChunkerFactory factory)
                          throws IOException
Throws:
IOException


Copyright © 2012-2013 CoGrOO. All Rights Reserved.