public class LuceneKeywordPFE extends LucenePFEBase implements PairFeatureExtractor
KeywordNGramDFE Can be used to extract ngrams from one or both documents in the pair, and
parameters for each document (view 1's, view 2's) can be set separately, or both documents can be
treated together as one extended document. | Modifier and Type | Field and Description |
|---|---|
protected boolean |
includeCommas |
static String |
KEYWORD_NGRAM_FIELD |
static String |
KEYWORD_NGRAM_FIELD1 |
static String |
KEYWORD_NGRAM_FIELD2 |
protected int |
keywordMaxN |
protected int |
keywordMinN |
protected int |
keywordNgramUseTopK |
protected Set<String> |
keywords |
protected String |
keywordsFile |
protected boolean |
markSentenceBoundary |
protected boolean |
markSentenceLocation |
protected boolean |
markViewBlindNgramsWithLocalView |
protected int |
ngramMaxN1 |
protected int |
ngramMaxN2 |
protected int |
ngramMinN1 |
protected int |
ngramMinN2 |
static String |
PARAM_KEYWORD_NGRAM_INCLUDE_COMMAS |
static String |
PARAM_KEYWORD_NGRAM_MARK_SENTENCE_BOUNDARY |
static String |
PARAM_KEYWORD_NGRAM_MARK_SENTENCE_LOCATION |
static String |
PARAM_KEYWORD_NGRAM_MAX_N |
static String |
PARAM_KEYWORD_NGRAM_MAX_N_VIEW1
Maximum size n of ngrams from View 1's.
|
static String |
PARAM_KEYWORD_NGRAM_MAX_N_VIEW2
Maximum size n of ngrams from View 2's.
|
static String |
PARAM_KEYWORD_NGRAM_MIN_N |
static String |
PARAM_KEYWORD_NGRAM_MIN_N_VIEW1
Minimum size n of ngrams from View 1's.
|
static String |
PARAM_KEYWORD_NGRAM_MIN_N_VIEW2
Minimum size n of ngrams from View 2's.
|
static String |
PARAM_KEYWORD_NGRAM_USE_TOP_K |
static String |
PARAM_MARK_VIEWBLIND_KEYWORD_NGRAMS_WITH_LOCAL_VIEW
This option collects a FrequencyDistribution of ngrams across both documents of all pairs,
but when writing features, the view where a particular ngram is found is recorded with the
ngram.
|
static String |
PARAM_NGRAM_KEYWORDS_FILE |
static String |
PARAM_USE_VIEW1_KEYWORD_NGRAMS_AS_FEATURES
Each ngram from View 1 documents added to the document pair instance as a feature.
|
static String |
PARAM_USE_VIEW2_KEYWORD_NGRAMS_AS_FEATURES
Each ngram from View 1 documents added to the document pair instance as a feature.
|
static String |
PARAM_USE_VIEWBLIND_KEYWORD_NGRAMS_AS_FEATURES
All qualifying ngrams from anywhere in either document are used as features.
|
protected boolean |
useView1NgramsAsFeatures |
protected boolean |
useView2NgramsAsFeatures |
protected boolean |
useViewBlindNgramsAsFeatures |
fieldOfTheMoment, kngramUseTopK, ngramBinaryFeatureValuesCombos, ngramUseTopK1, ngramUseTopK2, PARAM_NGRAM_BINARY_FEATURE_VALUES_COMBO, PARAM_NGRAM_USE_TOP_K_VIEW1, PARAM_NGRAM_USE_TOP_K_VIEW2, topKSetView1, topKSetView2, topNOfTheMomentforceRereadFromIndex, LUCENE_NGRAM_FIELD, luceneDir, PARAM_SOURCE_LOCATIONdfStore, filterPartialStopwordMatches, ngramFreqThreshold, ngramLowerCase, ngramStopwordsFile, ngramUseTopK, PARAM_FILTER_PARTIAL_STOPWORD_MATCHES, PARAM_NGRAM_FREQ_THRESHOLD, PARAM_NGRAM_LOWER_CASE, PARAM_NGRAM_MAX_N, PARAM_NGRAM_MIN_N, PARAM_NGRAM_STOPWORDS_FILE, PARAM_NGRAM_USE_TOP_K, PARAM_TF_IDF_CALCULATION, prefix, stopwords, tfIdfCalculation, topKSetfeatureExtractorName, PARAM_UNIQUE_EXTRACTOR_NAME| Constructor and Description |
|---|
LuceneKeywordPFE() |
| Modifier and Type | Method and Description |
|---|---|
Set<Feature> |
extract(org.apache.uima.jcas.JCas view1,
org.apache.uima.jcas.JCas view2) |
protected String |
getFeaturePrefix() |
protected String |
getFieldName() |
List<MetaCollectorConfiguration> |
getMetaCollectorClasses(Map<String,Object> parameterSettings) |
protected int |
getTopN() |
protected de.tudarmstadt.ukp.dkpro.core.api.frequency.util.FrequencyDistribution<String> |
getViewNgrams(org.apache.uima.jcas.JCas view1,
org.apache.uima.jcas.JCas view2) |
boolean |
initialize(org.apache.uima.resource.ResourceSpecifier aSpecifier,
Map<String,Object> aAdditionalParams) |
addToFeatureArraygetTopNgrams, logSelectionProcess, passesScreeningafterResourcesInitialized, getLogger, getResourceNamepublic static final String KEYWORD_NGRAM_FIELD
public static final String PARAM_KEYWORD_NGRAM_MIN_N
protected int keywordMinN
public static final String PARAM_KEYWORD_NGRAM_MAX_N
protected int keywordMaxN
public static final String PARAM_NGRAM_KEYWORDS_FILE
protected String keywordsFile
public static final String PARAM_KEYWORD_NGRAM_MARK_SENTENCE_BOUNDARY
protected boolean markSentenceBoundary
public static final String PARAM_KEYWORD_NGRAM_MARK_SENTENCE_LOCATION
protected boolean markSentenceLocation
public static final String PARAM_KEYWORD_NGRAM_INCLUDE_COMMAS
protected boolean includeCommas
public static final String PARAM_KEYWORD_NGRAM_USE_TOP_K
protected int keywordNgramUseTopK
public static final String PARAM_KEYWORD_NGRAM_MIN_N_VIEW1
protected int ngramMinN1
public static final String PARAM_KEYWORD_NGRAM_MIN_N_VIEW2
protected int ngramMinN2
public static final String PARAM_KEYWORD_NGRAM_MAX_N_VIEW1
protected int ngramMaxN1
public static final String PARAM_KEYWORD_NGRAM_MAX_N_VIEW2
protected int ngramMaxN2
public static final String PARAM_USE_VIEW1_KEYWORD_NGRAMS_AS_FEATURES
protected boolean useView1NgramsAsFeatures
public static final String PARAM_USE_VIEW2_KEYWORD_NGRAMS_AS_FEATURES
protected boolean useView2NgramsAsFeatures
public static final String PARAM_USE_VIEWBLIND_KEYWORD_NGRAMS_AS_FEATURES
protected boolean useViewBlindNgramsAsFeatures
public static final String PARAM_MARK_VIEWBLIND_KEYWORD_NGRAMS_WITH_LOCAL_VIEW
PARAM_NGRAM_USE_TOP_K value of 500, 400 of the ngrams in
the top 500 might happen to be from View 2's; and whenever an ngram from the 500 is seen in
any document, view 1 or 2, the document's view is recorded.PARAM_USE_VIEWBLIND_KEYWORD_NGRAMS_AS_FEATURES must also be set to true.protected boolean markViewBlindNgramsWithLocalView
public static final String KEYWORD_NGRAM_FIELD1
public static final String KEYWORD_NGRAM_FIELD2
public List<MetaCollectorConfiguration> getMetaCollectorClasses(Map<String,Object> parameterSettings) throws org.apache.uima.resource.ResourceInitializationException
getMetaCollectorClasses in interface MetaDependentorg.apache.uima.resource.ResourceInitializationExceptionpublic boolean initialize(org.apache.uima.resource.ResourceSpecifier aSpecifier,
Map<String,Object> aAdditionalParams)
throws org.apache.uima.resource.ResourceInitializationException
initialize in interface org.apache.uima.resource.Resourceinitialize in class LucenePFEBaseorg.apache.uima.resource.ResourceInitializationExceptionpublic Set<Feature> extract(org.apache.uima.jcas.JCas view1, org.apache.uima.jcas.JCas view2) throws org.dkpro.tc.api.exception.TextClassificationException
extract in interface PairFeatureExtractororg.dkpro.tc.api.exception.TextClassificationExceptionprotected de.tudarmstadt.ukp.dkpro.core.api.frequency.util.FrequencyDistribution<String> getViewNgrams(org.apache.uima.jcas.JCas view1, org.apache.uima.jcas.JCas view2)
protected String getFieldName()
getFieldName in class LuceneFeatureExtractorBaseprotected int getTopN()
getTopN in class LuceneFeatureExtractorBaseprotected String getFeaturePrefix()
getFeaturePrefix in class NGramFeatureExtractorBaseCopyright © 2013–2018 Ubiquitous Knowledge Processing (UKP) Lab. All rights reserved.