Package org.apache.tika.langdetect.tika
Class TikaLanguageDetector
- java.lang.Object
-
- org.apache.tika.language.detect.LanguageDetector
-
- org.apache.tika.langdetect.tika.TikaLanguageDetector
-
public class TikaLanguageDetector extends org.apache.tika.language.detect.LanguageDetectorThis is Tika's original legacy, homegrown language detector. As it is currently implemented, it computes vector distance of trigrams between input string and language models.Because it works only on trigrams, it is not suitable for short texts.
There are better performing language detectors. This module is still here in the hopes that we'll get around to improving it, because it is elegant and could be fairly trivially improved.
-
-
Constructor Summary
Constructors Constructor Description TikaLanguageDetector()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddText(char[] cbuf, int off, int len)List<org.apache.tika.language.detect.LanguageResult>detectAll()booleanhasModel(String language)org.apache.tika.language.detect.LanguageDetectorloadModels()org.apache.tika.language.detect.LanguageDetectorloadModels(Set<String> languages)voidreset()org.apache.tika.language.detect.LanguageDetectorsetPriors(Map<String,Float> languageProbabilities)not supported
-
-
-
Method Detail
-
loadModels
public org.apache.tika.language.detect.LanguageDetector loadModels() throws IOException- Specified by:
loadModelsin classorg.apache.tika.language.detect.LanguageDetector- Throws:
IOException
-
loadModels
public org.apache.tika.language.detect.LanguageDetector loadModels(Set<String> languages) throws IOException
- Specified by:
loadModelsin classorg.apache.tika.language.detect.LanguageDetector- Throws:
IOException
-
hasModel
public boolean hasModel(String language)
- Specified by:
hasModelin classorg.apache.tika.language.detect.LanguageDetector
-
setPriors
public org.apache.tika.language.detect.LanguageDetector setPriors(Map<String,Float> languageProbabilities) throws IOException
not supported- Specified by:
setPriorsin classorg.apache.tika.language.detect.LanguageDetector- Parameters:
languageProbabilities- Map from language to probability- Returns:
- Throws:
IOException
-
reset
public void reset()
- Specified by:
resetin classorg.apache.tika.language.detect.LanguageDetector
-
addText
public void addText(char[] cbuf, int off, int len)- Specified by:
addTextin classorg.apache.tika.language.detect.LanguageDetector
-
detectAll
public List<org.apache.tika.language.detect.LanguageResult> detectAll()
- Specified by:
detectAllin classorg.apache.tika.language.detect.LanguageDetector
-
-