Package org.apache.tika.langdetect.tika
Class TikaLanguageDetector
java.lang.Object
org.apache.tika.language.detect.LanguageDetector
org.apache.tika.langdetect.tika.TikaLanguageDetector
public class TikaLanguageDetector
extends org.apache.tika.language.detect.LanguageDetector
This is Tika's original legacy, homegrown language detector.
As it is currently implemented, it computes vector distance
of trigrams between input string and language models.
Because it works only on trigrams, it is not suitable for short texts.
There are better performing language detectors. This module is still here in the hopes that we'll get around to improving it, because it is elegant and could be fairly trivially improved.
-
Field Summary
Fields inherited from class org.apache.tika.language.detect.LanguageDetector
mixedLanguages, shortText -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidaddText(char[] cbuf, int off, int len) List<org.apache.tika.language.detect.LanguageResult>booleanorg.apache.tika.language.detect.LanguageDetectororg.apache.tika.language.detect.LanguageDetectorloadModels(Set<String> languages) voidreset()org.apache.tika.language.detect.LanguageDetectornot supportedMethods inherited from class org.apache.tika.language.detect.LanguageDetector
addText, detect, detect, detectAll, getDefaultLanguageDetector, getLanguageDetectors, getLanguageDetectors, hasEnoughText, isMixedLanguages, isShortText, setMixedLanguages, setShortText
-
Constructor Details
-
TikaLanguageDetector
public TikaLanguageDetector()
-
-
Method Details
-
loadModels
- Specified by:
loadModelsin classorg.apache.tika.language.detect.LanguageDetector- Throws:
IOException
-
loadModels
public org.apache.tika.language.detect.LanguageDetector loadModels(Set<String> languages) throws IOException - Specified by:
loadModelsin classorg.apache.tika.language.detect.LanguageDetector- Throws:
IOException
-
hasModel
- Specified by:
hasModelin classorg.apache.tika.language.detect.LanguageDetector
-
setPriors
public org.apache.tika.language.detect.LanguageDetector setPriors(Map<String, Float> languageProbabilities) throws IOExceptionnot supported- Specified by:
setPriorsin classorg.apache.tika.language.detect.LanguageDetector- Parameters:
languageProbabilities- Map from language to probability- Returns:
- Throws:
IOException
-
reset
public void reset()- Specified by:
resetin classorg.apache.tika.language.detect.LanguageDetector
-
addText
public void addText(char[] cbuf, int off, int len) - Specified by:
addTextin classorg.apache.tika.language.detect.LanguageDetector
-
detectAll
- Specified by:
detectAllin classorg.apache.tika.language.detect.LanguageDetector
-