Index
All Classes and Interfaces|All Packages|Constant Field Values
A
- add(String) - Method in class org.apache.tika.langdetect.tika.LanguageProfile
-
Adds a single occurrence of the given ngram to this profile.
- add(StringBuffer) - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Adds ngrams from a single word to this profile
- add(String, long) - Method in class org.apache.tika.langdetect.tika.LanguageProfile
-
Adds multiple occurrences of the given ngram to this profile.
- addProfile(String, LanguageProfile) - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Adds a single language profile
- addText(char[], int, int) - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
- analyze(StringBuilder) - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Analyzes a piece of text
C
- clearProfiles() - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Clears the current map of language profiles
- close() - Method in class org.apache.tika.langdetect.tika.ProfilingWriter
- create(String, InputStream, String) - Static method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Creates a new Language profile from (preferably quite large - 5-10k of lines) text file
D
- DEFAULT_NGRAM_LENGTH - Static variable in class org.apache.tika.langdetect.tika.LanguageProfile
- detectAll() - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
- distance(LanguageProfile) - Method in class org.apache.tika.langdetect.tika.LanguageProfile
-
Calculates the geometric distance between this and the given other language profile.
F
- flush() - Method in class org.apache.tika.langdetect.tika.ProfilingWriter
-
Ignored.
G
- getCount() - Method in class org.apache.tika.langdetect.tika.LanguageProfile
- getCount(String) - Method in class org.apache.tika.langdetect.tika.LanguageProfile
- getErrors() - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Returns a string of error messages related to initializing language profiles
- getLanguage() - Method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Gets the identified language
- getLanguage() - Method in class org.apache.tika.langdetect.tika.ProfilingWriter
-
Returns the language that best matches the current state of the language profile.
- getName() - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
- getProfile() - Method in class org.apache.tika.langdetect.tika.ProfilingWriter
-
Returns the language profile being built by this writer.
- getRawScore() - Method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
1 - vector distance between the language model and the content
- getSimilarity(LanguageProfilerBuilder) - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Calculates a score how well NGramProfiles match each other
- getSorted() - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Returns a sorted list of ngrams (sort done by 1.
- getSupportedLanguages() - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Returns what languages are supported for language identification
H
- hasErrors() - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Tests whether there were errors initializing language config
- hasModel(String) - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
I
- initProfiles() - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Builds the language profiles.
- initProfiles(Map<String, LanguageProfile>) - Static method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Initializes the language profiles from a user supplied initialized Map.
- isReasonablyCertain() - Method in class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Tries to judge whether the identification is certain enough to be trusted.
L
- LanguageIdentifier - Class in org.apache.tika.langdetect.tika
-
Identifier of the language that best matches a given content profile.
- LanguageIdentifier(String) - Constructor for class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Constructs a language identifier based on a String of text content
- LanguageIdentifier(LanguageProfile) - Constructor for class org.apache.tika.langdetect.tika.LanguageIdentifier
-
Constructs a language identifier based on a LanguageProfile
- LanguageProfile - Class in org.apache.tika.langdetect.tika
-
Language profile based on ngram counts.
- LanguageProfile() - Constructor for class org.apache.tika.langdetect.tika.LanguageProfile
- LanguageProfile(int) - Constructor for class org.apache.tika.langdetect.tika.LanguageProfile
- LanguageProfile(String) - Constructor for class org.apache.tika.langdetect.tika.LanguageProfile
- LanguageProfile(String, int) - Constructor for class org.apache.tika.langdetect.tika.LanguageProfile
- LanguageProfilerBuilder - Class in org.apache.tika.langdetect.tika
-
This class runs a ngram analysis over submitted text, results might be used for automatic language identification.
- LanguageProfilerBuilder(String) - Constructor for class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Constructs a new ngram profile where minlen=3, maxlen=3
- LanguageProfilerBuilder(String, int, int) - Constructor for class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Constructs a new ngram profile
- load(InputStream) - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Loads a ngram profile from an InputStream (assumes UTF-8 encoded content)
- loadModels() - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
- loadModels(Set<String>) - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
M
- main(String[]) - Static method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
main method used for testing only
N
- normalize() - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Normalizes the profile (calculates the ngrams frequencies)
O
- org.apache.tika.langdetect.tika - package org.apache.tika.langdetect.tika
P
- ProfilingWriter - Class in org.apache.tika.langdetect.tika
-
Writer that builds a language profile based on all the written content.
- ProfilingWriter() - Constructor for class org.apache.tika.langdetect.tika.ProfilingWriter
- ProfilingWriter(LanguageProfile) - Constructor for class org.apache.tika.langdetect.tika.ProfilingWriter
R
- reset() - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
S
- save(OutputStream) - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
-
Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding
- setPriors(Map<String, Float>) - Method in class org.apache.tika.langdetect.tika.TikaLanguageDetector
-
not supported
T
- TikaLanguageDetector - Class in org.apache.tika.langdetect.tika
-
This is Tika's original legacy, homegrown language detector.
- TikaLanguageDetector() - Constructor for class org.apache.tika.langdetect.tika.TikaLanguageDetector
- toString() - Method in class org.apache.tika.langdetect.tika.LanguageIdentifier
- toString() - Method in class org.apache.tika.langdetect.tika.LanguageProfile
- toString() - Method in class org.apache.tika.langdetect.tika.LanguageProfilerBuilder
U
- useInterleaved - Static variable in class org.apache.tika.langdetect.tika.LanguageProfile
W
- write(char[], int, int) - Method in class org.apache.tika.langdetect.tika.ProfilingWriter
All Classes and Interfaces|All Packages|Constant Field Values