| Package | Description |
|---|---|
| org.opencms.search |
Implements the main full text search and indexing functions available in OpenCms.
|
| org.opencms.search.documents |
Handles indexing different sorts of document and resource type from the OpenCms VFS for the full text search.
|
| org.opencms.search.extractors |
Contains a generic, low-level framework for extration of plain text content out of various popular file formats.
|
| org.opencms.search.fields |
These classes control the mapping of the OpenCms content to the Lucene search fields.
|
| org.opencms.search.solr |
The package contains the Solr search integration.
|
| Modifier and Type | Method and Description |
|---|---|
I_CmsExtractionResult |
I_CmsSearchIndex.getContentIfUnchanged(CmsResource resource)
The method should return the extraction result of a content from the index, if sure the
content has not changed since last indexing.
|
I_CmsExtractionResult |
CmsSearchIndex.getContentIfUnchanged(CmsResource resource) |
I_CmsExtractionResult |
A_CmsSearchIndex.getContentIfUnchanged(CmsResource resource)
We always assume we have no unchanged copy of the content, since it depends on the concrete index.
|
| Modifier and Type | Method and Description |
|---|---|
I_CmsExtractionResult |
CmsDocumentPdf.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource containing Adobe PDF data.
|
I_CmsExtractionResult |
CmsDocumentMsOfficeOOXML.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource containing MS Word data.
|
I_CmsExtractionResult |
CmsDocumentPlainText.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource containing plain text data.
|
I_CmsExtractionResult |
CmsDocumentHtml.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given VFS resource containing HTML data.
|
I_CmsExtractionResult |
CmsDocumentGeneric.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Just returns an empty extraction result since the content can't be extracted form a generic resource.
|
I_CmsExtractionResult |
I_CmsSearchExtractor.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Extracts the content of a given index resource according to the resource file type and the
configuration of the given index.
|
I_CmsExtractionResult |
CmsDocumentRtf.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource containing RTF data.
|
I_CmsExtractionResult |
CmsDocumentXmlPage.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource of type
CmsResourceTypeXmlPage. |
I_CmsExtractionResult |
CmsDocumentMsOfficeOLE2.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource containing MS Word data.
|
I_CmsExtractionResult |
CmsDocumentOpenOffice.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given vfs resource containing MS Word data.
|
I_CmsExtractionResult |
CmsDocumentContainerPage.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a VFS resource of type
CmsResourceTypeContainerPage. |
I_CmsExtractionResult |
CmsDocumentXmlContent.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a given VFS resource of type
CmsResourceTypeXmlContent. |
| Modifier and Type | Method and Description |
|---|---|
void |
CmsExtractionResultCache.saveCacheObject(java.lang.String rfsName,
I_CmsExtractionResult content)
Serializes the given extraction result and saves it in the disk cache.
|
| Modifier and Type | Class and Description |
|---|---|
class |
CmsExtractionResult
The result of a document text extraction.
|
| Modifier and Type | Method and Description |
|---|---|
I_CmsExtractionResult |
I_CmsTextExtractor.extractText(byte[] content)
Extracts the text and meta information from the given binary document.
|
I_CmsExtractionResult |
A_CmsTextExtractor.extractText(byte[] content) |
I_CmsExtractionResult |
I_CmsTextExtractor.extractText(byte[] content,
java.lang.String encoding)
Extracts the text and meta information from the given binary document, using the specified content encoding.
|
I_CmsExtractionResult |
A_CmsTextExtractor.extractText(byte[] content,
java.lang.String encoding) |
I_CmsExtractionResult |
I_CmsTextExtractor.extractText(java.io.InputStream in)
Extracts the text and meta information from the document on the input stream.
|
I_CmsExtractionResult |
CmsExtractorMsOfficeOOXML.extractText(java.io.InputStream in) |
I_CmsExtractionResult |
CmsExtractorRtf.extractText(java.io.InputStream in) |
I_CmsExtractionResult |
CmsExtractorPdf.extractText(java.io.InputStream in) |
I_CmsExtractionResult |
A_CmsTextExtractor.extractText(java.io.InputStream in) |
I_CmsExtractionResult |
CmsExtractorMsOfficeOLE2.extractText(java.io.InputStream in) |
I_CmsExtractionResult |
I_CmsTextExtractor.extractText(java.io.InputStream in,
java.lang.String encoding)
Extracts the text and meta information from the document on the input stream, using the specified content encoding.
|
I_CmsExtractionResult |
CmsExtractorOpenOffice.extractText(java.io.InputStream in,
java.lang.String encoding) |
I_CmsExtractionResult |
A_CmsTextExtractor.extractText(java.io.InputStream in,
java.lang.String encoding) |
I_CmsExtractionResult |
CmsExtractorHtml.extractText(java.io.InputStream in,
java.lang.String encoding) |
I_CmsExtractionResult |
CmsExtractionResult.merge(java.util.List<I_CmsExtractionResult> extractionResults) |
I_CmsExtractionResult |
I_CmsExtractionResult.merge(java.util.List<I_CmsExtractionResult> extractionResults)
Appends, for the locales of the current collection result, the content fields
from all provided extraction results to the current extraction result.
|
| Modifier and Type | Method and Description |
|---|---|
I_CmsExtractionResult |
CmsExtractionResult.merge(java.util.List<I_CmsExtractionResult> extractionResults) |
I_CmsExtractionResult |
I_CmsExtractionResult.merge(java.util.List<I_CmsExtractionResult> extractionResults)
Appends, for the locales of the current collection result, the content fields
from all provided extraction results to the current extraction result.
|
| Modifier and Type | Method and Description |
|---|---|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendAdditionalValuesToDcoument(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extraction,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Overriding this method allows to append some 'extra' values/fields to a document
without overriding the
CmsSearchFieldConfiguration.createDocument(org.opencms.file.CmsObject, org.opencms.file.CmsResource, org.opencms.search.I_CmsSearchIndex, org.opencms.search.extractors.I_CmsExtractionResult) method itself. |
protected I_CmsSearchDocument |
CmsSearchFieldConfigurationOldCategories.appendCategories(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by resource category information based on properties.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendCategories(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by resource category information based on properties.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendContentBlob(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by a field that contains the extracted content blob.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendDates(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by fields for date of creation, content and last modification.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendFieldMapping(I_CmsSearchDocument document,
CmsSearchField field,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by the mappings for the given field.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendFieldMappings(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by the configured field mappings.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendFileSize(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by the "size" field.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendLocales(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extraction,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by the "res_locales" field.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendPath(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by fields for VFS path lookup.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendProperties(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extraction,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Appends all direct properties, that are not empty or white space only to the document.
|
protected I_CmsSearchDocument |
CmsSearchFieldConfiguration.appendType(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Extends the given document by a field that contains the resource type name.
|
I_CmsSearchDocument |
I_CmsSearchFieldConfiguration.createDocument(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index,
I_CmsExtractionResult extractionResult)
Creates the document to index.
|
I_CmsSearchDocument |
CmsSearchFieldConfiguration.createDocument(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index,
I_CmsExtractionResult extraction)
Creates the Lucene Document with this field configuration for the provided VFS resource, search index and content.
|
java.lang.String |
CmsSearchFieldMapping.getStringValue(CmsObject cms,
CmsResource res,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
java.lang.String |
I_CmsSearchFieldMapping.getStringValue(CmsObject cms,
CmsResource res,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched)
Returns the String value extracted form the provided data according to the rules of this mapping type.
|
| Modifier and Type | Method and Description |
|---|---|
I_CmsExtractionResult |
CmsSolrDocumentXmlContent.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index) |
I_CmsExtractionResult |
CmsSolrDocumentContainerPage.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index)
Returns the raw text content of a VFS resource of type
CmsResourceTypeContainerPage. |
I_CmsExtractionResult |
CmsSolrDocumentContainerPage.extractContent(CmsObject cms,
CmsResource resource,
I_CmsSearchIndex index,
java.util.Locale forceLocale)
Extracts the content of a given index resource according to the resource file type and the
configuration of the given index.
|
| Modifier and Type | Method and Description |
|---|---|
protected I_CmsSearchDocument |
CmsSolrFieldConfiguration.appendAdditionalValuesToDcoument(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
protected I_CmsSearchDocument |
CmsSolrFieldConfiguration.appendDates(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
protected I_CmsSearchDocument |
CmsSolrFieldConfiguration.appendFieldMapping(I_CmsSearchDocument document,
CmsSearchField sfield,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
protected I_CmsSearchDocument |
CmsSolrFieldConfiguration.appendFieldMappings(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extractionResult,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
protected I_CmsSearchDocument |
CmsSolrFieldConfiguration.appendLocales(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extraction,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
protected I_CmsSearchDocument |
CmsSolrFieldConfiguration.appendProperties(I_CmsSearchDocument document,
CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extraction,
java.util.List<CmsProperty> properties,
java.util.List<CmsProperty> propertiesSearched) |
protected java.util.List<java.util.Locale> |
CmsSolrFieldConfiguration.getContentLocales(CmsObject cms,
CmsResource resource,
I_CmsExtractionResult extraction)
Retrieves the locales for an content, that is whether an XML content nor an XML page.
|