Skip to main content

ATG Search Indexing - overview of different steps in search indexing


Read more about the search indexing behind the scene steps @ http://tips4ufromsony.blogspot.in/2011/12/atg-search-indexing-behind-scene-steps.html

ATG Search prepares searchable content by indexing the products specified in the XML definition file (/atg/commerce/search/ProductCatalogOutputConfig).

Generally there are two types of indexing
1.  Full Indexing  --> all data taken for indexing
2.  Incremental Indexing --> only changed data will be taken for indexing

When full indexing is triggered, following happens:

   1. The out of box component BulkLoader will call IndexedItemsGroup.getGroupMembers() to load the products to the XHTL document. It prevents uncategorized products from getting indexed. The definition file format begins with a top-level item as a product and includes the properties of parent category and childskus. For each product, the set of Variant Producers configured in ProductCatalogOutputConfig is executed to check how many index items are to be created.

   2. XHTML documents are generated for each product, in order to submit to the engine for indexing. The XHTML is generated based on the definition file specified in ProductCatalogOutputConfig. An XHTML document that represents a Commerce product includes information about its parent category’s properties, as well as information about the properties of the child SKUs.

  3. The definition file, product-catalog-output-config is parsed to generate the text and meta properties, to be added to the index. The Text–properties indicates the properties which can be searched on. The Meta-properties indicate the properties which can be sent as constraints for faceted search. The text property will be specified in <text-properties> tag and meta property in <meta-properties> tag. The properties for which there is a custom property accessor specified, the property accessor is used to obtain the value to be indexed.

  4. After all the products have been added, the out of box PostIndexCustomization is executed to add any refineConfig and rankConfig information. This is used by the engine for generating facets and for manipulating the search results

  In case of failure in indexing, check the following logs
- JBoss server logs - <JBOSS_HOME>\server\atg\logs\server.log
- Dumping request logs Folder - <ATG_HOME>\logs\searchEngineActivity\*.xml ( request and response xmls). These logs will provide whether what was the request send to search engine in xml form and what was the response from engine for a query.
- Soap request logs - <ATG_HOME>\Search2007.1\SearchEngine\i686-win32-vc71\bin. This is used for checking the indexing failing.


Comments

  1. Hi, I've been dealing with some questions regarding Search and indexing, I got the process a lot clear now (thanks for that) but I got some questions:
    - if there are problems with facets, like not showing the right facets configured, does that mean there was a problem during the indexing on the PostIndexCustomization?
    - Why during the PostIndexCustomization the indexing can take too long, like 2 hours? And before it took like 30 min, what could be a starting point to find what is wrong?
    - Which one is better, full or incremental indexing? Can both coexist? What do you recommend?

    Thanks a lot!

    ReplyDelete
  2. - If there are issues with facets not showing on the site, first check the data in your database, then the refinement repository, then the refineconfig passed to search engine.

    - I haven't tries incremental indexing.But if you have small changes per day, you can go ahead with incremental, otherwise go with full indexing

    ReplyDelete

Post a Comment

Popular posts from this blog

ATG - more about Forms and Form Handlers

An ATG form is defined by the dsp:form tag, which typically encloses DSP tags that specify form elements, such as dsp:input that provide direct access to Nucleus component properties. Find below a sample dsp:form tag.    <dsp:form action="/testPages/showPersonProperties.jsp" method="post" target="_top">      <p>Name: <dsp:input bean="/samples/Person.name" type="text"/>      <p>Age: <dsp:input bean="/samples/Person.age" type="text" value="30"/>      <p><dsp:input type="submit" bean="/samples/Person.submit"/> value="Click to submit"/>    </dsp:form>   When the user submits the form, the /samples/Person.name property is set to the value entered in the input field.Unlike standard HTML, which requires the name attribute for most input tags; the name attribute is optional for DSP form element tags. If an input tag omits the n...

Eclipse plug-in to create Class and Sequence diagrams

ModelGoon is an Eclipse plug-in avaiable for UML diagram generation from Java code. It can be used to generate Package Dependencies Diagram, Class Diagram, Interaction Diagram and Sequence Diagram. You coud get it from http://marketplace.eclipse.org/content/modelgoon-uml4java Read more about it and see some vedios about how to create the class and sequence diagram @ http://www.modelgoon.org/?tag=eclipse-plugin Find some snapshots below which gives an idea about the diagram generation.

ATG Search architectural flow : Search and Index

I would like to explain the high level ATG Search implementation architecture ( for an online store) through the above diagram. In this diagram 1.x denotes the search functionality and 2.x denotes the indexing functionality. I have given JBoss as the application server. Physical Boxes and Application Servers in the diagram ( as recommended by ATG )  : Estore ( Commerce ) Box --> The box with the estore/site ear (with the site JSPs and Java codes). Search Engine Box --> The box with the search engine application running. Indexing Engine Box --> The box with the indexing engine application running. CA (Content Administration) Box --> The box with the ATG CA ear ( where we could take CA -BCC - Search Administration and configure the search projects) . Search Indexer Box --> The box with the ATG Search Index ear ( to fetch the index data from repository). Note that the engine performing indexing will need access ...

ATG Search - how estore(commerce instance) forms the search engine SOAP URL ?

The comminucation between the Commerce box and the Search engine is through SOAP. Read  more about this architecture @  http://tips4ufromsony.blogspot.in/2011/11/atg-search-architectural-flow-search.html The commerce instance forms the SOAP url just like the below code: private URL getSearchEngineURL(SearchEngine engine) {       SearchEnvironmentHost h =  engine.getSearchEnvironmentHost();       SearchMachine hi = h.getSearchMachine() ;       return new URL( "http://" + hi.getHostname() + ":" + engine.getPort() + "/AEXmlService/" );   } So the commerce instance need the hi.getHostname()  and engine.getPort() to form the url. It is obtained as below: 1. The component / atg/commerce/search/refinement/ CommerceFacetSearchService has the siteName defined, which will be pointing to the environment name defined in the Search Project. Read  more about this search project setup @  http://...

CamStudio - to capture your screen activity into video (Screen casting free software)

CamStudio is a tool (open source) for recording screen activity into standard AVI video files (screen casting software). It also have the audio record feature. It can also used to convert AVIs into Flash Video format. Read more about screencast @  http://en.wikipedia.org/wiki/Screencast . You can download CamStudio from:   http://sourceforge.net/projects/camstudio/    or   http://camstudio.org/  . I have uploaded a demo video, recorded using the Camstudio release 2.6. You could watch a high quality  video @ Youtube:  http://www.youtube.com/watch?v=7S-6aHFcuUM or you could find a video with low resolution below : CamStudio can be used to: Create movies used in user trainings Demonstrate features of a new software Track the progress of a program that executes for a long time Record the sequence of steps that cause the occurrence of bugs in a faulty software Record a movie stream  Convert AVI files to Flash (...