Skip to main content

ATG Search Indexing - overview of different steps in search indexing


Read more about the search indexing behind the scene steps @ http://tips4ufromsony.blogspot.in/2011/12/atg-search-indexing-behind-scene-steps.html

ATG Search prepares searchable content by indexing the products specified in the XML definition file (/atg/commerce/search/ProductCatalogOutputConfig).

Generally there are two types of indexing
1.  Full Indexing  --> all data taken for indexing
2.  Incremental Indexing --> only changed data will be taken for indexing

When full indexing is triggered, following happens:

   1. The out of box component BulkLoader will call IndexedItemsGroup.getGroupMembers() to load the products to the XHTL document. It prevents uncategorized products from getting indexed. The definition file format begins with a top-level item as a product and includes the properties of parent category and childskus. For each product, the set of Variant Producers configured in ProductCatalogOutputConfig is executed to check how many index items are to be created.

   2. XHTML documents are generated for each product, in order to submit to the engine for indexing. The XHTML is generated based on the definition file specified in ProductCatalogOutputConfig. An XHTML document that represents a Commerce product includes information about its parent category’s properties, as well as information about the properties of the child SKUs.

  3. The definition file, product-catalog-output-config is parsed to generate the text and meta properties, to be added to the index. The Text–properties indicates the properties which can be searched on. The Meta-properties indicate the properties which can be sent as constraints for faceted search. The text property will be specified in <text-properties> tag and meta property in <meta-properties> tag. The properties for which there is a custom property accessor specified, the property accessor is used to obtain the value to be indexed.

  4. After all the products have been added, the out of box PostIndexCustomization is executed to add any refineConfig and rankConfig information. This is used by the engine for generating facets and for manipulating the search results

  In case of failure in indexing, check the following logs
- JBoss server logs - <JBOSS_HOME>\server\atg\logs\server.log
- Dumping request logs Folder - <ATG_HOME>\logs\searchEngineActivity\*.xml ( request and response xmls). These logs will provide whether what was the request send to search engine in xml form and what was the response from engine for a query.
- Soap request logs - <ATG_HOME>\Search2007.1\SearchEngine\i686-win32-vc71\bin. This is used for checking the indexing failing.


Comments

  1. Hi, I've been dealing with some questions regarding Search and indexing, I got the process a lot clear now (thanks for that) but I got some questions:
    - if there are problems with facets, like not showing the right facets configured, does that mean there was a problem during the indexing on the PostIndexCustomization?
    - Why during the PostIndexCustomization the indexing can take too long, like 2 hours? And before it took like 30 min, what could be a starting point to find what is wrong?
    - Which one is better, full or incremental indexing? Can both coexist? What do you recommend?

    Thanks a lot!

    ReplyDelete
  2. - If there are issues with facets not showing on the site, first check the data in your database, then the refinement repository, then the refineconfig passed to search engine.

    - I haven't tries incremental indexing.But if you have small changes per day, you can go ahead with incremental, otherwise go with full indexing

    ReplyDelete

Post a Comment

Popular posts from this blog

JBoss - know more about the JBoss directory structure

Fundamentally, the JBoss architecture consists of the JMX MBean server, the microkernel, and a set of pluggable component services - the MBeans. The JBoss Application Server ships with three different server configurations. Within the <JBoss_Home>/server directory, you will find three subdirectories: minimal, default and all. The default configuration is the one used if you don’t specify another one when starting up the server. If you want to know which services are configured in each of these instances, look at the jboss-service.xml file in the <JBoss_Home>/server/<instance-name>/conf/ directory and also the configuration files in the <JBoss_Home>/server/<instance-name>/deploy directory. JBoss 4.0 features an embedded Apache Tomcat 5.5 servlet container. conf --> The conf directory contains the jboss-service.xml bootstrap descriptor file for a given server configuration. This has the jboss-log4j.xml file which configures the Apach...

How to simulate Browser back button

When someone asks how to simulate a back button, they really mean to ask how to create a link that points to the previously visited page. Most browsers tend to keep a list of which websites the user has visited and in what order they have done so. The DOM window object provides access to the browser's history through the history object. Moving backward and forward through the user's history is done using the   back(), forward(), and go() methods of the  history  object. To move backward through history, just do window.history.back() ; This will act exactly like the user clicked on the Back button in their browser toolbar. Find below a sample html code: <html> <head> <script type="text/javascript"> function goBack(){  window.history.back() } </script> </head> <body>    <input type="button" value="Back" onclick="goBack()" /> </body> </html>

ATG Search and how to generate XHTMLs from STG file

The ATG search  indexing will give you the idx and stg files. When I analyse the stg files with some text editors like Textpad or Ultraedit , found some <html> and </html> tags and the contents inside these tags seems to be the same content of the temporary XHTML files , which will be generated during the search indexing for each indexed item. So I deicded to take the contents in between the <html> and </html> tags and save as XHTML file and it works for almost all indexed items. As you might know, these XHTML file’s <head> tag contains all the meta properties ( refine properties ) and the <body> tag have the text properties ( searchable properties ) for each indexed item. Please note that the above steps are not an ATG recommended method to generate the XHTML files. I come across to this simple method to form the XHTML files and I am not 100% sure that this will give all the XHTML files of a search index . But I found this to be very useful f...

CamStudio - to capture your screen activity into video (Screen casting free software)

CamStudio is a tool (open source) for recording screen activity into standard AVI video files (screen casting software). It also have the audio record feature. It can also used to convert AVIs into Flash Video format. Read more about screencast @  http://en.wikipedia.org/wiki/Screencast . You can download CamStudio from:   http://sourceforge.net/projects/camstudio/    or   http://camstudio.org/  . I have uploaded a demo video, recorded using the Camstudio release 2.6. You could watch a high quality  video @ Youtube:  http://www.youtube.com/watch?v=7S-6aHFcuUM or you could find a video with low resolution below : CamStudio can be used to: Create movies used in user trainings Demonstrate features of a new software Track the progress of a program that executes for a long time Record the sequence of steps that cause the occurrence of bugs in a faulty software Record a movie stream  Convert AVI files to Flash (...

Eclipse plugin: InstaSearch – for quick search

InstaSearch is an Eclipse plug-in for performing quick and advanced search of workspace files. This will index the files and when you search for some file contents, it will look with in this index and the search results will be faster, just like the Goolge instant search. It uses Lucene ( http://lucene.apache.org/ ) for indexing and fast searching of files in the workspace. Each search result file then can be previewed using few most matching and relevant lines. A double-click on the match leads to the matching line in the file. Main Features Instantly shows search results Shows a preview using relevant lines Periodically updates the index Matches partial words (e.g. case in CamelCase) Opens and highlights matches in files Searches JAR source attachments Supports filtering by extension/project/working set Download / Installation In Eclipse Helios (3.6) please install using the  Eclipse Marketplace from the Help menu http://marketplace.eclipse.org/s...