Skip to main content

ATG Search Indexing - overview of different steps in search indexing


Read more about the search indexing behind the scene steps @ http://tips4ufromsony.blogspot.in/2011/12/atg-search-indexing-behind-scene-steps.html

ATG Search prepares searchable content by indexing the products specified in the XML definition file (/atg/commerce/search/ProductCatalogOutputConfig).

Generally there are two types of indexing
1.  Full Indexing  --> all data taken for indexing
2.  Incremental Indexing --> only changed data will be taken for indexing

When full indexing is triggered, following happens:

   1. The out of box component BulkLoader will call IndexedItemsGroup.getGroupMembers() to load the products to the XHTL document. It prevents uncategorized products from getting indexed. The definition file format begins with a top-level item as a product and includes the properties of parent category and childskus. For each product, the set of Variant Producers configured in ProductCatalogOutputConfig is executed to check how many index items are to be created.

   2. XHTML documents are generated for each product, in order to submit to the engine for indexing. The XHTML is generated based on the definition file specified in ProductCatalogOutputConfig. An XHTML document that represents a Commerce product includes information about its parent category’s properties, as well as information about the properties of the child SKUs.

  3. The definition file, product-catalog-output-config is parsed to generate the text and meta properties, to be added to the index. The Text–properties indicates the properties which can be searched on. The Meta-properties indicate the properties which can be sent as constraints for faceted search. The text property will be specified in <text-properties> tag and meta property in <meta-properties> tag. The properties for which there is a custom property accessor specified, the property accessor is used to obtain the value to be indexed.

  4. After all the products have been added, the out of box PostIndexCustomization is executed to add any refineConfig and rankConfig information. This is used by the engine for generating facets and for manipulating the search results

  In case of failure in indexing, check the following logs
- JBoss server logs - <JBOSS_HOME>\server\atg\logs\server.log
- Dumping request logs Folder - <ATG_HOME>\logs\searchEngineActivity\*.xml ( request and response xmls). These logs will provide whether what was the request send to search engine in xml form and what was the response from engine for a query.
- Soap request logs - <ATG_HOME>\Search2007.1\SearchEngine\i686-win32-vc71\bin. This is used for checking the indexing failing.


Comments

  1. Hi, I've been dealing with some questions regarding Search and indexing, I got the process a lot clear now (thanks for that) but I got some questions:
    - if there are problems with facets, like not showing the right facets configured, does that mean there was a problem during the indexing on the PostIndexCustomization?
    - Why during the PostIndexCustomization the indexing can take too long, like 2 hours? And before it took like 30 min, what could be a starting point to find what is wrong?
    - Which one is better, full or incremental indexing? Can both coexist? What do you recommend?

    Thanks a lot!

    ReplyDelete
  2. - If there are issues with facets not showing on the site, first check the data in your database, then the refinement repository, then the refineconfig passed to search engine.

    - I haven't tries incremental indexing.But if you have small changes per day, you can go ahead with incremental, otherwise go with full indexing

    ReplyDelete

Post a Comment

Popular posts from this blog

Google Chrome shortcut keys

If you are a Google Chromey guy, please find below the list of shortcut keys for some of the most used features  :-) Find more shortcut keys @  http://www.google.com/support/chrome/bin/static.py?page=guide.cs&guide=25799&topic=28650

Income Tax process and e-filing

http://financeminister.in/income_tax_calculator.php https://incometaxindiaefiling.gov.in Below I am listing the step-by-step activities of the Tax Process that a working professional need to do in a given Financial Year. Here FY refers to Financial Year  and  AY refers to Assesment Year. Each month we will pay the Tax (From APRIL 20xx to MARCH 20xx+1) through our Employer for the FY 20xx – 20xx+1. In the month April 20xx, we will give the investment details to the employer (in our employer specified portal) for the FY 20xx – 20xx+1. In the month January 20xx+1, we will give the investment proof details , Rent receipts… to the Employer Finance Department for the FY 20xx – 20xx+1. In the month MAY/JUNE 20xx+1, employer gives the Form 16 for the FY 20xx – 20xx+1 to us (The proof given by the employer to the employee for the tax paid by the employee). In the month July 20xx+1 (on or before July 31st of every year), we will fill the ITR forms (earlier it was NayaSaral f...

ATG CA - different activity sources used @ BCC

Read about how a new link can be added in BCC home page @  http://tips4ufromsony.blogspot.com/2012/03/atg-ca-bcc-home-screen-how-to-add-new.html Normally an ActivitySource.properties file define the set of actions that it supports under a genericActivityDefinitionFile. But some ActivitySource.properties  define the actions  using the workflowActivityDefinitionFiles. For example consider the default "Content Administration" ,  "SearchAdministration",  " Merchanding "  and "Personalization" options in BCC homepage. Below I listed the ActivitySource.properties and other properties for these links. To get all these activitysource names, just take the / atg/bizui/activity/ActivityManager  component @ dyn/admin. Content Administration ActivitySource  --> /atg/bizui/activity/PublishingActivitySource genericActivityDefinitionFile Search Administration ActivitySource  --> /atg/bizui/...

IFSC and MICR code

For internet banking third party transfer you might need the IFSC or MICR code. IFSC or Indian Financial System Code is an alpha-numeric code that uniquely identifies a bank-branch participating in the NEFT system. MICR or Magnetic Ink Character Recognition is a character recognition technology used primarily by the banking industry to facilitate the processing of cheques. Please find below some useful links to get the IFSC and MICR code for a bank branch: http://rbi.org.in/scripts/neft.aspx http://rbidocs.rbi.org.in/rdocs/content/docs/67440.xls http://bankifsccode.com/ http://banks-india.com/ifsc-code.php http://en.wikipedia.org/wiki/Indian_Financial_System_Code http://en.wikipedia.org/wiki/Magnetic_ink_character_recognition

ATG Search architectural flow : Search and Index

I would like to explain the high level ATG Search implementation architecture ( for an online store) through the above diagram. In this diagram 1.x denotes the search functionality and 2.x denotes the indexing functionality. I have given JBoss as the application server. Physical Boxes and Application Servers in the diagram ( as recommended by ATG )  : Estore ( Commerce ) Box --> The box with the estore/site ear (with the site JSPs and Java codes). Search Engine Box --> The box with the search engine application running. Indexing Engine Box --> The box with the indexing engine application running. CA (Content Administration) Box --> The box with the ATG CA ear ( where we could take CA -BCC - Search Administration and configure the search projects) . Search Indexer Box --> The box with the ATG Search Index ear ( to fetch the index data from repository). Note that the engine performing indexing will need access ...