Skip to main content

ATG Search and how to generate XHTMLs from STG file


The ATG search  indexing will give you the idx and stg files. When I analyse the stg files with some text editors like Textpad or Ultraedit , found some <html> and </html> tags and the contents inside these tags seems to be the same content of the temporary XHTML files , which will be generated during the search indexing for each indexed item. So I deicded to take the contents in between the <html> and </html> tags and save as XHTML file and it works for almost all indexed items. As you might know, these XHTML file’s <head> tag contains all the meta properties ( refine properties ) and the <body> tag have the text properties ( searchable properties ) for each indexed item.

Please note that the above steps are not an ATG recommended method to generate the XHTML files. I come across to this simple method to form the XHTML files and I am not 100% sure that this will give all the XHTML files of a search index . But I found this to be very useful for debugging any ATG search related issues.

Please find the below JAVA code written to genrate the XHTML files from an stg file. The main method is expecting the name of the stg file as a program argument. This will create a folder named “XHTML_Files” in the current directory and will save the XHTML files inside this folder.

Please find the below screen shot of XHTML files generated using this JAVA code :




Please find the below screen shot of a sample XHTML file generated using this JAVA code :





import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;

public class ATGSearchStgXHTMLGenerator {
public static void main(String args[]) {
try {
String fileName =  args[0];
String xhtmlfilePath = ".\\XHTML_Files";
if(fileName ==  null){
System.out.println("Give the stg file name as input");
return;
}
File stgfile = new File(fileName);
FileReader stgfileFis = new FileReader(stgfile);
BufferedReader stgfileBr = new BufferedReader(stgfileFis);
String readLine = stgfileBr.readLine();
String outToWrite = null;
String outFileName = null;
File outFile = new File(xhtmlfilePath);
boolean exists = outFile.exists();
if(!exists){
(new File(xhtmlfilePath)).mkdir();
}
FileWriter outFileWriter = null;
int lineNumber = 1;
int xhtmlFileCount = 1;
boolean canWriteXhtmlfile = false;
do {
if(readLine.contains("<html>")
&& readLine.contains("</html>")){
outToWrite = readLine.substring(readLine.indexOf("<html>"),readLine.indexOf("</html>")+7);
outFileName = "\\XHTMLFile_";
canWriteXhtmlfile = true;
}else if(readLine.contains("<html>")
&& !readLine.contains("</html>")){
System.out.println("In the STG file at lineNumber:" +lineNumber+" ERROR: html tag found, but no end tag for");
outToWrite = readLine.substring(readLine.indexOf("<html>"),readLine.length());
outFileName = "\\XHTMLFile_Error_";
canWriteXhtmlfile = true;
}
if(canWriteXhtmlfile){
outFile =  new File(xhtmlfilePath+outFileName+(xhtmlFileCount++)+".xhtml");
outFileWriter =  new FileWriter(outFile);
outFileWriter.write(outToWrite);
outFileWriter.close();
}
readLine = stgfileBr.readLine();
lineNumber++;
canWriteXhtmlfile = false;
} while (readLine != null);
System.out.println("The STG file is processed fully till lineNumber"+lineNumber);
} catch (Exception e) {
e.printStackTrace();
}
}
}

Comments

Popular posts from this blog

Search Facets - how to create a new search facets in ATG Search

A Facet is a search refinement element that corresponds to a property of a commerce item type. ATG supports the search result refinement using the Faceted Search concept. Read more about facted search @  http://en.wikipedia.org/wiki/Faceted_search . Facet can either be ranges or specific values. Each facet is stored in the RefinementRepository as a separate refineElement repository item. Facets are divided into Global and Local facets. Global facets apply to all the categories and local facets only to the category in which they are created. For example Price/Brand can be considered as the facets that are common for all skus and New Release/Coming Soon can be considered as the facets that are specific to Physical Media products like Vidoe/DVD/Blue-ray/Books. We can use the ATG BCC - Merchandising UI to create facets. The Faceting Property depends on the meta-properties defined in the \atg\commerce\search\product-catalog-output-config.xml ( the def...

ATG Search troubleshooting tips

In this blog, I have listed some basic ATG Search troubleshooting tips in some general scenarios. 1. If the index did not deploy, consider the following possible causes : Is the DeployShare property configured @ /atg/search/routing/LaunchingService component ?  Is enough space available @ deployment share box for the index ?  Are the RMI ports configured correctly in the RoutingSystemService component ?  If the search engine application is running in a separate box, this application is invoked through a RemoteLauncher running in these boxes. Check whether these RemoteLaunchers are running in these boxes ? 2. If you have trouble in launching one or more search engines, try the following remedies: If the Search engine is standalone, set the /atg/search/routing/LaunchingService component’s engineDir property to the absolute path of the Search engine directory.  3. Search unavailable in the estore page, even when the SearchEngine is in "Running" st...

Basic design decisions for a commerce search setup ( with an ATG Search view)

In this blog I would like to explain the basic set of configuration/design decisions needed to setup an ATG search project. Most of these design decisions are common for all Enterprise search applications. 1. Decide the searchable properties :   This means the properties that the business want the user to search in the ecommerce platform. In ATG search these are configured as the text properties in the product-catalog-output-config.xml ( the definitionFile of the \atg\commerce\search\ProductCatalogOutputConfig). Usually the displayName of product/sku, displayName of department/category/sub-category, skuId, brandName are the properties configured as searchable. 2. Decide the search refinement properties or the faceted properties :   After a user search for a keyword, search refinement is the next step done to filter his results. ATG supports the search refinement using the Faceted Search concept. Read more about facted search @...

ATG Search - search engine tuning settings

In this blog, I am going to list the best tuning settings for ATG Search engine. The AESoapConfig.xml, AESoapWaspConfig.xml  and AEConfig.xml are the xmls referred below and you can find it @  <ATG_DIR>\<Searchx.x>\SearchEngine\<operating_system>\bin\ folder. (1)  Make sure that the AESoapConfig.xml's rwTimeout is less than or equal to routing's readTimeoutMs. You could find the routing's readTimeoutMs @ atg\search\routing\SearchEngineService component.               rwTimeout is the  length of time in seconds to wait before a read or write operation times out on an active connection. The number can be decreased to improve performance. However, a value that is too low could cause slow connections to be prematurely closed. (2)  Adjust the number of engine threads to match the number of CPUs available to the engine. Note that the minimal value for maxThreads and maxSpar...

ATG - how to prevent Cross-Site attacks using _dynSessConf parameter

Cross-site scripting attacks take advantage of a vulnerability that enables a malicious site to use your browser to submit form requests to another site. In order to protect forms from cross-site attacks in ATG, you can enable form submissions to automatically supply the request parameter _dynSessConf , which identifies the current session through a randomly generated long number. On submission of a form (using dsp:form tag) or activation of a property setting (using dsp:a tag), the request-handling pipeline ( DAFDropletEventServlet ) validates _dynSessConf  against its session confirmation identifier. If it detects a mismatch or missing number, it can block form processing and return an error. To disable this functionality, we could give the following properties (@ /atg/dynamo/Configuration to disable it globally) enforceSessionConfirmation  = false -->  specifies whether the request-handling pipeline requires session confirmation in order to proc...