Google logo
Google Search Appliance Documentation

Search Protocol Reference
PDF Previous Next
Results Format

Results Format

This section covers the following topics:

Back to top

Custom HTML

This section describes the custom HTML results.

Custom HTML Output Overview

Google Search Appliance has a built-in XSLT (eXtensible Stylesheet Language Transformation) server, and can generate custom HTML using your XSL stylesheet. Search requests that include the output parameter set to xml_no_dtd and a valid proxystylesheet parameter value are automatically processed by the XSLT server as requests for custom HTML output.

Using the XSL stylesheet specified by the proxystylesheet parameter, the XSLT server applies the transformation rules found in the XSL stylesheet to the standard Google XML results. Although this document assumes that the output generated by applying the XSL stylesheet is HTML, almost any output format can be generated by using appropriate XSL stylesheet rules. For any front end, the default XSL stylesheet can be customized or replaced by the search administrator.

To customize the XSL stylesheet used to generate custom HTML output, see XML Output to determine the XML tags that may be transformed using a customized XSL stylesheet.

Additionally, you can leverage the proxycustom parameter to pass custom XML tags to the XSLT server. Because including custom XML does not generate search results, this feature is useful for implementing additional static search pages, such as an advanced search page.

Customizations to XSLT stylesheets may result in vulnerability to cross-site scripting (XSS) attacks. Google recommends that you run XSS test after customizing an XSLT stylesheet.

Notes:

When you request cached results in custom HTML output, the BLOB XML tag and associated value are automatically converted to the original text before the XSL stylesheet rules are applied. When using an XSL stylesheet that customizes cache results, simply use the values of the CACHE_LEGEND_TEXT, CACHE_LEGEND_NOTFOUND and CACHE_LEGEND_HTML XML tags directly instead of applying a rule on the BLOB subtag.

Internationalization

The Google Search Appliance handles over 20 character encoding schemes. This section discusses special considerations for the custom HTML output format with encoding schemes other than latin1.

To support all the encoding schemes supported by Google, the XSLT server follows a process to ensure that the results are returned in the correct encoding scheme. When requesting search results through the XSLT server, the server translates the results to the UTF8 encoding scheme before applying the selected XSL stylesheet. After the XSL stylesheet rules are applied to generate the results, the results are converted to the encoding scheme that is specified by the output encoding parameter, oe. The one exception to this rule is cached result pages, which get converted to the encoding scheme of the cached document after XSLT processing.

Each front end for your search appliance is associated with an underlying stylesheet. All XSL stylesheets must be in latin1 or UTF8 formats.

Back to top

XML Output

The description of the XML results format contains the following sections:

XML Output Overview

For maximum flexibility, Google provides search results in XML format. Using the Google XML results, you can use your own XML parser to customize the display for your search users. If you are using an XSL stylesheet to transform the XML results instead of developing your own XML parser, proceed to Custom HTML.

Notes:

<param name="temp" value="token_ring" original_value="token+ring" />

Character Encoding Conventions

The first line of the XML results indicates which character encoding is used. See XML Standard for information about character encoding (http://www.w3.org/TR/1998/REC-xml-19980210#charencoding).

Certain characters must be escaped when they are included as values in XML tags. These characters are documented in XML Standard (http://www.w3.org/TR/1998/REC-xml-19980210#dt-escape), and are shown in the table that follows. All other characters in the XML results are presented without modification.

 

<

either &lt; or &#60;

&

either &amp; or &#38;

>

either &gt; or &#62;

either &apos; or &#39;

"

either &quot; or &#34;

Google XML Results DTD

Google XML results can be returned with or without a reference to the most recent DTD (Document Type Definition) describing Google’s XML format. The DTD is a guide to help search administrators and XML parsers understand the XML results output. Because Google’s XML grammar may change from time to time, do not configure your parser to use the DTD to validate the XML results.

XML parsers should not be configured to fetch the DTD every time a search request is performed. Because the DTD is updated infrequently, these fetches create unnecessary delay and bandwidth requirements.

To get results in XML output format, use one of the following parameters in the search request:

output=xml_no_dtd (recommended), or

When you use the xml output format, the XML results include the line:

<!DOCTYPE GSP SYSTEM "google.dtd">

The DTD is available on the Google Search Appliance at http://<appliance_hostname>/google.dtd.

Google XML Tag Definitions

This section contains an index of Google’s XML tags.

Subtags legend:

 

?

zero or one instance of the subtag

*

zero or more instances of the subtag

+

one or more instances of the subtag

|

Boolean OR

BLOB
Format/Parent

Text (See Definition)

CACHE_HTML, CACHE_LEGEND_NOTFOUND, CACHE_LEGEND_TEXT

Subtags

None

Definition

This tag contains HTML data in the encoding format that is specified in the attribute. The data is Base64 encoded to preserve the data integrity of cached results that are encoded in a different encoding scheme than the requested results.

Attributes

 

encoding

Text (Encoding Scheme)

The encoding scheme of the HTML data

(See Internationalization for a list of common encoding values)

 

C
Format/Parent

HAS

Subtags

None

Definition

Indicates that the “cache:” special query term is supported for this search result URL.

Cached results are suppressed and this element is not returned if the <head> tag of the document contains the following <meta> tag: <meta name="ROBOTS" value="noarchive">.

Attributes

 

SZ

Text (Integer + “k”)

Provides the size of the cached version of the search result in kilobytes (“k”). This field is not populated if no cached version of a document is available, which can be the case if robots “noarchive” meta tags are used.

CID

Text

Identifier of a document in the Google Search Appliance cache. To fetch the document from the cache, send a search term of the form:

"cache:" + CID text + ":" + encoded URL.

The encoded URL is available in the UE tag. Send this search term normally, as you would type it into the search form.

ENC

Text

The encoding of the document in the cache. See Internationalization for a list of common values.

 

CACHE
Format/Parent

GSP

Subtags

CACHE_URL, CACHE_REDIR_URL, CACHE_LAST_MODIFIED, CACHE_LEGEND_FOUND?, CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE, CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML

Definition

Encapsulates the cached version of a search result.

Attributes

None

CACHE_CONTENT_TYPE
Format/Parent

Text (MIME type)

CACHE

Subtags

None

Definition

MIME type of the cached result, as specified in the HTTP header that is returned when the document is crawled.

Attributes

None

CACHE_HTML
Format/Parent

Text (HTML) (Custom HTML output only)

CACHE

Subtags

BLOB? (XML output only)

Definition

The cached version of the search result. All search results are stored in HTML format.

Attributes

None

CACHE_ENCODING
Format/Parent

Text

CACHE

Subtags

None

Definition

The encoding scheme of the cached result, as specified in the HTTP header that is returned when the document is crawled. (See Internationalization for a list of common values.)

Attributes

None

CACHE_LANGUAGE
Format/Parent

Text (Google language tag)

CACHE

Subtags

None

Definition

The language of the cached result as determined by Google’s automatic language classification algorithm. The value of this tag is the same as the values used for the automatic language collections without the “lang_” prefix (see Automatic Language Filters).

Attributes

None

CACHE_LAST_MODIFIED
Format/Parent

Text

CACHE

Subtags

None

Definition

Date that the document was crawled, as specified in the Date HTTP header when the document was crawled for this index. The crawler fetches documents from its cache if the web server responds with a 304 (not modified) status code to an if-modified-since request. In this case, the CACHE_LAST_MODIFIED is the date when the document was originally crawled and not the date of the if-modified-since request.

Attributes

None

CACHE_LEGEND_FOUND
Format/Parent

CACHE

Subtags

CACHE_LEGEND_TEXT*

Definition

Encapsulates query terms that are found in the visible text of the cached result returned.

Attributes

None

CACHE_LEGEND_NOTFOUND
Format/Parent

Text (Custom HTML output only)

CACHE

Subtags

BLOB? (XML output only)

Definition

Details of any query terms that are not visible in the cached result returned.

Attributes

None

CACHE_LEGEND_TEXT
Format/Parent

Text (Custom HTML output only)

CACHE_LEGEND_FOUND

Subtags

BLOB (XML output only)

Definition

Details of a query term that is visible in the cached result. Query terms found in the cached result are automatically highlighted using the colors described in the attributes of this tag.

Attributes

 

fgcolor

Color attribute

The foreground color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.

bgcolor

Color attribute

The background color of the query term in the cached result. This value can be used directly in a color attribute for HTML tags.

CACHE_REDIR_URL
Format/Parent

Text (Absolute URL)

CACHE

Subtags

None

Definition

Final URL of cached result after all redirects are resolved.

Attributes

None

CACHE_URL
Format/Parent

Text (Absolute URL)

CACHE

Subtags

None

Definition

Initial URL of cached result.

Attributes

None

CRAWLDATE
Format/Parent

Text

R

Subtags

None

Definition

An optional element that shows the date when the page was crawled. It is shown only for pages that have been crawled within the past two days.

Attributes

None

CT
Format/Parent

HTML

GSP

Subtags

None

Definition

Search comments.

Example comment: Sorry, no content found for this URL

Attributes

None

CUSTOM
Format/Parent

GSP

Subtags

(Custom XML specified in the search request)

Definition

Encapsulates custom XML tags that are specified in the proxycustom input parameter.

Attributes

None

ENT_SOURCE
Format/Parent

R

Subtags

None

Definition

Identifies the application ID (serial number) of the search appliance that contributes to a result.

Example:

<ENT_SOURCE>S5-KUB000F0ADETLA</ENT_SOURCE>
Attributes

None

ENTOBRESULTS
Format/Parent

GSP

Subtags

OBRES

Definition

Encapsulates the results returned by OneBox modules.

Attributes

None

FI
Format/Parent

RES

Subtags

None

Definition

Indicates that document filtering was performed during this search.

See Automatic Filtering for more details

Attributes

None

FS
Format/Parent

R

Subtags

None

Definition

Additional details about the search result.

Attributes

 

NAME

Text

Name of the result descriptor

VALUE

Text

Value of the result descriptor

GD
Format/Parent

Text (HTML)

GM

Subtags

None

Definition

Contains the description of a KeyMatch result.

Attributes

None

GL
Format/Parent

Text (URL)

GM

Subtags

None

Definition

Contains the URL of a KeyMatch result.

Attributes

None

GM
Format/Parent

GSP

Subtags

GL, GD?

Definition

Encapsulates a single KeyMatch result.

Attributes

None

GSP
Format/Parent

This is the root element.

Subtags

(CT?, CUSTOM?, ENTOBRESULTS, GM*, PARAM+, Q, RES?, Spelling?, Synonyms?, TM) | CACHE

Definition

GSP = “Google Search Protocol”

Encapsulates all data that is returned in the Google XML search results.

Attributes

 

VER

Text

Indicates version of the search results output. The current output version is “3.2”.

HAS
Format/Parent

R

Subtags

L?, C?

Definition

Encapsulates special features that are included for this search result.

Attributes

None

HN
Format/Parent

Text (URL-encoded web directory, see Appendix B: URL Encoding)

R

Subtags

None

Definition

Indicates that filtering has occurred and that additional results are available from the directory where this search result was found. The value of this tag is ready to be used with the site: query term (see Directory Restricted Search).

Attributes

 

U

Text

Server and path components of the directory’s URL.

L
Format/Parent

HAS

Subtags

None

Definition

Indicates that the “link:” special query term is supported for this search result URL.

Attributes

None

LANG
Format/Parent

Text

R

Subtags

None

Definition

Indicates the language of the search result. The LANG element contains a two-letter language code. See Automatic Language Filters for language codes.

Attributes

None

M
Format/Parent

Text (Integer)

RES

Subtags

None

Definition

The estimated total number of results for the search.

The estimate of the total number of results for a search can be too high or too low. See Appendix A: Estimated vs. Actual Number of Results.

Attributes

None

MT
Format/Parent

R

Subtags

None

Definition

Meta tag name and value pairs obtained from the search result.

Only meta tags (see Meta Tags) that are requested in the search request are returned.

Attributes

 

N

Text

Name of the meta tag

V

Text

Value of the meta tag

NB
Format/Parent

RES

Subtags

PU?, NU?

Definition

Encapsulates the navigation information for the result set.

The NB tag is present only if either the previous or additional results are available.

Attributes

None

NU
Format/Parent

Text (Relative URL)

NB

Subtags

None

Definition

Contains a relative URL pointing to the next results page.

The NU tag is present only when more results are available.

Attributes

None

OBRES
Format/Parent

ENTOBRESULTS

Subtags

The contents of the OBRES element are provided by the OneBox module, and must conform to the OneBox Results Schema. See the specific OneBox module’s documentation for details. See also the Google OneBox for Enterprise Developer’s Guide.

Definition

Encapsulates a result returned by a OneBox module.

Attributes

None

OneSynonym
Format/Parent

HTML

Synonyms

Subtags

None

Definition

A related query for the submitted query, in HTML format.

Attributes

 

q

Text

The URL-encoded version of the related query (see Appendix B: URL Encoding)

PARAM
Format/Parent

GSP

Subtags

None

Definition

The search request parameters that were submitted to the Google Search Appliance to generate these results.

Attributes

 

name

Text

Name of the input parameter

value

HTML

HTML-formatted version of the input parameter value

original_value

Text

Original URL-encoded version of the input parameter value (see Appendix B: URL Encoding)

PARM
Format/Parent

RES

Subtags

PC, PMT*

Definition

Encapsulates all dynamic navigation results.

Attributes

None

PC
Format/Parent

Text (Integer 0 or 1)

PARM

Subtags

None

Definition

Indicates whether the counts are exact or partial. 0-exact, 1-partial.

None

PMT
Format/Parent

PARM

Subtags

PV+

Definition

Encapsulates results for one attribute. A maximum of 5k values (PV) are returned after sorting all by count or value as configured and discarding the rest.

Attributes

 

NM

Text

Metatag name

DN

Text

Display name

IR

Text (Integer)

Attribute is range type (1) or not (0)

T

Text (Integer)

Attribute type: 0-String, 1-Integer, 2-Float, 3-Currency, 4-Date

PU
Format/Parent

Text (Relative URL)

NB

Subtags

None

Definition

Contains relative URL to the previous results page.

The PU tag is present only if previous results are available.

Attributes

None

PV
Format/Parent

PMT

Subtags

None

Definition

Encapsulates one value count information.

Attributes

 

V

Text

Value (empty for range attributes)

L

Text

Contains low range value (empty for non-range attribute)

H

Text

Contains high range value (empty for non-range attribute)

C

Text (Integer)

Doc count matching this value or under this range

Q
Format/Parent

HTML

GSP

Subtags

None

Definition

The search query terms submitted to the Google search appliance to generate these results.

Attributes

None

R
Format/Parent

RES

Subtags

CRAWLDATE, FS?, HAS, HN?, LANG, MT*, RK, S?, T?, U, UD, UE

Definition

Encapsulates the details of an individual search result.

Attributes

 

N

Text (Integer)

The index number (1-based) of this search result.

L

Text (Integer)

The recommended indentation level of the results. This value is 1 unless Duplicate Directory Filtering occurs (see Automatic Filtering). In this case, the second directory result has a value of 2.

MIME

Text

The MIME type of the search result.

RES
Format/Parent

GSP

Subtags

FI?, M, NB?, PARM?, R*, XT?

Definition

Encapsulates the set of all search results.

Attributes

 

SN

Text (Integer)

The index (1-based) of the first search result returned in this result set.

EN

Text (Integer)

Indicates the index (1-based) of the last search result returned in this result set.

RK
Format/Parent

Text (Integer in the range 0-10)

Subtags

None

Definition

The RK parameter assigns a ranking score to each page on a scale from 0 (least important) to 10 (most important) based on how well the result matches the query. When search results are sorted by relevancy, the RK value is in decreasing order (highest to lowest).

To see the RK values, you must view search results in raw XML, as described in the following steps:

2.
3.
On the Advanced Search page, edit the query parameters:
a.
Change the output parameter to &output=xml
b.
Remove &proxystylesheet=default_frontend
c.
Add &getfield=*

The XML results show the RK parameter for each result, for example: <RK>10</RK>.

Attributes

None

S
Format/Parent

Text (HTML)

R

Subtags

None

Definition

The snippet for the search result.

Query terms appear in bold in the results. Line breaks are included for proper text wrapping.

In documents larger than 300KB, snippets may not contain query terms that occur beyond the first 300KB of the document. For non-HTML documents, the 300KB limit applies to the converted version, not the original document.

Attributes

None

SCOREBIAS
Format/Parent

Text (XML)

R

Subtags

None

Definition

The SCOREBIAS tag can appear zero or more times as a child of the R tag (see R) for each result. The SCOREBIAS tag appears for each result biaser that is applied.

The NAME attribute is the name of the result biaser.

The VALUE attribute indicates the effect of the biaser. For biasers where the strength is expressed symbolically, such as source or collection biasing and metadata biasing.

The search appliance does not include any information about the exact change in score or rank, or the weight of the result biaser.

The following example indicates a medium increase in the PatternScorer result biaser:

<SCOREBIAS NAME="PatternScorer" VALUE="2">
Attributes

 

NAME

PatternScorer

Text

Used for both source biasing and collection biasing.

DateBias

Text

Used for date biasing.

Metadata

Text

Used for metadata biasing.

VALUE

3

Text

For a strong increase.

2

Text (integer)

For a medium increase.

1

Text (integer)

For a weak increase.

0

Text (integer)

For no change.

-3

Text (integer)

For a strong decrease.

-2

Text (integer)

For a medium decrease.

-1

Text (integer)

For a weak decrease.

For biasers that do not use a symbolic change, such as date biasing, VALUE has these numerical values:

Spelling
Format/Parent

GSP

Subtags

Suggestion+

Definition

Encapsulates alternate spelling suggestions for the submitted query. Only one spelling suggestion is returned at this time.

Attributes

None

Suggestion
Format/Parent

HTML

Spelling

Subtags

None

Definition

An alternate spelling suggestion for the submitted query, in HTML format.

Attributes

 

q

Text

The spelling suggestion.

qe

Text

Internal-only attribute of the spelling suggestion. This attribute works when the search results are transformed on the search appliance, but not on external parsers.

Synonyms
Format/Parent

GSP

Subtags

OneSynonym+

Definition

Encapsulates the related queries for the submitted query. Up to 20 related queries may be returned, depending on the related queries list that is associated with the front end.

Attributes

None

T
Format/Parent

Text (HTML)

R

Subtags

None

Definition

The title of the search result.

Attributes

None

TM
Format/Parent

Text (Floating-point number)

GSP

Subtags

None

Definition

Total server time to return search results, measured in seconds.

Attributes

None

U
Format/Parent

Text (Absolute URL)

R

Subtags

None

Definition

The URL of the search result.

Attributes

None

UD
Format/Parent

Text (URL to display for non-ASCII URLs)

R

Subtags

None

Definition

The URL string to display when the URL that is in the U parameter is non-ASCII. Displays UTF-8 characters and IDNA domain names properly.

Attributes

None

UE
Format/Parent

Text (URL-encoded version of the URL)

R

Subtags

None

Definition

The URL-encoded version of the URL that is in the U parameter.

Attributes

None

XT
Format/Parent

RES

Subtags

None

Definition

Indicates that the estimated total number of results specified in this search result is exact.

See Automatic Filtering for more details.

Attributes

None