Google logo
Google Search Appliance Documentation

Search Protocol Reference
PDF Previous Next
Appendices

Appendices

This section contains:

Back to top

Appendix A: Estimated vs. Actual Number of Results

The Google Search Appliance does not guarantee the ability to return a particular number of results for any given search query. The total count of results is an estimate of the actual number of results for the search request. This section covers issues relating to this topic.

In search appliance software version 6.2 and later, the estimated number of results is different depending on whether filtering is enabled.

You can use the rc search parameter to request an accurate result count for up to 1M documents, but it might introduce high latency.

Counting Results in Secure Search

The total count of search results is not provided when a secure search is performed, regardless of which type of output format, XML or HTML, is used. A secure search request includes the parameters access=a or access=s.

How the Google Search Appliance Determines the Number of Results to Return

When search results are returned, the number of results is determined by one of the following conditions:

To determine if a results page is the last page of available results, check for any of the following conditions:

Navigation

When the total number of results returned is an estimate, the navigation structure for search results is based on this estimate. Google recommends two approaches for generating a navigation scheme for your search results:

Automatic Filtering

When the automatic filtering feature is active, the number of results returned is significantly reduced. Automatic filtering reduces undesirable results such as duplicate entries. You can disable this feature using the instructions in Automatic Filtering.

Filtered search results are identified in the returned results. For example, the <FI/> XML tag is present in XML search results where automatic document filtering occurs.

Google recommends that the search results page displays a message on the last page similar to the following, when automatic filtering occurs:

In order to show you the most relevant results, we have omitted some entries very similar to the search results already displayed. If you like, you can repeat the search with the omitted results included.

This is the behavior you see in the default output format of the Google Search Appliance.

The underlined text in the message should be a hypertext link to submit the same search again with the parameter filter=0. Google finds that this method of informing users about automatic document filtering is effective. This method is used on the Google Internet search site.

If you are using OneBox modules to provide additional query results to your users, note that the results served through a OneBox module are reported separately. The number of OneBox results are not added to the number of standard results.

Back to top

Appendix B: URL Encoding

Some characters are not safe to use in a URL without first being encoded. Because a Google Search Appliance request is made by using an HTTP URL, the search request must follow URL conventions, including character encoding, where necessary.

The HTTP URL syntax specifies that only alphanumeric characters, the special characters $-_.+!*’(),and the reserved characters ;/?:@=& can be used as values within an HTTP URL request. Since reserved characters are used by the search engine to decode the URL, and some special characters are used to request search features, all non-alphanumeric characters used as a value to an input parameter must be URL-encoded.

To URL-encode a string, replace each non-alphanumeric character with its hexadecimal ASCII value, in the format of a percent sign (%) character followed by two hexadecimal digits. Such an ASCII value may be referred to as an escape code. Spaces can be replaced by the plus sign (+) character for query parameters except when requesting search results by meta name or values.

If you are using the search box on the search appliance, you single-encode the special characters $-.+!*’(). Underscores (_) do not need to be URL-encoded in the search box.

If you are using special characters in a search query, you double-encode the special characters $-.+!*’().

Underscores (_) do not need to be URL-encoded in the search box or in a search query.

Some input parameters require that the values passed to Google search are double-URL-encoded. This requirement means that you must apply the URL encoding to the string twice in succession to generate the final value. See the input parameter descriptions (Search Parameters) for more information.

Special characters in a query are the ones described as query term separators (see Special Characters: Query Term Separators) and meta tags names and values. Special characters within the document content do not get indexed so they are not searchable. For example, an indexed document containing a paragraph ending with “the *end” is not searchable using query “%2Aend” in the GSA search box. Only ‘end’ is indexed.

For more information about URL encoding, see W3C (http://www.w3.org/TR/html401/interact/forms.html#form-content-type) and IETF (http://www.ietf.org/rfc/rfc1738.txt) web sites.

Examples

 

chicken -teriyaki

chicken+%2Dteriyaki

admission form site:www.stanford.edu

admission+form+site%3Awww.stanford.edu

 

William Shakespeare

William%2BShakespeare

admission form site:www.stanford.edu

admission%2Bform%2Bsite%253Awww.stanford.edu

Back to top

Appendix C: Date Formatting

The search appliance recognizes dates in most reasonable formats. However, dates that only mention the year (YY or YYYY), such as 2008, are not used. For dates in the format month year, the date is assumed to be the first of the month. The search appliance currently recognizes most Latin1 month names, but not Chinese, Japanese, or Korean month names.

 

YYYY

All digits in a year

2008

YY

Last two digits of a year

08

YR

All four digits or only the last two digits of the year

YY, YYYY

M

Month represented by one or two digits

9 or 09

D

Day of the month represented by one or two digits

7 or 07

MM

Month represented by two digits

04

DD

Day of the month represented by two digits

07

WK

Day of the week

Monday or Mon

MON

Month

March or Mar

O

The relationship of local time to Universal Time (UT).

O is used in a standard date format that follows ISO/IEC 8824.

O is denoted by a plus sign (+), a minus sign (-), or the letter Z. A minus sign indicates that the local time is ahead of UT; a plus sign, behind UT; and the letter Z, equal to UT.

Pacific Standard Time would be a minus sign because it is ahead of UT.

Acceptable Date Formats

The following table lists date formats that you can use with the Google Search Appliance.

 

YYYY-M-D

Hyphen

2008-2-27

YYYY-D-M

Hyphen

2008-27-2

YYYY.M.D

Period

2008.2.27

YYYY.D.M

Period

2008.27.2

YYYY/M/D

Slash

2008/2/27

YYYY/D/M

Slash

2008/27/2

D-M-YYYY

Hyphen

20-2-2008

M-D-YYYY

Hyphen

2-23-2008

D.M.YYYY

Period

20.2.2008

M.D.YYYY

Period

2.23.2008

D/M/YYYY

Slash

20/2/2008

M/D/YYYY

Slash

2/23/2008

YY-MM-DD

Hyphen

09-04-27

DD-MM-YY

Hyphen

27-04-09

MM-DD-YY

Hyphen

04-27-09

YY.MM.DD

Period

09.04.27

DD.MM.YY

Period

27.04.09

MM.DD.YY

Period

04.27.09

YY/MM/DD

Slash

09/04/27

DD/MM/YY

Slash

27/04/09

MM/DD/YY

Slash

04/27/09

WK, D MON, YR

Comma

Tue, 3 March, 2009

WK, MON D, YR

Comma

Tue, March 3, 2009

D MON, YR

Space and comma

2 Jan, 09

MON YYYY

Space

March 2009

MON D, YR

Space and comma

Mar 03, 09

MON YY

Space

Mar 09

YYYYMMDDHHmm

(none)

200903211642 (see Note 1 below)

YYYYMMDDHH

(none)

2009082116

YYYYMMDD

(none)

20090323

YYYYMM

(none)

200903

YYYY

(none)

2009

DDMMYYYY

(none)

23032009

MMDDYYYY

(none)

03232009

YYMMDD

(none)

090225

DDMMYY

(none)

150209

MMDDYY

(none)

021509

YYYY

(none)

2009

Date Formatting Notes

1.
The YYYYMMDDHH and YYYYMMDDHHmm patterns for specifying dates are supported, however, the search appliance has no notion of sorting search results based on the difference of time in document dates. For example, if a document has a meta tag with a value of 200910212150 and a second document with a value of 200910210900 then the search appliance discards both dates and sets document dates to their modification time (because the YYYYMMDDHHmm format does not get parsed).

To specify rules for dates of documents:

5.
To add more rules, click the Add More Lines button.

Examples of Rules

 

1

www.foo.com/example/

Title

2

www.foo2.com/archives/

URL

3

www.foo.com/

Meta Tag

publication_date

4

www.foo2.com/

Body

5

/

Last Modified

Because the document http://www.foo.com/example/foo.html matches the URL pattern in rule 1, the search appliance first checks for the date in the title of the document. The URL doesn’t match rule 2, so the search appliance checks against rule 3. If the search appliance is unable to find a valid date in the title or the URL, the search appliance looks for the date in the meta tag named publication_date according to rule 3. If the search appliance is unable to find a valid date in the meta tag, the search appliance defaults to the last modified date of the HTTP server, according to rule 5.

The search appliance extracts the date from the http://www.foo2.com/archives/20040605/abc.html URL.

Because the document http://www.foo.com/foo.html does not match the URL pattern in rule 1, the search appliance looks for the date in the meta tag, according to rule 3 and defaults to rule 5 if the search appliance cannot find a valid date in rule 3.

For the document http://www.foo2.com/foo.html, the search appliance looks for the date in the body and defaults to the last-modified date.

For the document http://www.foo3.com/foo.html, the search appliance looks for the date only on the last-modified header as it only matches the URL pattern of rule 5.

Back to top

Appendix D: Compressed Results

The Google Search Appliance supports serving compressed results.

The search appliance serves compressed results to browsers that support compression. The browser must send the following HTTP header to the search appliance:

Accept-Encoding: gzip

The search appliance will then serve compressed results. The browser uncompresses the results.

This applies to both XML and XSLT-transformed results. If the Accept-Encoding: gzip header is not present, the results are not compressed.