The Cyprus Source of Legal Information

Free, independent and non-profit access to Cypriot Law

Sino Search Basics

When you do a Sino search, you as fundamentally searching for documents which contain some words or phrases. If you can come up with a phrase which you think is distinctive enough, just type it in and hit the return key! If you need to find documents containing more than a single word or phrase, things get a little (but not a lot) more complicated.

If you want more than one phrase or word to appear in the retrieved documents, put an and between them. For example, to find documents containing the phrase "moral rights" as well as the word "copyright", you would type: "moral rights and copyright" (less the quotes of course).

If, on the other hand, you want to find one term and/or another one, put an or between them. For example, to find stuff which contains the words "treaty", "convention" or "international agreement" you would search for: "treaty or convention or international agreement". If you wanted to, you could even put these two searches together - as in: "treaty or convention or international agreement and moral rights and copyright".

If you want to find two words or phrases which appear close to each other (for example, the parties to a case), you can use the near connector. If you wanted to find cases where Smith sued (or was sued by) Brown, you might type: "smith near brown".

The rest of this document is a fairly detailed description of how Sino searches documents. If your new to free text searching, you might want to go away and have a play at this point, and come back when you have some questions.

Words and Phrases

Now, let's get technical ... The basic unit of a Sino search is the word. A word is any continuous sequence of alphanumeric characters. Words are case insensitive. All words are searchable other than a relatively small list of common words which is specified for each database. The list of non-searchable words is typically quite small (less than 100 words) and is generally limited to words of little informational content (such as "the", "is", "but" and so forth). Words may be combined into phrases without the need for any special connectors (eg. "pervert the course of justice").

Sino automatically expands searches to match regular English plurals (that is, a search for "treaty" will also match "treaties" and a search for "contract" will match "contracts"). The search parser allows for Unix shell style pattern matching, including the ability to forward truncate (particularly handy for Norwegian!). The following wild cards are recognised:

    *

        matches any string (including null)

    ?

        matches any single character

    [ ... ]

        matches any one of the enclosed characters. A pair of characters separated by a '- ' matches a range of characters (eg [a-c] will match 'a', 'b', or 'c'). If the first character is a '^' or a '!', characters not enclosed are matched (eg [^a-c] will matched anything except 'a', 'b' or 'c'.

The pattern must match an entire word. To search for words containing substrings, use "*substring*". The left square bracket symbol is also used for boolean grouping. Where you wish to start a word with a [ ... ], you need to put the whole word in quotes (eg "[ab]*ing").

As far as is consistent, Sino also supports regular expressions. It will for example, treat the sequence ".*" as "*", ignore '^' and '$' characters and will even deal with agrep's '#' character. The main limitation is that sequences such as "[0-9]*" will not work.

Care should be taken when applying pattern matching to ensure that patterns are not ridiculously wild. The Sino search engine has to combine all of the occurrence information for each matched word with a boolean OR. Patterns such as "*" or even "a*" will lead to rather slow search times!

Boolean and Proximity Operators

Words and phrases may be connected together with boolean and proximity operators to form more complex searches. The operators are borrowed from a number of existing free text retrieval systems. They may be used in any combination and regardless of their heritage.

Boolean AND

The boolean AND operator allows you to identify documents which contain two (or more) words or phrases. It may be written as: "and", '+', '&', "&&" or ';'. Some typical searches are:

    - copyright and material form

    - 18 and crimes act 1900

    - defamation and journalist and newspaper

Where the keyword "and" is used to indicate a boolean AND it has low precedence (like on Lexis) - it is only evaluated after both of its arguments have been fully evaluated. Where it is written in any of the other forms, it has a (more traditionally) higher precedence than a boolean OR. The rationale for this is that OR is usually used for synonyms which ought to group tightly and so giving AND a lower precedence is usually more convenient for free text searching and is less likely to lead neophyte searchers into difficulties.

Boolean OR

The boolean OR operator is used to find documents containing either or both of two terms and is typically used to find synonymous words and phrases. It is written in Sino as: "or", ',', '|' or "||". Examples include:

    - section or s

    - husband or wife or spouse

    - proprietary limited or p l or pty ltd

Boolean NOT

The NOT operator allows you to find documents which contain one thing but not another. It may be written as: "not", '-', or '^'. In practice, this operator is seldom used, but to illustrate:

    - trust not family

    - trade practice act not 51

Proximity Operators

Proximity operators are used to find documents where 2 or more terms appear near each other. Sino indexes documents in terms of where words appear. Consequently, all proximity operators are in terms of word positions. The simplest form of this class of operators is "near" (as used on Info One). This operator requires that words or phrases appear within 50 words of each other. For example:

    - smith near brown

    - 31 near bail act 1900

Although convenient, this operator is obviously a little on the restrictive side. For more flexible proximity searching, you have the choice of Lexis or Status style operators. These take the following forms:

    /n/

        words and phrases must appear within n words of each other (STATUS)

    /m, n/

        words must appear within m to n words of each other (STATUS)

    w/n

        words or phrases must occur within n words of each other (Lexis)

    pre/n

        first word must proceed second by less than n words (Lexis)

For example:

    - smith w/10 brown>

    - smith /10/ brown

    - smith /-10,10/ brown [ All find the word 'smith' within 10 words of 'brown']

    - smith pre/10 brown

    - smith /1,10/ brown [ Both find 'smith' followed by 'brown' up to 10 words later ]

Named Sections (Segments)

Named section (segment) searching takes one of the following forms:

    - section(searchterms)

    - phrase @ section

Standard named sections are title (the html title of a document) and text (everything).

Precedence

Normally searches are evaluated from left to right. This is subject to the following order of operator precedence (highest to lowest):

    - word

    - ( terms) phrase

    - w/n pre/n w/seg /n/ /m, n/ @ name ( terms )

    - or & &&

    - and not ^ || | , ;

You can use parentheses to alter this. Round, square and curly brackets are all recognised. If you need to make any special symbols literal, these should be enclosed in quotes (double, single or back quotes).


www.cylaw.org