Full Text Search
Postgres has built-in functions to handle Full Text Search
queries. This is like a "search engine" within Postgres.
Preparation#
For this guide we'll use the following example data:
id | title | author | description |
---|---|---|---|
1 | The Poky Little Puppy | Janette Sebring Lowrey | Puppy is slower than other, bigger animals. |
2 | The Tale of Peter Rabbit | Beatrix Potter | Rabbit eats some vegetables. |
3 | Tootle | Gertrude Crampton | Little toy train has big dreams. |
4 | Green Eggs and Ham | Dr. Seuss | Sam has changing food preferences and eats unusually colored food. |
5 | Harry Potter and the Goblet of Fire | J.K. Rowling | Fourth year of school starts, big drama ensues. |
Usage#
The functions we'll cover in this guide are:
to_tsvector()
#
Converts your data into searchable "tokens". to_tsvector()
stands for "to text search vector". For example:
1select to_tsvector('green eggs and ham') 2 3-- Returns 'egg':2 'green':1 'ham':4
Collectively these tokens are called a "document" which Postgres can use for comparisons.
to_tsquery()
#
Converts a query string into "tokens" to match. to_tsquery()
stands for "to text search query".
This conversion step is important because we will want to "fuzzy match" on keywords. For example if a user searches for "eggs", and a column has the value "egg", we probably still want to return a match.
Match: @@
#
The @@
symbol is the "match" symbol for Full Text Search. It returns any matches between a to_tsvector
result and a to_tsquery
result.
Take the following example:
1select *
2from books
3where title = 'Harry';
The equality symbol above (=
) is very "strict" on what it matches. In a full text search context, we might want to find all "Harry Potter" books and so we can rewrite the
example above:
1select *
2from books
3where to_tsvector(title) @@ to_tsquery('Harry');
Basic Full Text Queries#
Search a single column#
To find all books
where the description
contain the word big
:
1select
2 *
3from
4 books
5where
6 to_tsvector(description)
7 @@ to_tsquery('big');
Search multiple columns#
To find all books
where description
or title
contain the word little
:
1select
2 *
3from
4 books
5where
6 to_tsvector(description || ' ' || title) -- concat columns, but be sure to include a space to separate them!
7 @@ to_tsquery('little');
Match all search words#
To find all books
where description
contains BOTH of the words little
and big
, we can use the &
symbol:
1select
2 *
3from
4 books
5where
6 to_tsvector(description)
7 @@ to_tsquery('little & big'); -- use & for AND in the search query
Match any search words#
To find all books
where description
contain ANY of the words little
or big
, use the |
symbol:
1select
2 *
3from
4 books
5where
6 to_tsvector(description)
7 @@ to_tsquery('little | big'); -- use | for OR in the search query
Notice how searching for big
includes results with the word bigger
(or biggest
, etc).
Creating Indexes#
Now that we have Full Text Search working, let's create an index
. This will allow Postgres to "build" the documents pre-emptively so that they
don't need to be created at the time we execute the query. This will make our queries much faster.
Searchable columns#
Let's create a new column fts
inside the books
table to store the searchable index of the title
and description
columns.
We can use a special feature of Postgres called
Generated Columns
to ensure that the index is updated any time the values in the title
and description
columns change.
1alter table
2 books
3add column
4 fts tsvector generated always as (to_tsvector('english', description || ' ' || title)) stored;
5
6create index books_fts on books using gin (fts); -- generate the index
7
8select id, fts
9from books;
Search using the new column#
Now that we've created and populated our index, we can search it using the same techniques as before:
1select
2 *
3from
4 books
5where
6 fts @@ to_tsquery('little & big');
Query Operators#
Visit PostgreSQL: Text Search Functions and Operators
to learn about additional query operators you can use to do more advanced full text queries
, such as:
Proximity: <->
#
The proximity symbol is useful for searching for terms that are a certain "distance" apart.
For example, to find the phrase big dreams
, where the a match for "big" is followed immediately by a match for "dreams":
1select
2 *
3from
4 books
5where
6 to_tsvector(description) @@ to_tsquery('big <-> dreams');
We can also use the <->
to find words within a certain distance of eachother. For example to find year
and school
within 2 words of each other:
1select
2 *
3from
4 books
5where
6 to_tsvector(description) @@ to_tsquery('year <2> school');
Negation: !
#
The negation symbol can be used to find phrases which don't contain a search term.
For example, to find records that have the word big
but not little
:
1select
2 *
3from
4 books
5where
6 to_tsvector(description) @@ to_tsquery('big & !little');