DeployU
Interviews / Databases / How do you do full-text search in PostgreSQL?

How do you do full-text search in PostgreSQL?

practical Full-text Search Interactive Quiz Code Examples

The Scenario

You are a backend engineer at a social media company. You are building a new service that needs to be able to search for posts that contain a certain keyword.

You could use the LIKE operator to do this, but you know that this would be very inefficient for a large table.

The Challenge

Explain how you would do full-text search in PostgreSQL. What are the key features of PostgreSQL’s full-text search, and how would you use them to solve this problem?

Wrong Approach

A junior engineer might try to solve this problem by using the `LIKE` operator. This would be very inefficient for a large table, and it would not provide a very good search experience.

Right Approach

A senior engineer would know that PostgreSQL has a powerful full-text search engine built-in. They would be able to explain how to use the `tsvector` and `tsquery` data types to do full-text search, and they would have a clear plan for how to use them to solve this problem.

Step 1: Understand the Key Concepts

ConceptDescription
tsvectorA data type that represents a document in a format that is optimized for full-text search.
tsqueryA data type that represents a search query.
to_tsvector()A function that converts a string to a tsvector.
to_tsquery()A function that converts a string to a tsquery.
@@A match operator that returns true if a tsvector matches a tsquery.

Step 2: Create a tsvector Column

The first step is to create a tsvector column on your table.

ALTER TABLE posts ADD COLUMN tsv tsvector;

Step 3: Populate the tsvector Column

The next step is to populate the tsvector column with the content of the posts.

UPDATE posts SET tsv = to_tsvector('english', title || ' ' || body);

You can also use a trigger to automatically update the tsvector column whenever a post is inserted or updated.

Step 4: Create a GIN Index

To speed up full-text search queries, you should create a GIN index on the tsvector column.

CREATE INDEX posts_tsv_idx ON posts USING gin(tsv);

Now you can perform a full-text search using the @@ operator.

SELECT title, body FROM posts WHERE tsv @@ to_tsquery('english', 'keyword');

Practice Question

You want to search for posts that contain either the word 'cat' or the word 'dog'. Which of the following would you use?