Posts tagged sql
Now that I have a substantial number of posts and some good quality content, I often find the need to look back at prior posts. Unfortunately, this has consisted of paging through my blog or clicking on tags. To resolve this, I'm introducing the search feature. To visit the search page, hover over Blog
in the header, and click Search Posts.
Probably, the easiest and most effective way to implement search is to use Google Site Search, but I thought that it would be more fun to use PostgreSQL's full text search capabilities. Plus, I'd rather keep my site ad-free, and I don't want to pay. From the title picture, one can see that PostgreSQL full text search has lots of features like the ability to understand grammar, rank matches, and extract the relevant text.
Under the hood, posts are stored as tsvector
s.
phillypham::DATABASE=> SELECT setweight(to_tsvector('Post title'), 'A') || setweight(to_tsvector('The_quick_brown_fox_jumps_over_the_lazy_dog'), 'B') AS tsvector FROM posts LIMIT 1;
tsvector
-----------------------------------------------------------------------------------
'brown':5B 'dog':11B 'fox':6B 'jump':7B 'lazi':10B 'post':1A 'quick':4B 'titl':2A
(1 row)
As you can see, there's the ability to weight the title more heavily than the body.
Now, to make this work properly, I added another column to the posts
table. I indexed it and created a trigger to make sure that it's automatically updated. Then, to integrate properly with Sequelize, I added all these queries to an afterSync
hook.
=> ALTER TABLE posts ADD COLUMN title_body_tsvector tsvector;
=> UPDATE posts SET title_body_tsvector=setweight(to_tsvector('english', title), 'A') || setweight(to_tsvector('english', body), 'B');
=> CREATE INDEX IF NOT EXISTS title_body_search_idx ON posts USING gin(title_body_tsvector);
=> CREATE OR REPLACE FUNCTION posts_trigger() RETURNS trigger AS $$ begin new.title_body_tsvector := setweight(to_tsvector('english', new.title), 'A') || setweight(to_tsvector('english', new.body), 'B'); return new; end $$ LANGUAGE plpgsql;
=> CREATE TRIGGER posts_update BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE posts_trigger();
See Post.js for more details.
Then, to do a search, I execute a raw query and return it as a promise. The query looks like this:
SELECT id,
title,
ts_rank_cd(title_body_tsvector,$QUERY,1) AS rank,
ts_headline('english', title || ' ' || body,$QUERY, 'MaxWords=100') AS headline
FROM posts
WHERE published AND title_body_tsvector @@ $QUERY
ORDER BY rank DESC, id DESC;
The Javascript looks like this:
function(db, tsquery) {
var query = tsquery.indexOf(' ') == -1 ? "to_tsquery('english','" + tsquery + "')" : "plainto_tsquery('english','" + tsquery + "')";
return db.sequelize.query("SELECT id, title, ts_rank_cd(title_body_tsvector," + query + ",1) AS rank, ts_headline('english', title || ' ' || body," + query + ", 'MaxWords=100') AS headline FROM posts WHERE published AND title_body_tsvector @@ " + query + " ORDER BY rank DESC, id DESC",
{ model: db.Post });
}
Note the ternary operator when defining query
. PostgreSQL tsquery
s allow searches of arbitrary complexity with nesting and various boolean operators. Unfortunately, most people will not have the syntax memorized, so plainto_tsquery
lets one search with more natural syntax. If you get the syntax wrong (e.g., forget to close a parentheses), the error will be handled gracefully thanks to promises.
Try it and let me know what you think!