Posts tagged sql

Photo URL is broken

Now that I have a substantial number of posts and some good quality content, I often find the need to look back at prior posts. Unfortunately, this has consisted of paging through my blog or clicking on tags. To resolve this, I'm introducing the search feature. To visit the search page, hover over Blog in the header, and click Search Posts.

Probably, the easiest and most effective way to implement search is to use Google Site Search, but I thought that it would be more fun to use PostgreSQL's full text search capabilities. Plus, I'd rather keep my site ad-free, and I don't want to pay. From the title picture, one can see that PostgreSQL full text search has lots of features like the ability to understand grammar, rank matches, and extract the relevant text.

Under the hood, posts are stored as tsvectors.

phillypham::DATABASE=> SELECT setweight(to_tsvector('Post title'), 'A') || setweight(to_tsvector('The_quick_brown_fox_jumps_over_the_lazy_dog'), 'B') AS tsvector FROM posts LIMIT 1;
                                     tsvector                                      
-----------------------------------------------------------------------------------
 'brown':5B 'dog':11B 'fox':6B 'jump':7B 'lazi':10B 'post':1A 'quick':4B 'titl':2A
(1 row)

As you can see, there's the ability to weight the title more heavily than the body.

Now, to make this work properly, I added another column to the posts table. I indexed it and created a trigger to make sure that it's automatically updated. Then, to integrate properly with Sequelize, I added all these queries to an afterSync hook.

=> ALTER TABLE posts ADD COLUMN title_body_tsvector tsvector;
=> UPDATE posts SET title_body_tsvector=setweight(to_tsvector('english', title), 'A') || setweight(to_tsvector('english', body), 'B');
=> CREATE INDEX IF NOT EXISTS title_body_search_idx ON posts USING gin(title_body_tsvector);
=> CREATE OR REPLACE FUNCTION posts_trigger() RETURNS trigger AS $$ begin new.title_body_tsvector := setweight(to_tsvector('english', new.title), 'A') || setweight(to_tsvector('english', new.body), 'B'); return new; end $$ LANGUAGE plpgsql;
=> CREATE TRIGGER posts_update BEFORE INSERT OR UPDATE ON posts FOR EACH ROW EXECUTE PROCEDURE posts_trigger();

See Post.js for more details.

Then, to do a search, I execute a raw query and return it as a promise. The query looks like this:

SELECT id, 
    title, 
    ts_rank_cd(title_body_tsvector,$QUERY,1) AS rank, 
    ts_headline('english', title || ' ' || body,$QUERY, 'MaxWords=100') AS headline 
    FROM posts 
    WHERE published AND title_body_tsvector @@ $QUERY
    ORDER BY rank DESC, id DESC;

The Javascript looks like this:

function(db, tsquery) {                                
    var query = tsquery.indexOf(' ') == -1 ? "to_tsquery('english','" + tsquery + "')" : "plainto_tsquery('english','" + tsquery + "')";
    return db.sequelize.query("SELECT id, title, ts_rank_cd(title_body_tsvector," + query + ",1) AS rank, ts_headline('english', title || ' ' || body," + query + ", 'MaxWords=100') AS headline FROM posts WHERE published AND title_body_tsvector @@ " + query + " ORDER BY rank DESC, id DESC",
    { model: db.Post });
}

Note the ternary operator when defining query. PostgreSQL tsquerys allow searches of arbitrary complexity with nesting and various boolean operators. Unfortunately, most people will not have the syntax memorized, so plainto_tsquery lets one search with more natural syntax. If you get the syntax wrong (e.g., forget to close a parentheses), the error will be handled gracefully thanks to promises.

Try it and let me know what you think!