SQL Joins

Persistent touts itself as a database-agnostic interface. How, then, are you supposed to do things which are inherently backend-specific? This most often comes up in Yesod when you want to join two tables together. There are some pure-Haskell solutions that are completely backend-agonistic, but there are also more efficient methods at our disposal. In this chapter, we’ll introduce a common problem you might want to solve, and then build up more sophisticated solutions.

Multi-author blog

Since blogs are a well understood problem domain, we’ll use that for our problem setup. Consider a blog engine that allows you to have multiple authors in the database, and each blog post will have a single author. In Persistent, we may model this as:

  1. Author
  2. name Text
  3. Blog
  4. author AuthorId
  5. title Text
  6. content Html

Let’s set up our initial Yesod application to show a blog post index indicating the blog title and the author:

  1. {-# LANGUAGE EmptyDataDecls #-}
  2. {-# LANGUAGE FlexibleContexts #-}
  3. {-# LANGUAGE GADTs #-}
  4. {-# LANGUAGE GeneralizedNewtypeDeriving #-}
  5. {-# LANGUAGE MultiParamTypeClasses #-}
  6. {-# LANGUAGE OverloadedStrings #-}
  7. {-# LANGUAGE QuasiQuotes #-}
  8. {-# LANGUAGE TemplateHaskell #-}
  9. {-# LANGUAGE TypeFamilies #-}
  10. {-# LANGUAGE ViewPatterns #-}
  11. import Control.Monad.Logger
  12. import Data.Text (Text)
  13. import Database.Persist.Sqlite
  14. import Yesod
  15. share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
  16. Author
  17. name Text
  18. Blog
  19. author AuthorId
  20. title Text
  21. content Html
  22. |]
  23. data App = App
  24. { persistConfig :: SqliteConf
  25. , connPool :: ConnectionPool
  26. }
  27. mkYesod "App" [parseRoutes|
  28. / HomeR GET
  29. /blog/#BlogId BlogR GET
  30. |]
  31. instance Yesod App
  32. instance YesodPersist App where
  33. type YesodPersistBackend App = SqlBackend
  34. runDB = defaultRunDB persistConfig connPool
  35. instance YesodPersistRunner App where
  36. getDBRunner = defaultGetDBRunner connPool
  37. getHomeR :: Handler Html
  38. getHomeR = do
  39. blogs <- runDB $ selectList [] []
  40. defaultLayout $ do
  41. setTitle "Blog posts"
  42. [whamlet|
  43. <ul>
  44. $forall Entity blogid blog <- blogs
  45. <li>
  46. <a href=@{BlogR blogid}>
  47. #{blogTitle blog} by #{show $ blogAuthor blog}
  48. |]
  49. getBlogR :: BlogId -> Handler Html
  50. getBlogR _ = error "Implementation left as exercise to reader"
  51. main :: IO ()
  52. main = do
  53. -- Use an in-memory database with 1 connection. Terrible for production,
  54. -- but useful for testing.
  55. let conf = SqliteConf ":memory:" 1
  56. pool <- createPoolConfig conf
  57. flip runSqlPersistMPool pool $ do
  58. runMigration migrateAll
  59. -- Fill in some testing data
  60. alice <- insert $ Author "Alice"
  61. bob <- insert $ Author "Bob"
  62. insert_ $ Blog alice "Alice's first post" "Hello World!"
  63. insert_ $ Blog bob "Bob's first post" "Hello World!!!"
  64. insert_ $ Blog alice "Alice's second post" "Goodbye World!"
  65. warp 3000 App
  66. { persistConfig = conf
  67. , connPool = pool
  68. }

That’s all well and good, but let’s look at the output:

Authors appear as numeric identifiers

SQL Joins - 图1

All we’re doing is displaying the numeric identifier of each author, instead of the author’s name. In order to fix this, we need to pull extra information from the Author table as well. Let’s dive in to getting that done.

Database queries in Widgets

I’ll address this one right off the bat, since it catches many users by surprise. You might think that you can solve this problem in the Hamlet template itself, e.g.:

  1. <ul>
  2. $forall Entity blogid blog <- blogs
  3. $with author <- runDB $ get404 $ blogAuthor
  4. <li>
  5. <a href=@{BlogR blogid}>
  6. #{blogTitle blog} by #{authorName author}

However, this isn’t allowed, because Hamlet will not allow you to run database actions inside of it. One of the goals of Shakespearean templates is to help you keep your pure and impure code separated, with the idea being that all impure code needs to stay in Haskell.

But we can actually tweak the above code to work in Yesod. The idea is to separate out the code for each blog entry into a Widget function, and then perform the database action in the Haskell portion of the function:

  1. getHomeR :: Handler Html
  2. getHomeR = do
  3. blogs <- runDB $ selectList [] []
  4. defaultLayout $ do
  5. setTitle "Blog posts"
  6. [whamlet|
  7. <ul>
  8. $forall blogEntity <- blogs
  9. ^{showBlogLink blogEntity}
  10. |]
  11. showBlogLink :: Entity Blog -> Widget
  12. showBlogLink (Entity blogid blog) = do
  13. author <- handlerToWidget $ runDB $ get404 $ blogAuthor blog
  14. [whamlet|
  15. <li>
  16. <a href=@{BlogR blogid}>
  17. #{blogTitle blog} by #{authorName author}
  18. |]

We need to use handlerToWidget to turn our Handler action into a Widget action, but otherwise the code is straightforward. And furthermore, we now get exactly the output we wanted:

Authors appear as names

SQL Joins - 图2

Joins

If we have the exact result we’re looking for, why isn’t this chapter over? The problem is that this technique is highly inefficient. We’re performing one database query to load up all of the blog posts, then a separate query for each blog post to get the author names. This is far less efficient than simply using a SQL join. The question is: how do we do a join in Persistent? We’ll start off by writing some raw SQL:

  1. getHomeR :: Handler Html
  2. getHomeR = do
  3. blogs <- runDB $ rawSql
  4. "SELECT ??, ?? \
  5. \FROM blog INNER JOIN author \
  6. \ON blog.author=author.id"
  7. []
  8. defaultLayout $ do
  9. setTitle "Blog posts"
  10. [whamlet|
  11. <ul>
  12. $forall (Entity blogid blog, Entity _ author) <- blogs
  13. <li>
  14. <a href=@{BlogR blogid}>
  15. #{blogTitle blog} by #{authorName author}
  16. |]

We pass the rawSql function two parameters: a SQL query, and a list of additional parameters to replace placeholders in the query. That list is empty, since we’re not using any placeholders. However, note that we’re using ?? in our SELECT statement. This is a form of type inspection: rawSql will detect the type of entities being demanded, and automatically fill in the fields that are necessary to make the query.

rawSql is certainly powerful, but it’s also unsafe. There’s no syntax checking on your SQL query string, so you can get runtime errors. Also, it’s easy to end up querying for the wrong type and end up with very confusing runtime error messages.

Esqueleto

Currently, esqueleto is not included in a recent LTS Haskell, and therefore using it may require some extra work, not covered by the book.

Persistent has a companion library- Esqueleto- which provides an expressive, type safe DSL for writing SQL queries. It takes advantage of the Persistent types to ensure it generates valid SQL queries and produces the results requested by the program. In order to use Esqueleto, we’re going to add some imports:

  1. import qualified Database.Esqueleto as E
  2. import Database.Esqueleto ((^.))

And then write our query using Esqueleto:

  1. getHomeR :: Handler Html
  2. getHomeR = do
  3. blogs <- runDB
  4. $ E.select
  5. $ E.from $ \(blog `E.InnerJoin` author) -> do
  6. E.on $ blog ^. BlogAuthor E.==. author ^. AuthorId
  7. return
  8. ( blog ^. BlogId
  9. , blog ^. BlogTitle
  10. , author ^. AuthorName
  11. )
  12. defaultLayout $ do
  13. setTitle "Blog posts"
  14. [whamlet|
  15. <ul>
  16. $forall (E.Value blogid, E.Value title, E.Value name) <- blogs
  17. <li>
  18. <a href=@{BlogR blogid}>#{title} by #{name}
  19. |]

Notice how similar the query looks to the SQL we wrote previously. One thing of particular interest is the ^. operator, which is a projection. blog ^. BlogAuthor, for example, means “take the author column of the blog table.” And thanks to the type safety of Esqueleto, you could never accidentally project AuthorName from blog: the type system will stop you!

In addition to safety, there’s also a performance advantage to Esqueleto. Notice the returned tuple; it explicitly lists the three columns that we need to generate our listing. This can provide a huge performance boost: unlike all other examples we’ve had, this one does not require transferring the (potentially quite large) content column of the blog post to generate the listing.

For the record, it’s possible to achieve this with rawSql as well, it’s just a bit trickier.

Esqueleto is really the gold standard in writing SQL queries in Persistent. The rule of thumb should be: if you’re doing something that fits naturally into Persistent’s query syntax, use Persistent, as it’s database agnostic and a bit easier to use. But if you’re doing something that would be more efficient with a SQL-specific feature, you should strongly consider Esqueleto.

Streaming

There’s still a problem with our Esqueleto approach. If there are thousands of blog posts, then the workflow will be:

  1. Read thousands of blog posts into memory on the server.

  2. Render out the entire HTML page.

  3. Send the HTML page to the client.

This has two downsides: it uses a lot of memory, and it gives high latency for the user. If this is a bad approach, why does Yesod gear you towards it out of the box, instead of tending towards a streaming approach? Two reasons:

  • Correctness: imagine if there was an error reading the 243rd record from the database. By doing a non-streaming response, Yesod can catch the exception and send a meaningful 500 error response. If we were already streaming, the streaming body would simply stop in the middle of a misleading 200 OK respond.

  • Ease of use: it’s usually easier to work with non-streaming bodies.

The standard recommendation I’d give someone who wants to generate listings that may be large is to use pagination. This allows you to do less work on the server, write simple code, get the correctness guarantees Yesod provides out of the box, and reduce user latency. However, there are times when you’ll really want to do a streaming response, so let’s cover that here.

Switching Esqueleto to a streaming response is easy: replace select with selectSource. The Esqueleto query itself remains unchanged. Then we’ll use the respondSourceDB function to generate a streaming database response, and manually construct our HTML to wrap up the listing.

  1. getHomeR :: Handler TypedContent
  2. getHomeR = do
  3. let blogsSrc =
  4. E.selectSource
  5. $ E.from $ \(blog `E.InnerJoin` author) -> do
  6. E.on $ blog ^. BlogAuthor E.==. author ^. AuthorId
  7. return
  8. ( blog ^. BlogId
  9. , blog ^. BlogTitle
  10. , author ^. AuthorName
  11. )
  12. render <- getUrlRenderParams
  13. respondSourceDB typeHtml $ do
  14. sendChunkText "<html><head><title>Blog posts</title></head><body><ul>"
  15. blogsSrc $= CL.map (\(E.Value blogid, E.Value title, E.Value name) ->
  16. toFlushBuilder $
  17. [hamlet|
  18. <li>
  19. <a href=@{BlogR blogid}>#{title} by #{name}
  20. |] render
  21. )
  22. sendChunkText "</ul></body></html>"

Notice the usage of sendChunkText, which sends some raw Text values over the network. We then take each of our blog tuples and use conduit’s map function to create a streaming value. We use hamlet to get templating, and then pass in our render function to convert the type-safe URLs into their textual versions. Finally, toFlushBuilder converts our Html value into a Flush Builder value, as needed by Yesod’s streaming framework.

Unfortunately, we’re no longer able to take advantage of Hamlet to do our overall page layout, since we need to explicit generate start and end tags separately. This introduces another point for possible bugs, if we accidentally create unbalanced tags. We also lose the ability to use defaultLayout, for exactly the same reason.

Streaming HTML responses are a powerful tool, and are sometimes necessary. But generally speaking, I’d recommend sticking to safer options.

Conclusion

This chapter covered a number of ways of doing a SQL join:

  • Avoid the join entirely, and manually grab the associated data in Haskell. This is also known as an application level join.

  • Write the SQL explicitly with rawSql. While somewhat convenient, this loses a lot of Persistent’s type safety.

  • Use Esqueleto’s DSL functionality to create a type-safe SQL query.

  • And if you need it, you can even generate a streaming response from Esqueleto.

For completeness, here’s the entire body of the final, streaming example:

  1. {-# LANGUAGE EmptyDataDecls #-}
  2. {-# LANGUAGE FlexibleContexts #-}
  3. {-# LANGUAGE GADTs #-}
  4. {-# LANGUAGE GeneralizedNewtypeDeriving #-}
  5. {-# LANGUAGE MultiParamTypeClasses #-}
  6. {-# LANGUAGE OverloadedStrings #-}
  7. {-# LANGUAGE QuasiQuotes #-}
  8. {-# LANGUAGE TemplateHaskell #-}
  9. {-# LANGUAGE TypeFamilies #-}
  10. {-# LANGUAGE ViewPatterns #-}
  11. import Control.Monad.Logger
  12. import Data.Text (Text)
  13. import qualified Database.Esqueleto as E
  14. import Database.Esqueleto ((^.))
  15. import Database.Persist.Sqlite
  16. import Yesod
  17. import qualified Data.Conduit.List as CL
  18. import Data.Conduit (($=))
  19. share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
  20. Author
  21. name Text
  22. Blog
  23. author AuthorId
  24. title Text
  25. content Html
  26. |]
  27. data App = App
  28. { persistConfig :: SqliteConf
  29. , connPool :: ConnectionPool
  30. }
  31. mkYesod "App" [parseRoutes|
  32. / HomeR GET
  33. /blog/#BlogId BlogR GET
  34. |]
  35. instance Yesod App
  36. instance YesodPersist App where
  37. type YesodPersistBackend App = SqlBackend
  38. runDB = defaultRunDB persistConfig connPool
  39. instance YesodPersistRunner App where
  40. getDBRunner = defaultGetDBRunner connPool
  41. getHomeR :: Handler TypedContent
  42. getHomeR = do
  43. let blogsSrc =
  44. E.selectSource
  45. $ E.from $ \(blog `E.InnerJoin` author) -> do
  46. E.on $ blog ^. BlogAuthor E.==. author ^. AuthorId
  47. return
  48. ( blog ^. BlogId
  49. , blog ^. BlogTitle
  50. , author ^. AuthorName
  51. )
  52. render <- getUrlRenderParams
  53. respondSourceDB typeHtml $ do
  54. sendChunkText "<html><head><title>Blog posts</title></head><body><ul>"
  55. blogsSrc $= CL.map (\(E.Value blogid, E.Value title, E.Value name) ->
  56. toFlushBuilder $
  57. [hamlet|
  58. <li>
  59. <a href=@{BlogR blogid}>#{title} by #{name}
  60. |] render
  61. )
  62. sendChunkText "</ul></body></html>"
  63. getBlogR :: BlogId -> Handler Html
  64. getBlogR _ = error "Implementation left as exercise to reader"
  65. main :: IO ()
  66. main = do
  67. -- Use an in-memory database with 1 connection. Terrible for production,
  68. -- but useful for testing.
  69. let conf = SqliteConf ":memory:" 1
  70. pool <- createPoolConfig conf
  71. flip runSqlPersistMPool pool $ do
  72. runMigration migrateAll
  73. -- Fill in some testing data
  74. alice <- insert $ Author "Alice"
  75. bob <- insert $ Author "Bob"
  76. insert_ $ Blog alice "Alice's first post" "Hello World!"
  77. insert_ $ Blog bob "Bob's first post" "Hello World!!!"
  78. insert_ $ Blog alice "Alice's second post" "Goodbye World!"
  79. warp 3000 App
  80. { persistConfig = conf
  81. , connPool = pool
  82. }