A Type-Safe Database Query DSL for Scala
December 21st, 2008 | In Scala, ScalaQueryWhen the topic of JDBC wrappers came up on the Scala mailing list a few weeks ago, I mentioned that…
I’d prefer a type-safe DSL for database queries but that’s more
complicated. I’m still experimenting with that, too![]()
I think those experiments worked rather well and the code is now in a state where the major features are usable and I can show some example code which compiles and runs successfully as a proof of concept. Many features (like more SQL operators, more data types and column aliasing) are missing but at least I am confident that it is possible to implement them on this basis.
Just like HaskellDB and LINQ to SQL, I am using a query monad which collects the projections and restrictions on database tables in a composable way to build an SQL statement for a query while producing the correct type for that statement’s result in Scala’s type system. Don’t worry if that sounds too abstract, there will be plenty of example code.
The basic scaffolding for the examples looks like this:
import java.lang.Integer
import com.novocode.squery._
import com.novocode.squery.Implicit._
import com.novocode.squery.session._
import com.novocode.squery.session.SessionFactory._
object SQuery2Test2 {
def main(args: Array[String]) {
val sf = new DriverManagerSessionFactory(
"org.h2.Driver", "jdbc:h2:mem:test1")
sf withSession {
// Database access code goes here
}
}
}
The session._ and SessionFactory._ imports provide a thin wrapper around JDBC connections with session management and query string caching. I am using the H2 Database Engine here because it is very easy to set up for a local in-memory database (just include a single JAR in the classpath and use the driver name and URL from above). SessionFactory contains an implicit session object that can be used in a withSession block. The squery package contains the actual classes and traits needed for the type-safe queries. I will explain the implicit conversions from Implicit._ as we go along.
For every table we want to use in a query, we need a corresponding Table object. Let’s define two tables for Users of some shopping application and the Orders they placed:
object Users extends Table[(Integer, String, String)]("users") {
def id = intColumn("id", O.AutoInc, O.NotNull)
def first = stringColumn("first")
def last = stringColumn("last")
def * = id ~ first ~ last
}
object Orders extends Table[(Integer, Integer, String)]("orders") {
def userID = intColumn("userID", O.NotNull)
def orderID = intColumn("orderID", O.AutoInc, O.NotNull)
def product = stringColumn("product")
def * = userID ~ orderID ~ product
}
Like every object which can be used in or returned from a query, a table is parameterized with the type of the values it represents. This is always a tuple of individual column types, in our case Integers and Strings (note the use of java.lang.Integer instead of Int; more about that later). In this respect SQuery (as I’ve named it for now) is closer to HaskellDB than to LINQ because Scala (like most languages) does not give you access to an expression’s AST at runtime. In LINQ you can write queries using the real types of the values and columns in your database and have the query expression’s AST translated to SQL at runtime. Without this option we have to use meta-objects like Table and Column to build our own AST from these.
The rest of the table definitions should be quite obvious if you’ve ever worked with an SQL database. Each table gets a name (”users” and “orders”) and the individual columns consist of a name, a type (depending on the method by which they are created, like intColumn and stringColumn) and options like AUTO_INCREMENT and NOT NULL.
Each table must define the special “*” column which is a projection of all of the table’s columns in their preferred order. The type of this projection is statically enforced to be the same as the table’s (i.e. a Table[T] contains a “*” method of type ConvertableColumn[T]. There are separate projection classes for different numbers of columns just like Scala has different tuple types. You can use the projection operator “~” to build projections incrementally:
trait SimpleColumn[T] extends ConvertableColumn[T] {
def ~[U](b: SimpleColumn[U]) = new Projection2[T, U](this, b)
}
sealed trait Projection[T <: Product]
extends ConvertableColumn[T] with Product { ... }
final class Projection2[T1,T2](
override val _1: SimpleColumn[T1],
override val _2: SimpleColumn[T2])
extends Tuple2(_1,_2) with Projection[(T1,T2)] {
def ~[U](c: SimpleColumn[U]) = new Projection3(_1,_2,c)
}
//... and so on with Projection3, Projection4, etc.
Before we get to the queries, we should define our tables in the database. In practice you will probably build the tables first in the database and then write the required meta-objects but it can also be done directly from the Table objects:
Users.createTable Orders.createTable
The createTable method is not defined on the tables but on the DDLInvoker class which can be instantiated automatically with an implicit conversion:
object Implicit {
implicit def tableToDDLInvoker[T](t: Table[T]): DDLInvoker[T] =
new DDLInvoker(t)
...
}
class DDLInvoker[T](table: Table[T]) {
lazy val createTableStatement =
new DDLBuilder(table).buildCreateTable
def createTable(implicit session: Session) = ...
}
createTable uses the implicit session I mentioned earlier. If you prefer to make it explicit you can leave out the SessionFactory._ import and use the alternate withSession method:
sf withSession { session =>
Users.createTable(session)
Orders.createTable(session)
}
All other database access methods work the same way. I will always use the implicit version in the examples from now on.
You can also print out the generated SQL statements:
> println(Users.createTableStatement) CREATE TABLE users (id INT NOT NULL AUTO_INCREMENT,first VARCHAR,last VARCHAR) > println(Orders.createTableStatement) CREATE TABLE orders (userID INT NOT NULL,orderID INT NOT NULL AUTO_INCREMENT,product VARCHAR)
Let’s insert some users into the database:
val ins1 = (Users.first ~ Users.last).insert("Homer", "Simpson")
val ins2 = (Users.first ~ Users.last).insertAll(
("Marge", "Simpson"),
("Apu", "Nahasapeemapetilon"),
("Carl", "Carlson"),
("Lenny", "Leonard") )
val ins3 = Users.first.insertAll(
"Santa's Little Helper",
"Snowball" )
println("Inserted "+(ins1+ins2+ins3)+" users")
You can use any single column or projection of columns to insert values of the corresponding type (which is a tuple for projections). All columns of a projection must be from the same table but this is not statically enforced.
Now that we have some content in the database, we can perform queries on it. This is about the simplest SQL query you can write:
SELECT * from users
You can do the same in Scala with a for-comprehension:
val q1 = for(u <- Users) yield u
This is equivalent to
val q1 = Users map { u => u }
There is no map method defined on tables but the table gets lifted automatically to a query with another implicit conversion:
object Implicit {
implicit def tableToQuery[T <: TableBase.T_](t: T) =
Query(t.withOp(new ColumnOp.BaseTableQueryOp(t)))
...
}
The mapping itself is just an identity mapping, so we might as well have written the query as tableToQuery(Users) to get a Query[Users] without any restrictions.
Note the use of t.withOp(...) in tableToQuery. What we want to get from tableToQuery is a node for our query AST which indicates a uniquely named table. This is needed for queries which use multiple instances of the same table, for example to get the latest version of every document in a table documents(id, version, content) you could use:
SELECT content FROM documents t1 WHERE t1.version = ( SELECT max(t2.version) FROM documents t2 WHERE t2.id = t1.id )
new BaseTableQueryOp(t) provides such a unique table instance but on the outside it doesn’t “look like” the table it wraps. We need an instance of the same type as the original table which is marked as being a BaseTableQueryOp but otherwise behaves like that table. This is what the trait WithOp provides in a rather un-functional and hackish way by cloning the receiver object and setting the clone’s op field to the given object (the actual AST node):
trait WithOp[O >: Null] extends Cloneable {
def withOp(op: O): this.type = {
val t = clone
t._op = op
t
}
private[WithOp] var _op: O = _
final def op: O = _op
override def clone(): this.type =
super.clone.asInstanceOf[this.type]
}
object WithOp {
def unapply[O >: Null](w: WithOp[O]) =
if(w.op == null) None else Some(w.op)
}
(Unlike this version the current implementation in SQuery is not generic because of bug #1434.)
We can print out the SQL statement for the query like this:
> println(q1.selectStatement) SELECT t1.id,t1.first,t1.last FROM users t1
This is just what you’d expect (taking into account that SQuery does not use “*” and aliases all tables). The query does not contain a selectStatement field but again there’s an implicit conversion to a QueryInvoker with that field:
object Implicit {
implicit def queryToQueryInvoker[T](q: Query[ConvertableColumn[T]]): QueryInvoker[T,T] = QueryInvoker(q)
...
}
class QueryInvoker[T,R] private (q: Query[ConvertableColumn[T]], mapper: T => R) {
lazy val selectStatement = new QueryBuilder(q).buildSelect
def first(implicit session: Session): Option[R] = ...
def list(implicit session: Session): List[R] = ...
def foreach(f: R => Unit)(implicit session: Session): Unit = ...
def elements(implicit session: Session): CloseableIterator[R] = ...
def mapResult[U](f: (R => U)) = new QueryInvoker[T,U](q, { v:T => f(mapper(v)) })
}
object QueryInvoker {
def apply[T](q: Query[ConvertableColumn[T]]) = new QueryInvoker[T,T](q, { v:T => v })
}
Note that you cannot convert just any Query[_], it has to be a Query[ConvertableColumn[_]]. This is the main reason for separating queries and invokers. The same implicit conversion allows you to use a foreach loop to iterate through all elements selected by a query:
> for(t <- q1) println("User tuple: "+t)
User tuple: (1,Homer,Simpson)
User tuple: (2,Marge,Simpson)
User tuple: (3,Apu,Nahasapeemapetilon)
User tuple: (4,Carl,Carlson)
User tuple: (5,Lenny,Leonard)
User tuple: (6,Santa's Little Helper,null)
User tuple: (7,Snowball,null)
The mapResult method can be used to apply a mapping function to every row which is read from the database. Unlike a projection in the query which gets translated to SQL and executed on the database server this mapping is applied on the client side. The Query class has two type parameters for the type returned by the database query and the mapped type. Let’s define a case class for User objects and use a mapped QueryInvoker to read a list of them from the database:
case class User(id: Integer, first: String, last: String)
val allUsers = q1.mapResult {
case (id,f,l) => User(id,f,l)
}.list
for(u <- allUsers) println("User object: "+u)
If function argument lists and tuples get unified in a future version of Scala, it should be possible to write that mapping simply as “q1.mapResult(User)“.
It is time we looked at a more complex query with a non-trivial projection and a restriction (a WHERE clause in SQL). The following code can be used to find Apu’s last name and ID:
val q2 = for(u <- Users where {_.first is "Apu" }) yield u.last ~ u.id
println("Apu's last name and ID are: " + q2.first)
Compare this with a similar snippet which selects the same values from a List[User] containing all data from the database:
val q2 = for(u <- users if u.first == "Apu") yield (u.last, u.id)
println("Apu's last name and ID are: " + q2.firstOption)
where is a method on Query, so the Users object gets lifted to a query of a new unique instance of the Users table and then where maps this table to a Column[Boolean] which is added to the query as a restriction. Note that you have to use “is” instead of “==” (and “isNot” instead of “!=”) for column expressions because the latter is defined on all objects and cannot be overwritten oder overloaded in a suitable way. The string “Apu” is lifted to a ConstColumn[String] by another implicit conversion:
implicit def stringToConstColumn(v: String) = new ConstColumn(v) with StringColumn
Instead of returning the complete Users tuple from the query we use the projection “yield u.last ~ u.id” to extract only the last name and the ID. The call to first consequently returns an Option[(String,Integer)]. The SQL statement for q2 is:
SELECT t1.last,t1.id FROM users t1 WHERE (t1.first='Apu')
At the moment all constants are inserted directly into the SQL statement. A production-grade library should of course use bind variables by default, with inlined values available as an option.
We need data in the Orders table for the next query, so let’s create some rows. We insert two random orders for every user except Apu and Snowball:
for(u <- allUsers
if u.first != "Apu" && u.first != "Snowball"; i <- 1 to 2)
Orders.insert(u.id, null, "Gizmo "+((Math.random*10)+1).toInt)
So far the queries have only used a single table instance. They can be written in terms of map and filter without making use of the monadic structure of the Query class. Adding a second table to the query gives us a natural way of expressing a join:
val q3 = for {
u <- Users
o <- Orders where { o => (u.id is o.userID) && (u.last isNot null) }
} yield (u.first ~ u.last ~ o.orderID ~ o.product).sortBy(u.first)
println("All Orders by Users with a last name by first name:")
q3.foreach(o => println(" "+o))
Note the use of the “&&” operator in the restriction. This is not the usual boolean “and” but an operator on two Column[Boolean] objects yielding another boolean column. Another new feature seen in this query is the sortBy method in the projection. The SQL statement for this query is:
SELECT t1.first,t1.last,t2.orderID,t2.product FROM users t1, orders t2 WHERE ((t1.id=t2.userID) and (t1.last is not null)) ORDER BY t1.first
Apart from joins, there are two more ways of combining queries: Unions and subqueries. Unions should map nicely to monadic plus operations with the empty query being the zero value but I haven’t tried that yet. Subqueries are used by constructing a special column from a query. This is already implemented as can be seen in the following example:
val q4 = for (
u <- Users;
o <- Orders
where { o => o.orderID is
queryToSubQuery(for {
o2 <- Orders where(o.userID is _.userID)
} yield o2.orderID.max) }
where { _.userID is u.id }
) yield u.first ~ o.orderID
println("Latest Order per User:")
q4.foreach(o => println(" "+o))
The queryToSubQuery function should ideally be implicit. It’s currently not because of bug #1579 which prevents all implicit conversions to various invoker classes from working if queryToSubquery is also implicit.
The subquery is used to find the maximum of a column for each separate value in another column. This common pattern can be extracted into a combinator:
def maxOfPer[T <: TableBase.T_]
(c: T, m: (T => Column[Integer]), p: (T => Column[_])) =
c where { o => m(o) is queryToSubQuery(
for { o2 <- c where( n => p(o) is p(n)) } yield m(o2).max
) }
We can now write the query with maxOfPer like this:
val q4b = for (
u <- Users;
o <- maxOfPer[Orders.type](Orders, _.orderID, _.userID)
where { _.userID is u.id }
) yield u.first ~ o.orderID
In either case, the generated SQL statement is:
SELECT t1.first,t2.orderID FROM orders t2, users t1 WHERE (t2.userID=t1.id) AND (t2.orderID=( SELECT max(t3.orderID) FROM orders t3 WHERE (t2.userID=t3.userID)) )
Another way to create a subquery implicitly is the in operator (and its negation notIn). To find all users which haven’t placed an order we can use:
val q5 = Users where { _.id notIn Orders.map(_.userID) }
println("Users without Orders:")
q5.foreach(o => println(" "+o))
Any query which returns a single table can be used to delete rows with a DeleteInvoker:
println("q5: " + q5.deleteStatement)
println("Deleting them...")
val deleted = q5.delete
println("Deleted "+deleted+" rows")
And finally, we can count the rows selected by that statement (which should of course be 0 after deleting them):
val q6 = q5.map(_.count)
println("q6: " + q6.selectStatement)
println("Users without Orders left: " + q6.first.get)
As I already mentioned at the beginning of this post (which is getting much longer than I anticipated), I am using java.lang.Integer instead of Int for the integer columns. The underlying problem is nullability. While you can specify which columns are created nullable in the database (by placing or omitting the NotNull option), they are all treated as nullable in SQuery and thus they cannot use value types like Int. Alternatively, you could take the same approach as HaskellDB and use value types (or references types which may never be null for types like String) for the non-nullable columns and an Option of the same type for their nullable counterparts. That works fine for all the features I’ve shown so far but it makes typechecking of outer joins difficult (or even impossible) because an outer join has to selectively promote non-nullable columns to nullable ones. I didn’t want to give up outer joins so easily, so I chose to make everything nullable for now.
There’s no public repository for SQuery yet but you can download the Eclipse project as a ZIP file. Just add the H2 JAR (not included) to the classpath to run the supplied example class test.SQuery2Test2 which contains all the queries from this post.
December 22nd, 2008 at 7:55 pm
Great DSL! Very useful!
January 2nd, 2009 at 4:29 am
It would awesome, if you can set a code repository somewhere (github??).
January 13th, 2009 at 8:58 pm
Hi — I second the vote for a public repo; I’m always reluctant to contribute to tarballs because I don’t know how out of date the source I’m working with is. Also, if it’s not too late to make a breaking change to the API, and assuming there’s not a regional spelling with which I’m unfamiliar, the word is “Convertible” not “-able”. Thanks much, I am enthusiastic about a typesafe query language.
January 13th, 2009 at 11:23 pm
I’m using Subversion at home and at work but I think a distributed system would be better suited for this. I’ve already installed git and looked into GitHub several weeks ago. As soon as I find the time to figure out how to use it, I’ll set up a public repo. I’ll also fix the spelling. I’m sure it won’t be the only breaking change.
February 7th, 2009 at 2:32 am
This looks super. Please consider this comment another friendly request for a public repository =)
February 9th, 2009 at 11:20 pm
[...] Wrapping JDBC (Note: referenced blog assumes fairly mature Scala knowledge.) [...]
February 21st, 2009 at 3:32 pm
Public repo now available. See next post for details.
June 10th, 2009 at 12:01 am
Why did you opt for a “where” method, instead of taking advantage of “if” though a filter method on query? If you overloaded filter to receive either a function into Boolean or a function into Column[Boolean] (if I understand how “where” works), then it could be used to work either as a filter over an iterator, or a query modifier.
June 10th, 2009 at 12:15 am
Another idea… if you could make a Scala code generator from DB schema, for the table objects, it would be pretty cool. One could get an existing database (or one generated by other tools) up and running that much quicker.
I imagine an interpreter session, where I have an sf session and just do:
sf withSession { generateScaffold (List(”Users”, “Orders”), “myTables.scala”) }
:load myTables.scala
sf withSession { … }
BTW, I forgot to remark on it, but I loved the idea of a type-safe database framework.
June 15th, 2009 at 9:00 pm
@Daniel: Initially I wanted to use filter() for conditions but I changed it to a separate where() method shortly before I posted this article. The filter(… => Boolean) method is needed if you want to write “Join(u,o) < - Users join Orders" instead of "uo <- Users join Orders; val Join(u,o) = uo" because the compiler creates a filter() call for such a pattern. If I added a filter(... => Column[Boolean]) method for conditions, an error like if(u.id != 42) instead of if(u.id isNot 42) would go unnoticed. Of course, currently I have neither of those methods. I’ll have to see which one to add back when unions and more useful joins are implemented.
And yes, generating the table classes from a DB schema would certainly be nice to have but it’s not one of my top priorities at the moment. Patches are welcome, if someone else wants to have a crack at it
September 14th, 2009 at 3:03 pm
@Daniel @Stefan Just came across this. I think using where is clearer than if – it implies that it will be part of the query and not a condition that is applied afterwards. It also leaves open the possibility of using if instead of where when you know that the you want to do the filtering client-side instead of server-side. Just my 2p worth
@Daniel This is awesome
I’ve been thinking that Scala must have some potential for doing really neat things to tidy up database access in a nicer way than Hibernate does
October 30th, 2009 at 9:43 am
[...] В Scala LINQ может быть сделано на основе плагина компилятора: http://www.sts.tu-harburg.de/people/mi.garcia/ScalaQL/ Подобный подход применим и к JPA. http://szeiger.de/blog/2008/12/21/a-type-safe-database-query-dsl-for-scala/#more-24 [...]
November 10th, 2009 at 9:37 pm
This is pretty cool. I’m surprised this didn’t get more attention/momentum into a complete solution. What’s missing?
November 10th, 2009 at 10:57 pm
@ctran: Scala 2.8 is missing. ScalaQuery has matured enough to for an official 1.0 release. I’m just waiting for Scala 2.8.0.
August 20th, 2010 at 8:16 am
[...] 用于JDBC 类型安全查询的一个Scala版LINQ,顺便再了解下DBC。 [...]
March 14th, 2011 at 3:51 pm
[...] > 用于JDBC类型安全查询的一个Scala版LINQ,顺便再了解下DBC。 [...]