Archive for the ‘ScalaQuery’ Category

Type-Level Computations with Functional Dependencies

Sunday, August 14th, 2011

It has been a long time since my last blog post about ScalaQuery, and in the meantime I have announced new features in my Twitter stream and in the release notes but now I need some more space to explain a refactoring I have been working on which should eventually lead to ScalaQuery 0.10.

Recap

In ScalaQuery 0.9, there is a simple relationship between the types of queries and the types of values they return: A Query[ColumnBase[T]] selects values of type T, so e.g. calling .list on such a query gives you a List[T]. You can create Query objects of types other than ColumnBase[T] forSome { type T } but the implicit conversion to a StatementInvoker (which allows you to call methods such as .list) is not available for them.

Let’s look at the different subtypes of ColumnBase that we can use in queries. First of all, all tables extend ColumnBase, so that Table[T] <: ColumnBase[T].

  object MyTable extends Table[(Int, String, String)]("MY_TABLE") {
    def a = column[Int]("A")
    def b = column[String]("B")
    def c = column[String]("C")
    def * = a ~ b ~ c
  }

  val q1 = Query(MyTable)
  // q1: Query[MyTable]
  val res1 = q1.list
  // res1: List[(Int, String, String)]

And, as implied by the name ColumnBase, Column[T] <: ColumnBase[T], so we can query for individual columns:

  val q2 = for {
    t <- MyTable // automatically lifted to a Query
  } yield t.a
  // q2: Query[Column[Int]]
  val res2 = q2.list
  // res2: List[Int]

So what about selecting multiple columns? The most natural notation would be the same as when you’re querying a Scala collection instead of a database:

  val q3 = for {
    t <- MyTable
  } yield (t.a, t.b)
  // q3: Query[(Column[Int], Column[String])]

But this won’t work. We get a Query[(ColumnBase[A], ColumnBase[B])] but what we need is a Query[ColumnBase[(A, B)]] — wrapped in the opposite order! This ColumnBase[Tuple2[A,B]] is called a Projection2[A,B] in ScalaQuery (actually, a Projection2[A,B] is both a ColumnBase[(A,B)] and a (Column[A], Column[B])). You construct it with the ~ operator:

  val q3b = for {
    t <- MyTable
  } yield t.a ~ t.b
  // q3b: Query[Projection2[Int, String]]
  val res3 = q3b.list
  // res3: List[(Int, String)]

Type-Level Transformation

For the sake of simplicity, instead of the implicit conversions and Invokers we will assume a method list() which takes a Query and returns a list of the results. At the moment, it would look like this:

  def list[T](q: Query[ColumnBase[T]]): List[T]

We need to generalize it to:

  def list[T, U](q: Query[T]): List[U]

where U is a transformed version of T, which we will write as T =>> U. So what kinds of transformations do we need? The existing functionality is covered by:

  ColumnBase[T] =>> T

For tuples, as used in q3, we need another rule:

  (T1, T2) =>> (U1, U2) where T1 =>> U1 and T2 =>> U2

Actually, we need one of those for all arities from 2 to 22:

  (T1, T2, T3) =>> (U1, U2, U3) where T1 =>> U1 and T2 =>> U2 and T3 =>> U3
  ...and so on

Note that these rules are recursive. They enable the transformation of arbitrarily nested tuples.

Just one more rule before we’re done: ScalaQuery’s Projection classes can only wrap Columns, so any primitive value is lifted implicitly:

  t.a ~ t.b ~ 42
  // : Projection3[Int, String, Int]
  //   == (Column[Int], Column[String], Column[Int])

If we allow the use of Tuples instead of Projections, we do not get this implicit lifting anymore, so we need to allow primitive values, too:

  T =>> T where a TypeMapper[T] is available

We can use functional dependencies to have the Scala compiler do the required transformation by turning =>> into a trait, every “where” condition into an implicit parameter, and every rule into an implicit method:

  trait =>> [-T, U]

  implicit def unpackColumnBase[T]: ColumnBase[T] =>> T = null
  implicit def unpackTuple2[T1, T2, U2, U2](implicit u1: T1 =>> U1, u2: T2 =>> U2): (T1, T2) =>> (U1, U2) = null
  ...
  implicit def unpack[T : TypeMapper]: T =>> T = null

In order to avoid ambiguities, we need to prioritize the implicits. We also add a nice error message in case a bad query type is used:

  @implicitNotFound(msg = "Don't know how to unpack ${T} (to ${U})")
  trait =>> [-T, U]

  object =>> extends LowPriority_=>> {
    implicit def unpackColumnBase[T]: ColumnBase[T] =>> T = null
  }

  trait LowPriority_=>> {
    implicit def unpackTuple2[T1, T2, U2, U2](implicit u1: T1 =>> U1, u2: T2 =>> U2): (T1, T2) =>> (U1, U2) = null
    ...
    implicit def unpack[T : TypeMapper]: T =>> T = null
  }

And finally, the list() method gets an implicit parameter to compute the U type.

  def list[T, U](q: Query[T])(implicit u: T =>> U): List[U]

That’s it, basically. We can now call list() with a Query of any nesting of tuples and columns and it will remove all ColumnBase type constructors for the result type. We still need to extract a value of the right type from the query result at run-time. That can be done with a method in an implementation of =>> in a straight-forward way. For the purpose of computing the type at compile-time, the implementation does not matter, so we just used null above.

Another Transformation

There’s still a catch though. Take a look at the following queries:

  val q4a = for {
    a ~ b ~ c <- MyTable.where(_.a === 1).map(t => t.a ~ t.b ~ 4) unionAll
                 MyTable.where(_.a === 2).map(t => t.a ~ t.b ~ 5)
  } yield a ~ b ~ (c*2)

  val q4b = for {
    (a, b, c) <- MyTable.where(_.a === 1).map(t => (t.a, t.b, 4)) unionAll
                 MyTable.where(_.a === 2).map(t => (t.a, t.b, 5))
  } yield a ~ b ~ (c*2)

A unionAll operation takes multiple Queries of the same type and produces another Query of that type which represents their union. The values that come out of this operation need to encode the union as a ScalaQuery AST node. In q4a (old style), this value has the type Projection3[Int, String, Int] which is (Column[Int], Column[String], Column[Int]). The literal values 4 and 5 have been lifted to ConstColumns and the union operation can be properly encoded into the three resulting Columns.

In q4b (new style), the unionAll produces a (Column[String], Column[Int], Int) tuple and we have no way of encoding the union operation into the Int value! To solve this problem, union operations need to return a different type than that of their operands: One where all primitive values have been lifted to Columns, i.e. the opposite of the =>> transformation:

  @implicitNotFound(msg = "Don't know how to reify ${Packed} (to ${Reified})")
  sealed trait Reify[-Packed, Reified]

  object Reify extends ReifyLowPriority {
    implicit final def reifyColumnBase[T <: ColumnBase[_]]: Reify[T, T] = null
  }

  trait ReifyLowPriority {
    implicit final def reifyPrimitive[T](implicit tm: TypeMapper[T]): Reify[T, ConstColumn[T]] = null
    implicit final def reifyTuple2[T1,T2, U1,U2, R1,R2](implicit u1: Reify[T1, R1], u2: Reify[T2, R2]): Reify[(T1,T2), (R1,R2)] = null
    ...
  }

Limitations

The actual reification at run-time is done in =>>, so it is essential that the =>> and Reify implementation match. It would be nicer to encode the reification directly in =>> but this fails due to a limitation in Scala’s type inferencer. Compare the unpacking and reification for ColumnBase:

  implicit def unpackColumnBase[T]: ColumnBase[T] =>> T
  implicit final def reifyColumnBase[T <: ColumnBase[_]]: Reify[T, T]

In both cases, we get a type like A <: ColumnBase[B] but one operation has to capture the A and the other one the B (we need the actual sub-type of ColumnBase for the reification to preserve table types). There is no way (that I know of) of doing both together. Exposing the type parameter through a type alias in ColumnBase does not help. It allows us to write the unpacking in the desired style but T#Alias is always inferred as Any:

  implicit def unpackColumnBase[T <: ColumnBase[_]]: T =>> T#Alias

A second problem is caused by Scala issue SI-3346: Fundep implicits do not work on implicit conversions. I have been able to work around this in some cases but in others the situation looks hopeless. As long is this issue is not resolved, some extra methods calls for explicit unpacking might be needed.

You can find the work in progress in ScalaQuery’s nested-tuples-query2 branch.

New ScalaQuery Features

Monday, December 21st, 2009

It’s been almost 3 months since my last summary of new features in ScalaQuery. I implemented some important changes in the following weeks but I only had little time to work on ScalaQuery after my extended one-month vacation ended in October.

Build system

Builds against newer versions of Scala 2.8 have been going well thanks to sbt 0.6 which separates the Scala version used by the build system from the version used by the project being built. You need to use at least version 0.6.7 of the new xsbt launcher to build ScalaQuery. Other dependencies are downloaded automatically by sbt. I wrote an implementation of sbt’s test interface for JUnit which you can find here (binaries are here on scala-tools.org). It allows you to run ScalaQuery’s JUnit test cases from sbt with the test command.

All published artifacts now contain the Scala version in the artifact ID. You can find the latest ScalaQuery snapshot in scala-tools.org’s snapshots repository as:

groupId:
com.novocode
artifactId:
scala-query_2.8.0.Beta1-RC4
version:
1.0.0-SNAPSHOT

JDBC escape syntax

ScalaQuery now uses JDBC escape syntax and scalar functions where possible for better portability between DB engines:

  • String column concatenation (with the ++ operator) and the startsWith and endsWith methods on String columns have been moved up to BasicProfile.
  • Escape characters for LIKE expressions are added to the queries.
  • You can add other functions to be called with the {fn ...} syntax through SimpleScalarFunction() and SimpleScalarFunction.nullary().
  • Various scalar functions are supported: CONVERT, USER, DATABASE, CURDATE, CURTIME, PI, MOD, ABS, CEILING, FLOOR, SIGN, DEGREES, RADIANS, IFNULL, UCASE, LCASE, LTRIM, RTRIM.
  • Literals for Date, Time and Timestamp values use the JDBC escape syntax.
  • Explicit inner, left outer, right outer and full outer joins are supported. For example:
  val q = for {
    Join(c,p) <- Categories innerJoin Posts on (_.id is _.category)
    _ <- Query orderBy p.id
  } yield p.id ~ c.id ~ c.name ~ p.title

Complex inserts

Queries and columns can now be inserted into tables (using SQL’s INSERT...SELECT syntax). This allows you to use scalar functions when inserting individual records and to copy data produced by a query. For example:

  val q2 = for(s <- Src1 if s.id <= 2) yield s
  Dst2.insert(q2)

Various changes

  • String columns are created as VARCHAR(254) by default (except in H2 where no size is needed).
  • The operators === and != can be used instead of is and isNot. They have the proper precedence when used with other operators like && and ||.

A ScalaQuery Update

Sunday, September 27th, 2009

Since my last post about ScalaQuery one month ago I have wasted quite some time trying to implement the OptionMapper type (which is used to make operations on columns interoperable between base types and Option types) in a way which does not require 2^n implicit functions and 2*n+2 type parameters for operations with n operands. The composable OptionMapper itself is not hard to write (see NewOptionMapper here) but it requires more type fidelity than ScalaQuery offers at the moment: Either Projections need to preserve the subtypes of Columns they are storing, or the required types have to be reconstructed from implicit values where they are needed. You can find my failed attempts in the projection-types and new-option-mapper branches. The first one might actually work once Scala bug #2346 is resolved. The second one would require a much more powerful type inferencer. For now, I just added an OptionMapper3 for 3 parameters (which is needed by the BETWEEN operator).

There are also some interesting new features in ScalaQuery:

Mapped Projections

You can now create a bidirectional mapping of a Projection with the <> operator. This is especially useful for a table’s “*” projection. The <> operator takes a function from the projection tuple type T (either as a tuple or as multiple parameters) to the mapped type R, and a function from R back to Some[T]. The asymmetry with the unnecessary Some wrapper is deliberate, matching exactly the apply und unapply methods of a case class. For example, here is a simple User class and table:

  case class User(first: String, last: String)

  object Users extends Table[User]("users") {
    def first = column[String]("first", O NotNull)
    def last = column[String]("last", O NotNull)
    def * = first ~ last <> (User, User.unapply _)
  }

Whenever you perform an insert, update or query directly on such a table (instead of a projection of individual columns), you can use the mapped type:

  Users.insert(User("Stefan", "Zeiger"))
  val someUsers: List[User] = Users.where(_.first startsWith "S").list

Finders

The new convenience method Table.createFinderBy allows you to create a simple finder query template which selects values from a table by matching on a single column, e.g.:

  val findUserByFirstName = Users.createFinderBy(_.first)

This is equivalent to:

  val findUserByFirstName = for {
    first <- Parameters[String]
    u <- Users if u.first is first
  } yield u

Updatable ResultSets

JDBC allows you to modify a ResultSet which was created with the CONCUR_UPDATABLE result set concurrency. ScalaQuery now offers a high-level interface to this functionality with the mutate method, e.g.:

  val q1 = for(u <- Users if u.last.is("Simpson") || u.last.is("Bouvier")) yield u
  q1.mutate { m =>
    if(m()._3 == "Bouvier") m() = m().copy(_3 = "Simpson")
    else if(m()._2 == "Homer") m.delete()
    else if(m()._2 == "Bart") m.insert((None, "Lisa", "Simpson"))
  }

Statement Options

The new ResultSetType, ResultSetConcurrency and ResultSetHoldability classes allow you to request the corresponding JDBC options when ScalaQuery creates a PreparedStatement, e.g.:

  myDB withSession {
    ResultSetType.ScrollInsensitive {
      // All database calls in this block use TYPE_SCROLL_INSENSITIVE
    }
  }

Or if you don't want to use the implicit threadLocalSession:

  myDB withSession { s1 =>
    ResultSetType.ScrollInsensitive(s1) { s2 =>
      // All database calls on s2 use TYPE_SCROLL_INSENSITIVE
    }
  }

The default for all three options is Auto which gives you values suitable for the invoker method you are calling (e.g. ResultSetConcurrency.Updatable for mutate and ResultSetConcurrency.ReadOnly for all other methods).

Build System

I have rewritten most of the test classes as proper unit tests (using JUnit 4) with assertions. I'd like to investigate ScalaTest and specs further (possibly together with ScalaCheck) in the future to replace JUnit but the binary incompatibilities between different Scala 2.8 snapshot builds would only complicate the build process right now. If you build with sbt, you still need to download a suitable Scala 2.8 snapshot yourself and set the paths in the properties because the build is done in forking mode. This also means that you cannot use the "test" command to run the unit tests from sbt at the moment.

H2 and JUnit are declared as test dependencies in the sbt project definition and the Eclipse build path refers to the local copies in lib_managed. If you want to build ScalaQuery with Eclipse (as I usually do), just run "sbt update" once to pull in the required JARs.

I am using FreeMarker templates and the FMPP tool to create repetitive code for the Projection and Parameters classes. This is not yet integrated into the sbt build process. You can use the supplied Eclipse launcher instead (after downloading FMPP and pointing the launcher to the correct path). The generated sources are checked in, so you only need to do this if you want to change the templates and rebuild the Scala sources.

Profiles, Drivers & More

Thursday, August 27th, 2009

It’s been three weeks already since my last post about ScalaQuery so I’d like to provide an update on some recent changes.

Profiles & Drivers

The biggest news is that ScalaQuery now supports database engine-specific features. Feature sets can be bundled in profiles which are then implemented by drivers (Of course, a driver can also add driver-specific features which are not part of a profile). All previously existing features are now part of the BasicProfile which should work on all SQL databases with little or no customization. The BasicProfile provides:

  • An object called Implicit with all implicit conversions which are required to use the profile’s features (including an implicit value of type BasicProfile which points to the driver implementing the profile).
  • Some methods for creating various statements and query templates. They have default implementations provided by the profile but can be overridden by other profiles or drivers. You do not usually call them directly.
  • A number of TypeMapperDelegate objects for the basic types supported by the profile. You still use TypeMappers defined outside the profile but they do not implement the actual type mapping any more. Instead they ask the profile for a delegate implementation. (Alternatively, I could have moved the existing TypeMapper objects entirely into the profile but mappers for some basic types like Int and Boolean are used internally at many places. They would need to be threaded through several methods and constructors, thus complicating the implementation unnecessarily.)

The BasicProfile is implemented by the BasicDriver.

There is also a new ExtendedProfile for features which are expected to be available in every commonly used SQL database system, albeit with a non-standard syntax. It currently provides string concatenation with the ++ operator (plus startsWith and endsWith methods which can be implemented with LIKE and string concatenation) and the take and drop methods needed for pagination.

If you only need to support a single database engine in your application, you can forget about profiles and just import a specific driver’s implicit conversions. Supporting multiple drivers is almost as simple: Give your DAO class (or whatever you have in your application design) a parameter of the required profile type and import this profile’s implicits. You can then call it with any driver which implements the profile. For example:

  def test(profile: ExtendedProfile) {
    import profile.Implicit._
    println("Using driver: "+profile.getClass.getName)
    val q1 = Users.where(_.name startsWith "quote ' and backslash \\").take(5)
    println(q1.selectStatement)
    val q2 = Users.where(_.name startsWith "St".bind).drop(10).take(5)
    println(q2.selectStatement)
    val q3 = Query(42 ~ "foo")
    println(q3.selectStatement)
    println
  }

  def main(args: Array[String]) {
    test(H2Driver)
    test(OracleDriver)
    test(MySQLDriver)
  }

This prints the different statements required for H2, Oracle and MySQL:

Using driver: com.novocode.squery.combinator.extended.H2Driver$
SELECT t1.name FROM users t1 WHERE (t1.name like 'quote '' and backslash \%') LIMIT 5
SELECT t1.name FROM users t1 WHERE (t1.name like (? || '%')) LIMIT 5 OFFSET 10
SELECT 42,'foo'

Using driver: com.novocode.squery.combinator.extended.OracleDriver$
SELECT * FROM (SELECT t1.name FROM users t1 WHERE (t1.name like 'quote '' and backslash \%')) WHERE ROWNUM <= 5
SELECT * FROM (SELECT t0.*, ROWNUM ROWNUM_O FROM (t1.name,ROWNUM ROWNUM_I FROM users t1 WHERE (t1.name like (? || '%'))) t0) WHERE ROWNUM_O BETWEEN (1+10) AND (10+5) ORDER BY ROWNUM_I
SELECT 42,'foo' FROM DUAL

Using driver: com.novocode.squery.combinator.extended.MySQLDriver$
SELECT t1.name FROM users t1 WHERE (t1.name like 'quote \' and backslash \\%') LIMIT 5
SELECT t1.name FROM users t1 WHERE (t1.name like concat(?,'%')) LIMIT 10,5
SELECT 42,'foo' FROM DUAL

Updates

You can now take a simple query (returning only named columns from a single table; no modifiers except WHERE restrictions) and call it as an UPDATE statement:

  val q = for(u <- Users if u.id is 42) yield u.first ~ u.last
  q.update("foo", "bar")

Filtering

You may already have noticed in my previous post that the Query class now has a filter method which works like where for functions returning a Column[Boolean] or Column[Option[Boolean]], so you can use Scala’s standard if clauses in for-comprehensions on queries. Unlike where, filter also accepts functions returning a plain Boolean value to enable the use of refutable patterns in queries (e.g. when destructuring a Join).

Invokers

Result type remapping and parameter application are now available for all Invokers. These features were pushed down from the statement invokers for monadic queries into the invoker framework. Query templates are parameterized invokers now. They can be applied like any other invoker.

Efficient Parameterized Queries in ScalaQuery

Thursday, August 6th, 2009

One question about ScalaQuery which keeps coming up is that of perfomance. The question is usually in the form “How does it compare to JDBC?” but that’s like comparing apples and, well, apple trees. After all, ScalaQuery is a layer on top of JDBC which provides mainly two things:

  • A nicer, more Scala-like way of handling database connections, performing queries and reading result sets. This is not optional when you access a database in your application. You will wrap SQL statement execution and result set reading in some way or another to abstract from the low-level JDBC API. Anyway, the overhead here is quite low.
  • A way of composing queries with an internal DSL based on a query monad and combinators. That’s the part I want to talk about in this post.

If you want to optimize the query generation away, you can always fall back to the StaticQuery and DynamicQuery classes. These work a bit like iBATIS, except your SQL code is embedded directly in your Scala code and not in some XML files. But you’re still writing SQL! That’s nice for the special cases which are not covered by the combinator queries and which do not need to be composable but it’s probably not the reason why you want to use ScalaQuery in the first place.

When you’re constructing a query in the query monad, you normally have some variables which are used in the query like this:

def userNameByID(id: Int) =
  for(u <- Users if u.id is id) yield u.first

This would insert the user ID directly into the SQL statement, thus requiring a new statement to be generated by ScalaQuery and parsed and optimized by the database server (which can be very expensive) for every invocation of the query. We can improve this by using a bind variable for the user ID:

def userNameByID(id: Int) =
  for(u <- Users if u.id is id.bind) yield u.first

Now the generated SQL statement is always the same (e.g. “SELECT t1.first FROM users t1 WHERE (t1.id=?)“), so the database server needs to calculate an execution plan for it only once and can reuse it on subsequent invocations. But the query is still constructed as a tree of dozens of objects and compiled to the same SQL statement every time you call userNameByID.

This can be remedied with query templates, a recent addition to ScalaQuery:

val userNameByID = for {
  id <- Parameters[Int]
  u <- Users if u.id is id
} yield u.first

If you recall how for-comprehensions are desugared, this gets translated to Parameters[Int].apply(...).flatMap(id => ...). The apply method on the Parameters object takes an implicit TypeMapper for each parameter type you specify and creates a Parameters instance. The flatMap method on Parameters takes a function which creates a Query and returns a QueryTemplate for it:

final class QueryTemplate[P, R](query: Query[ColumnBase[R]]) {
  def apply(param: P) = new AppliedQueryTemplate(built, param, query.value)
  lazy val built = QueryBuilder.buildSelect(query, NamingContext())
}

final class Parameters[P, C](c: C) {
  def flatMap[F](f: C => Query[ColumnBase[F]]): QueryTemplate[P, F] =
    new QueryTemplate[P, F](f(c))
  ...
}

object Parameters {
  def apply[P1](implicit tm1: TypeMapper[P1]) =
    new Parameters[P1, Column[P1]](new ParameterColumn(-1, tm1))
  ...
}

Note that, unlike Query, neither Parameters nor QueryTemplate is a monad. You cannot compose multiple parameter lists or query templates this way but by providing a suitable flatMap method, the parameters can be used as the first generator in a for-comprehension which otherwise operates in the Query monad.

Either one of the userNameByID functions/methods defined above can be used in the same way:

for(t <- userNameByID(3)) println(t)

When you apply the parameters to a QueryTemplate, you get an AppliedQueryTemplate which can be lifted to a suitable Invoker by an implicit conversion (just like a Query). The first time you do this, the SQL code gets generated and then cached in the QueryTemplate for further applications.

If you specify more than one type parameter, the Parameters generator gives you a Projection instead of a single Column. It can be unpacked either with the extractor on the “~” object:

val userNameByIDRangeAndProduct = for {
  min ~ max ~ product <- Parameters[Int, Int, String]
  u <- Users if u.id >= min &&
    u.id <= max &&
    Orders.where(o => (u.id is o.userID) && (o.product is product)).exists
} yield u.first

…or with the Projection extractor (which is slightly more efficient but not as nice to read):

val userNameByIDRangeAndProduct = for {
  Projection(min, max, product) <- Parameters[Int, Int, String]
  u <- Users if u.id >= min &&
    u.id <= max &&
    Orders.where(o => (u.id is o.userID) && (o.product is product)).exists
} yield u.first

ScalaQuery’s invoker framework provides methods for invoking queries with parameters but these are currently used by the simple queries only. I expect to integrate query templates with this system in the future.


Close
E-mail It