Algebraic Data Types In Scala
ObjectOriented Meets Functional
— Scala's punchline, https://scalalang.org
Scala claims to unite objectoriented and functional programming, and offers a rich set features from both worlds.
Developers coming from objectoriented languages—Java in particular—quickly adopt the objectoriented features of Scala but often struggle to find their way around functional programming.
Some aspects of functional programming found their way into objectoriented languages: Higher order functions or combinators like map
and filter
appear in today's C# or Java code, and even a preference for immutable data structures spreads out into conventional objectoriented languages.
But algebraic data types (ADTs) still do not appear in objectoriented programming although these enable the true power of functional programming: Types wellfounded on theory let us model the problem domain in types and thus help us write correctbyconstruction code which provides a higher level of compiletime confidence than possible with the type systems of most objectoriented languages. This article aims to help developers from objectoriented languages understand what it means, and become familiar with the basic set of algebraic data types commonly used in functional programming and its appearance in Scala.
Simple Types
Developers from objectoriented programming languages often conflate “types” with “classes” but in fact represent a much simpler, yet more powerful concept. For this article we define “type” as just a name for a set of values. We can define a Boolean type with standard set notation now:
This type has just two values, the wellknown boolean constants true
and false
.
We can also define more complex types, eg, Int for all integers, positive and negative:
Or a String as sequence of characters where Unicode denotes the set of all unicode code points:
Combined Types
We can combine these simple types in two fundamental ways: Either as a group of values of other types, ie, a product type, or as an alternative between values of different types, ie, a coproduct type or sum type. We can “calculate” with these types just like we can calculate sums and products of numbers, and these types obey similar laws. We say they “form an algebra”, hence the name “algebraic data types”.
Product types
Formally we can define the product of types^{1} T_{1}, T_{2}, … as follows:
C denotes the constructor of the new type. This constructor serves as a “tag” to differentiate between two product types combined of the same types. With the help of this tag we can create two different products of, eg, Boolean and Nat, by giving them different constructors. The following type introduces a generic pair, for example, one of the most basic product types:
We use the common infix notation (v, t) by which pairs appear in almost all contemporary programming languages, from Python to Haskell. Now (42, “Donald Duck”) becomes a value of the type Pair[Int, String]. By using a different constructor we can also give this generic pair a special name, like User with an ID and a name:
We follow the convention to name the constructor like the type, but we can also choose different names for each. This new type combines the same types as Pair[Int, String] but holds different values and thus becomes distinct from a pair. User 42 “Donald Duck” is a value of type User, but not of Pair[Int, String],
Sum types
Like a product combines values of different types at the same time a sum type^{2} provides an alternative between values of different types. Formally we can define a sum of types T_{1}, T_{2}, … as follows:
Again C_{1}, C_{2}, … denote constructors where each constructor lifts another constituent type to the sum type. We can now define the common Either type, as an alternative between two types:
Now we can use the type Either[String, User] to represent the result of finding a user in a database. In case of error we return Left “User not found”—which strictly speaking is of type Either[String, T] for any T—otherwise we return Right user where user is a value of type User.
Types in Scala
In Scala we can already use pairs and tuples—the standard library includes these—and we can also define our own products with case classes:
> (42, "Donald Duck")
res0: (Int, String) = (42,Donald Duck)
> final case class User(id: Int, name: String)
> User(42, "Donald Duck")
res1: User = User(42,Donald Duck)
Scala also supports sum type, but lacks a syntax for these.
A sum type like Either
looks straightforward in Haskell:
data Either a b = Left a  Right b
The same definition in Scala looks considerable more noisy^{3}:
sealed trait Either[L, R]
case class Left[L, R](value: L) extends Either[L, R]
case class Right[L, R](value: R) extends Either[L, R]
This illustrates the common pattern to define sum types in Scala: In the absence of firstlevel support for sum types we must exploit subtyping to achieve the effect of sum types.
We define the type itself as a sealed trait.
The sealed
keyword forces us to define all subtypes in the same file as the trait and thus allows the Scala compiler to subsequently warn if a pattern match over the type does not match all subtypes (“exhaustiveness check”)^{4}.
Then we define each branch of the sum as distinct subclass of the trait, and use the corresponding type parameter as type for the value of either side.
This use of subtyping has important implications for the ergonomics of the type which we will cover in the section after the next.
Shapes of generic types
If we look at the types in the previous section we notice some similiarity between product types. A Pair[Int, String] and our User type have the same shape: Apart from their constructors they have the same values. In fact constructors exist just to introduce an “artificial” distinction between otherwise equal sums or products of types, and thus allow us to give different names to the same type to aid understanding of our programs. But we can “omit” the constructors and abstract over the shape of these types.
This leads us to the famous [Shapeless][] library for Scala which provides types to abstract over the shape of algebraic data types.
At the core of Shapeless lies a special HList
type for a heterogenous list—a list where each element has a different type:
In other words, each alternative of a sum type becomes a distinct type.
The expression Left("Foo")
has type Left[String]
for an arbitrary R
, not Either[String, R]
.
We can then widen Left[String]
to Either[Left, R]
by invoking the subtype relation.
The compiler automatically widens covariant types^{5} but often we do not wish for automatic widening; in particular automatic widening complicates implicit search which in turn impacts type class resolution so libraries like cats and scalaz made most of their types invariant to prevent automatic widening to subtypes.
With invariant types we can find ourselves in a situation where the compiler infers the subtype, ie, a sumtype variant like Left
, but needs the base type, ie, Either
.
The following (simplified) example with OptionT
and EitherT
—both invariant^{6}—does not compile, for instance:
import cats.data._
import cats.Id
sealed trait Sum
case object A extends Sum
val res: EitherT[Id, Sum, Int] = for {
id < OptionT.pure[Id](42).toRight(A)
} yield id
The compiler complains:
type mismatch;
found : cats.data.EitherT[cats.Id,A.type,Int]
required: cats.data.EitherT[cats.Id,Sum,Int]
Note: A.type <: Sum, but class EitherT is invariant in type A.
You may wish to define A as +A instead. (SLS 4.5)
We must explicitly downcast to Sum
with a type ascription to make the code compile:
id < OptionT.pure[Id](42).toRight(A : Sum)
This issue appears frequently and thus impacts the ergonomics of sum types in Scala, in particular when it causes much worse and less understandable compiler errors than the one above.
It appears so frequently that cats and scalaz even have their own family of functions to help with subtyping.
The Functor
type also provides widen
to widen the type of the functor argument:
> A.some.widen
res10: Option[A.type] = Some(A)
> A.some.widen[Sum]
res11: Option[Sum] = Some(A)
Likewise BiFunctor
(a functor with two “sides”, like Either
) also has a leftWiden
which we can use instead of the type ascription above^{7}:
id < OptionT.pure[Id](42).toRight(A).leftWiden
While these functions provide some convenience to cope with subtyping, all in all ergonomics of sum types often falls short of what other functional languages like Haskell—which lack subtyping—can provide.
Summary
Product types combine different types into a new type; sum types describe an alternative of other types. In Scala the former appear as simple and wellknown case classes, whereas the latter have a more intricate representation as subtypes of sealed traits or classes. In some cases this subtyping in sum types interferes with type inference and invariance which makes ADTs in Scala less ergonomic than in other languages like Haskell. Algebraic data types have similar shapes, and we can abstract over these shapes to write generic programs over all kinds of sum or product types. The famous shapeless library provides the necessary infrastructure for this abstraction—in particular a heterogeneous list type as generalization of tuples and generic types to convert concrete algebraic types into the corresponding heterogeneous list.
We hope that this article helped you understand how algebraic data types work, how they appear in Scala, and what the shapeless library contributes to generic programming.

The name “product type” originates from a branch of mathematics called “category theory” a denotes a fundamental way to combine objects in a category. A product type resembles this combination with regards to programming language types, hence the name. We spare category theory in this article—Products and Coproducts from Category Theory for Programmers by Bartosz Milewski provides a gentle introduction to products and coproducts in category theory—but a look at the size of (the set of values of) a product type in relation to the sizes its constituent types gives an intuition about the meaning of the name “product type”. We observe that the size of the product type in fact equals the product of the sizes of its constituent types: V(P) = V(T_{1}) · V(T_{2}) · … ↩︎

We observe that the size of a sum type follows a similar analogy as the size of a product type. A sum type has just as many values as the sum of its constituent types: V(S) = V(T_{1}) + V(T_{2}) + …. The more formal name coproduct originates from category theory as well. The prefix co indicates that category theory considers a sum in some ways as the opposite of a product. ↩︎

We simplify the definition of
Either
for this article. In particular we omit all methods onEither
and elide variance annotations. For the actual definition see scala.util.Either. ↩︎ 
Instead of a trait we can use a sealed abstract class—in case we need to pass values to the base class, because Scala does not support trait parameters yet. Dotty will add these to Scala. ↩︎

The article Of variance and functors provides an indepth explanation of variance and its effects in types. ↩︎

We cannot use
Either
in this example—historically Scala madeEither
covariant. ↩︎ 
This code makes use of partial unification, see Cats FAQ. ↩︎