Schema Language

Record Types

Pegasus Records contain any number of fields, which may be any pegasus type, including primitives, enums, unions, maps and arrays.

For example, a basic record type containing a few fields

namespace org.example

import org.example.time.DateTime

record Example {
  field1: string
  field2: int?
  field3: DateTime
}

will be generated as:

/** A simple record */
case class Example(field1: String, field2: Option[Int], field3: DateTime)

Record Fields may be optional or may have default values.

Schema Field Generated Scala
field: string case class Record(field: String)
field: string = "message" case class Record(field: String = "message")
field: string? case class Record(field: Option[String] = None)

Doc Strings

Types and fields may be documented using “doc strings”.

/**
 * Doc strings may be added to types. This doc should describe the purposes
 * of the Example type.
 *
 * Doc strings may be formatted using
 * [Markdown](https://daringfireball.net/projects/markdown/).
 */
record Example {
  /**
   * Doc strings may also be added to fields.
   */
   field: string
}

Deprecation

Types and fields may be deprecated.

@deprecated("Use record X instead.")
record Example {
  @deprecated("Use field x instead.")
  field: string
}

Including fields

Records may include fields from other records using "include":

record WithIncluded {
  ...AnotherRecord
}

In pegasus, field inclusion does not imply inheritance, it is merely a convenience to reduce duplication when writing schemas.

Record Backward Compatibility

The backward compatibility rules for records are:

Compatible changes:

  • Adding an optional fields
  • Adding a field with a default (required or optional)

When accessing fields:

  • Unrecognized fields must be ignored.
  • Fields with defaults should always be written, either with the desired value or the default value.
  • The default value for a field should be assumed if the field is absent and is needed by the reader.

Primitive Types

The Pegasus primitive types are: int, long, float, double, boolean, string and bytes.

Schema Type Scala Type Example JSON data
“int” Int 100
“long” Long 10000000
“float” Float 3.14
“double” Double 2.718281
“boolean” Boolean true
“string” String “coursera”
“bytes” ByteString “\u0001\u0002”

A ‘null’ type also exists, but should generally be avoided in favor of optional fields.

Array Type

Pegasus Arrays are defined with a items type using the form:

array[org.example.Fortune]

Arrays bind to the Scala IndexedSeq type:

IndexedSeq[Fortune]

Under the hood, Courier generates a class FortuneArray extends IndexedSeq[Fortune] type and then provides an implicit conversion from Traversable[Fortune] so that developers can work directly with Scala generic collection types. E.g.:

ExampleRecord(fortunes = Seq(Fortune(...), Fortune(...)))

For example, to define a field of a record containing an array, use:

namespace org.example.fortune

record Fortune {
  arrayField: array[int]
}

This will bind to:

case class Fortune(arrayField: Traversable[Int])

and be generated as case class Fortune(arrayField: IntArray).

Array items may be any pegasus type.

The array types for all primitive value types (IntArray, StringArray, …) are pre-generated by Courier and provided in the courier-runtime artifact in the org.coursera.courier.data package. The generator is aware of these classes and will refer to them instead of generating them when primitive arrays are used.

Schema type Scala type
array[int] org.coursera.courier.data.IntArray (predefined)
array[org.example.Record] org.example.RecordArray (generated)

All generated Arrays implement Scala’s IndexedSeq, Traversable and Product traits and behave like a standard Scala collection type.

All generated Arrays implement Scala’s IndexedSeq, Traversable and Product traits and behave like a standard Scala collection type. They contain an implicit converter so that a Scala Traversable can be converted to them without the need to do any explicit conversion.

// constructors
val array = IntArray(10, 20, 30)
val array: IntArray = Seq(10, 20, 30)
val array: IntArray = List(10, 20, 30)

// collection methods
array(0)

array.map { int => ... }

array.zipWithIndex

array.filter(_ > 20)

array.toSet

Unsurprisingly, Pegasus arrays are represented in JSON as arrays.

Scala Expression Equivalent JSON data
IntArray(1, 2, 3) [1, 2, 3]
RecordArray(Record(field = 1), Record(field = 2)) [ { "field": 1 }, { "field": 2 } ]

Ordinarily, arrays are defined inline inside other types. But if needed, typerefs allow a map to be defined in a separate .courier (or .pdsc) file and be assigned a unique type name. See below for more details about typerefs.

Map Type

Pegasus Maps are defined with a values type, and an optional keys type, using the form:

map[int, org.example.Fortune]

Maps bind to the Scala Map type:

Map[Int, Fortune]

Under the hood, Courier generates a class IntToFortuneMap extends Map[Int, Fortune] type and then provides an implicit conversion from Map[String, Int] so that developers can work directly with Scala generic collection types. E.g.:

Fortune(Map("a" -> 1, "b" -> 2))

If no “keys” type is specified, the key type will default to “string”. For example:

map[string, org.example.Note]

will bind to:

Map[String, Note]

and will be generated as class NoteMap extends Map[String, Note].

When complex types are used for “keys”, InlineStringCodec is used to serialize/deserialize complex type keys to JSON strings.

To define a field of a record containing a map, use:

namespace org.example.fortune

record Fortune {
  mapField: map[string, int]
}

This will bind to:

case class Fortune(mapField: Map[String, Int])

and be generated as case class Fortune(mapField: IntMap).

Like arrays, map values can be of any type, and the map types for all primitives are predefined.

Schema type Scala type
map[string, int] org.coursera.courier.data.IntMap (predefined)
map[string, org.example.Record org.example.RecordMap (generated)
map[org.example.SimpleId, org.example.Record] org.example.SimpleIdToRecordMap (generated)

All generated Maps implement Scala’s Map and Iterable traits and behave like a standard Scala collection type. The contain an implicit converter so that a Scala Map can be converted to them without the need to do any explicit conversion.

// constructors
val map = IntMap("a" -> 1, "b" -> 2, "c" -> 3)
val map: IntMap = Map("a" -> 1, "b" -> 2, "c" -> 3)

// collection methods
map.get("a")

map.getOrElse("b", 0)

map.contains("c")

map.mapValues { v => ... }

map.filterKeys { _.startsWith("a") }

Maps are represented in JSON as objects:

Scala Expression Equivalent JSON data
IntMap(“a” -> 1, “b” -> 2, “c” -> 3) { "a": 1, "b": 2, "c": 3 }
RecordMap(“a” -> Record(field = 1), “b” -> Record(field = 2)) { "a": { "field": 1 }, "b": { "field": 2 } }
SimpleIdToRecordMap(SimpleId(id = 1000) -> Record(field = 1)) { "(id~1000)": { "field": 1 } }

Ordinarily, maps are defined inline inside other types. But if needed, typerefs allow a map to be defined in a separate .courier (or .pdsc) file and be assigned a unique type name. See below for more details about typerefs.

Union Type

Pegasus Unions are tagged union types.

A union type may be defined with any number of member types. Each member may be any pegasus type except union: primitive, record, enum, map or array.

Unions types are defined in using the form:

union[MemberType1, MemberType2]

For example, a union that holds an int, string or a Fortune would be defined as:

union[int, string, org.example.Fortune]

The member type names also serve as the “member keys” (sometimes called “union tags”), and identify which union member type data holds.

For example:

Schema type Member key Example JSON data
“int” “int” { "int": 1 }
“string” “string” { "string": "coursera" }
“org.example.Fortune” “org.example.Fortune” { "org.example.Fortune": { "message": "Today is your lucky day!" }

Let’s look at an example of a union in use. To define a field of a record containing a union of two other records, we would define:

namespace org.example

record Question {
  answerFormat: union[MultipleChoice, TextEntry]
}

This will be generated as:

case class Question(answerFormat: Question.AnswerFormat)

object Question {
  // ...

  sealed abstract class AnswerFormat()
  case class MultipleChoiceMmeber(value: MultipleChoice) extends AnswerFormat
  case class TextEntryMember(value: TextEntry) extends AnswerFormat
  case class $UnknownMember() extends AnswerFormat // for backward compatibility (see below)
}

Here, because the union was defined inline in in the Question record, it is generated as a class scoped within Question type. It is also assigned a name based on the field is is contained in.

If the union were instead defined with a typeref it would be assigned the name of the typeref and be generated as a top level type. This will be covered in more detail later.

Note that each member type is “boxed” in a <Type>Member case class. This is because Scala does not (yet) support disjoint types directly in the type system.

Here’s how the AnswerFormat union can be used to create a new Question:

Scala Expression Equivalent JSON data
Question(TextEntryMember(TextEntry(...))) { "answerFormat": { "org.example.TextEntry": { ... } } }
Question(MultipleChoiceMember(MultipleChoice(...))) { "answerFormat": { "org.example.MultipleChoice": { ... } }}

To read the union, pattern matching may be used, e.g.:

question.answerFormat match {
  case TextEntryMember(textEntry) => ...
  case MultipleChoiceMember(multipleChoice) => ...
  case $UnknownMember => ... // for backward compatibility (see below)
}

Because the union is defined using a sealed base type, Scala can statically check that the cases used are exhaustive.

The member key of primitives, maps, arrays and unions are the same as their type name:

Scala Expression Equivalent JSON data
Record(field = IntMember(1)) { "field": { "int": 1 } }
Record(field = StringMember("a")) { "field": { "string": "a" } }
Record(field = IntMapMember(IntMap("a" -> 1))) { "field": { "map": { "a": 1 } } }
Record(field = IntArrayMember(IntArray(1,2,3))) { "field": { "array": [1, 2, 3] } }

Ordinarily, unions are defined inside other types. But if needed, typerefs may be used to define a union in a separate .courier (or .pdsc) file and give the union any desired name. See below for more details about typerefs.

Union Backward Compatibility

Strictly speaking, adding members to unions is a backward incompatible change. But in some cases, adding members can be handled in safe and controlled fashion.

Each Courier generated union type includes a $UnknownMember that indicates an unrecognized union member was read from serialized data. $UnknownMember is primarily intended help manage changes to the union in systems where reader and writers of the data may be using different versions of a schema, because, in such system, a reader might receive data containing union members they do not yet recognize.

All readers are expected to check if a union is $UnknownMember when consuming it. If the reader is able to handle $UnknownMember in a reasonable and safe way, it is encouraged to do so. If the reader requires a recognized member and cannot proceed in a reasonable way when $UnknownMember is encountered the reader should reject the data outright (e.g. if the data was received in an HTTP POST request, respond with a 400 HTTP response status code).

Enum Type

Pegasus enums

Enums types may contain any number of symbols, for example:

namespace org.example

enum Fruits {
  APPLE
  BANANA
  ORANGE
}

This will be generated as:

object Fruits extend Enumeration

where symbols are referenced as:

Fruits.APPLE

and the enum’s Scala type is:

Fruits.Fruits

Enums are referenced in other schemas either by name, e.g.:

namespace org.example

record FruitBasket {
  fruit: org.example.Fruits
}

..or by inlining their type definition, e.g.:

namespace org.example

record FruitBasket {
  fruit: enum Fruits { APPLE, BANANA, ORANGE }
}

This fully generated enum looks like:

object Fruits extends Enumeration {
  type Fruits = Value

  val APPLE = Value("APPLE")
  val BANANA = Value("BANANA")
  val ORANGE = Value("ORANGE")

  val $UNKNOWN = Value("$UNKNOWN") // for backward compatibility (see below)
}

Enums are represented in JSON as strings, e.g. "APPLE"

Enum documentation, deprecation and properties

Doc comments, @deprecation and properties may be added directly to enum symbols:

namespace org.example

enum Fruits {
  @color = "red"
  APPLE

  /** Yum. */
  @color = "yellow"
  BANANA

  @deprecated
  @color = "orange"
  ORANGE
}

Properties can easily be accessed from Scala code:

Fruits.BANANA.property("color")

Enum Backward Compatibility

Strictly speaking, adding a symbol to a enum is a backward incompatible change. But in some cases, adding symbols can be handled in safe and controlled fashion.

Each Courier generated enum includes a $UNKNOWN symbol that indicates an unrecognized symbol was read from serialized data. $UNKNOWN is primarily intended to help manage changes to the enum in systems where reader and writers of the data may be using different versions of a schema, because, in such system, a reader might receive data containing enum symbols they do not yet recognize.

All readers are expected to check if a enum is $UNKNOWN when consuming it. If the reader is able to handle $UNKNOWN in a reasonable and safe way, it is encouraged to do so. If the reader requires a recognized symbol and cannot proceed in a reasonable way when $UNKNOWN is encountered the reader should reject the data outright (e.g. if the data was received in an HTTP POST request, respond with a 400 HTTP response status code).

Typerefs

Pegasus Typerefs provide a lightweight alias to any other type.

They can be used for a variety of purposes. A few common uses:

(1) Provide a name for a union, map, or array so that it can be referenced by name. E.g.:

namespace org.example

typeref AnswerTypes = union[MultipleChoice, TextEntry]

This will be generated as:

abstract class AnswerTypes
case class MultipleChoiceMember(value: MultipleChoice) extends AnswerTypes
case class TextEntryMember(value: TextEntry) extends AnswerTypes

And can be referred to from any other type using the name org.example.AnswerTypes, e.g.:

namespace org.example

record Question {
  answerFormat: org.example.AnswerTypes
}

This is particularly useful because unions, maps and arrays cannot otherwise be named directly like records and enums can.

(2) Provide additional clarity when using primitive types for specific purposes.

namespace org.example

typeref UnixTimestamp = long

No classes will be generated for this typeref. In Scala, typerefs to primitives are simply bound to their reference types (unless the typref is defined as a custom, see below for details). E.g. UnixTypestamp will simply be bound to Long in Scala.

Custom Types

Pegasus Custom Types allow any Scala type to be bound to any pegasus primitive type.

There are two ways to define custom types:

  • For simple Scala case classes with a single element, simply define a typeref and reference class.
  • For any other type, create a Coercer and define a typeref that references both the class and the coercer.

Custom Types for Scala Case Classes

Coercers are not required for Scala case classes that have only a single element.

For example, to coerce to the Scala case class:

case class SlugId(slug: String)

Define a Pegasus typeref schema like:

namespace org.example.schemas

@scala.class = "org.example.SlugId"
typeref SlugId = string

Coercers

For example, Joda time has a convenient DateTime class. If we wish to use this class in Scala to represent date times, all we need to do is define a pegasus custom type that binds to it:

namespace org.example

@scala.class = "org.joda.time.DateTime"
@scala.coercerClass = "org.coursera.models.common.DateTimeCoercer"
typeref DateTime = string

The coercer is responsible for converting the pegasus “referenced” type, in this case "string" to the Joda DateTime class:

class DateTimeCoercer extends DirectCoercer[DateTime] {
  override def coerceInput(obj: DateTime): AnyRef = {
    DateTimeCoercer.iso8601Format.print(obj)
  }
  override def coerceOutput(obj: Any): DateTime = {
    obj match {
      case string: String => DateTimeCoercer.iso8601Format.parseDateTime(string)
      case _: Any => // ...
    }
  }
}
object DateTimeCoercer {
  registerCoercer()
  def registerCoercer(): Unit = {
    Custom.registerCoercer(new DateTimeCoercer, classOf[DateTime])
  }
  val iso8601Format = ISODateTimeFormat.dateTime()
}

Once a custom type is defined, it can be used in any type. For example, to use the DateTime custom type in a record:

namespace org.example

record Fortune {
  createdAt: org.example.DateTime
}

This will be generated as:

case class Fortune(createdAt: org.joda.time.DateTime)