Schema Language
Record Types
Pegasus Records contain any number of fields, which may be any pegasus type, including primitives, enums, unions, maps and arrays.
For example, a basic record type containing a few fields
namespace org.example
import org.example.time.DateTime
record Example {
field1: string
field2: int?
field3: DateTime
}
will be generated as:
/** A simple record */
case class Example(field1: String, field2: Option[Int], field3: DateTime)
Record Fields may be optional or may have default values.
Schema Field | Generated Scala |
---|---|
field: string |
case class Record(field: String) |
field: string = "message" |
case class Record(field: String = "message") |
field: string? |
case class Record(field: Option[String] = None) |
Doc Strings
Types and fields may be documented using “doc strings”.
/**
* Doc strings may be added to types. This doc should describe the purposes
* of the Example type.
*
* Doc strings may be formatted using
* [Markdown](https://daringfireball.net/projects/markdown/).
*/
record Example {
/**
* Doc strings may also be added to fields.
*/
field: string
}
Deprecation
Types and fields may be deprecated.
@deprecated("Use record X instead.")
record Example {
@deprecated("Use field x instead.")
field: string
}
Including fields
Records may include fields from other records
using "include"
:
record WithIncluded {
...AnotherRecord
}
In pegasus, field inclusion does not imply inheritance, it is merely a convenience to reduce duplication when writing schemas.
Record Backward Compatibility
The backward compatibility rules for records are:
Compatible changes:
- Adding an optional fields
- Adding a field with a default (required or optional)
When accessing fields:
- Unrecognized fields must be ignored.
- Fields with defaults should always be written, either with the desired value or the default value.
- The default value for a field should be assumed if the field is absent and is needed by the reader.
Primitive Types
The Pegasus primitive types are: int, long, float, double, boolean, string and bytes.
Schema Type | Scala Type | Example JSON data |
---|---|---|
“int” | Int | 100 |
“long” | Long | 10000000 |
“float” | Float | 3.14 |
“double” | Double | 2.718281 |
“boolean” | Boolean | true |
“string” | String | “coursera” |
“bytes” | ByteString | “\u0001\u0002” |
A ‘null’ type also exists, but should generally be avoided in favor of optional fields.
Array Type
Pegasus Arrays
are defined with a items
type using the form:
array[org.example.Fortune]
Arrays bind to the Scala IndexedSeq
type:
IndexedSeq[Fortune]
Under the hood, Courier generates a class FortuneArray extends IndexedSeq[Fortune]
type and then
provides an implicit conversion from Traversable[Fortune]
so that developers can work
directly with Scala generic collection types. E.g.:
ExampleRecord(fortunes = Seq(Fortune(...), Fortune(...)))
For example, to define a field of a record containing an array, use:
namespace org.example.fortune
record Fortune {
arrayField: array[int]
}
This will bind to:
case class Fortune(arrayField: Traversable[Int])
and be generated as case class Fortune(arrayField: IntArray)
.
Array items may be any pegasus type.
The array types for all primitive value types (IntArray
, StringArray
, …) are pre-generated by Courier and
provided in the courier-runtime
artifact in the org.coursera.courier.data
package. The generator
is aware of these classes and will refer to them instead of generating them when primitive arrays are used.
Schema type | Scala type |
---|---|
array[int] |
org.coursera.courier.data.IntArray (predefined) |
array[org.example.Record] |
org.example.RecordArray (generated) |
All generated Arrays implement Scala’s IndexedSeq
, Traversable
and Product
traits and behave
like a standard Scala collection type.
All generated Arrays implement Scala’s IndexedSeq
, Traversable
and Product
traits and behave
like a standard Scala collection type. They contain an implicit converter so that a Scala Traversable
can be converted to them without the need to do any explicit conversion.
// constructors
val array = IntArray(10, 20, 30)
val array: IntArray = Seq(10, 20, 30)
val array: IntArray = List(10, 20, 30)
// collection methods
array(0)
array.map { int => ... }
array.zipWithIndex
array.filter(_ > 20)
array.toSet
Unsurprisingly, Pegasus arrays are represented in JSON as arrays.
Scala Expression | Equivalent JSON data |
---|---|
IntArray(1, 2, 3) | [1, 2, 3] |
RecordArray(Record(field = 1), Record(field = 2)) | [ { "field": 1 }, { "field": 2 } ] |
Ordinarily, arrays are defined inline inside other types. But if needed,
typerefs allow a map to be defined in a separate .courier
(or .pdsc
) file and be assigned a
unique type name. See below for more details about typerefs.
Map Type
Pegasus Maps
are defined with a values
type, and an optional keys
type, using the form:
map[int, org.example.Fortune]
Maps bind to the Scala Map
type:
Map[Int, Fortune]
Under the hood, Courier generates a class IntToFortuneMap extends Map[Int, Fortune]
type and then
provides an implicit conversion from Map[String, Int]
so that developers can work
directly with Scala generic collection types. E.g.:
Fortune(Map("a" -> 1, "b" -> 2))
If no “keys” type is specified, the key type will default to “string”. For example:
map[string, org.example.Note]
will bind to:
Map[String, Note]
and will be generated as class NoteMap extends Map[String, Note]
.
When complex types are used for “keys”, InlineStringCodec is used to serialize/deserialize complex type keys to JSON strings.
To define a field of a record containing a map, use:
namespace org.example.fortune
record Fortune {
mapField: map[string, int]
}
This will bind to:
case class Fortune(mapField: Map[String, Int])
and be generated as case class Fortune(mapField: IntMap)
.
Like arrays, map values can be of any type, and the map types for all primitives are predefined.
Schema type | Scala type |
---|---|
map[string, int] |
org.coursera.courier.data.IntMap (predefined) |
map[string, org.example.Record |
org.example.RecordMap (generated) |
map[org.example.SimpleId, org.example.Record] |
org.example.SimpleIdToRecordMap (generated) |
All generated Maps implement Scala’s Map
and Iterable
traits and behave
like a standard Scala collection type. The contain an implicit converter so that a Scala Map
can be converted to them without the need to do any explicit conversion.
// constructors
val map = IntMap("a" -> 1, "b" -> 2, "c" -> 3)
val map: IntMap = Map("a" -> 1, "b" -> 2, "c" -> 3)
// collection methods
map.get("a")
map.getOrElse("b", 0)
map.contains("c")
map.mapValues { v => ... }
map.filterKeys { _.startsWith("a") }
Maps are represented in JSON as objects:
Scala Expression | Equivalent JSON data |
---|---|
IntMap(“a” -> 1, “b” -> 2, “c” -> 3) | { "a": 1, "b": 2, "c": 3 } |
RecordMap(“a” -> Record(field = 1), “b” -> Record(field = 2)) | { "a": { "field": 1 }, "b": { "field": 2 } } |
SimpleIdToRecordMap(SimpleId(id = 1000) -> Record(field = 1)) | { "(id~1000)": { "field": 1 } } |
Ordinarily, maps are defined inline inside other types. But if needed,
typerefs allow a map to be defined in a separate .courier
(or .pdsc
) file and be assigned a
unique type name. See below for more details about typerefs.
Union Type
Pegasus Unions are tagged union types.
A union type may be defined with any number of member types. Each member may be any pegasus type except union: primitive, record, enum, map or array.
Unions types are defined in using the form:
union[MemberType1, MemberType2]
For example, a union that holds an int
, string
or a Fortune
would be defined as:
union[int, string, org.example.Fortune]
The member type names also serve as the “member keys” (sometimes called “union tags”), and identify which union member type data holds.
For example:
Schema type | Member key | Example JSON data |
---|---|---|
“int” | “int” | { "int": 1 } |
“string” | “string” | { "string": "coursera" } |
“org.example.Fortune” | “org.example.Fortune” | { "org.example.Fortune": { "message": "Today is your lucky day!" } |
Let’s look at an example of a union in use. To define a field of a record containing a union of two other records, we would define:
namespace org.example
record Question {
answerFormat: union[MultipleChoice, TextEntry]
}
This will be generated as:
case class Question(answerFormat: Question.AnswerFormat)
object Question {
// ...
sealed abstract class AnswerFormat()
case class MultipleChoiceMmeber(value: MultipleChoice) extends AnswerFormat
case class TextEntryMember(value: TextEntry) extends AnswerFormat
case class $UnknownMember() extends AnswerFormat // for backward compatibility (see below)
}
Here, because the union was defined inline in in the Question record, it is generated as a class scoped within Question type. It is also assigned a name based on the field is is contained in.
If the union were instead defined with a typeref it would be assigned the name of the typeref and be generated as a top level type. This will be covered in more detail later.
Note that each member type is “boxed” in a <Type>Member
case class. This is
because Scala does not (yet) support disjoint types directly in the type system.
Here’s how the AnswerFormat
union can be used to create a new Question
:
Scala Expression | Equivalent JSON data |
---|---|
Question(TextEntryMember(TextEntry(...))) |
{ "answerFormat": { "org.example.TextEntry": { ... } } } |
Question(MultipleChoiceMember(MultipleChoice(...))) |
{ "answerFormat": { "org.example.MultipleChoice": { ... } }} |
To read the union, pattern matching may be used, e.g.:
question.answerFormat match {
case TextEntryMember(textEntry) => ...
case MultipleChoiceMember(multipleChoice) => ...
case $UnknownMember => ... // for backward compatibility (see below)
}
Because the union is defined using a sealed base type, Scala can statically check that the cases used are exhaustive.
The member key of primitives, maps, arrays and unions are the same as their type name:
Scala Expression | Equivalent JSON data |
---|---|
Record(field = IntMember(1)) |
{ "field": { "int": 1 } } |
Record(field = StringMember("a")) |
{ "field": { "string": "a" } } |
Record(field = IntMapMember(IntMap("a" -> 1))) |
{ "field": { "map": { "a": 1 } } } |
Record(field = IntArrayMember(IntArray(1,2,3))) |
{ "field": { "array": [1, 2, 3] } } |
Ordinarily, unions are defined inside other types. But if needed,
typerefs may be used to define a union in a separate .courier
(or .pdsc
) file and give the union
any desired name. See below for more details about typerefs.
Union Backward Compatibility
Strictly speaking, adding members to unions is a backward incompatible change. But in some cases, adding members can be handled in safe and controlled fashion.
Each Courier generated union type includes a $UnknownMember
that indicates
an unrecognized union member was read from serialized data. $UnknownMember
is primarily intended help manage changes to the union in systems where reader
and writers of the data may be using different versions of a schema, because,
in such system, a reader might receive data containing union members they do
not yet recognize.
All readers are expected to check if a union is $UnknownMember
when consuming it.
If the reader is able to handle $UnknownMember
in a reasonable and safe way, it
is encouraged to do so. If the reader requires a recognized member and cannot
proceed in a reasonable way when $UnknownMember
is encountered the reader should
reject the data outright (e.g. if the data was received in an HTTP POST request,
respond with a 400 HTTP response status code).
Enum Type
Enums types may contain any number of symbols, for example:
namespace org.example
enum Fruits {
APPLE
BANANA
ORANGE
}
This will be generated as:
object Fruits extend Enumeration
where symbols are referenced as:
Fruits.APPLE
and the enum’s Scala type is:
Fruits.Fruits
Enums are referenced in other schemas either by name, e.g.:
namespace org.example
record FruitBasket {
fruit: org.example.Fruits
}
..or by inlining their type definition, e.g.:
namespace org.example
record FruitBasket {
fruit: enum Fruits { APPLE, BANANA, ORANGE }
}
This fully generated enum looks like:
object Fruits extends Enumeration {
type Fruits = Value
val APPLE = Value("APPLE")
val BANANA = Value("BANANA")
val ORANGE = Value("ORANGE")
val $UNKNOWN = Value("$UNKNOWN") // for backward compatibility (see below)
}
Enums are represented in JSON as strings, e.g. "APPLE"
Enum documentation, deprecation and properties
Doc comments, @deprecation
and properties may be added directly to enum
symbols:
namespace org.example
enum Fruits {
@color = "red"
APPLE
/** Yum. */
@color = "yellow"
BANANA
@deprecated
@color = "orange"
ORANGE
}
Properties can easily be accessed from Scala code:
Fruits.BANANA.property("color")
Enum Backward Compatibility
Strictly speaking, adding a symbol to a enum is a backward incompatible change. But in some cases, adding symbols can be handled in safe and controlled fashion.
Each Courier generated enum includes a $UNKNOWN
symbol that indicates an
unrecognized symbol was read from serialized data. $UNKNOWN
is primarily
intended to help manage changes to the enum in systems where reader and writers of the data
may be using different versions of a schema, because, in such system, a reader might
receive data containing enum symbols they do not yet recognize.
All readers are expected to check if a enum is $UNKNOWN
when consuming it.
If the reader is able to handle $UNKNOWN
in a reasonable and safe way, it
is encouraged to do so. If the reader requires a recognized symbol and cannot
proceed in a reasonable way when $UNKNOWN
is encountered the reader should
reject the data outright (e.g. if the data was received in an HTTP POST request,
respond with a 400 HTTP response status code).
Typerefs
Pegasus Typerefs provide a lightweight alias to any other type.
They can be used for a variety of purposes. A few common uses:
(1) Provide a name for a union, map, or array so that it can be referenced by name. E.g.:
namespace org.example
typeref AnswerTypes = union[MultipleChoice, TextEntry]
This will be generated as:
abstract class AnswerTypes
case class MultipleChoiceMember(value: MultipleChoice) extends AnswerTypes
case class TextEntryMember(value: TextEntry) extends AnswerTypes
And can be referred to from any other type using the name
org.example.AnswerTypes
, e.g.:
namespace org.example
record Question {
answerFormat: org.example.AnswerTypes
}
This is particularly useful because unions, maps and arrays cannot otherwise be named directly like records and enums can.
(2) Provide additional clarity when using primitive types for specific purposes.
namespace org.example
typeref UnixTimestamp = long
No classes will be generated for this typeref. In Scala, typerefs to primitives
are simply bound to their reference types (unless the typref is defined as a
custom, see below for details). E.g. UnixTypestamp
will simply be bound
to Long
in Scala.
Custom Types
Pegasus Custom Types allow any Scala type to be bound to any pegasus primitive type.
There are two ways to define custom types:
- For simple Scala case classes with a single element, simply define a typeref and reference class.
- For any other type, create a Coercer and define a typeref that references both the class and the coercer.
Custom Types for Scala Case Classes
Coercers are not required for Scala case classes that have only a single element.
For example, to coerce to the Scala case class:
case class SlugId(slug: String)
Define a Pegasus typeref schema like:
namespace org.example.schemas
@scala.class = "org.example.SlugId"
typeref SlugId = string
Coercers
For example, Joda time has a convenient
DateTime
class. If we wish to use this class in Scala to represent date times,
all we need to do is define a pegasus custom type that binds to it:
namespace org.example
@scala.class = "org.joda.time.DateTime"
@scala.coercerClass = "org.coursera.models.common.DateTimeCoercer"
typeref DateTime = string
The coercer is responsible for converting the pegasus “referenced” type, in this
case "string"
to the Joda DateTime
class:
class DateTimeCoercer extends DirectCoercer[DateTime] {
override def coerceInput(obj: DateTime): AnyRef = {
DateTimeCoercer.iso8601Format.print(obj)
}
override def coerceOutput(obj: Any): DateTime = {
obj match {
case string: String => DateTimeCoercer.iso8601Format.parseDateTime(string)
case _: Any => // ...
}
}
}
object DateTimeCoercer {
registerCoercer()
def registerCoercer(): Unit = {
Custom.registerCoercer(new DateTimeCoercer, classOf[DateTime])
}
val iso8601Format = ISODateTimeFormat.dateTime()
}
Once a custom type is defined, it can be used in any type. For example, to use the DateTime custom type in a record:
namespace org.example
record Fortune {
createdAt: org.example.DateTime
}
This will be generated as:
case class Fortune(createdAt: org.joda.time.DateTime)