Idris 语言文档

注解

Idris 文档已按照 创作共用 CC0 许可协议 发布。因此根据法律规定, Idris 社区 已放弃对 Idris 文档的所有版权以及相关或邻接的权利。

关于 CC0 的更多信息参见:https://creativecommons.org/publicdomain/zero/1.0/deed.zh

Idris 教程

本文档为 Idris 的教程,它简单介绍了如何用 Idris 语言编程。 文档中覆盖了该语言的核心特性,并假定你至少熟悉一门函数式编程语言,如 Haskell 或 OCaml。

注解

Idris 文档已按照 创作共用 CC0 许可协议 发布。因此根据法律规定, Idris 社区 已放弃对 Idris 文档的所有版权以及相关或邻接的权利。

关于 CC0 的更多信息参见:https://creativecommons.org/publicdomain/zero/1.0/deed.zh

引言

在传统编程语言中,类型 之间有明确的区分。例如在 Haskell 中, 下面这些类型分别表示整数、字符、字符列表 以及任意值的列表:

  • Int, Char, [Char], [a]

与此对应,下面这些值分别为上述类型的成员:

  • 42, ’a’, "Hello world!", [2,3,4,5,6]

然而,在带有 依赖类型(Dependent Type) 的语言中,它们的区别并不明显。 依赖类型允许类型「依赖」于值,换句话说,类型是 一等(First-Class) 的语言构造, 即它可以像值一样进行操作。标准范例就是带长度的列表类型 [1] Vect n a, 其中 a 为元素的类型,而 n 为该列表的长度且可以任意长。

当类型包含了描述其性质的值(如列表的长度)时,它就能描述函数自身的性质了。 比如连接两个列表的操作,它拥有性质:结果列表的长度为两个输入列表的长度之和。 因此我们可以为 app 函数赋予如下类型,它用于连接向量(Vector):

app : Vect n a -> Vect m a -> Vect (n + m) a

本教程介绍了 Idris,一个通用的依赖类型函数式编程语言。Idris 项目旨在为可验证的通用编程打造一个依赖类型的语言。 为此,Idris 被设计成了编译型语言,目的在于生成高效的可执行代码。 它还拥有轻量的外部函数接口,可与外部 C 库轻松交互。

目标受众

本教程面向已经熟悉函数式语言(如 HaskellOCaml )的读者,旨在简要地介绍该语言。 尽管大多数概念都会有简单的解释,读者仍须熟悉 Haskell 的语法。 我们亦假设读者有兴趣使用依赖类型来编写和验证系统软件。

对 Idris 更加深入的介绍,见 Edwin Brady 所著的《 Idris 类型驱动开发 》, 其进度要慢得多,涵盖了交互式程序开发以及更多的例子。本书可从 Manning 获取。

示例代码

本教程中的示例代码通过了 Idris 的测试。这些文件可在 Idris 发行版的 samples 目录中找到,因此您可以轻松使用它们。然而,我们强烈建议您不要只是载入后阅读, 而是亲自输入它们。

[1]也许会令人困惑,它在依赖类型编程的文献中通常被称作「向量 (Vector) 」。

入门

前提需求

在安装 Idris 之前,你需要确认是否拥有所有必须的库和工具。你需要:

  • 近期版本的 GHC 。目前测试过的最早版本为 7.10.3.
  • GNU 多精度运算库 (GMP) 可从 MacPorts/Homebrew 和所有主流 Linux 发行版中获取。

下载并安装

如果你满足所有的前提需求,那么安装 Idris 的最简方式就是在命令行输入:

cabal update; cabal install idris

这会安装 Hackage 中的最新版本及其所有依赖。如果你想要最新开发版的话, 可以在 GitHub 上找到它, 然后根据构建指令来安装。

如果你之前从未安装过使用 Cabal 的东西,Idris 可能不在你的 PATH 中。 如果 Idris 可执行文件无法找到,请将 ~/.cabal/bin 添加到 $PATH 环境变量中。Mac OS X 用户则需要添加 ~/Library/Haskell/bin,Windows 用户通常会发现 Cabal 将程序安装在 %HOME%\AppData\Roaming\cabal\bin 中。

为了检查安装是否成功,你需要编写第一个程序。请创建一个名为 hello.idr 的文件, 其中包含以下文本:

module Main

main : IO ()
main = putStrLn "Hello world"

如果你熟悉 Haskell,那会很清楚这段程序在做什么,是如何做的; 如果你对它感到陌生,我们稍后会详细地解释。你可以在命令行中输入 idris hello.idr -o hello 来将程序编译成可执行文件。 这会创建一个名为 hello 的可执行文件,你可以运行它:

$ idris hello.idr -o hello
$ ./hello
Hello world

请注意,美元符号 $ 表示命令行。下面是一些常用的 Idris 命令行选项:

  • -o prog 编译成名为 prog 的可执行文件。
  • --check 无需启动交互式环境就对文件及其依赖进行类型检查。
  • --package pkg 将包添加为依赖,例如通过 --package contrib 来使用 contrib 包。
  • --help 显示用法摘要和命令行选项。

交互式环境

在命令行中输入 idris 来启动交互式环境。你会看到如下内容:

$ idris
    ____    __     _
   /  _/___/ /____(_)____
   / // __  / ___/ / ___/     Version 1.3.1
 _/ // /_/ / /  / (__  )      http://www.idris-lang.org/
/___/\__,_/_/  /_/____/       Type :? for help

Idris>

它会提供一个 ghci 风格的界面,可以像类型检查那样求值表达式、进行定理证明、 编译、编辑、以及执行多种其它操作。命令 :? 会列出所支持的命令。在以下示例中, hello.idr 已被加载,main 的类型已通过检查,之后该程序被编译成了可执行的 hello。 在对某文件类型检查时,如果通过,就会创建它的的字节码版本(本例中为 hello.ibc) 以提升未来的加载速度。在源文件被修改之后,字节码会重新生成。

$ idris hello.idr
     ____    __     _
    /  _/___/ /____(_)____
    / // __  / ___/ / ___/     Version 1.3.1
  _/ // /_/ / /  / (__  )      http://www.idris-lang.org/
 /___/\__,_/_/  /_/____/       Type :? for help

Type checking ./hello.idr
*hello> :t main
Main.main : IO ()
*hello> :c hello
*hello> :q
Bye bye
$ ./hello
Hello world

类型与函数

原语类型

Idris 定义了一些原语(Primitive)类型:IntIntegerDouble 用于数值类型, CharString 用于文本操作,Ptr 则表示外部指针。库中还声明了一些数据类型, 包括 Bool 及其值 TrueFalse。我们可以用这些类型来声明常量。 将以下内容保存到文件 Prims.idr 中,在命令行中输入 idris Prims.idr 以将其加载到 Idris 的交互式环境中:

module Prims

x : Int
x = 42

foo : String
foo = "Sausage machine"

bar : Char
bar = 'Z'

quux : Bool
quux = False

提示

原语(Primitive)本意为不可分割的基本构件,原语类型即最基本的类型。

Idris 文件由一个可选的模块声明(此处为 module Prims),一个可选的导入列表, 一组声明及其定义构成。在本例中并未指定导入。Idris 可由多个模块构成, 每个模块中的每个定义都有它自己的命名空间。这点会在 模块与命名空间 一节中进一步讨论。在编写 Idris 程序时,声明的顺序和缩进都很重要。 函数和数据类型必须先定义再使用,每个定义都必须有类型声明,例如之前列出的 x : Intfoo : String。新声明的缩进级别必须与之前的声明相同。 此外,分号 ; 也可用于结束声明。

库模块 Prelude 会在每个 Idris 程序中自动导入,包括 IO 功能,算术运算, 数据结构以及多种通用函数。Prelude 中定义了一些算术和比较运算符, 我们可以在提示符中使用它们。在提示符中进行求值会给出一个答案及其类型。例如:

*prims> 6*6+6
42 : Integer
*prims> x == 6*6+6
True : Bool

Idris 为原语类型定义了所有的普通算术和比较运算。它们通过接口进行了重载, 并可被扩展以适用于用户定义的类型,这点我们会在 接口 一节中讨论。布尔表达式可通过 if...then...else 构造来测试,例如:

*prims> if x == 6 * 6 + 6 then "The answer!" else "Not the answer"
"The answer!" : String

数据类型

数据类型的声明方式和语法与 Haskell 非常相似。例如,自然数和列表的声明如下:

data Nat    = Z   | S Nat           -- 自然数(零与后继)
data List a = Nil | (::) a (List a) -- 多态列表

以上声明来自于标准库。一进制自然数要么为零(Z), 要么就是另一个自然数的后继(S k);列表要么为空(Nil), 要么就是一个值被添加在另一个列表之前(x :: xs)。 在 List 的声明中,我们使用了中缀操作符 ::。 像这样的新操作符可通过缀序声明(Fixity Declaration)来添加,如下所示:

infixr 10 ::

函数、数据构造器与类型构造器均可用名字作为中缀操作符。它们被括在括号中时, 也可作为前缀形式使用,例如 (::)。中缀操作符可使用下面的任何符号:

:+-*\/=.?|&><!@$%^~#

有些以这些符号构成的操作符无法被用户定义。它们是 :=>-><-=?=|**==>\%~? 以及 !

函数

函数通过模式匹配(Pattern Matching)实现,语法同样和 Haskell 类似。 其主要的不同在于 Idris 的所有函数都需要类型声明,声明中使用单冒号 : (而非 Haskell 的双冒号 :: )。自然数的某些运算函数定义如下,同样来自标准库:

-- 一进制加法
plus : Nat -> Nat -> Nat
plus Z     y = y
plus (S k) y = S (plus k y)

-- 一进制乘法
mult : Nat -> Nat -> Nat
mult Z     y = Z
mult (S k) y = plus y (mult k y)

标准算术运算符 +* 同样根据 Nat 的需要进行了重载, 它们使用上面的函数来定义。和 Haskell 不同的是,类型和函数名的首字母并无大小写限制。 函数名(前面的 plusmult ),数据构造器(ZSNil::) 以及类型构造器(NatList)均属同一命名空间。不过按照约定, 数据类型和构造器的名字通常以大写字母开头。我们可以在 Idris 提示符中测试这些函数:

Idris> plus (S (S Z)) (S (S Z))
4 : Nat
Idris> mult (S (S (S Z))) (plus (S (S Z)) (S (S Z)))
12 : Nat

注解

在显示一个 Nat 元素,如 (S (S (S (S Z)))) 时,Idris 会将其显示为 4plus (S (S Z)) (S (S Z)) 的结果实际上为 (S (S (S (S Z)))), 即自然数 4。这点可在 Idris 提示符中验证:

Idris> (S (S (S (S Z))))
4 : Nat

和算术运算符一样,整数字面也可通过接口重载,因此我们也能像下面这样测试函数:

Idris> plus 2 2
4 : Nat
Idris> mult 3 (plus 2 2)
12 : Nat

你可能会很好奇,既然计算机已经完美内建了整数运算,我们为何还需要一进制的自然数? 主要的原因在于一进制数的结构非常便于推理,而且易与其它数据结构建立联系, 我们之后就会看到。尽管如此,我们并不希望以牺牲效率为代价获得这种便捷。幸运的是, Idris 知道 Nat (以及类似的结构化类型)和数之间的联系, 这意味着 Idris 可以优化它们的表示以及像 plusmult 这样的函数。

where 从句

函数也可通过 where 从句来 局部 地定义。例如,要定义用来反转列表的函数, 我们可以使用辅助函数来累加新的,反转后的列表,并且它无需全局可见:

reverse : List a -> List a
reverse xs = revAcc [] xs where
  revAcc : List a -> List a -> List a
  revAcc acc [] = acc
  revAcc acc (x :: xs) = revAcc (x :: acc) xs

缩进是十分重要的,where 块中函数的缩进层次必须比外层函数更深。

注解

作用域

任何外层作用域中可见,且没有被重新被定义过的名字,在 where 从句中也可见 (这里的 xs 被重新定义了)。若某个名字是某个类型的 形参(Parameter),那么仅当它在类型中出现时才会在 where 从句的作用域中,即,它在整体结构中是固定不变的。

除函数外,where 块中也可包含局部数据声明,以下代码中的的 MyLT 就无法在 foo 的定义之外访问。

foo : Int -> Int
foo x = case isLT of
            Yes => x*2
            No => x*4
    where
       data MyLT = Yes | No

       isLT : MyLT
       isLT = if x < 20 then Yes else No

通常,where 从句中定义的函数和其它顶层函数一样,都需要类型声明。 然而,函数 f 的类型声明可在以下情况中省略:

  • f 出现在顶层定义的右边
  • f 的类型完全可以通过其首次应用来确定

因此,举例来说,以下定义是合法的:

even : Nat -> Bool
even Z = True
even (S k) = odd k where
  odd Z = False
  odd (S k) = even k

test : List Nat
test = [c (S 1), c Z, d (S Z)]
  where c x = 42 + x
        d y = c (y + 1 + z y)
              where z w = y + w

Idris 程序中可以挖 坑(Hole) 来表示未完成的部分。例如,我们可以在「Hello world」 程序中为问候语 greeting 挖一个坑:

main : IO ()
main = putStrLn ?greeting

语法 ?greeting 挖了个坑,它表示程序中尚未写完的部分。这是个有效的 Idris 程序,你可以检查 greeting 的类型:

*Hello> :t greeting
--------------------------------------
greeting : String

检查坑的类型也会显示作用域中所有变量的类型。例如,给定一个未完成的 even 定义:

even : Nat -> Bool
even Z = True
even (S k) = ?even_rhs

我们可以检查 even_rhs 的类型,查看期望的返回类型,以及变量 k 的类型:

*Even> :t even_rhs
  k : Nat
--------------------------------------
even_rhs : Bool

坑非常有用,因为它能帮助我们 逐步地 编写函数。我们无需一次写完整个函数, 而是留下一些尚未编写的部分,让 Idris 告诉我们如何完成其定义。

提示

lhs(left hand side) 与 rhs(right hand side)分别表示等式中等号的左边和右边, 即左式和右式。

依赖类型

一等类型

在 Idris 中,类型是一等(First-Class)的,即它们可以像其它的语言构造那样被计算和操作 (以及传给函数)。例如,我们可以编写一个用来计算类型的函数:

isSingleton : Bool -> Type
isSingleton True = Nat
isSingleton False = List Nat

该函数可从一个 Bool 值计算出适当的类型,布尔值表示其类型是否为一个单例(Singleton)。 我们可以在任何能够使用类型的地方用该函数计算出一个类型。例如,它可用于计算返回类型:

mkSingle : (x : Bool) -> isSingleton x
mkSingle True = 0
mkSingle False = []

它也可拥有不同的输入类型。以下函数能够计算 Nat 列表之和,或者返回给定的 Nat,这取决于单例标记 single 是否为 True

sum : (single : Bool) -> isSingleton single -> Nat
sum True x = x
sum False [] = 0
sum False (x :: xs) = x + sum False xs
向量

依赖类型的一个范例就是「带长度的列表」类型,在依赖类型的文献中, 它通常被称作向量(Vector)。向量作为 Idris 库的一部分,可通过导入 Data.Vect 来使用,当然我们也自己声明它:

data Vect : Nat -> Type -> Type where
   Nil  : Vect Z a
   (::) : a -> Vect k a -> Vect (S k) a

注意我们使用了与 List 相同的构造器名。只要名字声明在不同的命名空间内 (在实践中,通常在不同的模块内),Idris 就能接受像这样的特设(ad-hoc)名重载。 有歧义的构造器名称通常可根据上下文来解决。

这里声明了一个类型族(Type Family),该声明的形式与之前的简单类型声明不太一样。 我们显式地描述了类型构造器 Vect 的类型,它接受一个 Nat 和一个类型作为参数,其中 Type 表示类型的类型。我们说 Vect 通过 Nat索引,并被 Type 参数化 。 每个构造器会产生该类型家族的不同部分。 Nil 只能用于构造零长度的向量, 而 :: 用于构造非零长度的向量。在 :: 的类型中,我们显式地指定了一个类型为 a 的元素和一个类型为 Vect k a 的尾部(Tail)(即长度为 k 的向量), 二者构成了一个长度为 S k 的向量。

List 以及 Nat 这类简单类型一样,我们可以通过模式匹配以同样的方式为 Vect 这样的依赖类型定义函数。 作用于 Vect 的函数的类型能够描述所涉及向量的长度会如何变化。例如,下面定义的 ++ 用于连接两个 Vect

(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) Nil       ys = ys
(++) (x :: xs) ys = x :: xs ++ ys

(++) 的类型描述了结果向量的长度必须为输入向量的长度之和。 如果我们以某种方式给出了错误的定义使其不成立,那么 Idris 就不会接受该定义。 例如:

(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) Nil       ys = ys
(++) (x :: xs) ys = x :: xs ++ xs -- 有误

在经由 Idris 类型检查器检查时,它会给出以下结果:

$ idris VBroken.idr --check
VBroken.idr:9:23-25:
When checking right hand side of Vect.++ with expected type
        Vect (S k + m) a

When checking an application of constructor Vect.:::
        Type mismatch between
                Vect (k + k) a (Type of xs ++ xs)
        and
                Vect (plus k m) a (Expected type)

        Specifically:
                Type mismatch between
                        plus k k
                and
                        plus k m

该错误信息指出两个向量的长度不匹配:我们需要一个长度为 k + m 的向量, 而你提供了一个长度为 k + k 的向量。

有限集

有限集,顾名思义,即元素有限的集合。它作为 Idris 库的一部分,可通过导入 Data.Fin 来使用,当然也可以像下面这样声明它:

data Fin : Nat -> Type where
   FZ : Fin (S k)
   FS : Fin k -> Fin (S k)

从它的签名中,我们可以看出该类型构造器接受一个 Nat,然后产生一个 类型 。 因此,它不是一个「对象的容器」意义上的集合,而是个拥有无名元素的一般集合。举例来说, 就是「存在一个包含五个元素的集合」的那种集合。实际上,它是一个捕获了所有落入零至 (n - 1) 范围内的整数的类型,其中 n 是用于实例化 Fin 类型的参数。例如, Fin 5 可被视作从 0 到 4 之间的整数的类型。

我们来仔细地观察一下它的构造器。

对于拥有 S k 个元素的有限集来说,FZ 是它的第零个元素, FS n 则是它的第 n+1 个元素。 Fin 通过 Nat 来索引, 它表示该集合中元素的个数。由于我们无法构造出属于空集的元素,因此也就无法构造出 Fin Z

如之前提到的, Fin 家族的用途之一在于表示有界的自然数集。由于前 n 个自然数构成了一个含有 n 个元素的有限集,我们可以将 Fin n 视作大于等于零且小于 n 的整数集。

例如,下面的函数根据给定的有界索引 Fin n 找出 Vect 中的元素, 它在 Prelude 中定义为:

index : Fin n -> Vect n a -> a
index FZ     (x :: xs) = x
index (FS k) (x :: xs) = index k xs

该函数从一个向量中找出给定位置的值。位置的边界由该向量的长度所界定 (每种情况下都是 n ),因此无需在运行时进行边界检查。类型检查器保证了 位置不会大于该向量的长度,当然也不会小于零。

注意这里也没有 Nil 的情况,因为这种情况不可能存在。 由于没有类型为 Fin Z 且位置为 Fin n 的元素,因此 n 不可能是 Z。 因此,如果你试图在一个空向量中查找元素,就会得到一个编译时的类型错误, 因为这样做会强行令 nZ

隐式参数

我们再仔细观察一下 index 的类型:

index : Fin n -> Vect n a -> a

它接受两个参数:一个类型为 n 元素有限集的元素,以及一个类型为 an 元素向量。不过这里还有两个名字:na,它们未被显式地声明。index 包含了 隐式 参数。我们也可以将 index 的类型写作:

index : {a:Type} -> {n:Nat} -> Fin n -> Vect n a -> a

隐式参数在类型声明的大括号 {} 中给定,它们并没有在应用 index 时给出, 因为它们的值可以从 Fin nVect n a 的参数类型中推导出来。 任何以小写字母开头,在类型声明中作为形参和索引出现的名字都不会应用到任何实参上, 它们 总是 会作为隐式参数被自动绑定。隐式参数仍然可以在应用时通过 {a=value}{n=value} 来显式地给定,例如:

index {a=Int} {n=2} FZ (2 :: 3 :: Nil)

实际上,无论是隐式还是显式,任何参数都可以给定一个名称。我们可以将 index 声明成这样:

index : (i:Fin n) -> (xs:Vect n a) -> a

写不写它纯属偏好问题,不过有时它能让参数更加明确,有助于函数文档的记录。

此外, {} 在等号左边时可用作模式匹配,即 {var = pat} 获取一个隐式变量, 并试图对「pat」进行模式匹配。例如:

isEmpty : Vect n a -> Bool
isEmpty {n = Z} _   = True
isEmpty {n = S k} _ = False
using」记法

有时为隐式参数提供类型会十分有用,特别是存在依赖顺序,或隐式参数本身含有依赖的情况下。 例如,我们可能希望在以下定义中指明隐式参数的类型,它为向量定义了前提(它也在 Data.VectElem 下定义):

data IsElem : a -> Vect n a -> Type where
   Here :  {x:a} ->   {xs:Vect n a} -> IsElem x (x :: xs)
   There : {x,y:a} -> {xs:Vect n a} -> IsElem x xs -> IsElem x (y :: xs)

IsElem x xs 的实例描述了 xxs 中的一个元素。我们可以构造这样的谓词: 若所需的元素在向量的头部时为 Here,在向量的尾部中时则为 There。例如:

testVec : Vect 4 Int
testVec = 3 :: 4 :: 5 :: 6 :: Nil

inVect : IsElem 5 Main.testVec
inVect = There (There Here)

重要

隐式参数与作用域

在类型签名中,类型检查器会将所有以小写字母开头 并且 没有应用到别的东西上的变量 视作隐式变量。要让上面的代码示例能够编译,你需要为 testVec 提供一个限定名。 在前面的例子中,我们假设该代码处于 Main 模块内。

如果大量使用相同的隐式参数,就会导致定义难以阅读。为避免此问题,可使用 using 块来为任何在块中出现的隐式参数指定类型和顺序:

using (x:a, y:a, xs:Vect n a)
  data IsElem : a -> Vect n a -> Type where
     Here  : IsElem x (x :: xs)
     There : IsElem x xs -> IsElem x (y :: xs)
注:声明顺序与 mutual 互用块

通常,函数与数据类型必须在使用前定义,因为依赖类型允许函数作为类型的一部分出现, 而类型检查会依赖于特定的函数如何定义(尽管这只对全函数成立,见 完全性检查)。 然而,此限制可通过使用 mutual 互用块来放宽,它允许数据类型和函数同时定义:

mutual
  even : Nat -> Bool
  even Z = True
  even (S k) = odd k

  odd : Nat -> Bool
  odd Z = False
  odd (S k) = even k

mutual 块中,首先所有的类型声明会被添加,然后是函数体。 因此,没有一个函数类型可以依赖于块中任何函数的归约行为。

I/O

如果计算机程序不能通过某种方式与用户或系统进行交互,那么它基本上没什么用。在 Idris 这样纯粹(Pure)的语言中,表达式没有副作用(Side-Effect)。而 I/O 的难点在于它本质上是带有副作用的。因此在 Idris 中,这样的交互被封装在 IO 类型中:

data IO a -- IO 操作返回一个类型为 a 的值

我们先给出 IO 抽象的定义,它本质上描述了被执行的 I/O 操作是什么, 而非如何去执行它们。最终操作则由运行时系统在外部执行。我们已经见过一个带 IO 的程序了:

main : IO ()
main = putStrLn "Hello world"

putStrLn 的类型描述了它接受一个字符串,然后通过 I/O 活动返回一个单元类型 () 的元素。它还有一个变体 putStr 用来输出字符串但不换行。

putStrLn : String -> IO ()
putStr   : String -> IO ()

我们可以从用户的输入中读取字符串:

getLine : IO String

Prelude 中定义了很多 I/O 操作,例如为了读写文件,需要包含:

data File -- abstract
data Mode = Read | Write | ReadWrite

openFile : (f : String) -> (m : Mode) -> IO (Either FileError File)
closeFile : File -> IO ()

fGetLine : (h : File) -> IO (Either FileError String)
fPutStr : (h : File) -> (str : String) -> IO (Either FileError ())
fEOF : File -> IO Bool

注意其中几个函数会返回 Either ,因为它们可能会失败。

do」记法

I/O 程序通常需要串连起多个活动,将一个计算的输出送入下一个计算的输入中。 然而,IO 是一个抽象类型,因此我们无法直接访问一个计算的结果。 因此,我们用 do-记法来串连起操作:

greet : IO ()
greet = do putStr "What is your name? "
           name <- getLine
           putStrLn ("Hello " ++ name)

语法 x <- iovalue 执行 IO a 类型的 I/O 操作 iovalue,然后将类型为 a 的结果送入变量 x 中。在这种情况下,getLine 会返回一个 IO String,因此 name 的类型为 String。缩进十分重要: do 语句块中的每个语句都必须从同一列开始。pure 操作允许我们将值直接注入到 IO 操作中:

pure : a -> IO a

后面我们会看到,do-记法比这里的展示更加通用,并且可以被重载。

惰性

通常,函数的参数会在函数被调用前求值(也就是说,Idris 采用了 及早(Eager) 求值策略)。然而,这并不总是最佳的方式。考虑以下函数:

ifThenElse : Bool -> a -> a -> a
ifThenElse True  t e = t
ifThenElse False t e = e

该函数会使用参数 te 二者之一,而非二者都用(我们之后会看到其实它被用于实现 if...then...else 构造)。我们更希望 只有 用到的参数才被求值。为此, Idris 提供了 Lazy 数据类型,它允许暂缓求值:

data Lazy : Type -> Type where
     Delay : (val : a) -> Lazy a

Force : Lazy a -> a

类型为 Lazy a 的值只有通过 Force 强制求值时才会被求值。Idris 类型检查器知道 Lazy 类型,并会在必要时在 Lazy aa 之间插入转换, 反之亦同。因此我们可以将 ifThenElse 写成下面这样,无需任何 ForceDelay 的显式使用:

ifThenElse : Bool -> Lazy a -> Lazy a -> a
ifThenElse True  t e = t
ifThenElse False t e = e

余数据类型

我们可以通过余数据类型,将递归参数标记为潜在无穷来定义无穷数据结构。对于一个余数据类型 T,其每个构造器中类型为 T 的参数都会被转换成类型为 Inf T 的参数。 这会让每个 T 类型的参数惰性化,使得类型为 T 的无穷数据结构得以构建。 余数据类型的一个例子为 Stream,其定义如下:

codata Stream : Type -> Type where
  (::) : (e : a) -> Stream a -> Stream a

它会被编译器翻译成下面这样:

data Stream : Type -> Type where
  (::) : (e : a) -> Inf (Stream a) -> Stream a

以下是如何用余数据类型 Stream 来构建无穷数据结构的例子。 在这里我们创建了一个 1 的无穷流:

ones : Stream Nat
ones = 1 :: ones

要重点注意:余数据类型不允许创建互用的无穷递归数据结构。 例如,以下代码会创建一个无穷循环并导致栈溢出:

mutual
  codata Blue a = B a (Red a)
  codata Red a = R a (Blue a)

mutual
  blue : Blue Nat
  blue = B 1 red

  red : Red Nat
  red = R 1 blue

mutual
  findB : (a -> Bool) -> Blue a -> a
  findB f (B x r) = if f x then x else findR f r

  findR : (a -> Bool) -> Red a -> a
  findR f (R x b) = if f x then x else findB f b

main : IO ()
main = do printLn $ findB (== 1) blue

为了修复它,我们必须为构造器参数的类型显式地加上 Inf 声明,因为余数据类型 不会将它添加到和正在定义的构造器类型 不同 的构造器参数上。例如,以下程序输出「1」。

mutual
  data Blue : Type -> Type where
   B : a -> Inf (Red a) -> Blue a

  data Red : Type -> Type where
   R : a -> Inf (Blue a) -> Red a

mutual
  blue : Blue Nat
  blue = B 1 red

  red : Red Nat
  red = R 1 blue

mutual
  findB : (a -> Bool) -> Blue a -> a
  findB f (B x r) = if f x then x else findR f r

  findR : (a -> Bool) -> Red a -> a
  findR f (R x b) = if f x then x else findB f b

main : IO ()
main = do printLn $ findB (== 1) blue

提示

「归纳数据类型」和「余归纳数据类型」

余数据类型(Codata Type)的全称为余归纳数据类型(Coinductive Data Type), 归纳数据类型和余归纳数据类型是对偶的关系。从语义上看, Inductive Type 描述了如何从更小的 term 构造出更大的 term;而 Coinductive Type 则描述了如何从更大的 term 分解成更小的 term。 二者即为塔斯基不动点定理中的最大不动点(对应余归纳)和最小不动点(对应归纳)。 参考自 Belleve 的回答。

常用数据类型

Idris 包含了很多常用的数据类型和库函数(见发行版中的 libs/ 目录及 文档 )。本节描述了其中的一部分。 作为 Prelude.idr 的一部分,下面描述的函数都会被每个 Idris 程序自动导入,

ListVect

我们已经见过 ListVect 数据类型了:

data List a = Nil | (::) a (List a)

data Vect : Nat -> Type -> Type where
   Nil  : Vect Z a
   (::) : a -> Vect k a -> Vect (S k) a

注意它们的构造器名称是相同的:只要构造器名称在不同的命名空间中声明, 它们就可以被重载(其实一般的名字都可以),并且通常会根据其类型来确定。 作为一种语法糖,任何带有 Nil:: 构造其名的类型都可被写成列表的形式。 例如:

  • [] 表示 Nil
  • [1,2,3] 表示 1 :: 2 :: 3 :: Nil

库中还定义了一些用于操作这些类型的函数。 mapListVect 都进行了重载,它将一个函数应用到列表或向量中的每个元素上。

map : (a -> b) -> List a -> List b
map f []        = []
map f (x :: xs) = f x :: map f xs

map : (a -> b) -> Vect n a -> Vect n b
map f []        = []
map f (x :: xs) = f x :: map f xs

例如,给定以下整数向量,以及一个将整数乘以 2 的函数:

intVec : Vect 5 Int
intVec = [1, 2, 3, 4, 5]

double : Int -> Int
double x = x * 2

函数 map 可像下面这样将该向量中的每个元素乘以二:

*UsefulTypes> show (map double intVec)
"[2, 4, 6, 8, 10]" : String

更多可用于 ListVect 的函数的详情请参阅以下库文件:

  • libs/prelude/Prelude/List.idr
  • libs/base/Data/List.idr
  • libs/base/Data/Vect.idr
  • libs/base/Data/VectType.idr

其中包括过滤、追加、反转等函数。

题外话:匿名函数与操作符段

上面的表达式其实还有更加利落的写法。其中一种就是使用匿名函数(Anonymous Function):

*UsefulTypes> show (map (\x => x * 2) intVec)
"[2, 4, 6, 8, 10]" : String

记法 \x => val 构造了一个匿名函数,它接受一个参数 x 并返回表达式 val。 匿名函数可以接受多个参数,它们以逗号分隔,如 \x, y, z => val。 参数也可以显式地给定类型,如 \x : Int => x * 2,也可使用模式匹配,如 \(x, y) => x + y

*UsefulTypes> show (map (* 2) intVec)
"[2, 4, 6, 8, 10]" : String

(*2) 是将数字乘以 2 的函数的简写,它会被展开为 \x => x * 2。 同样,(2*) 会被展开为 \x => 2 * x

提示

匿名函数在函数式编程中又称为 λ-表达式(Lambda Expression)。

Maybe

Maybe 描述了可选值,表示给定类型的值是否存在:

data Maybe a = Just a | Nothing

Maybe 是一种为可能失败的操作赋予类型的方式。例如,在 List (而非向量)中查找时可能会产生越界错误:

list_lookup : Nat -> List a -> Maybe a
list_lookup _     Nil         = Nothing
list_lookup Z     (x :: xs) = Just x
list_lookup (S k) (x :: xs) = list_lookup k xs

maybe 函数用于处理 Maybe 类型的值,如果值存在就对其应用一个函数, 否则提供一个默认值:

maybe : Lazy b -> Lazy (a -> b) -> Maybe a -> b

注意前两个参数的类型被封装在 Lazy 内。由于二者只有其一会被使用, 而计算完大型表达式之后就丢弃会造成浪费,因此我们将它们标记为 Lazy

元组

值可以通过以下内建的数据类型构成序对(Pair):

data Pair a b = MkPair a b

序对的语法糖可以写成 (a, b),根据上下文,其意思为 Pair a bMkPair a b。元组(Tuple)可包含任意个数的值,它通过嵌套的序对来表示:

fred : (String, Int)
fred = ("Fred", 42)

jim : (String, Int, String)
jim = ("Jim", 25, "Cambridge")
*UsefulTypes> fst jim
"Jim" : String
*UsefulTypes> snd jim
(25, "Cambridge") : (Int, String)
*UsefulTypes> jim == ("Jim", (25, "Cambridge"))
True : Bool
依赖序对

依赖序对允许序对第二个元素的类型依赖于第一个元素的值。

data DPair : (a : Type) -> (P : a -> Type) -> Type where
   MkDPair : {P : a -> Type} -> (x : a) -> P x -> DPair a P

同样,它也有语法糖。(a : A ** P) 表示由 A 和 P 构成的序对的类型,其中名字 a 可出现在 P 中。( a ** p ) 会构造一个该类型的值。例如, 我们可以将一个数和一个特定长度的 Vect 构造成一个序对:

vec : (n : Nat ** Vect n Int)
vec = (2 ** [3, 4])

如果你喜欢,也可以把它写成较长的形式,二者完全等价:

vec : DPair Nat (\n => Vect n Int)
vec = MkDPair 2 [3, 4]

当然,类型检查器可以根据向量的的长度推断出第一个元素的值。 我们可以在希望类型检查器填写值的地方写一个下划线 _,这样上面的定义也可以写作:

vec : (n : Nat ** Vect n Int)
vec = (_ ** [3, 4])

有时我们也更倾向于省略该序对第一个元素的类型,同样,它也可以被推断出来:

vec : (n ** Vect n Int)
vec = (_ ** [3, 4])

依赖序对的一个用处就是返回依赖类型的值,其中的索引未必事先知道。例如, 若按照某谓词过滤出 Vect 中的元素,我们不会事先知道结果向量的长度:

filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)

如果 Vect 为空,结果很简单:

filter p Nil = (_ ** [])

:: 的情况下,我们需要检查 filter 递归调用的结果来提取结果的长度和向量。 为此,我们使用 with 记法,它允许我们对中间值进行模式匹配:

filter p (x :: xs) with (filter p xs)
  | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )

我们之后会看到 with 的更多详情。

依赖序对有时被称作「Sigma 类型」。

记录

记录(Record) 数据类型将多个值(记录的 字段(Field))收集在一起。 Idris 提供了定义记录的语法,它也会自动生成用于访问和更新字段的函数。 和数据结构的语法不同,Idris 中的记录遵循与 Haskell 中看起来不同的语法。 例如,我们可以将一个人的名字和年龄用记录表示:

record Person where
    constructor MkPerson
    firstName, middleName, lastName : String
    age : Int

fred : Person
fred = MkPerson "Fred" "Joe" "Bloggs" 30

构造器名称由 constructor 关键字确定,字段where 关键字之后的缩进块中给定(此处为 firstNamemiddleNamelastName 以及 age)。

*Record> firstName fred
"Fred" : String
*Record> age fred
30 : Int
*Record> :t firstName
firstName : Person -> String

我们也可以用字段名来更新一个记录(确切来说,会产生一个更新了给定字段的记录的副本):

*Record> record { firstName = "Jim" } fred
MkPerson "Jim" "Joe" "Bloggs" 30 : Person
*Record> record { firstName = "Jim", age $= (+ 1) } fred
MkPerson "Jim" "Joe" "Bloggs" 31 : Person

语法 record { field = val, ... } 会生成一个更新了记录中给定字段的函数。= 为字段赋予新值,而 $= 通过应用一个函数来更新其值。

每个记录在它自己的命名空间中定义,这意味着字段名可在多个记录中重用。

记录以及记录中的字段可拥有依赖类型。更新允许更改字段的类型,前提是结果是良类型的。

record Class where
    constructor ClassInfo
    students : Vect n Person
    className : String

students 的字段更新为不同长度的向量是安全的,因为它不会影响该记录的类型:

addStudent : Person -> Class -> Class
addStudent p c = record { students = p :: students c } c
*Record> addStudent fred (ClassInfo [] "CS")
ClassInfo [MkPerson "Fred" "Joe" "Bloggs" 30] "CS" : Class

我们也可以用 $= 来更简洁地定义 addStudent

addStudent' : Person -> Class -> Class
addStudent' p c = record { students $= (p ::) } c
嵌套记录的更新

Idris 也提供了便于访问和更新嵌套记录的语法。例如,若一个字段可以通过表达式 c (b (a x)) 访问,那么它就可通过以下语法更新:

record { a->b->c = val } x

这会返回一个新的记录,通过路径 a->b->c 访问的字段会被设置为 val。 该语法是一等的,即 record { a->b->c = val } 本身拥有一个函数类型。 与此对应,你也可以使用以下语法访问该字段:

record { a->b->c } x

$= 记法对嵌套记录的更新亦有效。

依赖记录

记录也可依赖于值。记录拥有 形参,它无法像其它字段那样更新。 形参作为结果类型的参数出现,写在记录类型名之后。例如,一个序对类型可定义如下:

record Prod a b where
    constructor Times
    fst : a
    snd : b

使用前面的 class 记录,班级的大小可用 Vect 及通过 size 参数化该记录的大小来限制其类型。例如:

record SizedClass (size : Nat) where
    constructor SizedClassInfo
    students : Vect size Person
    className : String

注意 它无法再使用之前的 addStudent 函数了,因为这会改变班级的大小。 现在用于添加学生的函数必须在类型中指定班级的大小加一。由于其大小用自然数指定, 新的值可使用 S 构造器递增:

addStudent : Person -> SizedClass n -> SizedClass (S n)
addStudent p c =  SizedClassInfo (p :: students c) (className c)

更多表达式

let 绑定

中间值可使用 let 绑定来计算:

mirror : List a -> List a
mirror xs = let xs' = reverse xs in
                xs ++ xs'

我们也可以在 let 绑定中进行简单的模式匹配。例如,我们可以按如下方式从记录中提取字段, 也可以通过顶层的模式匹配来提取字段:

data Person = MkPerson String Int

showPerson : Person -> String
showPerson p = let MkPerson name age = p in
                   name ++ " is " ++ show age ++ " years old"
列表推导

Idris 提供了 推导 记法作为构建列表的简便写法。一般形式为:

[ expression | qualifiers ]

它会根据逗号分隔的限定式 qualifiers 给定的条件,通过求值表达式 expression 产生的值来生成列表。例如,我们可以按如下方式构建构建勾股三角的列表:

pythag : Int -> List (Int, Int, Int)
pythag n = [ (x, y, z) | z <- [1..n], y <- [1..z], x <- [1..y],
                         x*x + y*y == z*z ]

[a..b] 是另一种构建从 ab 的数列的简便记法。此外, [a,b..c]ab 之差为增量,构建从 ac 的数列。 它可作用于 NatInt 以及 Integer,它们使用了 Prelude 中的 enumFromToenumFromThenTo 函数。

提示

推导式

推导式(Comprehension)来源于集合的构建方法,即 {x|x∈X⋀Φ(x)},其中的 Φ(x) 即为限定式(Qualifier)。详情参见 维基百科

case 表达式

另一种检查 简单 类型的中间值的方法是使用 case 表达式。例如, 以下函数在给定的字符处将字符串分为两部分:

splitAt : Char -> String -> (String, String)
splitAt c x = case break (== c) x of
                  (x, y) => (x, strTail y)

break 是个库函数,它从给定的函数返回 true 的位置将字符串分为一个字符串的序对。 我们接着解构它返回的序对,并移除第二个字符串的第一个字符。

一个 case 表达式可匹配多种情况,例如去检查一个类型为 Maybe a 的中间值。 回想 list_lookup,它按索引查找列表中的元素,若索引越界则返回 Nothing。 我们可以用它来编写 lookup_default,该函数按索引查找元素,若索引越界则返回默认值:

lookup_default : Nat -> List a -> a -> a
lookup_default i xs def = case list_lookup i xs of
                              Nothing => def
                              Just x => x

若索引在界内,我们就会获得该索引对应的值,否则就会获得默认值:

*UsefulTypes> lookup_default 2 [3,4,5,6] (-1)
5 : Integer
*UsefulTypes> lookup_default 4 [3,4,5,6] (-1)
-1 : Integer

限制: case 构造用于对中间表达式进行简单的分析,以此避免编写辅助函数, 它也在内部用于实现 let 和 λ-绑定的模式匹配。它 在以下情况中可用:

  • 每个分支 匹配 一个相同类型的值,并 返回 一个相同类型的值。
  • 结果的类型是「已知」的,即表达式的类型无需对该 case 表达式进行类型检查就能确定。

完全性

Idris 区分 完全(全,Total) 函数与 部分(偏,Partial) 函数。 全函数满足以下情况之一:

  • 对于所有可能的输入都会终止,或
  • 产生一个非空的,有限的,可能为无限结果的前缀

若一个函数是完全的,我们可以认为其类型精确描述了该函数会做什么。例如, 若我们有一个返回类型为 String 的函数,根据它是否完全,我们能知道的东西会有所不同:

  • 若它是全函数,就会在有限的时间内返回一个类型为 String 的值;
  • 若它是偏函数,那么只要它不崩溃或进入无限循环,就会返回一个 String

Idris 对此作了区分,因此它知道在进行类型检查(正如我们在 一等类型 一节所见)的时候,哪些函数可以安全地求值。毕竟,若它在类型检查时试图对一个不终止的函数求值, 那么类型检查将无法终止!因此,在类型检查时只有全函数才会被求值。偏函数仍然可在类型中使用, 但它们不会被进一步求值。

接口

我们经常想要定义能够跨多种不同数据类型工作的函数。例如,我们想要算术运算符至少可以作用于 IntIntegerDouble。我们想要 == 作用于大部分的数据类型。 我们想要以一种统一的方式来显示不同的类型。

为此,我们使用了 接口(Interface),它类似于 Haskell 中的类型类(Typeclass)或 Rust 中的特性(Trait)。为了定义接口,我们提供了一组可重载的的函数。Show 接口就是个简单的例子, 它在 Prelude 中定义,并提供了将值转换为 String 的接口:

interface Show a where
    show : a -> String

它会生成一个类型如下的函数,我们称之为 Show 接口的 方法(Method)

show : Show a => a -> String

我们可以把它读作:「在 a 实现了 Show 的约束下,该函数接受一个输入 a 并返回一个 String。」我们可以通过为它定义接口的方法来实现该接口。例如,NatShow 实现可定义为:

Show Nat where
    show Z = "Z"
    show (S k) = "s" ++ show k
Idris> show (S (S (S Z)))
"sssZ" : String

一个类型只能实现一次某个接口,也就是说,实现无法被覆盖。实现的声明自身可以包含约束。 为此,实现的参数必须为构造器(即数据或类型构造器)或变量(也就是说,你无法为函数赋予实现)。 例如,要为向量定义一个 Show 的实现,我们需要确认其元素实现了 Show, 因为要用它来将每个元素都转换为 String

Show a => Show (Vect n a) where
    show xs = "[" ++ show' xs ++ "]" where
        show' : Vect n a -> String
        show' Nil        = ""
        show' (x :: Nil) = show x
        show' (x :: xs)  = show x ++ ", " ++ show' xs

提示

译者更愿意将它读作:对于一个实现了 ShowaVect n aShow 的实现为……

默认定义

库中定义了 Eq 接口,它提供了比较值是否相等的方法,所有内建类型都实现了它:

interface Eq a where
    (==) : a -> a -> Bool
    (/=) : a -> a -> Bool

要为类型实现一个接口,我们必须给出所有方法的定义。例如,为 Nat 实现 Eq

Eq Nat where
    Z     == Z     = True
    (S x) == (S y) = x == y
    Z     == (S y) = False
    (S x) == Z     = False

    x /= y = not (x == y)

很难想象在哪些情况下 /= 方法不是应用了 == 方法的结果的否定。 因此,利用接口声明中的某个方法为其它方法提供默认的定义会很方便:

interface Eq a where
    (==) : a -> a -> Bool
    (/=) : a -> a -> Bool

    x /= y = not (x == y)
    x == y = not (x /= y)

Eq 的最小完整实现只需要定义 ==/= 二者之一,而不需要二者都定义。 若缺少某个方法定义,且存在它的默认定义,那么就会使用该默认定义。

扩展接口

接口也可以扩展。逻辑上,相等关系 Eq 的下一步是定义排序关系 Ord。 我们可以定义一个 Ord 接口,它除了继承 Eq 的方法外还定义了自己的方法:

data Ordering = LT | EQ | GT
interface Eq a => Ord a where
    compare : a -> a -> Ordering

    (<) : a -> a -> Bool
    (>) : a -> a -> Bool
    (<=) : a -> a -> Bool
    (>=) : a -> a -> Bool
    max : a -> a -> a
    min : a -> a -> a
sort : Ord a => List a -> List a

函数、接口和实现可拥有多个约束。多个约束的列表写在括号内,以逗号分隔,例如:

sortAndShow : (Ord a, Show a) => List a -> String
sortAndShow xs = show (sort xs)
注意:接口与 mutual

mutual 块外,Idris 严格遵循「先定义后使用」的规则。在 mutual 块中, Idris 会分两趟进行繁释(elaborate):第一趟为类型,第二趟为定义。当互用块中包含接口声明时, 第一趟会繁释接口的头部而不繁释方法类型;第二趟则繁释方法类型以及所有的默认定义。

函子与应用子

目前,我们已经见过形参类型为 Type 的单形参接口了。通常,形参的个数可为任意个 (甚至零个),而形参也可为 任意 类型。若形参的类型不为 Type, 我们则需要提供显式的类型声明。例如,Prelude 中定义的函子接口 Functor 为:

interface Functor (f : Type -> Type) where
    map : (m : a -> b) -> f a -> f b

函子允许函数应用到结构上,例如将一个函数应用到 List 的每一个元素上:

Functor List where
  map f []      = []
  map f (x::xs) = f x :: map f xs
Idris> map (*2) [1..10]
[2, 4, 6, 8, 10, 12, 14, 16, 18, 20] : List Integer

定义了 Functor 之后,我们就能定义应用子 Applicative 了, 它对函数应用的概念进行了抽象:

infixl 2 <*>

interface Functor f => Applicative (f : Type -> Type) where
    pure  : a -> f a
    (<*>) : f (a -> b) -> f a -> f b

单子与 do-记法

单子接口 Monad 允许我们对绑定和计算进行封装,它也是 「do」记法 一节中 do-记法的基础。单子扩展了前面定义的 Applicative,其定义如下:

interface Applicative m => Monad (m : Type -> Type) where
    (>>=)  : m a -> (a -> m b) -> m b

do 块中会应用以下语法变换:

  • x <- v; e 变为 v >>= (\x => e)
  • v; e 变为 v >>= (\_ => e)
  • let x = v; e 变为 let x = v in e

IO 实现了 Monad,它使用原语函数定义。我们也可以为 Maybe 定义实现, 其实现如下:

Monad Maybe where
    Nothing  >>= k = Nothing
    (Just x) >>= k = k x

通过它,我们可以定义一个将两个 Maybe Int 相加的函数,并用单子来封装错误处理:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = do x' <- x -- 从 x 中提取值
               y' <- y -- 从 y 中提取值
               pure (x' + y') -- 二者相加

xy 均可用,该函数会从二者中提取出值;若其中一个或二者均不可用, 则返回 Nothing (「fail-fast 速错原则」)。Nothing 的情况通过 >>= 操作符来管理,由 do-记法来隐藏。

*Interfaces> m_add (Just 20) (Just 22)
Just 42 : Maybe Int
*Interfaces> m_add (Just 20) Nothing
Nothing : Maybe Int
模式匹配绑定

有时我们需要立即对 do-记法中某个函数的结果进行模式匹配。例如,假设我们有一个函数 readNumber,它从控制台读取一个数,若该数有效则返回 Just x 形式的值,否则返回 Nothing

readNumber : IO (Maybe Nat)
readNumber = do
  input <- getLine
  if all isDigit (unpack input)
     then pure (Just (cast input))
     else pure Nothing

如果接着用它来编写读取两个数,若二者均无效则返回 Nothing 的函数, 那么我们可能想要对 readNumber 进行模式匹配:

readNumbers : IO (Maybe (Nat, Nat))
readNumbers =
  do x <- readNumber
     case x of
          Nothing => pure Nothing
          Just x_ok => do y <- readNumber
                          case y of
                               Nothing => pure Nothing
                               Just y_ok => pure (Just (x_ok, y_ok))

如果有很多错误需要处理,它的嵌套层次很快就会变得非常深!我们不妨将绑定和模式匹配组合成一行。 例如,我们可以对 Just x_ok 的形式进行模式匹配:

readNumbers : IO (Maybe (Nat, Nat))
readNumbers =
  do Just x_ok <- readNumber
     Just y_ok <- readNumber
     pure (Just (x_ok, y_ok))

然而问题仍然存在,我们现在忽略了 Nothing 的情况,所以该函数不再是完全的了! 我们可以把 Nothing 的情况添加回去:

readNumbers : IO (Maybe (Nat, Nat))
readNumbers =
  do Just x_ok <- readNumber | Nothing => pure Nothing
     Just y_ok <- readNumber | Nothing => pure Nothing
     pure (Just (x_ok, y_ok))

此版本的 readNumbers 效果与初版完全一样(实际上,它是初版的语法糖, 并且会直接被翻译回初版的形式)。每条语句的第一部分(Just x_ok <-Just y_ok <-)给出了首选的绑定:若能匹配,do 块的剩余部分就会继续执行。 第二部分给出了候选的绑定,其中的绑定可以有不止一个。

!-记法

在很多情况下,do-记法会让程序不必要地啰嗦,在将值绑定一次就立即使用的情况下尤甚, 例如前面的 m_add。此时我们可以使用更加简短的方式:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = pure (!x + !y)

!expr 记法表示 expr 应当在求值后立即被隐式绑定。从概念上讲,我们可以把 ! 看做拥有以下类型的前缀函数:

(!) : m a -> a

然而请注意,它并不是一个真的函数,而是一个语法!在实践中,子表达式 !expr 会在 expr 的当前作用域内尽可能地提升,将它绑定到一个全新的名字 x, 然后用它来代替 !expr。首先表达式会按从左到右的顺序深度优先地上升。在实践中,! 记法允许我们以更直接的方式来编程,同时该记法也标出了哪些表达式为单子。

例如,表达式:

let y = 42 in f !(g !(print y) !x)

会被提升为:

let y = 42 in do y' <- print y
                 x' <- x
                 g' <- g y' x'
                 f g'
单子推导式

我们之间在 更多表达式 一节中看到的列表推导记法其实更通用, 它可应用于任何实现了 MonadAlternative 的东西:

interface Applicative f => Alternative (f : Type -> Type) where
    empty : f a
    (<|>) : f a -> f a -> f a

通常,推导式形式为 [ exp | qual1, qual2, …, qualn ] 其中 quali 可以为:

  • 一个生成式 x <- e
  • 一个 守卫式(Guard),它是一个类型为 Bool 的表达式
  • 一个 let 绑定 let x = e

要翻译一个推导式 [exp | qual1, qual2, …, qualn],首先任何作为 守卫式 的限定式 qual 会使用以下函数翻译为 guard qual

guard : Alternative f => Bool -> f ()

接着该推导式会被转换为 do-记法:

do { qual1; qual2; ...; qualn; pure exp; }

使用单子推导式,m_add 的选取(alternative)定义为:

m_add : Maybe Int -> Maybe Int -> Maybe Int
m_add x y = [ x' + y' | x' <- x, y' <- y ]

习语括号

do-记法为串连提供了另一种写法,而习语则为 应用 提供了另一种写法。 本节中的记法以及大量的例子受到了 Conor McBride 和 Ross Paterson 的论文 「带作用的应用子编程」 [1] 的启发。

首先,让我们回顾一下前面的 m_add。它所做的只是将一个操作符应用到两个从 Maybe Int 中提取的值。我们可以抽象出该应用:

m_app : Maybe (a -> b) -> Maybe a -> Maybe b
m_app (Just f) (Just a) = Just (f a)
m_app _        _        = Nothing

我们可以用它来编写另一种 m_add,它使用了函数应用的选取概念,带有显式的 m_app 调用:

m_add' : Maybe Int -> Maybe Int -> Maybe Int
m_add' x y = m_app (m_app (Just (+)) x) y

与其到处插入 m_app,我们不如使用习语括号(Idiom Brackets)来做这件事。 为此,我们可以像下面这样为 Maybe 提供一个 Applicative 的实现,其中 <*> 的定义方式与前面的 m_app 相同(它已在 Idris 库中定义):

Applicative Maybe where
    pure = Just

    (Just f) <*> (Just a) = Just (f a)
    _        <*> _        = Nothing

按照 <*> 的实现,我们可以像下面这样使用它,其中函数应用 [| f a1 an |] 会被翻译成 pure f <*> a1 <*> <*> an

m_add' : Maybe Int -> Maybe Int -> Maybe Int
m_add' x y = [| x + y |]
错误处理解释器

习语记法在定义求值器时通常很有用。McBride 和 Paterson 就为下面这样的语言描述了求值器 [1]

data Expr = Var String      -- 变量
          | Val Int         -- 值
          | Add Expr Expr   -- 加法

求值会被根据上下文将变量(表示为 String)映射为 Int 值,且可能会失败。 我们定义了数据类型 Eval 来包装求值器:

data Eval : Type -> Type where
     MkEval : (List (String, Int) -> Maybe a) -> Eval a

将求值器包装在数据类型中意味着我们之后可以为它提供接口的实现。我们首先定义一个函数, 它在求值过程中从上下文取出值。

fetch : String -> Eval Int
fetch x = MkEval (\e => fetchVal e) where
    fetchVal : List (String, Int) -> Maybe Int
    fetchVal [] = Nothing
    fetchVal ((v, val) :: xs) = if (x == v)
                                  then (Just val)
                                  else (fetchVal xs)

在定义该语言的求值器时,我们会应用 Eval 上下文中的函数,这样它自然会为 Eval 提供 Applicative 的实现。在 Eval 能够实现 Applicative 之前,我们必须为 Eval 实现 Functor

Functor Eval where
    map f (MkEval g) = MkEval (\e => map f (g e))

Applicative Eval where
    pure x = MkEval (\e => Just x)

    (<*>) (MkEval f) (MkEval g) = MkEval (\x => app (f x) (g x)) where
        app : Maybe (a -> b) -> Maybe a -> Maybe b
        app (Just fx) (Just gx) = Just (fx gx)
        app _         _         = Nothing

现在就可以在求值表达式时,通过应用的习语来处理错误了:

eval : Expr -> Eval Int
eval (Var x)   = fetch x
eval (Val x)   = [| x |]
eval (Add x y) = [| eval x + eval y |]

runEval : List (String, Int) -> Expr -> Maybe Int
runEval env e = case eval e of
    MkEval envFn => envFn env

命名实现

有时我们希望一个类型可以拥有一个接口的多个实现,例如为排序或打印提供另一种方法。 为此,实现可以像下面这样 命名

[myord] Ord Nat where
   compare Z (S n)     = GT
   compare (S n) Z     = LT
   compare Z Z         = EQ
   compare (S x) (S y) = compare @{myord} x y

它像往常一样声明了一个实现,不过带有显式的名字 myord。语法 compare @{myord} 会为 compare 提供显式的实现,否则它会使用 Nat 的默认实现。我们可以用它来反向排序一个 Nat 列表。给定以下列表:

testList : List Nat
testList = [3,4,1]

我们可以在 Idris 提示符中用默认的 Ord 实现来排序,之后使用命名的实现 myord

*named_impl> show (sort testList)
"[sO, sssO, ssssO]" : String
*named_impl> show (sort @{myord} testList)
"[ssssO, sssO, sO]" : String

有时,我们也需要访问命名的父级实现。例如 Prelude 中定义的半群 Semigroup 接口:

interface Semigroup ty where
  (<+>) : ty -> ty -> ty

接着又定义了幺半群 Monoid,它用「幺元」 neutral 扩展了 Semigroup

interface Semigroup ty => Monoid ty where
  neutral : ty

我们可以为 Nat 定义 SemigroupMonoid 的两种不同的实现, 一个基于加法,一个基于乘法:

[PlusNatSemi] Semigroup Nat where
  (<+>) x y = x + y

[MultNatSemi] Semigroup Nat where
  (<+>) x y = x * y

加法的幺元为 0,而乘法的幺元为 1。因此,我们在定义 Monoid 的实现时, 保证扩展了正确的 Semigroup 十分重要。我们可以通过 using 从句来做到这一点:

[PlusNatMonoid] Monoid Nat using PlusNatSemi where
  neutral = 0

[MultNatMonoid] Monoid Nat using MultNatSemi where
  neutral = 1

using PlusNatSemi 从句指明 PlusNatMonoid 应当扩展 PlusNatSemi

确定形参

当接口的形参多于一个时,通过限定形参可帮助查找实现。例如:

interface Monad m => MonadState s (m : Type -> Type) | m where
  get : m s
  put : s -> m ()

在此接口中,查找该接口的实现只需要知道 m 即可,而 s 可根据实现来确定。 它通过在接口声明之后添加 | m 来声明。我们将 m 称为 MonadState 接口的 确定形参(Determining Parameter) ,因为它是用于查找实现的形参。

[1](1, 2) Conor McBride and Ross Paterson. 2008. Applicative programming with effects. J. Funct. Program. 18, 1 (January 2008), 1-13. DOI=10.1017/S0956796807006326 http://dx.doi.org/10.1017/S0956796807006326

模块与命名空间

Idris 程序由一组模块构成。每个模块包含一个可选的 module 声明来命名模块, 一系列 import 语句用于导入其它模块,还有一组类型、接口和函数的声明与定义。 例如,以下模块定义了二叉树类型 BTree (在 Btree.idr 文件中):

module Btree

public export
data BTree a = Leaf
             | Node (BTree a) a (BTree a)

export
insert : Ord a => a -> BTree a -> BTree a
insert x Leaf = Node Leaf x Leaf
insert x (Node l v r) = if (x < v) then (Node (insert x l) v r)
                                   else (Node l v (insert x r))

export
toList : BTree a -> List a
toList Leaf = []
toList (Node l v r) = Btree.toList l ++ (v :: Btree.toList r)

export
toTree : Ord a => List a -> BTree a
toTree [] = Leaf
toTree (x :: xs) = insert x (toTree xs)

修饰符 exportpublic export 说明了哪些名称在其它模块中可见。 之后会详细解释。

这里给出了一个 main 程序(在 bmain.idr 文件中),它利用 Btree 模块对列表排序:

module Main

import Btree

main : IO ()
main = do let t = toTree [1,8,2,7,9,3]
          print (Btree.toList t)

同一名称可以在多个模块中定义:名称可以通过模块名来字 限定(Qualified)Btree 模块中定义的全部名称如下:

  • Btree.BTree
  • Btree.Leaf
  • Btree.Node
  • Btree.insert
  • Btree.toList
  • Btree.toTree

如果名称可以区分,就没必要给出完整的限定名。通过明确的限定,或者根据它们的类型, 可以消除名称的歧义。

模块名和文件名之间没有正式的联系,尽管通常会建议使用同一名字。import 语句用于引用文件名,它通过 . 来分隔路径。例如,import foo.bar 会导入文件 foo/bar.idr,其中通常会有模块声明 module foo.bar。对模块名字的唯一要求是, 包含 main 函数的主模块必须命名为 Main,而文件名无需为 Main.idr

导出修饰符

Idris 允许对模块内容的可见性进行细粒度的控制。默认情况下, 模块中定义的所有名称都是私有的。这有助于最小化接口并隐藏内部细节。Idris 允许将函数、类型和接口标记为 privateexportpublic export。 它们的一般含义如下:

  • private 表示完全不被导出。此为默认情况。
  • export 表示顶级类型是导出的。
  • public export 表示整个定义都是导出的。

更改可见性的另一限制是,定义无法引用可见性更低的名称。例如, public export 的定义无法使用私有名称, 而 export 的类型也无法使用私有名称。 这是为了避免私有名称泄露到模块的接口中。

对于函数的意义
  • export 表示导出类型
  • public export 表示导出类型和定义, 导出之后的定义可被使用。 换言之, 定义本身被认为是模块接口的一部分。名字 public export 比较长是为了让你三思而后行。

注解

Idris 中的同义类型可通过编写函数来创建。在为模块设置可见性时, 如果同义类型要在模块外部使用,那么将它们声明为 public export 可能会更好。 否则,Idris 会无法获知该类型与谁同义。

既然 public export 意味着函数的定义是导出的,那么该函数的定义实际上也就成为了模块 API 的一部分。因此,除非你确实想要导出函数的完整定义,否则最好避免将其声明为 public export

对于数据类型的涵义

对于数据类型,其涵义如下:

  • export 的类型构造器是导出的
  • public export 的类型构造器和数据构造器是导出的
对于接口的涵义

对于接口,其涵义如下:

  • export 的接口名字是导出的
  • public export 的接口名、方法名和默认定义都是导出的
%access 指令

默认的导出模式可以通过 %access 指令更改,例如:

module Btree

%access export

public export
data BTree a = Leaf
             | Node (BTree a) a (BTree a)

insert : Ord a => a -> BTree a -> BTree a
insert x Leaf = Node Leaf x Leaf
insert x (Node l v r) = if (x < v) then (Node (insert x l) v r)
                                   else (Node l v (insert x r))

toList : BTree a -> List a
toList Leaf = []
toList (Node l v r) = Btree.toList l ++ (v :: Btree.toList r)

toTree : Ord a => List a -> BTree a
toTree [] = Leaf
toTree (x :: xs) = insert x (toTree xs)

在这种情况下,没有访问修饰符的任何函数均可以导出为 export,而不是 private

内部模块 API 的传递

另外,通过对 import 使用 public 修饰符,可将模块内导入的模块再次导出。 例如:

module A

import B
import public C

模块 A 会导出名字 a, 以及模块 C 中所有公共或抽象的名称, 但无法再从模块 B 中导出任何东西。

显式命名空间

在定义模块的同时,也会隐式定义一个命名空间。然而,该命名空间也可以 显式 给出。当你想在同一模块内重载名称时,它会非常有用:

module Foo

namespace x
  test : Int -> Int
  test x = x * 2

namespace y
  test : String -> String
  test x = x ++ x

这个(明显人为设计的)模块使用完整的限定名 Foo.x.testFoo.y.test 定义了两个函数,二者可以根据函数类型来消歧义:

*Foo> test 3
6 : Int
*Foo> test "foo"
"foofoo" : String

形参化的块

一组函数的多个参数可通过 parameters 声明进行形参化(Parameterise),例如:

parameters (x : Nat, y : Nat)
  addAll : Nat -> Nat
  addAll z = x + y + z

parameters 形参块的作用是为块中的每个函数、类型和数据构造器添加形参声明。 具体来说,就是将形参添加到参数列表的前面。在此块外,形参必须显式地给定。 因此在 REPL 中调用 addAll 函数时,它会拥有以下函数声明:

*params> :t addAll
addAll : Nat -> Nat -> Nat -> Nat

以及以下定义:

addAll : (x : Nat) -> (y : Nat) -> (z : Nat) -> Nat
addAll x y z = x + y + z

形参块可以嵌套。在为所有类型和数据构造器显式地添加形参时, 块中也可包含数据声明。它们也可以是带有隐式参数的依赖类型:

parameters (y : Nat, xs : Vect x a)
  data Vects : Type -> Type where
    MkVects : Vect y a -> Vects a

  append : Vects a -> Vect (x + y) a
  append (MkVects ys) = xs ++ ys

为了在形参块外使用 Vects 或者 append,我们必须给出参数 xsy。 在这里,我们可以用占位符来代表类型检查器能推断出的值:

*params> show (append _ _ (MkVects _ [1,2,3] [4,5,6]))
"[1, 2, 3, 4, 5, 6]" : String

提示

形参(Parameter)与实参(Argument)

在数学中,对于函数 f(x)f 为函数名,x 则称作函数 f 的形式参数(Parameter),简称形参;在函数应用时,传输函数的参数则称作实际参数 (Argument),简称实参。例如 f(2) 中的 2 即为传入 f(x) 的实参。

在英文原文中,有时并不明确区分形参与实参,一般统译作「参数」。只有当需要明确区分时, 才分别译作「形参」与「实参」。

Idris 包括一套简单的构建系统,它会根据已命名的包描述文件来构建包以及可执行文件。 描述文件可以配合 Idris 编译器来管理开发过程。

包的描述

包的描述包含以下内容:

  • 包头,由关键字 package 后跟一个包名构成。包名可以是任何有效的 Idris 标识符。 iPKG 格式也可包含一个带引号的 version,它接受任何有效的文件名。
  • 描述包内容的字段,<field> = <value>

其中至少有一个 modules 字段,对应的值为逗号分隔的模块列表。例如, 给定一个 Idris 包 maths,包含 Maths.idrMaths.NumOps.idr Maths.BinOps.idrMaths.HexOps.idr 几个模块,其相应的包文件应为:

package maths

modules = Maths
        , Maths.NumOps
        , Maths.BinOps
        , Maths.HexOps

其它包文件的例子可以在主 Idris 仓库的 libs 目录中以及 第三方库 中找到。

使用包文件

Idris 本身是知晓包的,还有一些专门的命令来帮助构建、安装以及清除包。 例如,对于前面给出的 maths 包,我们可以像下面这样使用 Idris:

  • idris --build maths.ipkg 会构建包中的所有模块。
  • idris --install maths.ipkg 会安装包,使其可以被其它 Idris 库和程序访问。
  • idris --clean maths.ipkg 会删除构建时候产生的所有代码及可执行文件。

一旦 maths 包安装完成,命令行选项 --package maths (简写为 -p maths)就可以使用了。例如:

idris -p maths Main.idr

测试 Idris 包

集成的构建系统包含简单的测试框架。该框架会收集 tests 下的 ipkg 文件中列出的函数。所有的测试函数必须返回 IO ()

当输入 idris --testpkg yourmodule.ipkg 后,构建系统会列出单个 main 函数下的 tests 函数,并在你机器上的全新环境中创建一个临时文件。 它会将该临时文件编译成可执行文件并执行它。

测试本身负责报告它们的成功或失败。测试函数通常用 putStrLn 报告测试结果。 测试框架不强加任何报告标准,因此也不会合计测试结果。

我们以下面的函数列表为例,它们在样本包 maths 中名为 NumOps 的模块内定义:

module Maths.NumOps

%access export -- to make functions under test visible

double : Num a => a -> a
double a = a + a

triple : Num a => a -> a
triple a = a + double a

一个限定名为 Test.NumOps 的简单测试模块可声明为:

module Test.NumOps

import Maths.NumOps

%access export  -- to make the test functions visible

assertEq : Eq a => (given : a) -> (expected : a) -> IO ()
assertEq g e = if g == e
    then putStrLn "Test Passed"
    else putStrLn "Test Failed"

assertNotEq : Eq a => (given : a) -> (expected : a) -> IO ()
assertNotEq g e = if not (g == e)
    then putStrLn "Test Passed"
    else putStrLn "Test Failed"

testDouble : IO ()
testDouble = assertEq (double 2) 4

testTriple : IO ()
testTriple = assertNotEq (triple 2) 5

函数 assertEqassertNotEq 分别用于运行预期为通过和失败的相等性测试。 实际上的测试为 testDoubletestTriple,它们在 maths.ipkg 文件中的声明如下

package maths

modules = Maths.NumOps
        , Test.NumOps

tests = Test.NumOps.testDouble
      , Test.NumOps.testTriple

测试框架可通过 idris --testpkg maths.ipkg 命令调用:

> idris --testpkg maths.ipkg
Type checking ./Maths/NumOps.idr
Type checking ./Test/NumOps.idr
Type checking /var/folders/63/np5g0d5j54x1s0z12rf41wxm0000gp/T/idristests144128232716531729.idr
Test Passed
Test Passed

注意当我们使用 assertEqassertNoEq 函数测试时,它们是如何通过打印 Test Passed 来报告测试成功的。

在 Atom 中使用包依赖

如果你在使用 Atom 编辑器,并且依赖了另一个包,例如 import Lightyear 或者 import Pruviloj,那么你需要让 Atom 知道它应该加载什么。最简单的方式是通过 .ipkg 文件来完成。本教程在下一节中会描述一个 ipkg 文件通常会包含哪些内容, 不过这里先给出此平凡例子的简单步骤:

  • 创建一个 myProject 文件夹。
  • 添加一个 myProject.ipkg 文件,包含如下代码:
package myProject

pkgs = pruviloj, lightyear
  • 在 Atom 中,使用文件菜单打开 myProject 文件夹。

更多信息

更多详情,包括可用字段的完整列表,可在参考手册的 Packages 中找到。

示例:良类型的解释器

在本节中,我们使用已经见过的特性来编写一个更大的例子:一个简单的函数式语言解释器, 带有变量、函数应用、二元运算符和 if...then...else 构造。 我们用依赖类型系统来保证所有能够表示的程序都是良类型的。

语言的表示

首先,我们来定义语言中的类型。我们有整数、布尔和函数,用 Ty 来表示:

data Ty = TyInt | TyBool | TyFun Ty Ty

我们可以写一个函数将上面的表示转换为具体的 Idris 类型——要注意类型是一等的, 所以可以像其它值一样参与计算:

interpTy : Ty -> Type
interpTy TyInt       = Integer
interpTy TyBool      = Bool
interpTy (TyFun a t) = interpTy a -> interpTy t

我们将定义一种表示方法,它只能表示良类型的程序。我们通过表达式的类型 其局部变量的类型(上下文)来索引它。上下文可以用 Vect 数据类型来表达。 由于上下文需要被经常使用,因此它会被表示为隐式参数。为此,我们在 using 块中定义了所有需要的东西。(注意后文中的代码需要缩进才能在 using 块中使用):

using (G:Vect n Ty)

表达式通过局部变量的类型和表达式自身的类型来索引:

data Expr : Vect n Ty -> Ty -> Type

表达式的完整表示为:

data HasType : (i : Fin n) -> Vect n Ty -> Ty -> Type where
    Stop : HasType FZ (t :: G) t
    Pop  : HasType k G t -> HasType (FS k) (u :: G) t

data Expr : Vect n Ty -> Ty -> Type where
    Var : HasType i G t -> Expr G t
    Val : (x : Integer) -> Expr G TyInt
    Lam : Expr (a :: G) t -> Expr G (TyFun a t)
    App : Expr G (TyFun a t) -> Expr G a -> Expr G t
    Op  : (interpTy a -> interpTy b -> interpTy c) ->
          Expr G a -> Expr G b -> Expr G c
    If  : Expr G TyBool ->
          Lazy (Expr G a) ->
          Lazy (Expr G a) -> Expr G a

上述代码使用了 Idris 标准库中的 VectFin 类型。它们不在 Prelude 中, 因此我们需要导入它们:

import Data.Vect
import Data.Fin

由于表达式通过其类型来索引,我们可以直接从构造器的定义中得出该语言的类型规则。 下面我们来逐一观察每个构造器。

我们为变量使用了不带名字的表示法 - 它们以 de Bruijn 法来索引。 变量以它们在上下文中从属关系的证明来表示: HasType i G T 是变量 i 在上下文 G 中拥有类型 T 的证明。它的定义如下:

data HasType : (i : Fin n) -> Vect n Ty -> Ty -> Type where
    Stop : HasType FZ (t :: G) t
    Pop  : HasType k G t -> HasType (FS k) (u :: G) t

我们把 Stop 当做最近定义的变量有良类型的证明,而 Pop n 则证明了 如果第 n 个最近定义的变量是良类型的,那么第 n+1 个也是。在实践中, 这意味着我们通过 Var 构造器,使用 Stop 来引用最近定义的变量, Pop Stop 来引用下一个,以此类推。

Var : HasType i G t -> Expr G t

所以,在表达式 \x. \y. x y 中,变量 x 的 de Bruijn 索引为 1, 可表示为 Pop Stop ;变量 y 的 de Bruijn 索引为 0,可表示为 Stop 。 我们通过计算定义和使用之间 λ 的数量来查找它们。

值携带了整数的具体表示:

Val : (x : Integer) -> Expr G TyInt

λ 用于创建函数。在类型为 a -> t 的函数的作用域内,有一个新的类型为 a 的局部变量,它用上下文索引来表示:

Lam : Expr (a :: G) t -> Expr G (TyFun a t)

函数应用接受一个从 at 的函数和一个类型为 a 的值,产生一个类型为 t 的值。

App : Expr G (TyFun a t) -> Expr G a -> Expr G t

我们允许任意的二元运算符,运算符的类型会告诉我们参数的类型:

Op : (interpTy a -> interpTy b -> interpTy c) ->
     Expr G a -> Expr G b -> Expr G c

最后,If 表达式根据给定的布尔值来做出选择,每个分支必须有相同的类型。 我们将对分支使用惰性求值,只有被选择的分支才会被求值:

If : Expr G TyBool ->
     Lazy (Expr G a) ->
     Lazy (Expr G a) ->
     Expr G a

编写解释器

当我们对一个 Expr 求值时,我们需要其作用域内的值及其类型。 Env 是一个根据其作用域中的类型来索引的环境。尽管环境和局部变量类型的向量有着很强联系, 它也只是列表的另一种形式。我们使用了一般的 ::Nil 构造器, 这样就能使用一般的列表语法了。给定一个变量在上下文中定义的证明,我们可以从环境中得到一个值:

data Env : Vect n Ty -> Type where
    Nil  : Env Nil
    (::) : interpTy a -> Env G -> Env (a :: G)

lookup : HasType i G t -> Env G -> interpTy t
lookup Stop    (x :: xs) = x
lookup (Pop k) (x :: xs) = lookup k xs

有了上述内容,解释器就是一个根据特定环境将 Expr 转换成一个具体的 Idris 值的函数了:

interp : Env G -> Expr G t -> interpTy t

作为参考,解释器的完整定义如下。对于每一个构造器,我们把它转换成对应的 Idris 值:

interp env (Var i)     = lookup i env
interp env (Val x)     = x
interp env (Lam sc)    = \x => interp (x :: env) sc
interp env (App f s)   = interp env f (interp env s)
interp env (Op op x y) = op (interp env x) (interp env y)
interp env (If x t e)  = if interp env x then interp env t
                                         else interp env e

我们来逐一观察每一种情况。对于一个变量,我们只要从环境中找出它:

interp env (Var i) = lookup i env

对于一个值,我们只需要返回其实际的表达方式:

interp env (Val x) = x

λ 则比较有意思。我们构造了一个解释 λ 内部作用域的函数,但其环境中带有一个新的值。 因此,目标语言的函数可以用如下的方法来转换成 Idris 函数:

interp env (Lam sc) = \x => interp (x :: env) sc

对于函数应用,我们解释函数及其参数,并直接将函数应用于参数。我们知道,根据其类型, 解释 f 必定会得到一个函数:

interp env (App f s) = interp env f (interp env s)

运算符和条件分支同样会转换成等价的 Idris 构造。对于运算符,我们将函数直接应用到 操作数上;对于 If,我们直接使用 Idris 的 if...then...else 构造。

interp env (Op op x y) = op (interp env x) (interp env y)
interp env (If x t e)  = if interp env x then interp env t
                                         else interp env e

测试

我们可以编写一个简单的测试函数。首先,将两个输入值加起来,\x. \y. y + x 可写成如下形式:

add : Expr G (TyFun TyInt (TyFun TyInt TyInt))
add = Lam (Lam (Op (+) (Var Stop) (Var (Pop Stop))))

更有趣的是阶乘函数 fact,如 \x. if (x == 0) then 1 else (fact (x-1) * x), 可写成如下形式:

fact : Expr G (TyFun TyInt TyInt)
fact = Lam (If (Op (==) (Var Stop) (Val 0))
               (Val 1)
               (Op (*) (App fact (Op (-) (Var Stop) (Val 1)))
                       (Var Stop)))

运行

作为结束,我们编写一个 main 程序,它根据用户的输入来解释阶乘函数:

main : IO ()
main = do putStr "Enter a number: "
          x <- getLine
          printLn (interp [] fact (cast x))

此处的 cast 是一个被重载的函数,在可能的情况下,它将值转为另一种类型。在这里, 它将字符串转换成了整数,在输入不合法时返回 0 。在 Idris 的交互式环境运行中这个程序 会产生如下结果:

$ idris interp.idr
     ____    __     _
    /  _/___/ /____(_)____
    / // __  / ___/ / ___/     Version 1.3.1
  _/ // /_/ / /  / (__  )      http://www.idris-lang.org/
 /___/\__,_/_/  /_/____/       Type :? for help

Type checking ./interp.idr
*interp> :exec
Enter a number: 6
720
*interp>
题外话:cast

Prelude 中定义了一个 Cast 接口来实现类型之间的转换:

interface Cast from to where
    cast : from -> to

这是一个 多参数 接口,定义了转换的源类型和目标类型。在转换被应用的地方, 类型检查器必须能够推导出 两个 参数。原语类型间有意义的转换均已定义。

视角与「with」规则

依赖模式匹配

由于类型可以依赖于值,因此某些参数的形式可根据其它参数的值来确定。例如, 如果我们写出 (++) 的长度隐式参数,就会看出它的形式由向量是否为空来决定:

(++) : Vect n a -> Vect m a -> Vect (n + m) a
(++) {n=Z}   []        ys = ys
(++) {n=S k} (x :: xs) ys = x :: xs ++ ys

假如 n[] 的情况下是一个后继,或者在 :: 的情况下为零, 那么它的定义就不是良类型的。

with 规则:匹配中间值

我们经常需要匹配中间计算(Intermediate Computation)的结果。Idris 受到 Epigram [1] 中视角(View)的启发,为此提供了一种构造,即 with 规则, 它考虑到在依赖型的语言中匹配值会影响我们对其它值的形式的了解。在最简单的形式中, with 规则会为正在定义的函数附加一个额外的参数。

我们已经见过向量过滤函数了。这次我们用 with 来定义它:

filter : (a -> Bool) -> Vect n a -> (p ** Vect p a)
filter p [] = ( _ ** [] )
filter p (x :: xs) with (filter p xs)
  filter p (x :: xs) | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )

在这里,with 从句能让我们解构(deconstruct)``filter p xs`` 的结果。 该视角精化的参数模式 filter p (x :: xs) 位于 with 从句的下方, 之后是一条竖线 | 后面跟着解构的中间结果 ( _ ** xs' )。 如果该视角精化的参数模式与原函数的参数模式相同,那么 | 左侧的就是多余的,可以省略:

filter p (x :: xs) with (filter p xs)
  | ( _ ** xs' ) = if (p x) then ( _ ** x :: xs' ) else ( _ ** xs' )

提示

这个例子并不好,它省去了大量的细节,包括什么是视角,如何用视角来解构数据, 以及 with 从句帮你做了什么。参数 p 与结果 (p ** Vect p a) 中的 p 也毫无关系,这点可能会对初学者造成困扰。有条件的读者可参考 Type-Driven Development with Idris 第二部分中的 Views: extending pattern matching 一章,或者重新思考接下来这个例子。

with 从句还可以嵌套:

foo : Int -> Int -> Bool
foo n m with (succ n)
  foo _ m | 2 with (succ m)
    foo _ _ | 2 | 3 = True
    foo _ _ | 2 | _ = False
  foo _ _ | _ = False

如果中间计算本身具有依赖类型,那么其结果会影响其它参数的形式 — 我们可以通过测试 附加参数来了解一个值的形式。在这类情况下,视角精化的参数模式必须是显式的。 例如,一个 Nat 不是偶数就是奇数。如果它是偶数,那么它就等于两个 Nat 之和。 否则,它就是两个 Nat 之和再加一:

data Parity : Nat -> Type where
   Even : Parity (n + n)
   Odd  : Parity (S (n + n))

我们将 Parity 称为 Nat 的一个 视角(View)。它拥有一个 覆盖函数(Covering Function) 来测试 Nat 的奇偶性并构造相应的谓词:

parity : (n:Nat) -> Parity n

我们之后再看 parity 的定义。我们可以用它配合 with 规则来编写一个函数, 它将一个自然数转换成一系列二进制数字(False 为 0,True 为 1,低位在前):

natToBin : Nat -> List Bool
natToBin Z = Nil
natToBin k with (parity k)
   natToBin (j + j)     | Even = False :: natToBin j
   natToBin (S (j + j)) | Odd  = True  :: natToBin j

parity k 的值影响了 k 的形式,因为 parity k 的结果取决于 k。 因此,除了 | 右侧的中间计算结果(EvenOdd)的模式外, 我们还可以写出该结果如何影响 | 左侧的其它模式。即:

  • parity k 求值为 Even 时,我们可以根据 Even 构造器的定义 Parity (n + n),将原始参数 k 精化为模式 (j + j)。这样 (j + j) 就代替了 | 左侧的 k,而 Even 构造器则出现在右侧。精化模式中的自然数 j 会被用在 = 符号的两侧。
  • 否则,当 parity k 求值为 Odd 时,根据 Odd 构造器的定义 Parity (S (n + n)),原始参数 k 会被精化为模式 S (j + j), 它和 Odd 会出现在 | 的两侧,同样自然数 j 会被用在 = 符号的两侧。

注意,在精化模式的两个 j 之间有一个函数 (+),它被允许是因为附加参数 已经确定了此模式的形式。

我们会在下一节 实践中的证明 中回到 parity 上来完成它的定义。

with 与证明

要使用依赖模式匹配进行定理证明,有时必须根据匹配模式显式地构造出证明结果。为此,你可以为 with 从句加上 proof p 后缀,由模式匹配生成的证明会被命名为 p 并加入到作用域中。 例如:

data Foo = FInt Int | FBool Bool

optional : Foo -> Maybe Int
optional (FInt x) = Just x
optional (FBool b) = Nothing

isFInt : (foo:Foo) -> Maybe (x : Int ** (optional foo = Just x))
isFInt foo with (optional foo) proof p
  isFInt foo | Nothing = Nothing           -- here, p : Nothing = optional foo
  isFInt foo | (Just x) = Just (x ** Refl) -- here, p : Just x = optional foo
[1]Conor McBride and James McKinna. 2004. The view from the left. J. Funct. Program. 14, 1 (January 2004), 69-111. https://doi.org/10.1017/S0956796803004829

定理证明

相等性

Idris 可以声明命题的相等性,陈述并证明有关程序的定理。 相等性是内建的,不过这里给出其概念上的定义:

data (=) : a -> b -> Type where
   Refl : x = x

任何类型的任何值之间都可以判断相等性,然而构造相等性证明的唯一方式就是值确实相等。例如:

fiveIsFive : 5 = 5
fiveIsFive = Refl

twoPlusTwo : 2 + 2 = 4
twoPlusTwo = Refl

空类型

存在一个空类型 \bot,它没有构造器。因此,在不使用部分定义或一般递归函数的情况下, 无法构造出空类型的元素(详见 完全性检查 一节)。 由此,我们可以用空类型来证明某些命题是不可能的,例如零不可能是任何自然数的后继:

disjoint : (n : Nat) -> Z = S n -> Void
disjoint n p = replace {P = disjointTy} p ()
  where
    disjointTy : Nat -> Type
    disjointTy Z = ()
    disjointTy (S k) = Void

我们不必太关心该函数如何工作 — 本质上,它应用库函数 replace, 根据相等证明来变换谓词。在本例中,我们利用某些东西无法存在的证明, 将一个可以存在的类型(即空元组)的值,变换成了一个无法存在的类型的值。

一旦拥有了空类型的元素,我们就能证明任何东西。void 在库中定义,用于辅助反证法。

void : Void -> a

简单定理

当对依赖类型进行类型检查时,该类型就会被 规范化(Normalized)。 比如我们想要证明以下关于 plus 的归约行为(Reduction Behaviour)的定理:

plusReduces : (n:Nat) -> plus Z n = n

我们将该定理的陈述写成了类型,就像写出程序的类型一样。实际上证明和程序之间并没有本质的区别。 就我们所关注的而言,证明不过是个程序,只是它的类型精确到足以保证满足我们关心的特殊性质。

我们不会在这里深入细节,Curry-Howard 同构 [1] 解释了这种关系。该证明很平凡, 因为 plus 的定义将 plus Z n 规范化成了 n

plusReduces n = Refl

如果我们换种方式证明该论点,那就有点难了,因为加法是对第一个参数递归定义的。 该证明同样也可以在 plus 的第一个参数 n 上递归:

plusReducesZ : (n:Nat) -> n = plus n Z
plusReducesZ Z = Refl
plusReducesZ (S k) = cong (plusReducesZ k)

cong 是库中定义的一个函数,它指明相等性也适用于函数应用:

cong : {f : t -> u} -> a = b -> f a = f b

我们可以对加法在后继上的递归行为做同样的事情:

plusReducesS : (n:Nat) -> (m:Nat) -> S (plus n m) = plus n (S m)
plusReducesS Z m = Refl
plusReducesS (S k) m = cong (plusReducesS k m)

即便对于如此平凡的定理,一口气构造出证明也有点棘手。当事情变得更复杂时, 需要考虑的就太多了,此时根本无法在这种「批量模式」下构造证明。

Idris 提供了交互式编辑的能力,它可以帮助构造证明。关于在编辑器中交互式构造证明的更多详情, 参见 Theorem Proving

实践中的证明

证明定理的需求可在实践中自然产生。例如,在之前的 视角与「with」规则 一节中, 我们用函数 parity 实现了 natToBin

parity : (n:Nat) -> Parity n

然而,我们并未提供 parity 的定义。我们可能觉得它看起来会像下面这样:

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = Even {n=S j}
  parity (S (S (S (j + j)))) | Odd  = Odd {n=S j}

然而,它会因类型错误而无法编译:

在按照期望的类型
        Parity (S (S (j + j)))
检查 views.parity 中 with 块的右侧时

发现
        Parity (S j + S j) (Even 的类型)
与
        Parity (S (S (plus j j))) (期望的类型)
的类型不匹配

问题在于,在 Even 的类型中规范化 S j + S j 并不能得到我们需要的 Parity 右侧的类型。我们知道 S (S (plus j j)) 等于 S j + S j, 但需要向 Idris 证明它。我们可以先为该定义挖一些 (见 ):

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = let result = Even {n=S j} in
                                          ?helpEven
  parity (S (S (S (j + j)))) | Odd  = let result = Odd {n=S j} in
                                          ?helpOdd

检查 helpEven 的类型会告诉我们需要为 Even 的情况证明什么:

  j : Nat
  result : Parity (S (plus j (S j)))
--------------------------------------
helpEven : Parity (S (S (plus j j)))

由此我们可以编写一个辅助函数,将它的类型 重写 为需要的形式:

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in p

rewrite ... in 语法允许你根据相等性证明来改写它,以此改变表达式需要的类型。 在这里,我们使用了 plusSuccRightSucc,其类型如下:

plusSuccRightSucc : (left : Nat) -> (right : Nat) -> S (left + right) = left + S right

我们可以在 helpEven 的右侧挖个坑来看到 rewrite 的效果,然后一步一步地做。 我们从下面开始:

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = ?helpEven_rhs

先查看一下 helpEven_rhs 的类型:

  j : Nat
  p : Parity (S (plus j (S j)))
--------------------------------------
helpEven_rhs : Parity (S (S (plus j j)))

然后通过应用 plusSuccRightSucc j j 来进行 rewrite 重写, 它会给出等式 S (j + j) = j + S j,从而在类型中用 j + S j 取代 S (j + j) (在这里是 S (plus j j),它由 S (j + j) 规约而来 ):

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in ?helpEven_rhs

现在检查 helpEven_rhs 的类型会告诉我们发生了什么,包括刚才所用的等式的类型 (即 _rewrite_rule 的类型):

  j : Nat
  p : Parity (S (plus j (S j)))
  _rewrite_rule : S (plus j j) = plus j (S j)
--------------------------------------
helpEven_rhs : Parity (S (plus j (S j)))

Odd 的情况使用 rewrite 和另一个辅助函数,我们可以完成 parity

helpEven : (j : Nat) -> Parity (S j + S j) -> Parity (S (S (plus j j)))
helpEven j p = rewrite plusSuccRightSucc j j in p

helpOdd : (j : Nat) -> Parity (S (S (j + S j))) -> Parity (S (S (S (j + j))))
helpOdd j p = rewrite plusSuccRightSucc j j in p

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = helpEven j (Even {n = S j})
  parity (S (S (S (j + j)))) | Odd  = helpOdd j (Odd {n = S j})

rewrite 的完整细节超出了本入门教程的范围,不过定理证明教程 (见 Theorem Proving)中覆盖了它。

完全性检查

如果我们真的想要信任我们的证明,它们定义为 函数是十分重要的 — 也就是说, 一个函数为所有可能的输入情况定义,并且保证会终止。不然我们就能构造出一个空类型的元素, 以它开始我们可以证明任何东西:

-- 利用部分定义的「hd」
empty1 : Void
empty1 = hd [] where
    hd : List a -> a
    hd (x :: xs) = x

-- 不会终止
empty2 : Void
empty2 = empty2

Idris 会在内部检查所有函数的完全性,我们可在提示符中用 :total 命令来检查。 我们会看到上面的两个定义都是不完全的:

*Theorems> :total empty1
可能不完全,由于: empty1#hd
    不完全,因为有遗漏的情况
*Theorems> :total empty2
可能不完全,由于递归路径 empty2

注意这里用了「可能」一词 — 由于停机问题的不可判定性,完全性检查当然永远无法确定。 因此,该检查是保守的。我们也可以将函数标记为完全的,使其在完全性检查失败时产生编译期错误:

total empty2 : Void
empty2 = empty2
对 ./theorems.idr 进行类型检查
theorems.idr:25:empty2 可能不完全,由于递归路径 empty2

令人欣慰的是,我们在 空类型 一节中对零与后继构造器分立的证明是完全的:

*theorems> :total disjoint
Total

完全性检查必然是保守的。要被记录为完全的,函数 f 必须:

  • 覆盖所有可能的输入
  • 良基 的 — 即,当一系列(可能互相)递归的调用再次到达 f 时,它必须能够表明其参数之一已经递减。
  • 使用的数据类型必须 严格为正(strictly positive)
  • 没有调用任何不完全的函数
完全性的指令与编译器参数

默认情况下,Idris 允许所有良类型的定义,无论是否完全。然而在理想情况下, 函数总是要尽可能地完全,因为这能保证它们可以在有限时间内, 对于所有可能的输入提供一个结果。我们可以要求函数是完全的,通过以下两种方式之一:

  • 使用 --total 编译器参数。
  • 为源文件添加 %default total 指令。在这之后的所有定义都会要求为完全的, 除非显式地标记为 partial

%default total 声明 之后 的所有函数都会被要求是完全的。与此相应, %default partial 声明之后的要求则被放宽。

最后,编译器参数 --warnpartial 会为任何未声明完全性的偏函数打印一个警告。

完全性检查的问题

请注意,完全性检查器并不完美!首先,由于停机问题的不可判定性,它必然是保守的, 因此一些 确实完全 的程序不会被检测为完全的。其次,当前实现投入的精力有限, 因此它仍然有可能将不完全的函数当作完全的。你的证明请先不要依赖它!

完全性的提示

有时你确信一个程序是完全的,但 Idris 不这么认为,此时可以对检查器给出提示, 以此来给出终止参数的详情。检查器会确保所有的递归调用链最终都能导致其中一个参数递减到基本情况, 然而有些情况很难辨别。例如,以下定义无法被检查为 total,因为检查器无法确定 filter (< x) xs 一定小于 (x :: xs)

qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs)
   = qsort (filter (< x) xs) ++
      (x :: qsort (filter (>= x) xs))

prelude 中定义的 assert_smaller 旨在解决这个问题:

assert_smaller : a -> a -> a
assert_smaller x y = y

它简单地求值成第二个参数,但也会向完全性检查器断言 y 在结构上小于 x。 当检查器自己无法解决时,它可用于解释完全性的推理。现在上面的例子可重写为:

total
qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs)
   = qsort (assert_smaller (x :: xs) (filter (< x) xs)) ++
      (x :: qsort (assert_smaller (x :: xs) (filter (>= x) xs)))

表达式 assert_smaller (x :: xs) (filter (<= x) xs) 断言 filter 的结果总是小于模式 (x :: xs)

在更极端的情况下,函数 assert_total 能将一个表达式标为总是完全的:

assert_total : a -> a
assert_total x = x

通常,你应当避免使用该函数,不过在对原语进行推理,或者在对外部定义的, 完全性可被外部参数展示的函数(例如 C 库中的)进行推理时,它会非常有用。

[1]Timothy G. Griffin. 1989. A formulae-as-type notion of control. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL ‘90). ACM, New York, NY, USA, 47-58. DOI=10.1145/96709.96714 http://doi.acm.org/10.1145/96709.96714

临时定义

在依赖类型编程中,有时类型检查器需要的类型会和我们编写的程序的类型不同 (在这里是它们的形式不同),然而尽管如此,它们在证明上依然是等价的。 例如,回想一下 parity 函数:

data Parity : Nat -> Type where
   Even : Parity (n + n)
   Odd  : Parity (S (n + n))

我们要把它实现为:

parity : (n:Nat) -> Parity n
parity Z     = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j)))     | Even = Even {n=S j}
  parity (S (S (S (j + j)))) | Odd  = Odd {n=S j}

它简单地指定了零为偶数,一为奇数,然后递归地说明了 k+2 的奇偶性与 k 相同。 显式地标出 n 是奇数还是偶数对于类型推断来说是必须的。然而,类型检查器却拒绝了它:

viewsbroken.idr:12:10: 在解析 ViewsBroken.parity 的右侧时:
    Parity (plus (S j) (S j))
与
    Parity (S (S (plus j j)))
的类型不匹配

具体为:
        plus (S j) (S j)
    与
        S (S (plus j j))
    的类型不匹配

类型检查器告诉我们 (j+1)+(j+1)2+j+j 无法规范化为相同的值。这是因为 plus 是在第一个参数上递归定义的,而在第二个值中,有一个后继符号作用在第二个参数上, 因此它无法帮助归约。这些值明显相等 — 不过我们要如何重写程序来修复此问题?

临时定义

临时定义(Provisional Definition) 允许我们推迟证明的细节以帮助解决此问题。 它主要有两个作用:

  • 成型(Prototyping) 时,它可在所有的证明细节结束前测试程序。
  • 阅读 程序时,推迟证明的细节通常会让过程更清晰,避免读者从底层算法中分心。

临时定义的写法和普通定义相同,只是它以 ?= 而非 = 引入右式。我们将 parity 定义为:

parity : (n:Nat) -> Parity n
parity Z = Even {n=Z}
parity (S Z) = Odd {n=Z}
parity (S (S k)) with (parity k)
  parity (S (S (j + j))) | Even ?= Even {n=S j}
  parity (S (S (S (j + j)))) | Odd ?= Odd {n=S j}

当写成这种形式时,Idris 不会报告类型错误,而是在定理中挖一个坑,以此来修正类型错误。 Idris 会告诉我们有两个证明义务,其名字根据模块和函数名生成:

*views> :m
Global holes:
        [views.parity_lemma_2,views.parity_lemma_1]

其中第一个坑的类型如下:

*views> :p views.parity_lemma_1

---------------------------------- (views.parity_lemma_1) --------
{hole0} : (j : Nat) -> (Parity (plus (S j) (S j))) -> Parity (S (S (plus j j)))

-views.parity_lemma_1>

它的两个参数为 j,一个是模式匹配作用域中的变量,另一个是 value, 它是我们在临时定义右侧给出的值。我们的目标是重写类型以便让我们能使用该值。 我们可以用 Prelude 中的以下定理来达到此目的:

plusSuccRightSucc : (left : Nat) -> (right : Nat) ->
  S (left + right) = left + (S right)

还要再用 compute 来展开 plus 的定义:

-views.parity_lemma_1> compute


---------------------------------- (views.parity_lemma_1) --------
{hole0} : (j : Nat) -> (Parity (S (plus j (S j)))) -> Parity (S (S (plus j j)))

在应用 intros 之后,我们有:

-views.parity_lemma_1> intros

  j : Nat
  value : Parity (S (plus j (S j)))
---------------------------------- (views.parity_lemma_1) --------
{hole2} : Parity (S (S (plus j j)))

接着,我们对称地对 jj 应用 plusSuccRightSucc 重写规则,它会给出:

-views.parity_lemma_1> rewrite sym (plusSuccRightSucc j j)

  j : Nat
  value : Parity (S (plus j (S j)))
---------------------------------- (views.parity_lemma_1) --------
{hole3} : Parity (S (plus j (S j)))

sym 是一个在库中定义的函数,它可以反转重写的顺序:

sym : l = r -> r = l
sym Refl = Refl

我们可以用 trivial 策略来完成此证明,它会在前提中找到 value。 第二个引理的证明方式完全相同。

现在我们可以在提示符中测试 with 规则:匹配中间值 一节中的 natToBin 了。数字 42 的二进制为 101010。其二进制数字以逆序表示:

*views> show (natToBin 42)
"[False, True, False, True, False, True]" : String

暂且相信

Idris 在编译程序前需要完成证明(尽管在提示符中求值可以无需详细证明)。然而有时候, 特别在成型时,不去完成证明反而更容易。在尝试证明它们之前就测试程序甚至可能会更好, 如果测试找到了一个错误,你就会知道最好不要花时间去证明某些东西了!

因此,Idris 提供了一个内建的强迫(coercion)函数,它允许我们使用类型错误的值:

believe_me : a -> b

显然,它的使用必须要非常小心。在成型时它非常有用,在断言外部代码(可能在外部的 C 库中)的性质时也是可以用的。使用了它的 views.parity_lemma_1 的「证明」为:

views.parity_lemma_2 = proof {
    intro;
    intro;
    exact believe_me value;
}

exact 策略允许我们为该证明提供一个确切的值。在本例中,我们断言给出的值是正确的。

示例:二进制数

我们在前面通过 Parity 视角实现了 Nat 到二进制数的转换。在这里, 我们会展示如何用同样的视角来实现已验证的二进制转换。我们首先在与其等价的 Nat 上索引二进制数。这是一种通用的模式,即将它的表示(这里为 Binary)与其含义 (这里为 Nat)关联起来:

data Binary : Nat -> Type where
   BEnd : Binary Z
   BO : Binary n -> Binary (n + n)
   BI : Binary n -> Binary (S (n + n))

BOBI 接受一个二进制数作为其参数并立即将它左移一位, 然后再加零或一作为新的最低位。索引 n + nS (n + n) 描述了左移后再相加的结果与该数值的意义相同。它会产生低位在前的表示。

现在,将 Nat 转换为二进制的函数在其类型中描述了结果二进制数为原始 Nat 的正确表示:

natToBin : (n:Nat) -> Binary n

Parity 视角让定义变得相当简单:把数折半其实就是进行一次右移,尽管我们需要在 Odd 的情况下使用临时定义:

natToBin : (n:Nat) -> Binary n
natToBin Z = BEnd
natToBin (S k) with (parity k)
   natToBin (S (j + j))     | Even  = BI (natToBin j)
   natToBin (S (S (j + j))) | Odd  ?= BO (natToBin (S j))

Odd 情况的问题与 parity 定义中的相同,其证明过程也一样:

natToBin_lemma_1 = proof {
    intro;
    intro;
    rewrite sym (plusSuccRightSucc j j);
    trivial;
}

最后,我们来实现一个 main 程序,它读取用户输入的整数并输出为二进制:

main : IO ()
main = do putStr "Enter a number: "
          x <- getLine
          print (natToBin (fromInteger (cast x)))

当然,为了能让它工作,我们需要为 Binary n 实现 Show

Show (Binary n) where
    show (BO x) = show x ++ "0"
    show (BI x) = show x ++ "1"
    show BEnd = ""

交互式编辑

目前,我们已经见过几个 Idris 的例子了,它的依赖类型系统可以在函数的 类型 中为期望的行为添加更加精确的描述,为函数的正确性增加额外的信心。 我们也见识过类型系统如何帮助开发 EDSL 了,它允许程序员描述目标语言的类型系统。 然而,精确的类型不止赋予了我们程序验证的能力,我们还可以利用类型来帮助 构造正确 的程序。

Idris 的 REPL 提供了一些命令,可基于程序的类型来检查和修改程序的片段, 例如在模式变量中拆分情况,检查坑的类型,甚至还有基本的证明搜索机制。在本节中, 我们解释了如何在文本编辑器中,特别是如何在 Vim 中利用这些特性。 Emacs 的交互式模式也可以。

在 REPL 中编辑

REPL 提供了许多命令,我们稍后会介绍它们。它们基于当前加载的模块来生成新的程序片段。 其一般形式为:

:命令 [行号] [名称]

也就是说,每条命令都会作用在源码中特定行的特定名称上,然后输出一个新的程序片段。 每个命令都有一种原地 更新 源文件的形式:

:命令! [行号] [名称]

当 REPL 被加载时,它还会用 idris --client 在后台启动一个进程,该进程接受并响应 REPL 命令。例如,如果我们在某处运行着 REPL,就可以执行这样的命令:

$ idris --client ':t plus'
Prelude.Nat.plus : Nat -> Nat -> Nat
$ idris --client '2+2'
4 : Integer

文本编辑器可以利用此特性与编辑命令来提供交互式编辑。

编辑命令

:addclause

:addclause n f 命令,缩写为 :ac n f,它为第 n 行声明的函数 f 创建一个模版定义。例如,若从第 94 行开始的代码为:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c

那么 :ac 94 vzipWith 会给出:

vzipWith f xs ys = ?vzipWith_rhs

名称可根据程序员给定的提示来选择,必要时机器还会添加数字使其唯一。 我们可以像下面这样给出提示:

%name Vect xs, ys, zs, ws

它声明了 Vect 类型族的名字应按照 xsyszsws 的顺序来选取。

:casesplit

:casesplit n x 命令,缩写为 :cs n x,它将第 n 行的模式变量 x 拆分为能构成它的多种模式,并移除任何由于一致性错误而不可能出现的情况。 例如,若从第 94 行开始的代码为:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c
vzipWith f xs ys = ?vzipWith_rhs

那么 :cs 96 xs 会给出:

vzipWith f [] ys = ?vzipWith_rhs_1
vzipWith f (x :: xs) ys = ?vzipWith_rhs_2

即,模式变量 xs 被拆分成了 []x :: xs 两种情况。与之前一样, 名称的选取启发自相同的规则。若我们(用 :cs!)更新文件后再拆分同一行的 ys,就会得到:

vzipWith f [] [] = ?vzipWith_rhs_3

即,模式变量 ys 被拆分成了 [] 这一个情况,因为 Idris 发现另一种可能的情况 y :: ys 会导致一致性错误。

:addmissing

:addmissing n f 命令,缩写为 :am n f,它为第 n 行的函数 f 添加能覆盖所有输入情况的从句。例如,若从第 94 行开始的代码为:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = ?vzipWith_rhs_1

那么 :am 96 vzipWith 会给出:

vzipWith f (x :: xs) (y :: ys) = ?vzipWith_rhs_2

即,它注意到不存在空向量的情况,然后生成了需要的从句,并消除了会导致不一致性错误的从句。

:proofsearch

:proofsearch n f 命令,缩写为 :ps n f,它试图通过证明搜索、 尝试局部变量的值、递归调用以及所需类型族的构造器来为第 n 行的坑 f 查找一个值。该命令也可以接受一个可选的 提示(Hint) 列表, 也就是可用于尝试解决此坑的函数列表。例如,若从第 94 行开始的代码为:

vzipWith : (a -> b -> c) ->
           Vect n a -> Vect n b -> Vect n c
vzipWith f [] [] = ?vzipWith_rhs_1
vzipWith f (x :: xs) (y :: ys) = ?vzipWith_rhs_2

那么 :ps 96 vzipWith_rhs_1 会给出:

[]

它能工作是因为它在对长度为 0 的 Vect 进行搜索,而空向量是唯一的可能。 同样,在试图解决 :ps 97 vzipWith_rhs_2 时也出乎意料地只有一种可能:

f x y :: (vzipWith f xs ys)

它能工作是因为 vzipWith 拥有足够精确的类型:其结果向量一定非空(即至少有一个 ::);第一个元素的类型必须为 c,而得到它的唯一方法就是将 f 应用于 xy;最后,该向量的尾部只能递归地构造。

:makewith

:makewith n f 命令,缩写为 :mw n f,它为模式添加一个 with 从句。 以之前的 parity 为例。若第 10 行为:

parity (S k) = ?parity_rhs

那么 :mw 10 parity 会给出:

parity (S k) with (_)
  parity (S k) | with_pat = ?parity_rhs

若我们在占位符 _ 处填上 parity k,并用 :cs 11 with_pat 拆分 with_pat 的情况,就会得到以下模式:

parity (S (plus n n)) | even = ?parity_rhs_1
parity (S (S (plus n n))) | odd = ?parity_rhs_2

注意情况拆分规范化了该模式(即给的是 plus 而非 +)。我们会看到在任何情况下, 使用交互式编辑向程序员展示有效的模式都能显著简化依赖模式匹配的实现。

Vim 交互式编辑

Vim 的编辑器模式提供语法高亮和缩进,通过前文所述的命令提供交互式编辑的支持。 交互式编辑使用以下编辑器命令来进行,每一条都会直接更新缓冲区:

  • \d 使用 :addclause 为当前行声明的名字添加模版定义。
  • \c 使用 :casesplit 为光标处的变量执行情况拆分。
  • \m 使用 :addmissing 为光标处的名字添加缺少的情况。
  • \w 使用 :makewith 添加 with 从句。
  • \o 使用 :proofsearch 调用证明搜索来解决光标处的坑。
  • \p 使用 :proofsearch 根据附加的提示调用证明搜索以解决光标处的坑。

还有一些用来调用类型检查器与求值器的命令:

  • \t 显示光标下(全局可见的)名称的类型。对坑而言,它会显示其上下文和预期的类型。
  • \e 提醒要求值的表达式。
  • \r 重新加载缓冲区并执行类型检查。

对应的命令在 Emacs 模式中也可用。其它编辑器的支持可通过使用 idris –client 以相对直接的方式来添加。

语法扩展

Idris 支持以多种方式实现 嵌入式领域特定语言(Embedded Domain Specific Language, EDSL) [1] 。 我们见过的一种方式是扩展 do 记法。另一种重要的方式就是对核心语法进行扩展。 在本节中,我们描述了两种扩展语法的方式:syntax 规则与 dsl 记法。

syntax 规则

我们已经见过 if...then...else 表达式了,然而它并不是内建的。同样, 我们可以定义一个 Prelude 中的函数(它在 惰性 一节中出现过了):

ifThenElse : (x:Bool) -> Lazy a -> Lazy a -> a;
ifThenElse True  t e = t;
ifThenElse False t e = e;

接着用 syntax 声明来扩展核心语法:

syntax if [test] then [t] else [e] = ifThenElse test t e;

syntax 声明的左式描述了语法规则,而右式描述了其展开式。语法规则的构成为:

  • 关键字 — 在这里为 ifthenelse,它必须是有效的标识符。
  • 非终止符(Non-terminal) — 位于方括号内,此处为 [test][t][e],它们表示任意表达式。为避免解析歧义,这些表达式不能在顶层使用语法扩展 (也就是说你可以在括号中使用)。
  • 名称 — 位于大括号内,它表示可在右侧被绑定的名字。
  • 符号 — 位于引号内,例如 ":="。它也可在语法规则中包含保留字,例如 "let""in"

语法规则形式的限制在于它必须包含至少一个符号或关键字,且表示非终止符的变量不能重复。 任何表达式都可以使用,不过如果在一个规则的同一行内有两个非终止符,那么只有简单的表达式会被使用 (即,变量、常量或方括号括起的表达式)。规则可在其定义之前使用,但无法递归地使用。 因此以下语法扩展是有效的:

syntax [var] ":=" [val]                = Assign var val;
syntax [test] "?" [t] ":" [e]          = if test then t else e;
syntax select [x] from [t] "where" [w] = SelectWhere x t w;
syntax select [x] from [t]             = Select x t;

语法宏可被进一步限制为只能在模式(即,只能在模式匹配从句的左侧)中或只能在被标为 patternterm 语法规则的项(即除了模式匹配从句左侧的任何地方)中应用。 例如,假设我们定义了一个区间 Interval,用 So 静态检查以保证下界小于上界:

data Interval : Type where
   MkInterval : (lower : Double) -> (upper : Double) ->
                So (lower < upper) -> Interval

我们可以用 pattern 定义一个语法,它总是匹配 Oh 作为证明论据,用 term 请求提供一个证明项:

pattern syntax "[" [x] "..." [y] "]" = MkInterval x y Oh
term    syntax "[" [x] "..." [y] "]" = MkInterval x y ?bounds_lemma

term 中,语法 [x...y] 会生成一个证明义务 bounds_lemma (可能被重命名)。

最后,语法规则可引入另一种绑定形式。例如,for 循环在每次迭代中绑定一个参数:

syntax for {x} "in" [xs] ":" [body] = forLoop xs (\x => body)

main : IO ()
main = do for x in [1..10]:
              putStrLn ("Number " ++ show x)
          putStrLn "Done!"

注意,我们用形式 {x} 指明 x 表示一个已绑定的变量,它在右侧会被替换。 我们还把 in 放在了引号中,因为它是保留字。

dsl 记法

示例:良类型的解释器 一节中的良类型解释器是个依赖类型编程模式的简单例子。也就是说: 先用依赖类型描述一个 目标语言 及其类型系统,保证只有良类型的程序可被表示, 然后再通过这种方式表示程序。通过这种方式,我们可以编写序列化二进制数据 [2] 或安全运行并发过程 [3] 的程序。

然而,目标语言的形式使其难以在实践中编程。回想一下用 Expr 编写的阶乘程序:

fact : Expr G (TyFun TyInt TyInt)
fact = Lam (If (Op (==) (Var Stop) (Val 0))
               (Val 1) (Op (*) (App fact (Op (-) (Var Stop) (Val 1)))
                               (Var Stop)))

由于这是一种特别有用的模式,因此 Idris 提供了语法重载 [1] 使其在这种目标语言中更易于编程:

mkLam : TTName -> Expr (t::g) t' -> Expr g (TyFun t t')
mkLam _ body = Lam body

dsl expr
    variable    = Var
    index_first = Stop
    index_next  = Pop
    lambda      = mkLam

dsl 块描述了每个语法构造是如何在目标语言中表示的。在这里的 expr 语言中, 任何变量都会被翻译为 Var 构造器,使用 PopStop 来构造 de Bruijn 索引(即,由于变量本身被绑定,所以要统计有多少个绑定);而任何 λ-表达式都会被翻译为 Lam 构造器。mkLam 函数会简单地忽略其第一个参数,它是用户为变量选择的名字。 我们也可以通过这种方式来重载 let 与依赖函数的语法。现在可以将 fact 写成下面这样了:

fact : Expr G (TyFun TyInt TyInt)
fact = expr (\x => If (Op (==) x (Val 0))
                      (Val 1) (Op (*) (app fact (Op (-) x (Val 1))) x))

在这个新的版本中,expr 声明了下一个要被重载的表达式。我们可以利用习语括号, 通过以下声明再进一步:

(<*>) : (f : Lazy (Expr G (TyFun a t))) -> Expr G a -> Expr G t
(<*>) f a = App f a

pure : Expr G a -> Expr G a
pure = id

注意,它无需成为 Applicative 实现的一部分,因为习语括号记法会直接被翻译为 名字 <*>pure,针对特设(ad-hoc)类型的重载也被允许。现在我们可以写成:

fact : Expr G (TyFun TyInt TyInt)
fact = expr (\x => If (Op (==) x (Val 0))
                      (Val 1) (Op (*) [| fact (Op (-) x (Val 1)) |] x))

使用更加特设的重载、接口,以及新的语法规则,我们甚至可以再进一步:

syntax "IF" [x] "THEN" [t] "ELSE" [e] = If x t e

fact : Expr G (TyFun TyInt TyInt)
fact = expr (\x => IF x == 0 THEN 1 ELSE [| fact (x - 1) |] * x)
[1](1, 2) Edwin Brady and Kevin Hammond. 2012. Resource-Safe systems programming with embedded domain specific languages. In Proceedings of the 14th international conference on Practical Aspects of Declarative Languages (PADL‘12), Claudio Russo and Neng-Fa Zhou (Eds.). Springer-Verlag, Berlin, Heidelberg, 242-257. DOI=10.1007/978-3-642-27694-1_18 http://dx.doi.org/10.1007/978-3-642-27694-1_18
[2]Edwin C. Brady. 2011. IDRIS —: systems programming meets full dependent types. In Proceedings of the 5th ACM workshop on Programming languages meets program verification (PLPV ‘11). ACM, New York, NY, USA, 43-54. DOI=10.1145/1929529.1929536 http://doi.acm.org/10.1145/1929529.1929536
[3]Edwin Brady and Kevin Hammond. 2010. Correct-by-Construction Concurrency: Using Dependent Types to Verify Implementations of Effectful Resource Usage Protocols. Fundam. Inf. 102, 2 (April 2010), 145-176. http://dl.acm.org/citation.cfm?id=1883636

杂项

在本节中,我们讨论了多种附加特性:

  • 自动、隐式与默认参数
  • 文学编程(Literate Programming)
  • 通过外部函数与外部库交互
  • 接口
  • 类型提供器(Type Provider)
  • 代码生成,以及
  • 全域层级(Universe Hierarchy)

隐式参数

我们已经见过隐式参数了,它允许忽略能够被类型检查器推断出来的参数,例如:

index : {a:Type} -> {n:Nat} -> Fin n -> Vect n a -> a
自动隐式参数

在其它情况下,可以不通过类型检查来推断参数,而是通过在上下文中搜索恰当的值, 或通过构造证明来推断参数。例如,以下 head 的定义需要证明列表非空:

isCons : List a -> Bool
isCons [] = False
isCons (x :: xs) = True

head : (xs : List a) -> (isCons xs = True) -> a
head (x :: xs) _ = x

如果可以静态地获知列表非空,那是由于它的值是已知的,或上下文中存在对它的证明。 证明可以自动地构造,自动隐式参数允许它静默地发生。我们将 head 定义为:

head : (xs : List a) -> {auto p : isCons xs = True} -> a
head (x :: xs) = x

将隐式参数注解为 auto,表示 Idris 会试图搜索与其类型对应的值来填充它。 它会按照以下顺序尝试:

  • 局部变量,即类型正确的,在模式匹配中绑定的名字或 let 绑定。
  • 所需类型的构造器。若它们有参数,就会递归地搜索到最深 100 层的深度。
  • 带函数类型的局部变量,对参数进行递归地搜索。
  • 任何标出 %hint 注解的,带有相应返回类型的函数。

在找不到证明的情况下,它会像往常一样被显式地给出:

head xs {p = ?headProof}
定义隐式参数

除了让 Idris 自动查找给定类型的值外,有时我们还想要带有具体默认值的隐式参数。在 Idris 中,我们可以用 default 注解来做这件事。尽管它主要是为了在 auto 自动构造证明失败,或找到的值没有帮助时起辅助作用。然而,首先考虑一个更简单的, 不涉及证明的情况可能会更容易。

如果我们想要计算第 n 个斐波那契数(第 0 个斐波那契数定义为 0),我们可以写:

fibonacci : {default 0 lag : Nat} -> {default 1 lead : Nat} -> (n : Nat) -> Nat
fibonacci {lag} Z = lag
fibonacci {lag} {lead} (S n) = fibonacci {lag=lead} {lead=lag+lead} n

定义完之后,fibonacci 5 等价于 fibonacci {lag=0} {lead=1} 5,它会返回第 5 个斐波那契数。注意虽然它可以工作,但这并不是 default 注解的用途,在这里它只用作展示。 通常,default 用于提供定制证明搜索脚本的东西。

隐式转换

Idris 支持创建 隐式转换 ,在需要某项的类型能够匹配时, 它允许将一个类型的值自动转换成另一个类型。这是为了增强便利性并减少冗余。 下面是个故意构造的简单例子:

implicit intString : Int -> String
intString = show

test : Int -> String
test x = "Number " ++ x

通常,我们无法将一个 Int 附加到一个 String 之后,不过隐式转换函数 intString 可以将 x 转换为 String,因此 test 定义的类型是正确的。 隐式转换的实现和其它函数一样,只不过加上了 implicit 修饰符, 且被限制为只能接受一个显式参数。

一次只有一个隐式转换会被应用。也就是说,隐式转换无法被链式调用。如前文所见, 简单类型的隐式转换是不被鼓励的!更常见的做法是,使用隐式转换来减少 EDSL 的冗长度,或者隐藏证明的细节。这些示例超出了本教程的范围。

文学编程

和 Haskell 一样,Idris 支持 文学编程。如果某个文件的扩展名外 .lidr, 那么它会被当做文学编程文件。在文学编程中,除了以大于号 > 开头的行外, 所有的内容都会被视为注释。例如:

> module literate

这是一行注释,主程序在下面

> main : IO ()
> main = putStrLn "Hello literate world!\n"

附加的限制为,程序行(以 > 开头)与注释行(以任何其它字符开头)之间必须有一空行。

外部函数调用

在编程实践中,我们经常需要使用外部库,特别在与操作系统、文件系统、网络 等等 进行交互时。作为 Prelude 的一部分,Idris 提供了轻量的外部函数接口。 我们假定读者有一定的 C 和 gcc 编译器的知识。首先,我们来定一个数据类型, 它描述了我们能够处理的外部类型:

data FTy = FInt | FFloat | FChar | FString | FPtr | FUnit

它们每一个都与 C 的类型直接对应。分别为:intdoublecharchar*void*void。以下函数还描述了它们到对应的 Idris 类型的翻译:

interpFTy : FTy -> Type
interpFTy FInt    = Int
interpFTy FFloat  = Double
interpFTy FChar   = Char
interpFTy FString = String
interpFTy FPtr    = Ptr
interpFTy FUnit   = ()

外部函数由一组输入类型和返回类型描述,它们可以被转换为 Idris 类型:

ForeignTy : (xs:List FTy) -> (t:FTy) -> Type

外部函数被视作不纯粹的,因此 ForeignTy 构建了一个 IO 类型,例如:

Idris> ForeignTy [FInt, FString] FString
Int -> String -> IO String : Type

Idris> ForeignTy [FInt, FString] FUnit
Int -> String -> IO () : Type

我们通过为函数赋予名字、一系列参数的类型和返回值构建了一个外部函数调用。 内建的构造 mkForeign 将该函数的描述转换为一个可由 Idris 调用的函数:

data Foreign : Type -> Type where
    FFun : String -> (xs:List FTy) -> (t:FTy) ->
           Foreign (ForeignTy xs t)

mkForeign : Foreign x -> x

注意编译器期望 mkForeign 能够被完全地应用以构建完整的外部函数调用。例如, putStr 作为一个在运行时系统中定义的外部函数 putStr 的调用,其实现如下:

putStr : String -> IO ()
putStr x = mkForeign (FFun "putStr" [FString] FUnit) x
include 与链接器指令

外部函数调用会按照 Idris 和 C 的值的表示之间对应的转换,被直接翻译为 C 函数的调用。 通常这会需要将额外的库、头文件或目标文件链接进来。我们可以通过以下指令来完成:

  • %lib target x — 将 libx 库包含进来。如果目标为 C,它等价于向 gcc 传递 -lx 选项。如果目标为 Java,该库会被解释为 maven 的依赖关系定位符 groupId:artifactId:packaging:version
  • %include target x — 使用头文件或为给定的后端目标导入 x
  • %link target x.o — 在使用给定的后端目标时链接目标文件 x.o
  • %dynamic x.so — 动态链接共享的解释器目标文件 x.so
测试外部函数调用

一般来说,Idris 解释器(用于类型检查和 REPL)不会处理 IO 活动。除此之外,它既不会生成 C 代码,也不会将它编译成机器码,因此 %lib%include%link 是没有效果的。 IO 活动与 FFI 调用可使用特殊的 REPL 命令 :x EXPR 来测试,而 C 库可通过 :dynamic 命令或 %dynamic 指令来动态地加载到解释器中。例如:

Idris> :dynamic libm.so
Idris> :x unsafePerformIO ((mkForeign (FFun "sin" [FFloat] FFloat)) 1.6)
0.9995736030415051 : Double

类型提供器

Idris 类型提供器,灵感来自 F# 的类型提供器,它能让我们的类型与 Idris 之外的世界建立「联系」。例如,给定一个表示数据库模式(Schema)的类型, 和一个针对它检查过的查询,类型提供器可以在进行类型检查时读取真实数据库的模式。

Idris 类型提供器使用普通的 Idris 可执行语义来运行 IO 活动并提取出结果。 该结果会作为编译代码时的常量被保存。它可以是个类型,此时它能像其它类型一样使用; 它也可以是个值,此时它也可以像其它值一样使用,并作为一个索引被包含在类型中。

类型提供器尚且是个实验性的扩展。要启用它,请使用 %language 指令:

%language TypeProviders

某个类型 t 的提供器 p 不过就是个类型为 IO (Provider t) 的表达式。 %provide 指令会导致类型检查器去执行该活动,并将其结果绑定到一个名字上。 我们最好用一个简单的例子来展示它。类型提供器 fromFile 用于读取文本文件。 如果该文件由字符串 Int 构成,那么它就会提供 Int 类型。否则,它就会提供 Nat 类型。

strToType : String -> Type
strToType "Int" = Int
strToType _ = Nat

fromFile : String -> IO (Provider Type)
fromFile fname = do Right str <- readFile fname
                      | Left err => pure (Provide Void)
                    pure (Provide (strToType (trim str)))

接着我们使用 %provide 指令:

%provide (T1 : Type) with fromFile "theType"

foo : T1
foo = 2

如果名为 theType 的文件由单词 Int 构成,那么 foo 的类型会是 Int。 否则,它会是 Nat。当 Idris 遇到该指令时,它首先会检查确认提供器表达式 fromFile theType 的类型为 IO (Provider Type)。接着它会执行该提供器。 如果其结果为 Provide t,那么 T1 就会被定义为 t。否则,就会产生一个错误。

我们的数据类型 Provider t 定义如下:

data Provider a = Error String
                | Provide a

我们已经见过 Provide 构造器了。Error 构造器允许类型提供器返回有用的错误信息, 本节中的示例为了简单并未提供。更加复杂的类型提供器实现,包括一个静态检查的 SQLite 绑定, 可从外部链接 [1] 获取。

以 C 为编译目标

Idris 的默认编译目标为 C。它通过以下命令编译:

$ idris hello.idr -o hello

此命令等价于:

$ idris --codegen C hello.idr -o hello

当使用以上命令编译时,它会生成一个临时的 C 源码,接着再编译成名为 hello 的可执行文件。

要查看生成的 C 代码,请通过以下命令编译:

$ idris hello.idr -S -o hello.c

要开启优化,请在代码中使用 %flag C 编译指令:

module Main
%flag C "-O3"

factorial : Int -> Int
factorial 0 = 1
factorial n = n * (factorial (n-1))

main : IO ()
main = do
     putStrLn $ show $ factorial 3

要在编译生成的 C 代码时加上调试信息,例如使用 gdb 在 Idris 程序中调试段错误时, 请使用 %flag C 编译指令来包括调试符号,如下所示:

%flag C "-g"

以 JavaScript 为编译目标

Idris 可生成能够运行在浏览器以及 NodeJS 等类似环境中的 JavaScript 代码。 它可以通过 FFI 与 JavaScript 生态进行交互。

代码生成

代码生成分为两种独立的目标。要生成适合在浏览器中运行的代码,请使用以下命令:

$ idris --codegen javascript hello.idr -o hello.js

产生的文件可以像其它 JavaScript 代码那样嵌入到 HTML 中。

生成 NodeJS 代码的方式有点不同。Idris 会输出一个可直接调用 node 运行的 JavaScript 文件。

$ idris --codegen node hello.idr -o hello
$ ./hello
Hello world

考虑到 JavaScript 代码生成器使用 console.log 将文本写入到 stdout, 因此它会自动在每个字符串之后加上换行。而此行为不会在 NodeJS 代码生成器中出现。

使用 FFI

要编写一个有用的应用,我们需要与外部世界进行交流。可能我们想要操作 DOM 或发送 Ajax 请求。为此可以使用 FFI。由于大部分 JavaScript API 需要回调,因此我们需要扩展 FFI 以将函数作为参数来传入。

JavaScript FFI 的工作方式与一般的 FFI 有点不同。它使用位置参数将我们的参数直接插入一段 JavaScript 代码中。

我们可以使用 JavaScript 的原语加法:

module Main

primPlus : Int -> Int -> IO Int
primPlus a b = mkForeign (FFun "%0 + %1" [FInt, FInt] FInt) a b

main : IO ()
main = do
  a <- primPlus 1 1
  b <- primPlus 1 2
  print (a, b)

注意 %n 记法确定了从 0 开始的第 n 个给定的外部函数的参数。 当你需要一个百分号而非位置时,请使用 %% 来代替。

将函数传入外部函数中是非常相似的。假设我们想要从 JavaScript 世界中调用以下函数:

function twice(f, x) {
  return f(f(x));
}

显然我们需要在这里传入一个函数 f (我们可以从 twice 中使用 f 的方式推断出来,如果 JavaScript 有类型的话会更加明显)。

当你给 JavaScript FFI 一个类型为 FFunction 的东西时,它能够理解要将函数作为参数。 以下示例代码在 JavaScript 中调用了 twice 并将结果返回到了 Idris 程序中:

module Main

twice : (Int -> Int) -> Int -> IO Int
twice f x = mkForeign (
  FFun "twice(%0,%1)" [FFunction FInt FInt, FInt] FInt
) f x

main : IO ()
main = do
  a <- twice (+1) 1
  print a

该程序输出 3,正如我们所料。

包含外部的 JavaScript 文件

只要某人使用 JavaScript,他就有可能想要包含外部库,或者通过 FFI 调用存储在外部文件中的函数。JavaScriptNodeJS 代码生成器能够理解 %include 指令。请注意 JavaScriptNodeJS 是由不同的代码生成器处理的, 因此你需要指明所需的目标。这也就表示你可以在同一个 Idris 源文件中分别为 JavaScriptNodeJS 包含不同的文件。

因此如果你想要添加外部的 JavaScript 文件,可以这样做:

对于 NodeJS

%include Node "path/to/external.js"

要在浏览器中使用:

%include JavaScript "path/to/external.js"

给定的文件会被添加的生成代码的顶部。对于库包,你也可以使用 ipkg 文件中的 objs 选项来将 js 文件包含在安装中,并使用:

%include Node "package/external.js"

Idris 的 JavaScriptNodeJS 后端也会在此位置查找文件。

包含 NodeJS 模块

NodeJS 代码生成器也可以通过 %lib 指令来包含模块。

%lib Node "fs"

该指令会编译成以下 JavaScript

var fs = require("fs");
缩减生成的 JavaScript

Idris 会产生非常大的 JavaScript 代码块。然而,生成的代码可通过 Google 的 closure-compiler 缩小。其它的缩减器也可用,不过 closure-compiler 提供了更高级的编译,它会做一些侵入性的内敛和代码消除。Idris 可以充分利用这种编译模式, 强烈建议在传输用 Idris 编写的 JavaScript 应用时使用它。

累积性

由于值可以出现在类型中, 反之亦然,因此类型自然也有类型。例如:

*universe> :t Nat
Nat : Type
*universe> :t Vect
Vect : Nat -> Type -> Type

但是 Type 的类型呢?如果我们询问 Idris,它会报告:

*universe> :t Type
Type : Type 1

如果 Type 是它自己的类型,那么它会因为 Girard 悖论 而导致不一致性,因此在内部存在类型的 层级(Hierarchy) (或 全域,Universe):

Type : Type 1 : Type 2 : Type 3 : ...

全域类型是 积累(Cumulative) 的,也就是说,如果有 x : Type n, 我们也可以有 x : Type m 使得 n < m。如果类型检查器发现了任何不一致性, 它就会生成这种全域约束并报告一个错误。一般来说,程序员无须担心它, 但它确实可以防止(构造出)如下的程序:

myid : (a : Type) -> a -> a
myid _ x = x

idid :  (a : Type) -> a -> a
idid = myid _ myid

myid 应用到其自身会导致在全域层级中出现一个循环,即 myid 第一个参数的类型为 Type,如果要将其应用到自身,那么其级别不能低于所要求的级别。

[1]https://github.com/david-christiansen/idris-type-providers

扩展阅读

有关 Idris 编程的更多信息,以及依赖类型编程的一般问题解答,可从多种来源获取:

[1]Edwin Brady and Kevin Hammond. 2012. Resource-Safe systems programming with embedded domain specific languages. In Proceedings of the 14th international conference on Practical Aspects of Declarative Languages (PADL‘12), Claudio Russo and Neng-Fa Zhou (Eds.). Springer-Verlag, Berlin, Heidelberg, 242-257. DOI=10.1007/978-3-642-27694-1_18 http://dx.doi.org/10.1007/978-3-642-27694-1_18
[2]Edwin C. Brady. 2011. IDRIS —: systems programming meets full dependent types. In Proceedings of the 5th ACM workshop on Programming languages meets program verification (PLPV ‘11). ACM, New York, NY, USA, 43-54. DOI=10.1145/1929529.1929536 http://doi.acm.org/10.1145/1929529.1929536
[3]Edwin C. Brady and Kevin Hammond. 2010. Scrapping your inefficient engine: using partial evaluation to improve domain-specific language implementation. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP ‘10). ACM, New York, NY, USA, 297-308. DOI=10.1145/1863543.1863587 http://doi.acm.org/10.1145/1863543.1863587

常见问题解答(FAQ)

Agda 跟 Idris 有啥不一样啊?

和 Idris 一样,Agda 也是个带有依赖类型的函数式语言,并且支持依赖模式匹配。 它们都能编写程序和证明。不过,Idris 一开始就侧重于为通用编程而设计,而非 Agda 侧重的定理证明。因此,它支持与系统库和 C 程序的互操作性,以及用于实现 EDSL (领域特定语言)的语言构造。它还包括更高级的编程构造,例如接口(类似于类型类)和 do 记法。

Idris 支持多种后端(默认为 C 和 JavaScript,可以通过插件添加更多),拥有一个 C 编写的,带有垃圾收集器和内建并发消息传递的参考运行时系统。

有标准库文档没?有函数列表没?

预置包的 API 文档已在文档页中列出。

不幸的是,Idris 的默认前导和预置包在文档方面未必完整。其它查找函数的方式包括:

  • REPL 命令:
    • 使用 :apropos 来搜索文档和函数名中的文本。
    • 使用 :search 来根据类型搜索函数。
    • 使用 :browse 来列出给定名字空间中的内容。
  • 使用 REPL 的自动补全功能。
  • 使用 greplibs/ 中搜索源码

如果你发现预置包缺少文档,请随心为它补充一些。当然也可以通知别人来做这件事。 Idris 提供了显示富文档的语法,富文档可使用 :doc 命令浏览,并在生成的 HTML API 文档中列出。

Idris 能拿来干活不?

Idris 主要是个研究工具,用来探索用依赖类型开发软件的可能性,也就是说它的主要目的(还) 不是开发可用于生产的系统。因此,某些方面上它还有些粗糙,也缺少很多库。现在还没人全职搞 Idris,我们目前也没有资源来独立完善该系统。因此,我们不推荐你围绕它来开展业务!

话虽如此,我们非常欢迎帮助推进 Idris 适应生产的贡献,包括(但不限于)额外的库支持、 完善运行时系统(并确保其健壮性)、提供并维护 JVM 后端等等。

为啥 Idris 用了及早求值,不用惰性求值啊?

Idris 采用及早求值主要是为了能更好地预测性能,特别是其长期目标之一就是能够编写高效 且经过验证的底层代码,例如设备驱动以及网络设施。此外,Idris 的类型系统能让我们精确地 指明每个值的类型,因此也就能确定每个值在运行时的形式。在惰性求值的语言中,考虑一个 Int 类型的值:

thing : Int

thing 在运行时是如何表示的?它是一个按照位模式来表示的整数, 还一个是指向了能计算出整数的代码的指针?在 Idris 中,我们决定按照类型来精确地区分它们:

thing_val : Int
thing_comp : Lazy Int

这样,thing_val 就能确保为具体的 Int,而 thing_comp 则是会产生 Int 的计算。

俺咋弄个惰性控制结构呐?

你可以用特殊的 Lazy 类型来创建这种控制结构。例如,Idris 中的 if...then...else... 会被扩展为 ifThenElse 函数的应用。 它在库中对布尔值的默认实现如下:

ifThenElse : Bool -> (t : Lazy a) -> (e : Lazy a) -> a
ifThenElse True  t e = t
ifThenElse False t e = e

te 的类型 Lazy a 指出它们只会在需要时被求值,也就是说,它们是惰性求值的。

REPL 里头的求值行为跟咱想的不一样啊,这咋回事儿= =?

作为一个完全依赖类型的语言,Idris 的求值分为两个阶段,编译时与运行时。在编译时, 它只会对已知完全(即保证会终止且覆盖了所有可能输入)的东西求值, 以此来保持类型检查的可判定性。编译时的求值器属于 Idris 核心的一部分,它用 Haskell 编写,按照值的 HOAS(Higher Order Abstract Syntax 高阶抽象语法)风格的表示来实现。 由于已知所有东西都有规范的形式,选用哪种求值策略实际上也就无关紧要了, 因为它们都会得到相同的答案,而在实践中,它会执行 Haskell 运行时系统选择的任何事情。

按照约定,REPL 使用编译时求值的概念。除了更容易实现外(因为我们可以用求值器), 它在展示项(Term)如何在类型检查器中求值时也非常有用。

Idris> \n, m => (S n) + m
\n => \m => S (plus n m) : Nat -> Nat -> Nat

Idris> \n, m => n + (S m)
\n => \m => plus n (S m) : Nat -> Nat -> Nat

为啥咱不能在类型里用没参数的函数呐?

如果你在类型中使用小写字母开头的名字,且它们没用被应用于任何参数,那么 Idris 会把它视为隐式绑定的参数。例如:

append : Vect n ty -> Vect m ty -> Vect (n + m) ty

这里的 nmty 是隐式绑定的。此规则同样适用于在别处定义的, 带这类名字的函数。例如,你或许还有:

ty : Type
ty = String

即便如此,ty 还是会被视作 append 定义中隐式绑定的参数,而不会让 append 的类型等价于你预想的:

append : Vect n String -> Vect m String -> Vect (n + m) String

采用了此规则后,你无需其它上下文,只要查看 append 的类型就能明白哪些是隐式绑定的名字。

如果你想在类型中使用不被应用的名字,那么有两种选择:你可以显式地限定它,例如, 若 tyMain 的命名空间中定义,你可以这样做:

append : Vect n Main.ty -> Vect m Main.ty -> Vect (n + m) Main.ty

此外,你还可以使用不以小写字母开头的名字,它决不会被隐式绑定:

Ty : Type
Ty = String

append : Vect n Ty -> Vect m Ty -> Vect (n + m) Ty

按照约定,如果你打算将一个名字用作类型同义,那么最好以大写字母开头来避免此限制。

这破程序明显能停,为啥 Idris 还说它有可能不完全?

由于停机问题的不可判定性, Idris 通常无法判定一个程序是否会停止。然而,我们可以找出某些确定可以终止的程序。 Idris 使用「大小改变终止(size change termination)」来寻找从函数返回到自身的递归路径。 在此路径上,至少必有一个参数会收敛到基本情况。

  • Idris 支持相互递归的函数
  • 不过,递归路径上的所有函数必须被完整地应用。此外,Idris 不支持高阶应用。
  • 在递归调用的过程中,Idris 会查找语法上更小的输入参数,以此来识别能收敛到基本情况的参数。 例如,k 在语法上小于 S (S k),因为 kS (S k) 的子项(subterm), 然而 (k, k) 在语法上却不小于 (S k, S k)

如果你有个确信会终止的函数,但 Idris 不信,那么你可以调整程序的结构,或者使用 assert_total 函数。

<<<<<<< HEAD Idris 啥时候能自举啊? ======================

这事不急,虽说从长远来看这主意不错。就目前来说,实现支持自举的库是一项很有价值的工作, 比如说参数解析器以及符合 POSIX 标准的库用于系统交互。

Idris 有全域多态不? Type 是啥类型的?

Idris 并没有全域多态(Universe Polymorphism),而是拥有全域的积累层级 (Cumulative Hierarchy),如 Type : Type 1Type 1 : Type 2 等等。 积累性的意思是,若 x : Type nn <= m,则 x : Type m。 全域的级别总是由 Idris 推导,且无法被显式地指定。执行 REPL 命令 :type Type 1 以及试图为任何类型指定全域级别时,都会产生一个错误。

为啥 Idris 用 Double 不用 Float64

历史上 C 和很多语言都用分别用 FloatDouble 来表示 32 位和 64 位的浮点数。较新的语言,如 Rust 和 Julia 都开始遵循 IEEE 浮点运算标准 (IEEE 754) 的命名规范了。 它将单精度和双精度的数描述为 Float32Float64,其大小在类型名中描述。

由于开发者更熟悉旧有的命名约定,而 Idris 的开发者也选择了它,因此 Idris 采用了 C 风格的约定。也就是说名称 Double 用于描述双精度浮点数,而 Idris 现在还不支持 32 位浮点数。

-ffreestanding 是啥?

在相对路径中拥有自己的库和编译器时,可使用 freestanding 命令行参数来构建 Idris 二进制文件。当构建过程中的安装目录未知时,它对于构建二进制文件来说非常有用。 当传入此参数时,IDRIS_LIB_DIR 环境变量需要设置为相对与 idris 可执行文件所在的 Idris 库的路径。 IDRIS_TOOLCHAIN_DIR 环境变量是可选的,如果设置了它,Idris 就会在该路径下寻找 C 编译器。例如:

IDRIS_LIB_DIR="./libs" \
IDRIS_TOOLCHAIN_DIR="./mingw/bin" \
CABALFLAGS="-fffi -ffreestanding -frelease" \
make

话说「Idris」是个啥名儿 O_O?

有一定年龄的英国人可能很熟悉唱歌贼好听的小火龙妹砸。 你要是不满意,自己想个好词儿啊(手动微笑 :-)

还能有 Unicode 操作符不?

下面是为什么我们不应该支持 Unicode 操作符的原因:

  • 它难以输入(如果你在使用别人的代码,这点就很重要)。很多编辑器都有它自己的输入法, 不过你必须知道怎么输入。
  • 并不是任何软件都能轻松支持它。在一些移动 Email 客户端、基于终端的 IRC 客户端、 以及 Web 浏览器等软件中都会出现渲染问题。
  • 即便我们不在标准库中使用它(绝对不会!),然而只要有人在他们的库代码中用了它, 别人就得去处理它。
  • 有太多字符看起来太像了。单是分不清 0 和 O 就会造成很多麻烦,更不说各式各样的冒号和括号了。

如果使用得当,Unicode 操作符能让代码看起来更漂亮,然而 lhs2TeX 也能。 也许几年后情况有变,软件能更好地应对它,到时候重新审视它才有意义。然而目前, Idris 不会为操作符提供任何 Unicode 符号。

这似乎是个 Wadler 定律 在工作中的实例。

本答案基于 Edwin Brady 对此 推送请求 的回应。

Idris 有社区准则不?

这里 是Idris 社区规范的声明。

还有哪儿能找到解答啊?

Github 的维基上还有个非官方 FAQ, 其中解答了更多技术问题,而且经常更新。

用 Idris 实现带有状态的系统:ST 教程

本教程叙述了如何用 Idris 的 Control.ST 库实现带有状态的系统。

注解

Idris 文档已按照 创作共用 CC0 许可协议 发布。因此根据法律规定, Idris 社区 已放弃对 Idris 文档的所有版权以及相关或邻接的权利。

关于 CC0 的更多信息参见:https://creativecommons.org/publicdomain/zero/1.0/deed.zh

概览

Idris 这种可以直接用类型系统对程序进行推理的函数式语言, 仅凭程序能够编译这一点,我们就能确定该程序会正确运行(即按照类型描述的规范运行)。

然而在现实中,软件依赖于状态,很多组件则依赖于状态机。它们可以描述像 TCP 这样的网络传输协议, 实现事件驱动的系统以及正则表达式的匹配。此外,像网络 Socket 和文件这类的很多基础资源, 都由状态机隐式地管理;特定操作只有在资源处于特定状态时才可行,而这些操作也可以改变底层资源的状态。 例如,只有向已建立连接的网络 Socket 发送消息才有意义,关闭 Socket 会将其状态从“打开”切换为“关闭”。 状态机同样可以编码重要的安全性质。比如,在一个 ATM(自动取款机)的软件实现中, 有一点性质非常重要:ATM 只有在卡片插入且通过密码验证的状态下才能吐出钞票。

本教程将介绍 Control.ST 库,它支持对带有状态和副作用的程序进行编程和推理。 该库已包含在 Idris 发行版当中(目前属于 contrib 包)。本教程假设读者已经熟悉 Idris 教程 中所述的纯函数式的编程方法。ST 库基于《Idris 类型驱动开发》一书中第 13 和 14 章所讨论的内容,如需更多背景信息可以参考此书。

我们可使用 ST 库编写由多个状态转移系统复合而成的程序。它支持两种复合方式:第一,我们能够同时使用数个独立实现的状态转移系统;第二,我们能够基于其它状态转移系统实现新的状态转移系统。

示例:需要登录的数据存储系统

许多软件的组件依赖于状态,有些操作仅在特定状态下才有效。 例如,考虑一个安全的数据存储系统,它只有在用户登录的情况下才能访问某些机密数据。 该系统可以处于以下两种状态之一:

  • 已登录,在此状态下,用户可以访问机密数据
  • 未登录,在此状态下,用户无法访问机密数据

我们可以提供登录、登出和读取数据命令,如下图所示:

login

如果 登录 命令被成功执行,就会使系统状态从 未登录 转移到 已登录 状态。登出 命令会使系统状态从 已登录 转移到 未登录 状态。最重要的是,读取数据 命令仅在系统处于 已登录 状态时才有效。

我们通常使用类型检查器来确保变量和参数的使用一致性。然而,主流的类型系统并不能很好地支持像“某些操作只有在资源处于合适的状态时才被执行”这类性质的静态检查。 例如,在数据存储系统的示例中,检查“用户在执行 读取数据 前是否成功登录”这一性质非常重要。 ST 库能让我们在类型系统中表达这类协议,并且在编译阶段确保机密数据只有在用户处于已登录状态时才能被读取。

大纲

本教程从描述如何操作独立的状态开始(ST 介绍:用状态来工作),引入了 STrans 数据类型来描述带有状态的函数,以及 ST 用来描述顶层的状态转移。 接下来的章节(用类型表示状态机)描述了如何用类型表示状态机,以及如何定义接口以描述带有状态的系统。之后(复合状态机)描述了如何复合带有多个状态机的系统。 它解释了如何实现同时使用多个状态机的系统,以及如何基于低级系统来实现高级的带有状态的系统。 最后(示例:网络 Socket 编程)我们将看到一个带有状态的 API 应用于真实场景的例子,它实现了 POSIX 网络 Socket API。

Control.ST 库在 Edwin Brady 的一篇文稿 “State Machines All The Way Down” 中亦有提及,你可以从这里获取它。 这篇文章展示了本教程中的很多例子,更加深入地描述了它们的动机、设计、以及实现。

ST 介绍:用状态来工作

Control.ST 库提供了在函数中创建、读取、写入和销毁状态,以及在函数类型中跟踪状态变化的功能。它基于资源(Resources)的概念,本质上就是可变变量和依赖类型。 STrans 会在函数运行时跟踪资源的变化:

STrans : (m : Type -> Type) ->
         (resultType : Type) ->
         (in_res : Resources) ->
         (out_res : resultType -> Resources) ->
         Type

类型为 STrans m resultType in_res out_res_fn 的值表示一系列可以操作状态的动作。 其参数分别为:

  • m 表示一个底层的计算上下文(Computation Context),各种动作会在其中执行。 通常,它是一个实现了 Monad 的泛型,但并非必须如此。具体来说,我们无需理解单子 (Monad)就能高效地使用 ST
  • resultType 表示这一系列动作会产生的值的类型。
  • in_res 表示一个在执行动作之前可用的资源列表。
  • out_res 表示一个在执行动作之后可用的资源列表,它随动作的结果而不同。

我们可以用 STrans 在函数的类型中描述状态转移系统(State Transition Systems)。我们会在之后定义资源(Resources),现在你可以先把它看做“现实世界的状态”的抽象表示。通过给定输入资源(in_res)和输出资源(out_res),我们可以描述允许函数执行的前提条件(Precondition)和描述了函数如何影响世界的所有状态的后置条件(Postcondition)

本节以一些 STrans 函数的小例子开始,看看如何执行它们。我们还介绍了 ST,一个类型级的函数,它能让我们简明地描述带有状态的函数的状态转移。

对示例进行类型检查

对于本节和整个教程,你需要 import Control.ST 并通过向 idris 传递 -p contrib 参数来添加 contrib 包。

初试:操作 State

STrans 函数的类型解释了它如何影响一组 Resources。资源拥有一个类型为 Var标签(label),我们会用它在函数中引用该资源,并通过 label ::: type 的形式将该资源的状态写入``Resources`` 列表。

例如,以下函数的输入资源为 x,类型为 State Integer;输出资源的类型仍为 State Integer :

increment : (x : Var) -> STrans m () [x ::: State Integer]
                                     (const [x ::: State Integer])
increment x = do num <- read x
                 write x (num + 1)

本函数通过 read 读取资源 x 中存储的值,将其自增后的结果用 write 写回资源 x 中。稍后我们会看到 readwrite 的类型(见 STrans 的原语操作)。我们还可以创建和删除资源:

makeAndIncrement : Integer -> STrans m Integer [] (const [])
makeAndIncrement init = do var <- new init
                           increment var
                           x <- read var
                           delete var
                           pure x

makeAndIncrement 的类型指明了它的入口([])和出口(const [])中没有可用的资源。它会用 new 创建一个新的 State 资源(接受一个资源的初始值),对其值自增,将该值读出来,然后用 delete 删除它,返回该资源的最终值。我们后面同样会看到 newdelete 的类型。

STrans``(类型为 ``Type -> Type)的参数 m 叫做计算上下文(Computation Context),函数会在其中运行。在这里,该类型级变量指明了我们可以在任何上下文中运行它。我们可以在同一个上下文中用 runPure 运行它。例如,将以上定义保存在 Intro.idr 文件中,然后在 REPL 中运行以下语句:

*Intro> runPure (makeAndIncrement 93)
94 : Integer

通过交互式的,类型驱动的方式来实现 STrans 是个不错的主意。例如,在通过 new init 创建了资源后,你可以为程序剩余的部分留一个坑(Hole)来看到资源的创建是如何影响类型的:

makeAndIncrement : Integer -> STrans m Integer [] (const [])
makeAndIncrement init = do var <- new init
                           ?whatNext

如果你检查一下 ?whatNext 的类型,就会发现有一个可用的资源 var,而在函数调用完成后应当不会再有可用的资源:

  init : Integer
  m : Type -> Type
  var : Var
--------------------------------------
whatNext : STrans m Integer [var ::: State Integer] (\value => [])

这个小例子可以在任何计算上下文 m 中工作。然而通常,我们会在一个更加严格的上下文中工作。例如,我们可能想要编写一个只能在支持交互式程序的上下文中工作的程序。为此,我们需要学习如何从底层上下文中提升(Lift)操作。

提升:使用计算上下文

比如说,我们现在并不想直接把初始整数传入 makeAndIncrement,而是想要把它从控制台读进来。 那么我们就要把一般的工作上下文 m 换成特定的上下文 IO

ioMakeAndIncrement : STrans IO () [] (const [])

lift 函数给了我们访问 IO 操作的方式。我们可以将 ioMakeAndIncrement 定义如下:

ioMakeAndIncrement : STrans IO () [] (const [])
ioMakeAndIncrement
   = do lift $ putStr "Enter a number: "
        init <- lift $ getLine
        var <- new (cast init)
        lift $ putStrLn ("var = " ++ show !(read var))
        increment var
        lift $ putStrLn ("var = " ++ show !(read var))
        delete var

lift 函数能让我们直接使用底层计算上下文(此处为 IO)中的函数。同样,我们很快就会看到 lift 具体的类型。

!-记法

ioMakeAndIncrement 中,我们使用了 !(read var) 从资源中读取信息。 你可以在 Idris 教程(见单子与 do-记法)中找到关于 !-记法的详情。 简单来说,它允许我们直接就地使用 STrans 类型的函数,而不必先将其结果绑定到一个变量。

至少从概念上来说,你可以将它当做拥有以下类型的函数:

(!) : STrans m a state_in state_out -> a

这个语法糖会在执行 do-语句块中的当前动作之前立即绑定一个变量, 然后在 !-表达式的位置就地使用该变量。

然而在实践中,使用像 IO 这样特定的上下文通常是种糟糕的做法。首先,它需要我们在代码中到处泼洒 lift,这会影响可读性。再者,也是更重要的一点,它会降低函数的安全性,我们会在下一节(用类型表示状态机)中看到这一点。

所以我们改用定义接口的方式来限制计算上下文。例如,Control.ST 中定义了 ConsoleIO 接口,它为控制台的基本交互提供了必要的方法:

interface ConsoleIO (m : Type -> Type) where
  putStr : String -> STrans m () res (const res)
  getStr : STrans m String res (const res)

也就是说,我们能够以任何可用的资源 res 读写控制台,这两个方法均不会对可用的资源产生影响。IO 对它的实现如下:

ConsoleIO IO where
  putStr str = lift (Interactive.putStr str)
  getStr = lift Interactive.getLine

现在,我们可以将 ioMakeAndIncrement 定义为:

ioMakeAndIncrement : ConsoleIO io => STrans io () [] (const [])
ioMakeAndIncrement
   = do putStr "Enter a number: "
        init <- getStr
        var <- new (cast init)
        putStrLn ("var = " ++ show !(read var))
        increment var
        putStrLn ("var = " ++ show !(read var))
        delete var

它不仅可以在特定的 IO 中工作,还可以在一般的 io 上下文中工作,我们只需在该上下文中提供一个 ConsoleIO 的实现即可。相较于初版而言,它有以下优点:

  • 所有对 lift 的调用都在接口的实现中,而非在 ioMakeAndIncrement
  • 我们可以提供另一种 ConsoleIO 的实现,比如在基本的 I/O 中支持异常或日志。
  • 在下一节(用类型表示状态机)中我们将会看到,它可以让我们定义安全的 API,以便更加精确地操作具体的资源。

我们之前在同一个上下文中使用 runPure 来运行 makeAndIncrement。而在这里,我们则使用 run,它能够让我们在任何上下文中执行 STrans 程序(只要该上下文实现了 Applicative 即可)。我们可以像下面这样在 REPL 中执行 ioMakeAndIncrement

*Intro> :exec run ioMakeAndIncrement
Enter a number: 93
var = 93
var = 94

用依赖类型操作 State

在第一个 State 的例子中,当我们将值自增后,其类型并未改变。然而,当我们使用依赖类型时,状态的更新同样也会涉及到其类型的更新。例如,当我们向存储在状态中的向量添加一个元素时,其长度会改变:

addElement : (vec : Var) -> (item : a) ->
             STrans m () [vec ::: State (Vect n a)]
                  (const [vec ::: State (Vect (S n) a)])
addElement vec item = do xs <- read vec
                         write vec (item :: xs)

注意你需要 import Data.Vect 来执行此示例。

直接用 update 更新状态

除了分别使用 readwrite 以外,你还可以使用 update,它从一个 State 中读取内容,对它应用一个函数,然后写入其结果。通过 update 你可以将 addElement 写为如下形式:

addElement : (vec : Var) -> (item : a) ->
             STrans m () [vec ::: State (Vect n a)]
                  (const [vec ::: State (Vect (S n) a)])
addElement vec item = update vec (item ::)

然而,我们并不总是能够知道在一系列动作中类型具体是如何变化的。例如,如果我们有一个包含整数向量的状态,那么可以从控制台读取一个输入,只有当该输入为有效的整数时才将它添加到该向量中。根据该整数是否读取成功,我们的输出状态会有不同的类型,简直令人无语。因此,下面两个函数的类型都不太正确:

readAndAdd_OK : ConsoleIO io => (vec : Var) ->
                STrans m ()  -- 返回空元组
                            [vec ::: State (Vect n Integer)]
                     (const [vec ::: State (Vect (S n) Integer)])
readAndAdd_Fail : ConsoleIO io => (vec : Var) ->
                  STrans m ()  -- 返回空元组
                              [vec ::: State (Vect n Integer)]
                       (const [vec ::: State (Vect n Integer)])

不过请记住,输出资源的类型可以从函数的结果中计算出来。 目前,我们用 const 表示输出资源总是保持不变。不过在这里,我们可以用一个类型级函数来计算出输出资源。我们首先将返回空元组改为 Bool,当读取输入成功时它返回 True;然后为输出资源挖一个

readAndAdd : ConsoleIO io => (vec : Var) ->
             STrans m Bool [vec ::: State (Vect n Integer)]
                           ?output_res

如果你检查 ?output_res 的类型,就会看到 Idris 期望一个类型为 Bool -> Resources 的函数,它表示输出资源的类型可以随 readAndAdd 的结果而不同:

  n : Nat
  m : Type -> Type
  io : Type -> Type
  constraint : ConsoleIO io
  vec : Var
--------------------------------------
output_res : Bool -> Resources

所以,当输入无效时,输出资源为 Vect n Integer(例如 readAndAdd 返回 False);当输入有效时,输出资源为 Vect (S n) Integer。我们可以用类型将它表示出来:

readAndAdd : ConsoleIO io => (vec : Var) ->
             STrans io Bool [vec ::: State (Vect n Integer)]
                   (\res => [vec ::: State (if res then Vect (S n) Integer
                                                   else Vect n Integer)])

接着,我们在实现 readAndAdd 时需要为输出的状态返回适当的值。如果为向量添加了一个元素,就返回 True,否则就要返回 False

readAndAdd : ConsoleIO io => (vec : Var) ->
             STrans io Bool [vec ::: State (Vect n Integer)]
                   (\res => [vec ::: State (if res then Vect (S n) Integer
                                                   else Vect n Integer)])
readAndAdd vec = do putStr "Enter a number: "
                    num <- getStr
                    if all isDigit (unpack num)
                       then do
                         update vec ((cast num) ::)
                         pure True     -- 添加一个元素,因此返回 True
                       else pure False -- 没有添加元素,因此返回 False

如果进行交互式开发的话则稍微有点不同。如果我们挖一个坑,那么在知道要返回的值以前,所需的输出状态并不显而易见。例如,在以下未完成的 readAndAdd 定义中,我们为成功的情况留了个坑:

readAndAdd vec = do putStr "Enter a number: "
                    num <- getStr
                    if all isDigit (unpack num)
                       then ?whatNow
                       else pure False

我们可以查看 ?whatNow 的类型,很遗憾信息不足:

  vec : Var
  n : Nat
  io : Type -> Type
  constraint : ConsoleIO io
  num : String
--------------------------------------
whatNow : STrans io Bool [vec ::: State (Vect (S n) Integer)]
                 (\res =>
                    [vec :::
                     State (ifThenElse res
                                       (Delay (Vect (S n) Integer))
                                       (Delay (Vect n Integer)))])

问题是我们只有在知道值会被返回时才能知道需要的输出状态。为了帮助交互式开发, Control.ST 提供了一个 returning 函数,我们可以用它来提前指定返回值,然后更新相应的状态。例如,我们可以将未完成的 readAndAdd 编写为:

readAndAdd vec = do putStr "Enter a number: "
                    num <- getStr
                    if all isDigit (unpack num)
                       then returning True ?whatNow
                       else pure False

它表示在成功的分支中,我们会返回 True?whatNow 应该解释如何相应地更新状态,使其对于返回值 True 来说是正确的。我们只需检查 ?whatNow,就会发现现在的信息多了一点:

  vec : Var
  n : Nat
  io : Type -> Type
  constraint : ConsoleIO io
  num : String
--------------------------------------
whatnow : STrans io () [vec ::: State (Vect n Integer)]
                 (\value => [vec ::: State (Vect (S n) Integer)])

现在这个类型表示,在 STrans 的输出资源列表中,我们可以通过向 vec 添加一个元素来完成其定义:

readAndAdd vec = do putStr "Enter a number: "
                    num <- getStr
                    if all isDigit (unpack num)
                       then returning True (update vec ((cast num) ::))
                       else returning False (pure ()) -- 返回 False,因此无需更新状态

STrans 的原语操作

我们已经写过几个关于 STrans 函数的小例子了,是时候详细地了解这些状态操作函数了。 首先,为了读写状态,我们使用了 readwrite 函数:

read : (lbl : Var) -> {auto prf : InState lbl (State ty) res} ->
       STrans m ty res (const res)
write : (lbl : Var) -> {auto prf : InState lbl ty res} ->
        (val : ty') ->
        STrans m () res (const (updateRes res prf (State ty')))

它们的类型看起来有点吓人,特别是隐式的 prf 参数,其类型为:

prf : InState lbl (State ty) res

它依赖于一个断言 InState。一个类型为 InState x ty res 的值表示在资源列表 res 中,引用 x 的类型必须为 ty。实际上,所有这种类型都表示, 如果一个对某资源的引用存在于资源列表中,那么我们我们只能读取或写入该资源。

给定一个资源标签 res 和一个 res 存在于资源列表中的证明,那么 updateRes 会更新该资源的类型。因此,write 的类型表示该资源的类型会被更新为给定值的类型。

update 的类型与 readwrite 类型类似,它也需要资源的类型为给定函数的输入类型,并将它更新为该函数的输出类型:

update : (lbl : Var) -> {auto prf : InState lbl (State ty) res} ->
         (ty -> ty') ->
         STrans m () res (const (updateRes res prf (State ty')))

new 的类型表示它返回一个 Var,给定一个类型为 state 的初始值, 输出资源包含一个新的类型为 State state 的资源:

new : (val : state) ->
      STrans m Var res (\lbl => (lbl ::: State state) :: res)

新资源的类型为 State state 而非只是 state 这一点很重要,因为这能让我们隐藏 API 的实现细节。在下一节用类型表示状态机中,我们会看到更多关于其意义的内容。

delete 的类型表示,给定一个标签存在于输入资源内的隐式证明,该标签会从资源列表中移除:

delete : (lbl : Var) -> {auto prf : InState lbl (State st) res} ->
         STrans m () res (const (drop res prf))

这里的 drop 是一个类型级函数,它用于更新资源列表,从该列表中移除给定的资源 lbl

我们之前已经用 lift 在底层上下文中运行过函数了。它的类型如下:

lift : Monad m => m t -> STrans m t res (const res)

给定一个 result 值,pure 会返回产生该值的 STrans 程序, 当产生该值时,它会假设当前资源列表是正确:

pure : (result : ty) -> STrans m ty (out_fn result) out_fn

我们可以用 returning 将从 STrans 函数中返回值的过程分为两部分:提供值本身, 以及更新资源列表使其对应于该返回值:

returning : (result : ty) ->
            STrans m () res (const (out_fn result)) ->
            STrans m ty res out_fn

最后,我们已经用 runrunPure 在特定上下文中执行过 STrans 函数了。run 会在任何上下文中执行函数,若该上下文实现了 Applicative, 那么 runPure 会在同一上下文中执行函数:

run : Applicative m => STrans m a [] (const []) -> m a
runPure : STrans Basics.id a [] (const []) -> a

注意在任何情况下,输入和输出资源列表都必须为空。没有一种方法能够提供初始资源列表, 或提取最终的资源。这是有意设计的:它确保了所有的资源管理都在受控的 STrans 环境下进行,并且我们将会看到,这能够让我们实现安全的 API, 以精确的类型来解释在程序的执行过程中,资源是如何被跟踪的。

这些函数构成了 ST 库的核心。在遇到更加复杂的情况时,我们还会用到一些其它的函数, 不过目前所见的函数足以让我们用 Idris 进行细致的状态跟踪和推理了。

ST:直接表示状态转移

我们已经见过一些简单的 STrans 函数的例子了,由于需要提供显式的输入输出资源列表, 它们的类型会变得非常冗长。在需要为原语操作提供类型时这很方便,不过对于更一般的使用来说, 能为独立的资源表示状态转移,而无需完整地给出输入和输出资源列表的话会更加方便。 我们可以用 ST 来做到这一点:

ST : (m : Type -> Type) ->
     (resultType : Type) ->
     List (Action resultType) -> Type

ST 是一个类型级函数,它会为给定的活动(Action)列表计算出对应的 STrans 类型,该类型描述了资源的状态转移。函数类型中的 Action 可接受以下形式(我们之后还会见到其它形式):

  • lbl ::: ty 表示资源 lbl 的开始和结束状态均为 ty
  • lbl ::: ty_in :-> ty_out 表示资源 lbl 以状态 ty_in 开始,以状态 ty_out 结束
  • lbl ::: ty_in :-> (\res -> ty_out) 表示资源 lbl 以状态 ty_in 开始,以状态 ty_out 结束,其中 ty_out 从函数 res 的结果中计算而来。

现在,我们可以将前面的一些函数的类型写成如下形式:

increment : (x : Var) -> ST m () [x ::: State Integer]

即,increment 的开始和结束状态均为 State Integer 状态的 x

makeAndIncrement : Integer -> ST m Integer []

即,makeAndIncrement 的开始和结束均没有资源。

addElement : (vec : Var) -> (item : a) ->
             ST m () [vec ::: State (Vect n a) :-> State (Vect (S n) a)]

即,addElementvecState (Vect n a) 改变为 State (Vect (S n) a)

readAndAdd : ConsoleIO io => (vec : Var) ->
             ST io Bool
                   [vec ::: State (Vect n Integer) :->
                    \res => State (if res then Vect (S n) Integer
                                          else Vect n Integer)]

我们通过这种编写类型的方式,表达出了每个函数如何影响整体资源状态的最小必要条件。 如果某个资源的更新依赖于某个结果(如 readAndAdd),那么我们需要完整地描述它。 否则(如 incrementmakeAndIncrement),我们可以写出输入输出资源列表以避免重复。

Action 也可以描述添加移除状态:

  • add ty,如果该操作返回一个 Var,那么它会添加一个 ty 类型的新资源。
  • remove lbl ty 表示该操作会从资源列表中的状态 ty 开始,移除名为 lbl 的资源。

例如,我们可以写出:

newState : ST m Var [add (State Int)]
removeState : (lbl : Var) -> ST m () [remove lbl (State Int)]

第一个函数 newState 返回一个新的资源标签并将该资源添加到 State Int 类型的资源列表中。第二个函数 removeState 根据给定的标签 lbl 从列表中移除该资源。二者的类型与以下形式等价:

newState : STrans m Var [] (\lbl => [lbl ::: State Int])
removeState : (lbl : Var) -> STrans m () [lbl ::: State Int] (const [])

它们是构造 Action 的原语方法。我们后面还会遇到一些用类型级函数来提高可读性的方式。

除了极少数需要准确完整的 STrans 的情况外,在本教程剩余的部分中,我们通常会使用 ST。在下一节中,我们会看到如何用 ST 提供的设施来为需要安全性的系统编写准确的 API:一个需要登录的数据存储系统。

用类型表示状态机

概览一节中,我们用状态转移图展示了数据存储系统的抽象状态, 以及可以对它执行的动作:

login

我们之所以把它称作该存储系统的抽象状态,是因为具体的状态会包含更多信息。例如,它可能包含用户名、散列化的密码和存储内容等等。然而,就目前我们我们所关心的动作 loginlogoutreadSecret 而言,登录状态决定了哪些动作是有效的。

我们已经见过如何用 ST 操作状态,用依赖类型表示状态了。在本节中,我们会看到如何用 ST 为数据存储系统提供安全的 API。在 API 中,我们会用类型编码上面的状态转移图。 通过这种方式,我们可以只在状态有效时才能执行 loginlogoutreadSecret 操作。

我们已经用过 State 及其原语操作 newreadwritedelete 来操作状态了。而对于数据存储的 API,我们则以定义接口开始(见 Idris 教程中的接口一节)。接口描述了对存储系统的操作,其类型则准确解释了每种操作何时才有效,以及它是如何影响存储系统的状态的。通过接口,我们可以确保这是访问存储系统的唯一的方式。

为数据存储系统定义接口

我们首先在 Login.idr 文件中定义数据类型,它表示存储系统的两种抽象状态,即 LoggedOutLoggedIn

data Access = LoggedOut | LoggedIn

我们可以定义一个数据类型来表示存储系统的当前状态,保存所有必要的信息(如用户名、散列化的密码、存储内容等等),并根据登录状态来参数化该数据类型:

Store : Access -> Type

不过我们现在先不定义具体的类型,而是将以下代码包含在数据存储的接口中,之后再定义具体的类型:

interface DataStore (m : Type -> Type) where
  Store : Access -> Type

我们可以继续为此接口补充其它存储系统的操作。这样做优点众多。通过将接口与其实现相分离,我们可以为不同的上下文提供相应的具体实现。此外,我们还可以编写与存储系统协作的程序而无需知道任何实现的细节。

我们需要用 connect 连接到该存储系统,在结束后用 disconnect 断开连接。 我们为 DataStore 接口添加以下方法:

connect : ST m Var [add (Store LoggedOut)]
disconnect : (store : Var) -> ST m () [remove store (Store LoggedOut)]

connect 的类型表明它会返回一个初始类型为 Store LoggedOut 的新资源。 反之,disconnect 则会给出一个状态为 Store LoggedOut 的资源并移除该资源。 我们可以通过以下(未完成的)定义更加清楚地看到 connect 做了什么:

doConnect : DataStore m => ST m () []
doConnect = do st <- connect
               ?whatNow

注意我们正在一个 一般的上下文 m 中工作,为了能够执行 doConnect,我们必须为 m 实现 DataStore 接口来限制它。如果我们检查 ?whatNow 的类型,就会看到剩下的操作以一个状态为 Store LoggedOut 的资源 st 开始, 以没有可用的资源结束:

  m : Type -> Type
  constraint : DataStore m
  st : Var
--------------------------------------
whatNow : STrans m () [st ::: Store LoggedOut] (\result => [])

接着,我们可以用 disconnect 来移除该资源:

doConnect : DataStore m => ST m () []
doConnect = do st <- connect
               disconnect st
               ?whatNow

现在检查 ?whatNow 的类型会显示我们没有可用的资源:

  m : Type -> Type
  constraint : DataStore m
  st : Var
--------------------------------------
whatNow : STrans m () [] (\result => [])

为了继续完善 DataStore 接口的实现,我们接下来添加一个读取机密数据的方法。 这需要 store 的状态为 Store LoggedIn

readSecret : (store : Var) -> ST m String [store ::: Store LoggedIn]

此时我们可以试着编写一个函数,它先连接到存储系统,然后读取机密数据,之后断开连接。 然而它并不会成功,因为执行 readSecret 需要我们处于已登录状态。

badGet : DataStore m => ST m () []
badGet = do st <- connect
            secret <- readSecret st
            disconnect st

它会产生以下错误,因为 connect 创建了状态为 LoggedOut 的新存储,而 readSecret 需要该存储的状态为 LoggedIn

When checking an application of function Control.ST.>>=:
    Error in state transition:
            Operation has preconditions: [st ::: Store LoggedOut]
            States here are: [st ::: Store LoggedIn]
            Operation has postconditions: \result => []
            Required result states here are: \result => []

该错误信息解释了所需的输入状态(前提条件)和输出状态(后置条件)与该操作中的状态有何不同。为了使用 readSecret,我们需要一种方式将 Store LoggedOut 转换为 Store LoggedIn 状态。我们可以先尝试将 login(登录)指定为以下类型:

login : (store : Var) -> ST m () [store ::: Store LoggedOut :-> Store LoggedIn] -- 类型不正确!

注意,接口中并没有说明 login如何工作的,只是表达了它如何影响所有的状态。即便如此,login 的类型还是有点问题,因为它假设了登录总会成功。如果登录失败(比如在该实现提示输入密码时用户输入了错误的密码),那么它一定不会产生 LoggedIn 状态的存储。

因此,login 需要通过以下类型返回登录是否成功:

data LoginResult = OK | BadPassword

接着,我们可以从结果中计算出结果状态(见用依赖类型操作 State)。我们为 DataStore 接口添加以下方法:

login : (store : Var) ->
        ST m LoginResult [store ::: Store LoggedOut :->
                           (\res => Store (case res of
                                                OK => LoggedIn
                                                BadPassword => LoggedOut))]

如果 login 成功,那么 login 之后的状态会变成 Store LoggedIn。否则,状态仍然为 Store LoggedOut

为完成此接口,我们还需要添加一个退出该存储系统的方法。我们假设退出总是成功,并将存储系统的状态从 Store LoggedIn 转换为 Store LoggedOut

logout : (store : Var) -> ST m () [store ::: Store LoggedIn :-> Store LoggedOut]

这样就完成了此接口。完整代码如下:

interface DataStore (m : Type -> Type) where
  Store : Access -> Type

  connect : ST m Var [add (Store LoggedOut)]
  disconnect : (store : Var) -> ST m () [remove store (Store LoggedOut)]

  readSecret : (store : Var) -> ST m String [store ::: Store LoggedIn]
  login : (store : Var) ->
          ST m LoginResult [store ::: Store LoggedOut :->
                             (\res => Store (case res of
                                                  OK => LoggedIn
                                                  BadPassword => LoggedOut))]
  logout : (store : Var) -> ST m () [store ::: Store LoggedIn :-> Store LoggedOut]

在尝试创建此接口的实现之前,我们来看看如何用它来编写函数,以此来登录数据存储系统、在登录成功后读取机密数据,然后再退出。

用数据存储接口编写函数

我们以编写 getData 函数为例,展示如何使用 DataStore 接口。 该函数用于连接到存储系统并从中读取数据。我们使用该操作的类型来逐步指导开发, 交互式地编写此函数。它的类型如下:

getData : (ConsoleIO m, DataStore m) => ST m () []

该类型表示在进入或退出时没有资源可用。也就是说,整个动作列表为 [],这表示至少从外部来说,该函数完全没有对资源产生作用。换句话说,对于每一个在调用 getData 时创建的资源,我们都需要在退出前删除它。

由于我们要使用 DataStore 接口的方法,因此必须约束计算上下文 m 使其实现 DataStore 接口。我们还有一个 ConsoleIO m 约束,这样就能将我们从存储系统中读取的任何数据或者错误信息显示出来。

我们从连接到存储系统开始,创建一个新的资源 st,然后尝试用 login 登录:

getData : (ConsoleIO m, DataStore m) => ST m () []
getData = do st <- connect
             ok <- login st
             ?whatNow

登录可能成功也可能失败,两种状态可从 ok 值反映出来。如果我们检查 ?whatNow 的类型,就会看到当前存储系统的状态:

  m : Type -> Type
  constraint : ConsoleIO m
  constraint1 : DataStore m
  st : Var
  ok : LoginResult
--------------------------------------
whatNow : STrans m () [st ::: Store (case ok of
                                          OK => LoggedIn
                                          BadPassword => LoggedOut)]
                      (\result => [])

由于 st 的当前状态依赖于 ok 的值,因此我们可以对 ok 分情况讨论来继续推进:

getData : (ConsoleIO m, DataStore m) => ST m () []
getData = do st <- connect
             ok <- login st
             case ok of
                  OK => ?whatNow_1
                  BadPassword => ?whatNow_2

两个分支上的坑 ?whatNow_1?whatNow_2 的类型展现了状态是如何随着登录成功与否而改变的。如果登录成功,那么该存储系统的状态为 LoggedIn:

--------------------------------------
whatNow_1 : STrans m () [st ::: Store LoggedIn] (\result => [])

如果失败,那么它的状态为 LoggedOut:

--------------------------------------
whatNow_2 : STrans m () [st ::: Store LoggedOut] (\result => [])

?whatNow_1 中,由于登录成功,因此可以读取机密数据并将它显示在终端上:

getData : (ConsoleIO m, DataStore m) => ST m () []
getData = do st <- connect
             ok <- login st
             case ok of
                  OK => do secret <- readSecret st
                           putStrLn ("Secret is: " ++ show secret)
                           ?whatNow_1
                  BadPassword => ?whatNow_2

我们要以「无资源可用」的状态来结束 OK 分支,因此需要退出存储系统并断开连接:

getData : (ConsoleIO m, DataStore m) => ST m () []
getData = do st <- connect
             ok <- login st
             case ok of
                  OK => do secret <- readSecret st
                           putStrLn ("Secret is: " ++ show secret)
                           logout st
                           disconnect st
                  BadPassword => ?whatNow_2

注意我们在调用 disconnect 断开连接前,必须logout 退出 st,因为 disconnect 需要存储系统处于 LoggedOut 状态。

此外,我们不能像上一节中 State 的示例那样,简单地用 delete 来删除该资源,因为对于某个类型 ty 来说,delete 只能在资源的类型为 State ty 时起效。如果我们试图用 delete 来代替 disconnect,就会看到以下错误:

When checking argument prf to function Control.ST.delete:
        Can't find a value of type
                InState st (State st) [st ::: Store LoggedOut]

换句话说,类型检查器找不到一个「资源 st 拥有 State st 形式的类型」的证明,因为其类型为 Store LoggedOut。由于 StoreDataStore 接口的一部分,而我们尚未知道 Store 的具体表示,因此我们需要通过此接口的 disconnect 而非直接用 delete 来删除资源。

我们可以将 getData 完成如下,使用模式匹配来绑定候选(见 Idris 教程的单子与 do-记法),而非使用 case 语句来捕获 login 可能产生的错误:

getData : (ConsoleIO m, DataStore m) => ST m () []
getData = do st <- connect
             OK <- login st
                | BadPassword => do putStrLn "Failure"
                                    disconnect st
             secret <- readSecret st
             putStrLn ("Secret is: " ++ show secret)
             logout st
             disconnect st

然而它现在还跑不起来,因为我们还没有任何 DataStore 的实现!如果我们试着在一个 IO 上下文中执行它,就会产生一个没有 DataStore IO 的实现的错误:

*Login> :exec run {m = IO} getData
When checking an application of function Control.ST.run:
        Can't find implementation for DataStore IO

因此,要实现遵循其状态转移图的数据存储系统,最后一步就是提供一个 DataStore 的实现。

实现接口

要在 IO 中执行 getData,我们需要提供一个能够在 IO 上下文中工作的 DataStore 的实现。我们可以这样开始:

implementation DataStore IO where

接着,我们可以让 Idris 根据必要方法的基本定义来填充该接口(在 Atom 中按下 Ctrl-Alt-A,或者在你喜欢的编辑器中按下对应的快捷键来「添加定义」):

implementation DataStore IO where
  Store x = ?DataStore_rhs_1
  connect = ?DataStore_rhs_2
  disconnect store = ?DataStore_rhs_3
  readSecret store = ?DataStore_rhs_4
  login store = ?DataStore_rhs_5
  logout store = ?DataStore_rhs_6

我们首先要确定表示该数据存储系统的方式。为简单起见,我们将数据存储为单个的 String,并使用硬编码的密码来获取访问权限。我们可以将 Store 定义如下,无论在 LoggedOut 还是在 LoggedIn 状态下均使用 String 来表示数据。

Store x = State String

现在我们给出了 Store 的一个具体类型,我们可以实现建立连接、断开连接和访问数据的操作。而由于我们使用了 State,因此也就可以使用 newdeletereadwrite 来操作该存储系统。

坑的类型会告诉我们如何操作状态。例如,?DataStore_rhs_2 坑告诉我们要实现 connect 需要做些什么。我们需要返回一个新的 Var,表示一个类型为 State String 的资源:

--------------------------------------
DataStore_rhs_2 : STrans IO Var [] (\result => [result ::: State String])

我们可以通过创建一个带有某些数据作为存储内容的新变量来实现它(我们可以使用任何 String),然后返回该变量:

connect = do store <- new "Secret Data"
             pure store

对于 disconnect 而言,我们只需删除该资源即可:

disconnect store = delete store

对于 readSecret,我们需要读取机密数据并返回 String。由于我们并不知道该数据的具体表示为 State String,因此可以直接用 read 来访问数据:

readSecret store = read store

我们先来完成 logout,之后回到 login 上来。检查坑的类型会显示以下信息:

  store : Var
--------------------------------------
DataStore_rhs_6 : STrans IO () [store ::: State String] (\result => [store ::: State String])

因此在此小型实现中,我们实际上不用做任何事情!

logout store = pure ()

对于 login,我们需要返回登录是否成功。为此,我们需要提示用户输入密码,并在匹配到硬编码的密码时返回 OK,否则返回 BadPassword

login store = do putStr "Enter password: "
                 p <- getStr
                 if p == "Mornington Crescent"
                    then pure OK
                    else pure BadPassword

下面给出完整的实现以供参考,它能让我们在 REPL 中执行 DataStore 程序:

implementation DataStore IO where
  Store x = State String
  connect = do store <- new "Secret Data"
               pure store
  disconnect store = delete store
  readSecret store = read store
  login store = do putStr "Enter password: "
                   p <- getStr
                   if p == "Mornington Crescent"
                      then pure OK
                      else pure BadPassword
  logout store = pure ()

最后,我们可以像下面这样在 REPL 中尝试它(如果有可用的 IO 实现,那么在 Idris 的 REPL 中,上下文会默认为 IO,因此这里无需显式给出 m 参数):

*Login> :exec run getData
Enter password: Mornington Crescent
Secret is: "Secret Data"

*Login> :exec run getData
Enter password: Dollis Hill
Failure

对于 State 类型的资源,我们只能使用 readwritenewdelete。因此,在 DataStore 的实现或任何已知上下文为 IO 的环境内部,我们可以随意访问数据存储系统,因为这里是实现 DataStore 内部细节的地方。然而,如果我们只有 DataStore m 的约束,那么就无法知道该存储系统是否已实现,因此我们只能通过 DataStore 提供的 API 来访问它。

因此比较好的做法是在 getData 这类函数中使用泛型(Generic)上下文 m,并只根据我们需要的接口进行约束,而非使用具体的上下文 IO

现在我们已经学过如何处理状态,以及如何用接口来封装数据存储这类具体系统的状态转移了。 然而,真正的系统需要能够复合多种状态机。我们一次需要使用多个状态机,或者基于多个状态机来实现一个状态机。

复合状态机

在上一节中,我们定义了 DataStore 接口并用它实现了以下小程序,它能让用户登录该存储系统然后打印存储的内容:

getData : (ConsoleIO m, DataStore m) => ST m () []
getData = do st <- connect
             OK <- login st
                | BadPassword => do putStrLn "Failure"
                                    disconnect st
             secret <- readSecret st
             putStrLn ("Secret is: " ++ show secret)
             logout st
             disconnect st

该函数只用了一个状态,即存储本身。然而,更大的程序通常会有更多的状态,在执行过程中可能还需要添加、删除或更新状态。就本例而言,无限循环,在状态中记录登录失败的次数都是有用的扩展。

此外,状态机还可以有层级,即一个状态机可以通过复合其它状态机来实现。例如,我们可以有一个表示图形系统的状态机,并用它来实现「海龟绘图」这种高级的图形 API,「海龟绘图」会使用图形系统加上一些额外的状态。

在本节中,我们会看到如何使用多个状态工作,以及如何将状态机复合成更高级别的状态机。我们先来看看如何为 getData 添加登录失败的计数器。

使用多个资源来工作

为了了解如何使用多个资源来工作,我们需要修改 getData 使其循环,然后记录用户登录失败的总次数。例如,如果我们编写的 main 程序初始记录次数为 0, 那么会话过程看起来可能会是这样的:

*LoginCount> :exec main
Enter password: Mornington Crescent
Secret is: "Secret Data"
Enter password: Dollis Hill
Failure
Number of failures: 1
Enter password: Mornington Crescent
Secret is: "Secret Data"
Enter password: Codfanglers
Failure
Number of failures: 2
...

我们首先为 getData 添加一个跟踪失败次数的状态资源:

getData : (ConsoleIO m, DataStore m) =>
          (failcount : Var) -> ST m () [failcount ::: State Integer]

getData 的类型检查

如果你跟着本教程写代码,就会发现更新了 getData 的类型后就无法通过编译了。这是意料之中的!我们暂且先将 getData 的定义注释掉,回头再说。

接着,我们创建一个 main 程序,它将状态初始化为 0,然后调用 getData

main : IO ()
main = run (do fc <- new 0
               getData fc
               delete fc)

现在着手实现 getData。我们先来添加一个参数表示允许的失败次数:

getData : (ConsoleIO m, DataStore m) =>
          (failcount : Var) -> ST m () [failcount ::: State Integer]
getData failcount
        = do st <- connect
             OK <- login st
                | BadPassword => do putStrLn "Failure"
                                    disconnect st
             secret <- readSecret st
             putStrLn ("Secret is: " ++ show secret)
             logout st
             disconnect st

然而,它无法通过类型检查,因为我们用来调用 connect 的资源不正确。错误信息描述了资源为何不必配:

When checking an application of function Control.ST.>>=:
    Error in state transition:
            Operation has preconditions: []
            States here are: [failcount ::: State Integer]
            Operation has postconditions: \result => [result ::: Store LoggedOut] ++ []
            Required result states here are: st2_fn

换句话说,connect 需要在进入时没有资源,但我们却有一个资源,即失败次数!这理应没什么问题:毕竟我们需要的资源是拥有的资源的子集,而额外的资源(这里是失败次数)与 connect 无关。因此我们需要一种临时隐藏附加资源的方法。

我们可以用 call 函数来达此目的:

getData : (ConsoleIO m, DataStore m) =>
          (failcount : Var) -> ST m () [failcount ::: State Integer]
getData failcount
   = do st <- call connect
        ?whatNow

我们在这里为 getData 剩下的部分挖了个坑,这样你可以看到 call 的作用。它移除了调用 connect 时资源列表中不必要的部分,然后在返回时恢复了它们。因此 whatNow 的类型表明我们添加了一个新的资源 st,而 failcount依然可用:

  failcount : Var
  m : Type -> Type
  constraint : ConsoleIO m
  constraint1 : DataStore m
  st : Var
--------------------------------------
whatNow : STrans m () [failcount ::: State Integer, st ::: Store LoggedOut]
                      (\result => [failcount ::: State Integer])

在此函数函数的最后,whatNow 表明我们需要以状态 st 结束,然而此时尚有 failcount 可用。我们可以在调用数据存储系统中的操作时添加 call 来消除资源列表,这样完成的 getData 就能带着附加的状态资源工作:

getData : (ConsoleIO m, DataStore m) =>
          (failcount : Var) -> ST m () [failcount ::: State Integer]
getData failcount
        = do st <- call connect
             OK <- call $ login st
                | BadPassword => do putStrLn "Failure"
                                    call $ disconnect st
             secret <- call $ readSecret st
             putStrLn ("Secret is: " ++ show secret)
             call $ logout st
             call $ disconnect st

这样有点啰嗦,实际上我们可以将 call 变成隐式的从而去掉它。默认情况下,你需要显式地添加 call,但如果你导入了 Control.ST.ImplicitCall,那么 Idris 就会在需要的地方插入它。

import Control.ST.ImplicitCall

现在的 getData 就和之前一样了:

getData : (ConsoleIO m, DataStore m) =>
          (failcount : Var) -> ST m () [failcount ::: State Integer]
getData failcount
        = do st <- connect
             OK <- login st
                | BadPassword => do putStrLn "Failure"
                                    disconnect st
             secret <- readSecret st
             putStrLn ("Secret is: " ++ show secret)
             logout st
             disconnect st

这里需要权衡:如果你导入了 Control.ST.ImplicitCall,那么使用多个资源的函数会更加易读,因为没有啰嗦的 call 了。另一方面,Idris 对你函数的类型检查会变得有点困难,这会导致它花费更多时间,错误信息也会少一点帮助。

看一下 call 的类型,你会有所启发:

call : STrans m t sub new_f -> {auto res_prf : SubRes sub old} ->
       STrans m t old (\res => updateWith (new_f res) old res_prf)

被调用的函数有一个资源列表 sub,还有一个隐式证明 SubRes sub old,它证明了被调用函数的资源列表是整个资源列表的子集。尽管资源的顺序可以改变,然而在 old 中出现的资源无法在 sub 列表中出现超过一次(如果你尝试它就会得到一个类型错误)。

函数 updateWith 接受被调用函数的输出资源,然后在当前资源列表中更新它们。此函数会尽可能保持顺序不变,尽管被调用的函数在进行复杂的资源操作时并不总是可以保持顺序。

在被调用的函数中新创建的资源

如果被调用的函数创建了新的资源,那么基于 updateWith 的工作方式,它们通常会出现在资源列表的末尾。你可以在前面未完成的 getData 的定义中看到这一点。

最后我们可以更新 getData 使其可以循环,并在需要时保持更新 failCount

getData : (ConsoleIO m, DataStore m) =>
          (failcount : Var) -> ST m () [failcount ::: State Integer]
getData failcount
   = do st <- call connect
        OK <- login st
           | BadPassword => do putStrLn "Failure"
                               fc <- read failcount
                               write failcount (fc + 1)
                               putStrLn ("Number of failures: " ++ show (fc + 1))
                               disconnect st
                               getData failcount
        secret <- readSecret st
        putStrLn ("Secret is: " ++ show secret)
        logout st
        disconnect st
        getData failcount

注意在这里我们会在每次迭代中建立并断开连接。另一种实现方式是首先用 connect 建立连接,然后调用 getData,其实现如下:

getData : (ConsoleIO m, DataStore m) =>
          (st, failcount : Var) -> ST m () [st ::: Store {m} LoggedOut, failcount ::: State Integer]
getData st failcount
   = do OK <- login st
           | BadPassword => do putStrLn "Failure"
                               fc <- read failcount
                               write failcount (fc + 1)
                               putStrLn ("Number of failures: " ++ show (fc + 1))
                               getData st failcount
        secret <- readSecret st
        putStrLn ("Secret is: " ++ show secret)
        logout st
        getData st failcount

st 的类型 Store {m} LoggedOut 中添加显式的 {m} 是十分重要的,因为这给了 Idris 足够的信息来判断哪一个 DataStore 的实现是用于查找其对应的 Store 的实现的。 否则,如果我们只写 Store LoggedOut,那么将无法获知 Store 所关联到的计算上下文 m

接着我们可以在 main 中只 connectdisconnect 一次:

main : IO ()
main = run (do fc <- new 0
               st <- connect
               getData st fc
               disconnect st
               delete fc)

通过使用 call 或导入 Control.ST.ImplicitCall,我们可以编写使用多个资源的程序, 然后在调用一个只用到全部资源的子集的函数时,将资源列表按需删减。

复合资源:状态机的层级

我们现在已经见过如何在一个函数中使用多个资源了,这对于任何能够操作状态的实际的程序来说都是必须的。我们可以把它看做是「横向的」复合:一次使用多个资源。我们通常还需要「纵向」的复合:基于一个或多个资源来实现单个资源。

在本节中,我们会看到一个这种的例子。首先,我们在一个接口 Draw 中实现一个小型的图形 API,它支持:

  • 打开一个窗口,创建一个双缓冲(double-buffered)的平面(surface)来绘图
  • 在平面上绘制线条和矩形
  • 缓冲区「翻页(flipping)」,在窗口中显示我们刚绘制完毕的图像
  • 关闭一个窗口

提示

绘图并不是即时呈现的,需要一个缓冲区(buffer)来暂存即将呈现的图像。 缓冲区一般有三个基本操作:clear(清空),flip(翻页)和 rewind(重显)。 clear 会将当前缓冲区清空以便重新绘图;flip 会将当前缓冲区中的图像显示在屏幕上,然后清空缓冲区等待下一次绘图;而 rewind 则保持缓冲区内容不变,重新显示到屏幕上。

术语 flip 和 rewind 来源于磁带的播放,rewind 即倒带重放,而 flip 则表示把磁带翻过来播放另一面。flip 一词尚无标准译法,此处译作「翻页」表示本画面完成,开始下一画面。

接着,我们在一个 interface 中用此 API 来为「海龟绘图」实现一个更高级的 API。这不仅需要 Draw 接口,还需要表示海龟的状态(位置,方向和画笔颜色)。

SDL 绑定

为了测试本节中的示例,你需要为 Idris 安装一个非常基本的 SDL 绑定,它可从 https://github.com/edwinb/SDL-idris 获取。这些绑定实现了 SDL API 的一个很小的子集,只用作演示的目的。尽管如此,它们已经足以实现一个小型的绘图程序来展示本节的概念了。

一旦你安装完这个包,就可以通过参数启动 Idris 了,-p sdl 用于 SDL 绑定,-p contrib 用于 Control.ST

Draw 接口

我们要在此 API 中使用 Idris 的 SDL 绑定,因此你需要在安装完该绑定库后导入 Graphics.SDL。我们先来定义 Draw 接口,它包含一个表示平面的数据类型,我们会在其之上绘制线条和矩形:

interface Draw (m : Type -> Type) where
    Surface : Type

我们需要能够通过打开新窗口来创建新的 Surface

initWindow : Int -> Int -> ST m Var [add Surface]

然而这样不太正确。如果我们的程序运行在没有可用的窗口系统的终端环境中,那么打开窗口可能会失败。因此, initWindow 需要以某种方式来应对可能的失败。我们可以通过返回 Maybe Var 而非 Var,以及只在成功时添加 Surface 来做到这一点:

initWindow : Int -> Int -> ST m (Maybe Var) [addIfJust Surface]

它使用了 Control.ST 中定义的类型级函数 addIfJust,该函数返回一个 Action,仅在操作成功时添加资源(也就是说,它返回一个形如 Just val 的结果)。

addIfJustaddIfRight

Control.ST 中定义了能够在操作成功时构造新资源的函数,其中的 addIfJust 会在操作返回 Just ty 时添加资源。此外还有 addIfRight

addIfJust : Type -> Action (Maybe Var)
addIfRight : Type -> Action (Either a Var)

二者均基于下面的原语动作 Add 开发。此动作接受一个函数,该函数从操作的结果中构造出一个资源列表:

Add : (ty -> Resources) -> Action ty

如有需要,你可以用它来创建自己的动作,以此来基于某个操作的结果添加资源。例如,addIfJust 的实现如下:

addIfJust : Type -> Action (Maybe Var)
addIfJust ty = Add (maybe [] (\var => [var ::: ty]))

如果我们能创建窗口,那么也需要能删除它:

closeWindow : (win : Var) -> ST m () [remove win Surface]

我们还需要响应按下键盘或点击鼠标这类事件。Graphics.SDL 库为此提供了 Event 类型,而我们可以用 poll 轮询事件,如果存在的话,它会返回最后一个发生的事件:

poll : ST m (Maybe Event) []

Draw 中剩下的方法包括:flip,它会将从上次 flip 以来绘制的所有图像都显示出来;还有两个绘图的方法 filledRectangledrawLine

flip : (win : Var) -> ST m () [win ::: Surface]
filledRectangle : (win : Var) -> (Int, Int) -> (Int, Int) -> Col -> ST m () [win ::: Surface]
drawLine : (win : Var) -> (Int, Int) -> (Int, Int) -> Col -> ST m () [win ::: Surface]

我们按如下方式定义色彩,四个整数表示色彩通道(红、绿、蓝、不透明度):

data Col = MkCol Int Int Int Int

black : Col
black = MkCol 0 0 0 255

red : Col
red = MkCol 255 0 0 255

green : Col
green = MkCol 0 255 0 255

-- 蓝、黄、品红、青、白类似……

在导入 Graphics.SDL 之后,你就可以像下面这样用 SDL 的绑定实现 Draw 接口了:

implementation Draw IO where
  Surface = State SDLSurface

  initWindow x y = do Just srf <- lift (startSDL x y)
                           | pure Nothing
                      var <- new srf
                      pure (Just var)

  closeWindow win = do lift endSDL
                       delete win

  flip win = do srf <- read win
                lift (flipBuffers srf)
  poll = lift pollEvent

  filledRectangle win (x, y) (ex, ey) (MkCol r g b a)
       = do srf <- read win
            lift $ filledRect srf x y ex ey r g b a
  drawLine win (x, y) (ex, ey) (MkCol r g b a)
       = do srf <- read win
            lift $ drawLine srf x y ex ey r g b a

在本实现中,我们使用 startSDL 来初始化窗口,它在失败时返回 Nothing。由于 initWindow 的类型说明它会在返回形如 Just val 的值时添加一个资源, 因此我们在成功时添加 startSDL 返回的平面,在失败时什么也不做。我们只能在 startDSL 成功时初始化成功。

现在我们有了 Draw 的实现,可以试着写一些函数了,他们在窗口中绘图并通过 SDL 绑定执行它们。例如,假设我们有一个可以绘图的平面 win,那么可以编写 render 函数在黑色背景上绘图:

render : Draw m => (win : Var) -> ST m () [win ::: Surface {m}]
render win = do filledRectangle win (0,0) (640,480) black
                drawLine win (100,100) (200,200) red
                flip win

最后的 flip win 是必须的,因为绘图原语使用了双缓冲区来避免图形闪烁。我们在屏幕之外的缓冲区上绘图,同时显示另一个缓冲区。在调用 flip 时,它会将当前屏幕之外的缓冲区显示出来,并创建一个新的屏幕外缓冲区绘制下一帧。

要将它包含在程序里,我们需要编写一个主循环来渲染图像,同时等待用户关闭应用的事件:

loop : Draw m => (win : Var) -> ST m () [win ::: Surface {m}]
loop win = do render win
              Just AppQuit <- poll
                   | _ => loop win
              pure ()

最后,我们创建一个主程序。如果可能,它会创建窗口,然后运行主循环:

drawMain : (ConsoleIO m, Draw m) => ST m () []
drawMain = do Just win <- initWindow 640 480
                 | Nothing => putStrLn "Can't open window"
              loop win
              closeWindow win

我们可以在 REPL 中用 run 运行它:

*Draw> :exec run drawMain
更高级的接口:TurtleGraphics

「海龟绘图」会操纵一只「海龟」在屏幕上移动,在它移动时用「画笔」来画线。一只海龟拥有描述它位置的属性,它面对的方向,以及当前画笔的颜色。此外,还有一些命令来让海龟向前移动,转一个角度,以及更改画笔的颜色。下面是一种可行的接口:

interface TurtleGraphics (m : Type -> Type) where
  Turtle : Type

  start : Int -> Int -> ST m (Maybe Var) [addIfJust Turtle]
  end : (t : Var) -> ST m () [Remove t Turtle]

  fd : (t : Var) -> Int -> ST m () [t ::: Turtle]
  rt : (t : Var) -> Int -> ST m () [t ::: Turtle]

  penup : (t : Var) -> ST m () [t ::: Turtle]
  pendown : (t : Var) -> ST m () [t ::: Turtle]
  col : (t : Var) -> Col -> ST m () [t ::: Turtle]

  render : (t : Var) -> ST m () [t ::: Turtle]

Draw 一样,我们也需要一个初始化海龟的命令(这里叫做 start),如果它无法创建用来绘图的平面就会失败。此外还有一个 render 方法,它用来渲染窗口中目前已绘制的图像。使用此接口的一个可用的程序如下所示,它画了一个彩色的正方形:

turtle : (ConsoleIO m, TurtleGraphics m) => ST m () []
turtle = with ST do
            Just t <- start 640 480
                 | Nothing => putStr "Can't make turtle\n"
            col t yellow
            fd t 100; rt t 90
            col t green
            fd t 100; rt t 90
            col t red
            fd t 100; rt t 90
            col t blue
            fd t 100; rt t 90
            render t
            end t

with ST do

turtle 中使用 with ST do 是为了区分 (>>=) 的版本,它可能来自于 Monad 或者 ST。虽然 Idris 自己能够解决此问题,不过它会尝试所有可能性,因此使用 with 从句可以减少耗时,加速类型检查。

要实现此接口,我们可以尝试用 Surface 来表示海龟用来作画的平面:

implementation Draw m => TurtleGraphics m where
  Turtle = Surface {m}

知道了 Turtle 被表示为 Surface 之后,我们就能使用 Draw 提供的方法来实现海龟了。然而这还不够,我们还需要存储更多信息:具体来说,海龟有一些属性需要存储在某处。因此我们不仅需要将海龟表示为一个 Surface,还需要存储一些附加的状态。我们可以通过复合资源来做到这一点。

复合资源简介

复合资源由一个资源列表构造而来,它使用以下 Control.ST 中定义的类型实现:

data Composite : List Type -> Type

如果我们有一个复合资源,那么可以用 split 将其分解为构成它的资源,并为每个资源创建新的变量。例如:

splitComp : (comp : Var) -> ST m () [comp ::: Composite [State Int, State String]]
splitComp comp = do [int, str] <- split comp
                    ?whatNow

调用 split comp 会从复合资源 comp 中提取出 State IntState String,并分别将它们存储在变量 intstr 中。如果我们检查 whatNow 的类型,就会看到它影响了资源列表:

  int : Var
  str : Var
  comp : Var
  m : Type -> Type
--------------------------------------
whatNow : STrans m () [int ::: State Int, str ::: State String, comp ::: State ()]
                      (\result => [comp ::: Composite [State Int, State String]])

这样,我们就有了两个新的资源 intstr,而 comp 的类型则被更新为单元类型, 即当前没有保存数据。这点符合预期:我们只是要将数据提取为独立的资源而已。

现在我们提取出了独立的资源,可以直接操作它们(比如,增加 Int 或为 String 添加换行)并使用 combine 重新构造复合资源:

splitComp : (comp : Var) ->
            ST m () [comp ::: Composite [State Int, State String]]
splitComp comp = do [int, str] <- split comp
                    update int (+ 1)
                    update str (++ "\n")
                    combine comp [int, str]
                    ?whatNow

同样,我们可以检查 whatNow 的类型来查看 combine 的作用:

  comp : Var
  int : Var
  str : Var
  m : Type -> Type
--------------------------------------
whatNow : STrans m () [comp ::: Composite [State Int, State String]]
                 (\result => [comp ::: Composite [State Int, State String]])

所以 combine 的作用就是接受既有的资源并将它们合并成一个复合资源。在执行 combine 之前,目标资源必须存在(这里是 comp)且类型为 State ()

查看 splitcombine 的类型,了解它们使用的资源列表的要求是很有启发的。split 的类型如下:

split : (lbl : Var) -> {auto prf : InState lbl (Composite vars) res} ->
        STrans m (VarList vars) res (\vs => mkRes vs ++ updateRes res prf (State ()))

隐式的 prf 参数说明要被分解的 lbl 必须为复合资源。它返回一个由复合资源分解而来的变量列表,而 mkRes 函数则创建一个对应类型的资源列表。最后,updateRes 会将复合资源的类型更新为 State ()

combine 函数则反过来:

combine : (comp : Var) -> (vs : List Var) ->
          {auto prf : InState comp (State ()) res} ->
          {auto var_prf : VarsIn (comp :: vs) res} ->
          STrans m () res (const (combineVarsIn res var_prf))

隐式的 prf 参数确保了目标资源 comp 的类型为 State ()。也就是说,我们不会覆盖任何其它的信息。隐式的 var_prf 参数类似于 call 中的 SubRes,它确保了我们用来构造复合资源的每个变量都确实存在于当前的资源列表中。

我们可以基于 Draw 以及任何需要的附加资源,通过复合资源的方式来实现高级的 TurtleGraphics API。

实现 Turtle

现在我们已经知道如何用一组既有的资源来构造出新的资源了。我们可以用复合资源实现 Turtle,它包括一个用于绘图的 Surface,以及一些表示画笔颜色、位置和方向的独立的状态。我们还有一个线条的列表,它描述了我们在调用 render 时要绘制到 Surface 上的图像:

Turtle = Composite [Surface {m}, -- 用来绘图的平面
                    State Col,  -- 画笔颜色
                    State (Int, Int, Int, Bool), -- 画笔的位置/方向/落笔标记
                    State (List Line)] -- 渲染时要画的线条

一条 Line 由它的起始位置,终止位置和颜色定义:

Line : Type
Line = ((Int, Int), (Int, Int), Col)

首先来实现 start,它会创建一个新的 Turtle``(如果不可能则返回 ``Nothing)。我们从初始化绘图平面开始,然后是所有状态构成的组件。最后,我们将所有的组件组合成一个复合资源来表示海龟:

start x y = do Just srf <- initWindow x y
                    | Nothing => pure Nothing
               col <- new white
               pos <- new (320, 200, 0, True)
               lines <- new []
               turtle <- new ()
               combine turtle [srf, col, pos, lines]
               pure (Just turtle)

然后来实现 end,它会把海龟处理掉。我们先析构复合资源,然后关闭窗口,删除所有独立的资源。记住我们只能用 delete 删除一个 State,因此需要用 split 将复合资源分解掉,用 closeWindow 干净地关闭绘图平面,然后用 delete 删除状态:

end t = do [srf, col, pos, lines] <- split t
           closeWindow srf; delete col; delete pos; delete lines; delete t

对于其它的方法,我们需要用 split 分解资源以获取每一个组件,然后在结束时用 combine 将它们组合成一个复合资源。例如,以下为 penup 的实现:

penup t = do [srf, col, pos, lines] <- split t -- 分解复合资源
             (x, y, d, _) <- read pos          -- 析构画笔位置
             write pos (x, y, d, False)        -- 将落笔标记置为 False
             combine t [srf, col, pos, lines]  -- 重新组合组件

海龟剩下的操作遵循相同的模式。完整的细节见 Idris 发行版中的 samples/ST/Graphics/Turtle.idr 文件。之后就是是渲染海龟创建的图像:

render t = do [srf, col, pos, lines] <- split t -- 分解复合资源
              filledRectangle srf (0, 0) (640, 480) black -- 绘制背景
              drawAll srf !(read lines)         -- 渲染海龟绘制的线条
              flip srf                          -- 缓冲区翻页以显示图像
              combine t [srf, col, pos, lines]
              Just ev <- poll
                | Nothing => render t           -- 继续直到按下按键
              case ev of
                   KeyUp _ => pure ()           -- 按键按下,于是退出
                   _ => render t
 where drawAll : (srf : Var) -> List Line -> ST m () [srf ::: Surface {m}]
       drawAll srf [] = pure ()
       drawAll srf ((start, end, col) :: xs)
          = do drawLine srf start end col       -- 按相应的颜色绘制线条
               drawAll srf xs

示例:网络 Socket 编程

POSIX 的插口(Socket,另译作「套接字」)API 支持跨网络的进程间通信。插口表示网络通信的端点,它可以是以下几种状态之一:

  • Ready 为初始状态
  • Bound 表示已绑定至某个地址,准备接受传入的连接
  • Listening 表示正在监听传入的连接
  • Open 表示准备发送或接受数据
  • Closed 表示不再活动

下图展示了该 API 提供的操作是如何改变其状态的,其中 Ready 为初始状态:

netstate

如果某个连接为 Open 状态,那么我们可以用 send 发送消息到该连接的另一端,也可以用 recv 从另一端接收消息。

contrib 包提供了一个 Network.Socket 模块,该模块提供了创建插口以及发送和接收消息的原语。其中包含以下函数:

bind : (sock : Socket) -> (addr : Maybe SocketAddress) -> (port : Port) -> IO Int
connect : (sock : Socket) -> (addr : SocketAddress) -> (port : Port) -> IO ResultCode
listen : (sock : Socket) -> IO Int
accept : (sock : Socket) -> IO (Either SocketError (Socket, SocketAddress))
send : (sock : Socket) -> (msg  : String) -> IO (Either SocketError ResultCode)
recv : (sock : Socket) -> (len : ByteLength) -> IO (Either SocketError (String, ResultCode))
close : Socket -> IO ()

这些函数虽然涵盖了上图中所有的状态转移,然而却均未解释这些操作是如何影响其状态的!例如,我们完全有可能在一个尚未就绪的插口上发送消息,或者在插口关闭后从中接收消息。

我们可以用 ST 提供更好的 API,它精确地解释了每个操作是如何影响连接的状态的。在本节中,我们会定义一个插口 API,然后用它来实现一个「回显(echo)」服务器,通过原样返回客户端发送的消息来响应客户端的请求。

定义 Sockets 接口

我们不直接用 IO 进行底层插口编程,而是用 ST 实现一个接口来精确地描述每个操作如何影响插口的状态,以及何时创建或删除插口。我们首先来创建描述插口抽象状态的类型:

data SocketState = Ready | Bound | Listening | Open | Closed

接着定义一个接口,用 Sock 类型表示插口,以其当前状态作为该类型构造子的参数:

interface Sockets (m : Type -> Type) where
  Sock : SocketState -> Type

我们用 socket 方法创建插口。插口库中定义了 SocketType,它描述了插口为 TCP、UDP 还是其它形式。我们后面会一直使用 Stream 来表示 TCP 插口。

socket : SocketType -> ST m (Either () Var) [addIfRight (Sock Ready)]

记住 addIfRight 会在操作结果的形式为 Right val 时添加资源。按照此接口的约定,我们用 Either 来表示可能失败的操作,它可以保存任何关于错误的附加信息。如此一来,我们就能一致地使用 addIfRight 和其它类型级函数了。

现在来定义一个服务器。一旦我们创建了插口,就需要用 bind 方法将它绑定到一个端口上:

bind : (sock : Var) -> (addr : Maybe SocketAddress) -> (port : Port) ->
       ST m (Either () ()) [sock ::: Sock Ready :-> (Sock Closed `or` Sock Bound)]

绑定插口可能会失败,例如如果已经有一个插口被绑定到给定的端口上,因此它同样要返回一个类型为 Either 的值。这里的动作使用了类型级函数 or,它的意思是:

  • bind 失败,插口就会转移到 Sock Closed 状态
  • bind 成功,插口就会转移到 Sock Bound 状态,如前图所示

or 的实现如下:

or : a -> a -> Either b c -> a
or x y = either (const x) (const y)

这样一来,bind 的类型就能写成如下等价形式:

bind : (sock : Var) -> (addr : Maybe SocketAddress) -> (port : Port) ->
       STrans m (Either () ()) [sock ::: Sock Ready]
                    (either [sock ::: Sock Closed] [sock ::: Sock Bound])

然而,使用 or 会更加简洁,它也能尽可能直接地反映出状态转移图,同时仍然刻画了它可能失败的性质。

一旦我们将插口绑定到了端口,就可以开始监听来自客户端的连接了:

listen : (sock : Var) ->
         ST m (Either () ()) [sock ::: Sock Bound :-> (Sock Closed `or` Sock Listening)]

Listening 状态的插口表示准备接受来自独立客户端的连接:

accept : (sock : Var) ->
         ST m (Either () Var)
              [sock ::: Sock Listening, addIfRight (Sock Open)]

如果有客户端传入的连接,accept 会在资源列表的最后添加一个新的资源(按照约定,在列表末尾添加资源可以更好地配合 updateWith 工作,如上一节所述)。 现在,我们有了两个插口:一个继续监听传入的连接,另一个准备与客户端通信。

我们还需要能够在插口上发送和接受数据的方法:

send : (sock : Var) -> String ->
       ST m (Either () ()) [sock ::: Sock Open :-> (Sock Closed `or` Sock Open)]
recv : (sock : Var) ->
       ST m (Either () String) [sock ::: Sock Open :-> (Sock Closed `or` Sock Open)]

一旦我们与另一台机器通过插口进行的通信结束, 就需要用 close 关闭连接并用 remove 移除该插口:

close : (sock : Var) ->
        {auto prf : CloseOK st} -> ST m () [sock ::: Sock st :-> Sock Closed]
remove : (sock : Var) ->
         ST m () [Remove sock (Sock Closed)]

close 使用了断言 CloseOK 作为隐式证明参数,它描述了何时可以关闭插口:

data CloseOK : SocketState -> Type where
     CloseOpen : CloseOK Open
     CloseListening : CloseOK Listening

也就是说,我们可以关闭 Open 状态的插口,告诉另一台机器终止通信。我们也可以关闭 Listening 状态下等待传入连接的插口,这会让服务器停止接受请求。

在本节中,我们实现了一个服务器,不过为了完整性,我们还需要在另一台机器上实现客户端来连接到服务器。这可以通过 connect 来完成:

connect : (sock : Var) -> SocketAddress -> Port ->
          ST m (Either () ()) [sock ::: Sock Ready :-> (Sock Closed `or` Sock Open)]

以下完整的接口作为参考:

interface Sockets (m : Type -> Type) where
  Sock : SocketState -> Type
  socket : SocketType -> ST m (Either () Var) [addIfRight (Sock Ready)]
  bind : (sock : Var) -> (addr : Maybe SocketAddress) -> (port : Port) ->
         ST m (Either () ()) [sock ::: Sock Ready :-> (Sock Closed `or` Sock Bound)]
  listen : (sock : Var) ->
           ST m (Either () ()) [sock ::: Sock Bound :-> (Sock Closed `or` Sock Listening)]
  accept : (sock : Var) ->
           ST m (Either () Var) [sock ::: Sock Listening, addIfRight (Sock Open)]
  connect : (sock : Var) -> SocketAddress -> Port ->
            ST m (Either () ()) [sock ::: Sock Ready :-> (Sock Closed `or` Sock Open)]
  close : (sock : Var) -> {auto prf : CloseOK st} ->
          ST m () [sock ::: Sock st :-> Sock Closed]
  remove : (sock : Var) -> ST m () [Remove sock (Sock Closed)]
  send : (sock : Var) -> String ->
         ST m (Either () ()) [sock ::: Sock Open :-> (Sock Closed `or` Sock Open)]
  recv : (sock : Var) ->
         ST m (Either () String) [sock ::: Sock Open :-> (Sock Closed `or` Sock Open)]

我们稍后会看到如何实现它。这些方法大部分都可以直接通过原始的插口 API 在 IO 中实现。不过首先,我们会看到如何用这些 API 实现一个「回显」服务器。

Sockets 实现「回显」服务器

从顶层来说,我们的回显(echo)服务器在开始和结束时均没有资源可用,它使用了 ConsoleIOSockets 接口:

startServer : (ConsoleIO m, Sockets m) => ST m () []

首先我们需要用 socket 创建一个插口,绑定到一个端口并监听传入的连接。它可能会失败,因此我们需要处理它返回 Right sock 的情况,其中 sock 是新的插口变量,不过也可能返回 Left err

startServer : (ConsoleIO m, Sockets m) => ST m () []
startServer =
  do Right sock <- socket Stream
           | Left err => pure ()
     ?whatNow

交互式地实现这类函数是个不错的想法,我们可以通过挖坑来逐步观察整个系统的状态是如何变化的。在成功调用了 socket 之后,我们就有了 Ready 状态的插口:

  sock : Var
  m : Type -> Type
  constraint : ConsoleIO m
  constraint1 : Sockets m
--------------------------------------
whatNow : STrans m () [sock ::: Sock Ready] (\result1 => [])

接着,我们需要将插口绑定到端口,然后开始监听连接。同样,每一步都可能会失败,此时我们会移除该插口。失败总会导致插口转为 Closed 状态,此时我们能做的就是用 remove 移除它:

startServer : (ConsoleIO m, Sockets m) => ST m () []
startServer =
  do Right sock <- socket Stream        | Left err => pure ()
     Right ok <- bind sock Nothing 9442 | Left err => remove sock
     Right ok <- listen sock            | Left err => remove sock
     ?runServer

最后,我们就有了一个监听传入连接的插口:

  ok : ()
  sock : Var
  ok1 : ()
  m : Type -> Type
  constraint : ConsoleIO m
  constraint1 : Sockets m
--------------------------------------
runServer : STrans m () [sock ::: Sock Listening]
                   (\result1 => [])

我们会在一个独立的函数中实现它。runServer 的类型告诉我们 echoServer 的类型必须是什么(我们无需显式地为 Sock 给出参数 m):

echoServer : (ConsoleIO m, Sockets m) => (sock : Var) ->
             ST m () [remove sock (Sock {m} Listening)]

我们可以完成 startServer 的定义:

startServer : (ConsoleIO m, Sockets m) => ST m () []
startServer =
  do Right sock <- socket Stream        | Left err => pure ()
     Right ok <- bind sock Nothing 9442 | Left err => remove sock
     Right ok <- listen sock            | Left err => remove sock
     echoServer sock

echoServer 中,我们会继续接受并相应请求直到出现失败,此时我们会关闭插口并返回。我们从尝试接受传入的连接开始:

echoServer : (ConsoleIO m, Sockets m) => (sock : Var) ->
             ST m () [remove sock (Sock {m} Listening)]
echoServer sock =
  do Right new <- accept sock | Left err => do close sock; remove sock
     ?whatNow

accept 失败,我们就需要关闭 Listening 状态的插口并在返回前移除它,因为 echoServer 的类型要求如此。

通常,逐步实现 echoServer 意味着我们可以在开发过程中检查当前的状态。如果 accept 成功,那么我们既有的 sock 会继续监听连接,此外一个新的 new 插口会被打开用于通信:

  new : Var
  sock : Var
  m : Type -> Type
  constraint : ConsoleIO m
  constraint1 : Sockets m
--------------------------------------
whatNow : STrans m () [sock ::: Sock Listening, new ::: Sock Open]
                      (\result1 => [])

要完成 echoServer,我们需要从 new 插口上接收一条消息,然后原样返回它。在完成后,我们就关闭 new 插口,然后回到 echoServer 的开始处,准备响应下一次连接:

echoServer : (ConsoleIO m, Sockets m) => (sock : Var) ->
             ST m () [remove sock (Sock {m} Listening)]
echoServer sock =
  do Right new <- accept sock | Left err => do close sock; remove sock
     Right msg <- recv new | Left err => do close sock; remove sock; remove new
     Right ok <- send new ("You said " ++ msg)
           | Left err => do remove new; close sock; remove sock
     close new; remove new; echoServer sock

实现 Sockets

为了在 IO 中实现 Sockets,我们需要从给出具体的 Sock 类型开始。我们可以用原始的插口 API(在 Network.Socket 中实现),将 Socket 存储在 State 中来得到具体的类型,而不必关心该插口所处的抽象状态:

implementation Sockets IO where
  Sock _ = State Socket

大部分方法都可以直接用原始的插口 API 来实现,返回相应的 LeftRight。例如,我们可以实现 socketbindlisten

socket ty = do Right sock <- lift $ Socket.socket AF_INET ty 0
                    | Left err => pure (Left ())
               lbl <- new sock
               pure (Right lbl)
bind sock addr port = do ok <- lift $ bind !(read sock) addr port
                         if ok /= 0
                            then pure (Left ())
                            else pure (Right ())
listen sock = do ok <- lift $ listen !(read sock)
                 if ok /= 0
                    then pure (Left ())
                    else pure (Right ())

然而,这里的 accept 有点不同, 因为我们在用 new 为打开连接创建新资源时,它出现在了资源列表的起始处而非末尾。我们可以通过写出不完整的定义来看到这一点, 使用 returning 来查看返回 Right lbl 需要什么资源:

accept sock = do Right (conn, addr) <- lift $ accept !(read sock)
                       | Left err => pure (Left ())
                 lbl <- new conn
                 returning (Right lbl) ?fixResources

使用 new 将资源添加到列表起始处是很方便的,因为通常来说,这会让 Idris 使用隐式 auto 自动构造证明更加容易。另一方面,当我们用 call 来构造更小的资源集合时,updateWith 会将新创建的资源放到列表的末尾处,因为通常这样会减少需要重新排序的资源的数量。

如果我们查看 fixResources 的类型,就会知道结束 accept 需要做的事情:

  _bindApp0 : Socket
  conn : Socket
  addr : SocketAddress
  sock : Var
  lbl : Var
--------------------------------------
fixResources : STrans IO () [lbl ::: State Socket, sock ::: State Socket]
                      (\value => [sock ::: State Socket, lbl ::: State Socket])

当前资源列表的顺序为 lblsock,而我们需要它们的顺序变成 socklbl。为此,Control.ST 提供了原语 toEnd,它会将一个资源移到列表的末尾。这样我们就能完成 accept 了:

accept sock = do Right (conn, addr) <- lift $ accept !(read sock)
                       | Left err => pure (Left ())
                 lbl <- new conn
                 returning (Right lbl) (toEnd lbl)

Sockets 的完整实现见 Idris 发行版中的 samples/ST/Net/Network.idr 文件。你也可以在同目录下的 EchoServer.idr 文件中找到回显服务器。此外还有一个高级网络协议 RandServer.idr,它基于底层的插口 API,通过状态机的层级实现了一个高级的网络通信协议。它还使用线程来异步地处理传入的请求。你可以在 Edwin Brady 的文稿 State Machines All The Way Down (深入理解状态机)中找到关于线程和随机数服务器的更多详情。

The Effects Tutorial

A tutorial on the Effects package in Idris.

Effects and the Control.ST module

There is a new module in the contrib package, Control.ST, which provides the resource tracking facilities of Effects but with better support for creating and deleting resources, and implementing resources in terms of other resources.

Unless you have a particular reason to use Effects you are strongly recommended to use Control.ST instead. There is a tutorial available on this site for Control.ST with several examples (用 Idris 实现带有状态的系统:ST 教程).

注解

The documentation for Idris has been published under the Creative Commons CC0 License. As such to the extent possible under law, The Idris Community has waived all copyright and related or neighbouring rights to Documentation for Idris.

More information concerning the CC0 can be found online at: http://creativecommons.org/publicdomain/zero/1.0/

Introduction

Pure functional languages with dependent types such as Idris support reasoning about programs directly in the type system, promising that we can know a program will run correctly (i.e. according to the specification in its type) simply because it compiles. Realistically, though, things are not so simple: programs have to interact with the outside world, with user input, input from a network, mutable state, and so on. In this tutorial I will introduce the library, which is included with the distribution and supports programming and reasoning with side-effecting programs, supporting mutable state, interaction with the outside world, exceptions, and verified resource management.

This tutorial assumes familiarity with pure programming in Idris, as described in Sections 1–6 of the main tutorial [1]. The examples presented are tested with Idris and can be found in the examples directory of the Idris repository.

Consider, for example, the following introductory function which illustrates the kind of properties which can be expressed in the type system:

vadd : Vect n Int -> Vect n Int -> Vect n Int
vadd []        []        = []
vadd (x :: xs) (y :: ys) = x + y :: vadd xs ys

This function adds corresponding elements in a pair of vectors. The type guarantees that the vectors will contain only elements of type Int, and that the input lengths and the output length all correspond. A natural question to ask here, which is typically neglected by introductory tutorials, is “How do I turn this into a program?” That is, given some lists entered by a user, how do we get into a position to be able to apply the vadd function? Before doing so, we will have to:

  • Read user input, either from the keyboard, a file, or some other input device.
  • Check that the user inputs are valid, i.e. contain only Int and are the same length, and report an error if not.
  • Write output

The complete program will include side-effects for I/O and error handling, before we can get to the pure core functionality. In this tutorial, we will see how Idris supports side-effects. Furthermore, we will see how we can use the dependent type system to reason about stateful and side-effecting programs. We will return to this specific example later.

Hello world

To give an idea of how programs with effects look, here is the ubiquitous “Hello world” program, written using the Effects library:

module Main

import Effects
import Effect.StdIO

hello : Eff () [STDIO]
hello = putStrLn "Hello world!"

main : IO ()
main = run hello

As usual, the entry point is main. All main has to do is invoke the hello function which supports the STDIO effect for console I/O, and returns the unit value. All programs using the Effects library must import Effects. The details of the Eff type will be presented in the remainder of this tutorial.

To compile and run this program, Idris needs to be told to include the Effects package, using the -p effects flag (this flag is required for all examples in this tutorial):

idris hello.idr -o hello -p effects
./hello Hello world!

Outline

The tutorial is structured as follows: first, in Section State, we will discuss state management, describing why it is important and introducing the effects library to show how it can be used to manage state. This section also gives an overview of the syntax of effectful programs. Section Simple Effects then introduces a number of other effects a program may have: I/O; Exceptions; Random Numbers; and Non-determinism, giving examples for each, and an extended example combining several effects in one complete program. Section Dependent Effects introduces dependent effects, showing how states and resources can be managed in types. Section Creating New Effects shows how new effects can be implemented. Section Example: A “Mystery Word” Guessing Game gives an extended example showing how to implement a “mystery word” guessing game, using effects to describe the rules of the game and ensure they are implemented accurately. References to further reading are given in Section Further Reading.

[1]You do not, however, need to know what a monad is!

State

Many programs, even pure programs, can benefit from locally mutable state. For example, consider a program which tags binary tree nodes with a counter, by an inorder traversal (i.e. counting depth first, left to right). This would perform something like the following:

image

We can describe binary trees with the following data type BTree and testTree to represent the example input above:

data BTree a = Leaf
             | Node (BTree a) a (BTree a)

testTree : BTree String
testTree = Node (Node Leaf "Jim" Leaf)
                "Fred"
                (Node (Node Leaf "Alice" Leaf)
                      "Sheila"
                      (Node Leaf "Bob" Leaf))

Then our function to implement tagging, beginning to tag with a specific value i, has the following type:

treeTag : (i : Int) -> BTree a -> BTree (Int, a)

First attempt

Naïvely, we can implement treeTag by implementing a helper function which propagates a counter, returning the result of the count for each subtree:

treeTagAux : (i : Int) -> BTree a -> (Int, BTree (Int, a))
treeTagAux i Leaf = (i, Leaf)
treeTagAux i (Node l x r)
       = let (i', l') = treeTagAux i l in
         let x' = (i', x) in
         let (i'', r') = treeTagAux (i' + 1) r in
             (i'', Node l' x' r')

treeTag : (i : Int) -> BTree a -> BTree (Int, a)
treeTag i x = snd (treeTagAux i x)

This gives the expected result when run at the REPL prompt:

*TreeTag> treeTag 1 testTree
Node (Node Leaf (1, "Jim") Leaf)
     (2, "Fred")
     (Node (Node Leaf (3, "Alice") Leaf)
           (4, "Sheila")
           (Node Leaf (5, "Bob") Leaf)) : BTree (Int, String)

This works as required, but there are several problems when we try to scale this to larger programs. It is error prone, because we need to ensure that state is propagated correctly to the recursive calls (i.e. passing the appropriate i or i’). It is hard to read, because the functional details are obscured by the state propagation. Perhaps most importantly, there is a common programming pattern here which should be abstracted but instead has been implemented by hand. There is local mutable state (the counter) which we have had to make explicit.

Introducing Effects

Idris provides a library, Effects [3], which captures this pattern and many others involving effectful computation [1]. An effectful program f has a type of the following form:

f : (x1 : a1) -> (x2 : a2) -> ... -> Eff t effs

That is, the return type gives the effects that f supports (effs, of type List EFFECT) and the type the computation returns t. So, our treeTagAux helper could be written with the following type:

treeTagAux : BTree a -> Eff (BTree (Int, a)) [STATE Int]

That is, treeTagAux has access to an integer state, because the list of available effects includes STATE Int. STATE is declared as follows in the module Effect.State (that is, we must import Effect.State to be able to use it):

STATE : Type -> EFFECT

It is an effect parameterised by a type (by convention, we write effects in all capitals). The treeTagAux function is an effectful program which builds a new tree tagged with Ints, and is implemented as follows:

treeTagAux Leaf = pure Leaf
treeTagAux (Node l x r)
    = do l' <- treeTagAux l
         i <- get
         put (i + 1)
         r' <- treeTagAux r
         pure (Node l' (i, x) r')

There are several remarks to be made about this implementation. Essentially, it hides the state, which can be accessed using get and updated using put, but it introduces several new features. Specifically, it uses do-notation, binding variables with <-, and a pure function. There is much to be said about these features, but for our purposes, it suffices to know the following:

  • do blocks allow effectful operations to be sequenced.
  • x <- e binds the result of an effectful operation e to a
    variable x. For example, in the above code, treeTagAux l is an effectful operation returning BTree (Int, a), so l’ has type BTree (Int, a).
  • pure e turns a pure value e into the result of an effectful
    operation.

The get and put functions read and write a state t, assuming that the STATE t effect is available. They have the following types, polymorphic in the state t they manage:

get :      Eff t [STATE t]
put : t -> Eff () [STATE t]

A program in Eff can call any other function in Eff provided that the calling function supports at least the effects required by the called function. In this case, it is valid for treeTagAux to call both get and put because all three functions support the STATE Int effect.

Programs in Eff are run in some underlying computation context, using the run or runPure function. Using runPure, which runs an effectful program in the identity context, we can write the treeTag function as follows, using put to initialise the state:

treeTag : (i : Int) -> BTree a -> BTree (Int, a)
treeTag i x = runPure (do put i
                          treeTagAux x)

We could also run the program in an impure context such as IO, without changing the definition of treeTagAux, by using run instead of runPure:

treeTagAux : BTree a -> Eff (BTree (Int, a)) [STATE Int]
...

treeTag : (i : Int) -> BTree a -> IO (BTree (Int, a))
treeTag i x = run (do put i
                      treeTagAux x)

Note that the definition of treeTagAux is exactly as before. For reference, this complete program (including a main to run it) is shown in Listing [introprog].

module Main

import Effects
import Effect.State

data BTree a = Leaf
             | Node (BTree a) a (BTree a)

Show a => Show (BTree a) where
    show Leaf = "[]"
    show (Node l x r) = "[" ++ show l ++ " "
                            ++ show x ++ " "
                            ++ show r ++ "]"

testTree : BTree String
testTree = Node (Node Leaf "Jim" Leaf)
              "Fred"
              (Node (Node Leaf "Alice" Leaf)
                    "Sheila"
                    (Node Leaf "Bob" Leaf))

treeTagAux : BTree a -> Eff (BTree (Int, a)) [STATE Int]
treeTagAux Leaf = pure Leaf
treeTagAux (Node l x r) = do l' <- treeTagAux l
                             i <- get
                             put (i + 1)
                             r' <- treeTagAux r
                             pure (Node l' (i, x) r')

treeTag : (i : Int) -> BTree a -> BTree (Int, a)
treeTag i x = runPure (do put i; treeTagAux x)

main : IO ()
main = print (treeTag 1 testTree)

Effects and Resources

Each effect is associated with a resource, which is initialised before an effectful program can be run. For example, in the case of STATE Int the corresponding resource is the integer state itself. The types of runPure and run show this (slightly simplified here for illustrative purposes):

runPure : {env : Env id xs} -> Eff a xs -> a
run : Applicative m => {env : Env m xs} -> Eff a xs -> m a

The env argument is implicit, and initialised automatically where possible using default values given by implementations of the following interface:

interface Default a where
    default : a

Implementations of Default are defined for all primitive types, and many library types such as List, Vect, Maybe, pairs, etc. However, where no default value exists for a resource type (for example, you may want a STATE type for which there is no Default implementation) the resource environment can be given explicitly using one of the following functions:

runPureInit : Env id xs -> Eff a xs -> a
runInit : Applicative m => Env m xs -> Eff a xs -> m a

To be well-typed, the environment must contain resources corresponding exactly to the effects in xs. For example, we could also have implemented treeTag by initialising the state as follows:

treeTag : (i : Int) -> BTree a -> BTree (Int, a)
treeTag i x = runPureInit [i] (treeTagAux x)

Labelled Effects

What if we have more than one state, especially more than one state of the same type? How would get and put know which state they should be referring to? For example, how could we extend the tree tagging example such that it additionally counts the number of leaves in the tree? One possibility would be to change the state so that it captured both of these values, e.g.:

treeTagAux : BTree a -> Eff (BTree (Int, a)) [STATE (Int, Int)]

Doing this, however, ties the two states together throughout (as well as not indicating which integer is which). It would be nice to be able to call effectful programs which guaranteed only to access one of the states, for example. In a larger application, this becomes particularly important.

The library therefore allows effects in general to be labelled so that they can be referred to explicitly by a particular name. This allows multiple effects of the same type to be included. We can count leaves and update the tag separately, by labelling them as follows:

treeTagAux : BTree a ->  Eff (BTree (Int, a))
                               ['Tag ::: STATE Int,
                                'Leaves ::: STATE Int]

The ::: operator allows an arbitrary label to be given to an effect. This label can be any type—it is simply used to identify an effect uniquely. Here, we have used a symbol type. In general ’name introduces a new symbol, the only purpose of which is to disambiguate values [2].

When an effect is labelled, its operations are also labelled using the :- operator. In this way, we can say explicitly which state we mean when using get and put. The tree tagging program which also counts leaves can be written as follows:

treeTagAux Leaf = do
    'Leaves :- update (+1)
    pure Leaf
treeTagAux (Node l x r) = do
    l' <- treeTagAux l
    i <- 'Tag :- get
    'Tag :- put (i + 1)
    r' <- treeTagAux r
    pure (Node l' (i, x) r')

The update function here is a combination of get and put, applying a function to the current state.

update : (x -> x) -> Eff () [STATE x]

Finally, our top level treeTag function now returns a pair of the number of leaves, and the new tree. Resources for labelled effects are initialised using the := operator (reminiscent of assignment in an imperative language):

treeTag : (i : Int) -> BTree a -> (Int, BTree (Int, a))
treeTag i x = runPureInit ['Tag := i, 'Leaves := 0]
                    (do x' <- treeTagAux x
                        leaves <- 'Leaves :- get
                        pure (leaves, x'))

To summarise, we have:

  • ::: to convert an effect to a labelled effect.
  • :- to convert an effectful operation to a labelled effectful operation.
  • := to initialise a resource for a labelled effect.

Or, more formally with their types (slightly simplified to account only for the situation where available effects are not updated):

(:::) : lbl -> EFFECT -> EFFECT
(:-)  : (l : lbl) -> Eff a [x] -> Eff a [l ::: x]
(:=)  : (l : lbl) -> res -> LRes l res

Here, LRes is simply the resource type associated with a labelled effect. Note that labels are polymorphic in the label type lbl. Hence, a label can be anything—a string, an integer, a type, etc.

!-notation

In many cases, using do-notation can make programs unnecessarily verbose, particularly in cases where the value bound is used once, immediately. The following program returns the length of the String stored in the state, for example:

stateLength : Eff Nat [STATE String]
stateLength = do x <- get
                 pure (length x)

This seems unnecessarily verbose, and it would be nice to program in a more direct style in these cases. provides !-notation to help with this. The above program can be written instead as:

stateLength : Eff Nat [STATE String]
stateLength = pure (length !get)

The notation !expr means that the expression expr should be evaluated and then implicitly bound. Conceptually, we can think of ! as being a prefix function with the following type:

(!) : Eff a xs -> a

Note, however, that it is not really a function, merely syntax! In practice, a subexpression !expr will lift expr as high as possible within its current scope, bind it to a fresh name x, and replace !expr with x. Expressions are lifted depth first, left to right. In practice, !-notation allows us to program in a more direct style, while still giving a notational clue as to which expressions are effectful.

For example, the expression:

let y = 42 in f !(g !(print y) !x)

is lifted to:

let y = 42 in do y' <- print y
                 x' <- x
                 g' <- g y' x'
                 f g'

The Type Eff

Underneath, Eff is an overloaded function which translates to an underlying type EffM:

EffM : (m : Type -> Type) -> (t : Type)
        -> (List EFFECT)
        -> (t -> List EFFECT) -> Type

This is more general than the types we have been writing so far. It is parameterised over an underlying computation context m, a result type t as we have already seen, as well as a List EFFECT and a function type t -> List EFFECT.

These additional parameters are the list of input effects, and a list of output effects, computed from the result of an effectful operation. That is: running an effectful program can change the set of effects available! This is a particularly powerful idea, and we will see its consequences in more detail later. Some examples of operations which can change the set of available effects are:

  • Updating a state containing a dependent type (for example adding an element to a vector).
  • Opening a file for reading is an effect, but whether the file really is open afterwards depends on whether the file was successfully opened.
  • Closing a file means that reading from the file should no longer be possible.

While powerful, this can make uses of the EffM type hard to read. Therefore the library provides an overloaded function Eff There are the following three versions:

SimpleEff.Eff : (t : Type) -> (input_effs : List EFFECT) -> Type
TransEff.Eff  : (t : Type) -> (input_effs : List EFFECT) ->
                              (output_effs : List EFFECT) -> Type
DepEff.Eff    : (t : Type) -> (input_effs : List EFFECT) ->
                              (output_effs_fn : t -> List EFFECT) -> Type

So far, we have used only the first version, SimpleEff.Eff, which is defined as follows:

Eff : (x : Type) -> (es : List EFFECT) -> Type
Eff x es = {m : Type -> Type} -> EffM m x es (\v => es)

i.e. the set of effects remains the same on output. This suffices for the STATE example we have seen so far, and for many useful side-effecting programs. We could also have written treeTagAux with the expanded type:

treeTagAux : BTree a ->
             EffM m (BTree (Int, a)) [STATE Int] (\x => [STATE Int])

Later, we will see programs which update effects:

Eff a xs xs'

which is expanded to

EffM m a xs (\_ => xs')

i.e. the set of effects is updated to xs’ (think of a transition in a state machine). There is, for example, a version of put which updates the type of the state:

putM : y -> Eff () [STATE x] [STATE y]

Also, we have:

Eff t xs (\res => xs')

which is expanded to

EffM m t xs (\res => xs')

i.e. the set of effects is updated according to the result of the operation res, of type t.

Parameterising EffM over an underlying computation context allows us to write effectful programs which are specific to one context, and in some cases to write programs which extend the list of effects available using the new function, though this is beyond the scope of this tutorial.

[1]The earlier paper [3] describes the essential implementation details, although the library presented there is an earlier version which is less powerful than that presented in this tutorial.
[2]In practice, ’name simply introduces a new empty type
[3](1, 2) Edwin Brady. 2013. Programming and reasoning with algebraic effects and dependent types. SIGPLAN Not. 48, 9 (September 2013), 133-144. DOI=10.1145/2544174.2500581 http://dl.acm.org/citation.cfm?doid=2544174.2500581

Simple Effects

So far we have seen how to write programs with locally mutable state using the STATE effect. To recap, we have the definitions below in a module Effect.State

module Effect.State

STATE : Type -> EFFECT

get    :             Eff x  [STATE x]
put    : x ->        Eff () [STATE x]
putM   : y ->        Eff () [STATE x] [STATE y]
update : (x -> x) -> Eff () [STATE x]

Handler State m where { ... }

The last line, Handler State m where { ... }, means that the STATE effect is usable in any computation context m. That is, a program which uses this effect and returns something of type a can be evaluated to something of type m a using run, for any m. The lower case State is a data type describing the operations which make up the STATE effect itself—we will go into more detail about this in Section Creating New Effects.

In this section, we will introduce some other supported effects, allowing console I/O, exceptions, random number generation and non-deterministic programming. For each effect we introduce, we will begin with a summary of the effect, its supported operations, and the contexts in which it may be used, like that above for STATE, and go on to present some simple examples. At the end, we will see some examples of programs which combine multiple effects.

All of the effects in the library, including those described in this section, are summarised in Appendix Effects Summary.

Console I/O

Console I/O is supported with the STDIO effect, which allows reading and writing characters and strings to and from standard input and standard output. Notice that there is a constraint here on the computation context m, because it only makes sense to support console I/O operations in a context where we can perform (or at the very least simulate) console I/O:

module Effect.StdIO

STDIO : EFFECT

putChar  : Char ->   Eff () [STDIO]
putStr   : String -> Eff () [STDIO]
putStrLn : String -> Eff () [STDIO]

getStr   :           Eff String [STDIO]
getChar  :           Eff Char [STDIO]

Handler StdIO IO where { ... }
Handler StdIO (IOExcept a) where { ... }
Examples

A program which reads the user’s name, then says hello, can be written as follows:

hello : Eff () [STDIO]
hello = do putStr "Name? "
           x <- getStr
           putStrLn ("Hello " ++ trim x ++ "!")

We use trim here to remove the trailing newline from the input. The resource associated with STDIO is simply the empty tuple, which has a default value (), so we can run this as follows:

main : IO ()
main = run hello

In hello we could also use !-notation instead of x <- getStr, since we only use the string that is read once:

hello : Eff () [STDIO]
hello = do putStr "Name? "
           putStrLn ("Hello " ++ trim !getStr ++ "!")

More interestingly, we can combine multiple effects in one program. For example, we can loop, counting the number of people we’ve said hello to:

hello : Eff () [STATE Int, STDIO]
hello = do putStr "Name? "
           putStrLn ("Hello " ++ trim !getStr ++ "!")
           update (+1)
           putStrLn ("I've said hello to: " ++ show !get ++ " people")
           hello

The list of effects given in hello means that the function can call get and put on an integer state, and any functions which read and write from the console. To run this, main does not need to be changed.

Aside: Resource Types

To find out the resource type of an effect, if necessary (for example if we want to initialise a resource explicitly with runInit rather than using a default value with run) we can run the resourceType function at the REPL:

*ConsoleIO> resourceType STDIO
() : Type
*ConsoleIO> resourceType (STATE Int)
Int : Type

Exceptions

The EXCEPTION effect is declared in module Effect.Exception. This allows programs to exit immediately with an error, or errors to be handled more generally:

module Effect.Exception

EXCEPTION : Type -> EFFECT

raise : a -> Eff b [EXCEPTION a]

Handler (Exception a) Maybe where { ... }
Handler (Exception a) List where { ... }
Handler (Exception a) (Either a) where { ... }
Handler (Exception a) (IOExcept a) where { ... }
Show a => Handler (Exception a) IO where { ... }
Example

Suppose we have a String which is expected to represent an integer in the range 0 to n. We can write a function parseNumber which returns an Int if parsing the string returns a number in the appropriate range, or throws an exception otherwise. Exceptions are parameterised by an error type:

data Error = NotANumber | OutOfRange

parseNumber : Int -> String -> Eff Int [EXCEPTION Error]
parseNumber num str
   = if all isDigit (unpack str)
        then let x = cast str in
             if (x >=0 && x <= num)
                then pure x
                else raise OutOfRange
        else raise NotANumber

Programs which support the EXCEPTION effect can be run in any context which has some way of throwing errors, for example, we can run parseNumber in the Either Error context. It returns a value of the form Right x if successful:

*Exception> the (Either Error Int) $ run (parseNumber 42 "20")
Right 20 : Either Error Int

Or Left e on failure, carrying the appropriate exception:

*Exception> the (Either Error Int) $ run (parseNumber 42 "50")
Left OutOfRange : Either Error Int

*Exception> the (Either Error Int) $ run (parseNumber 42 "twenty")
Left NotANumber : Either Error Int

In fact, we can do a little bit better with parseNumber, and have it return a proof that the integer is in the required range along with the integer itself. One way to do this is define a type of bounded integers, Bounded:

Bounded : Int -> Type
Bounded x = (n : Int ** So (n >= 0 && n <= x))

Recall that So is parameterised by a Bool, and only So True is inhabited. We can use choose to construct such a value from the result of a dynamic check:

data So : Bool -> Type = Oh : So True

choose : (b : Bool) -> Either (So b) (So (not b))

We then write parseNumber using choose rather than an if/then/else construct, passing the proof it returns on success as the boundedness proof:

parseNumber : (x : Int) -> String -> Eff (Bounded x) [EXCEPTION Error]
parseNumber x str
   = if all isDigit (unpack str)
        then let num = cast str in
             case choose (num >=0 && num <= x) of
                  Left p => pure (num ** p)
                  Right p => raise OutOfRange
        else raise NotANumber

Random Numbers

Random number generation is also implemented by the library, in module Effect.Random:

module Effect.Random

RND : EFFECT

srand  : Integer ->            Eff () [RND]
rndInt : Integer -> Integer -> Eff Integer [RND]
rndFin : (k : Nat) ->          Eff (Fin (S k)) [RND]

Handler Random m where { ... }

Random number generation is considered side-effecting because its implementation generally relies on some external source of randomness. The default implementation here relies on an integer seed, which can be set with srand. A specific seed will lead to a predictable, repeatable sequence of random numbers. There are two functions which produce a random number:

  • rndInt, which returns a random integer between the given lower
    and upper bounds.
  • rndFin, which returns a random element of a finite set
    (essentially a number with an upper bound given in its type).
Example

We can use the RND effect to implement a simple guessing game. The guess function, given a target number, will repeatedly ask the user for a guess, and state whether the guess is too high, too low, or correct:

guess : Int -> Eff () [STDIO]

For reference, the code for guess is given below:

guess : Int -> Eff () [STDIO]
guess target
    = do putStr "Guess: "
         case run {m=Maybe} (parseNumber 100 (trim !getStr)) of
              Nothing => do putStrLn "Invalid input"
                            guess target
              Just (v ** _) =>
                         case compare v target of
                             LT => do putStrLn "Too low"
                                      guess target
                             EQ => putStrLn "You win!"
                             GT => do putStrLn "Too high"
                                      guess target

Note that we use parseNumber as defined previously to read user input, but we don’t need to list the EXCEPTION effect because we use a nested run to invoke parseNumber, independently of the calling effectful program.

To invoke this, we pick a random number within the range 0–100, having set up the random number generator with a seed, then run guess:

game : Eff () [RND, STDIO]
game = do srand 123456789
          guess (fromInteger !(rndInt 0 100))

main : IO ()
main = run game

If no seed is given, it is set to the default value. For a less predictable game, some better source of randomness would be required, for example taking an initial seed from the system time. To see how to do this, see the SYSTEM effect described in Effects Summary.

Non-determinism

The listing below gives the definition of the non-determinism effect, which allows a program to choose a value non-deterministically from a list of possibilities in such a way that the entire computation succeeds:

import Effects
import Effect.Select

SELECT : EFFECT

select : List a -> Eff a [SELECT]

Handler Selection Maybe where { ... }
Handler Selection List where { ... }
Example

The SELECT effect can be used to solve constraint problems, such as finding Pythagorean triples. The idea is to use select to give a set of candidate values, then throw an exception for any combination of values which does not satisfy the constraint:

triple : Int -> Eff (Int, Int, Int) [SELECT, EXCEPTION String]
triple max = do z <- select [1..max]
                y <- select [1..z]
                x <- select [1..y]
                if (x * x + y * y == z * z)
                   then pure (x, y, z)
                   else raise "No triple"

This program chooses a value for z between 1 and max, then values for y and x. In operation, after a select, the program executes the rest of the do-block for every possible assignment, effectively searching depth-first. If the list is empty (or an exception is thrown) execution fails.

There are handlers defined for Maybe and List contexts, i.e. contexts which can capture failure. Depending on the context m, triple will either return the first triple it finds (if in Maybe context) or all triples in the range (if in List context). We can try this as follows:

main : IO ()
main = do print $ the (Maybe _) $ run (triple 100)
          print $ the (List _) $ run (triple 100)

vadd revisited

We now return to the vadd program from the introduction. Recall the definition:

vadd : Vect n Int -> Vect n Int -> Vect n Int
vadd []        []        = []
vadd (x :: xs) (y :: ys) = x + y :: vadd xs ys

Using , we can set up a program so that it reads input from a user, checks that the input is valid (i.e both vectors contain integers, and are the same length) and if so, pass it on to vadd. First, we write a wrapper for vadd which checks the lengths and throw an exception if they are not equal. We can do this for input vectors of length n and m by matching on the implicit arguments n and m and using decEq to produce a proof of their equality, if they are equal:

vadd_check : Vect n Int -> Vect m Int ->
             Eff (Vect m Int) [EXCEPTION String]
vadd_check {n} {m} xs ys with (decEq n m)
  vadd_check {n} {m=n} xs ys | (Yes Refl) = pure (vadd xs ys)
  vadd_check {n} {m}   xs ys | (No contra) = raise "Length mismatch"

To read a vector from the console, we implement a function of the following type:

read_vec : Eff (p ** Vect p Int) [STDIO]

This returns a dependent pair of a length, and a vector of that length, because we cannot know in advance how many integers the user is going to input. We can use -1 to indicate the end of input:

read_vec : Eff (p ** Vect p Int) [STDIO]
read_vec = do putStr "Number (-1 when done): "
              case run (parseNumber (trim !getStr)) of
                   Nothing => do putStrLn "Input error"
                                 read_vec
                   Just v => if (v /= -1)
                                then do (_ ** xs) <- read_vec
                                        pure (_ ** v :: xs)
                                else pure (_ ** [])
  where
    parseNumber : String -> Eff Int [EXCEPTION String]
    parseNumber str
      = if all (\x => isDigit x || x == '-') (unpack str)
           then pure (cast str)
           else raise "Not a number"

This uses a variation on parseNumber which does not require a number to be within range.

Finally, we write a program which reads two vectors and prints the result of pairwise addition of them, throwing an exception if the inputs are of differing lengths:

do_vadd : Eff () [STDIO, EXCEPTION String]
do_vadd = do putStrLn "Vector 1"
             (_ ** xs) <- read_vec
             putStrLn "Vector 2"
             (_ ** ys) <- read_vec
             putStrLn (show !(vadd_check xs ys))

By having explicit lengths in the type, we can be sure that vadd is only being used where the lengths of inputs are guaranteed to be equal. This does not stop us reading vectors from user input, but it does require that the lengths are checked and any discrepancy is dealt with gracefully.

Example: An Expression Calculator

To show how these effects can fit together, let us consider an evaluator for a simple expression language, with addition and integer values.

data Expr = Val Integer
          | Add Expr Expr

An evaluator for this language always returns an Integer, and there are no situations in which it can fail!

eval : Expr -> Integer
eval (Val x) = x
eval (Add l r) = eval l + eval r

If we add variables, however, things get more interesting. The evaluator will need to be able to access the values stored in variables, and variables may be undefined.

data Expr = Val Integer
          | Var String
          | Add Expr Expr

To start, we will change the type of eval so that it is effectful, and supports an exception effect for throwing errors, and a state containing a mapping from variable names (as String) to their values:

Env : Type
Env = List (String, Integer)

eval : Expr -> Eff Integer [EXCEPTION String, STATE Env]
eval (Val x)   = pure x
eval (Add l r) = pure $ !(eval l) + !(eval r)

Note that we are using !-notation to avoid having to bind subexpressions in a do block. Next, we add a case for evaluating Var:

eval (Var x) = case lookup x !get of
                    Nothing => raise $ "No such variable " ++ x
                    Just val => pure val

This retrieves the state (with get, supported by the STATE Env effect) and raises an exception if the variable is not in the environment (with raise, supported by the EXCEPTION String effect).

To run the evaluator on a particular expression in a particular environment of names and their values, we can write a function which sets the state then invokes eval:

runEval : List (String, Integer) -> Expr -> Maybe Integer
runEval args expr = run (eval' expr)
  where eval' : Expr -> Eff Integer [EXCEPTION String, STATE Env]
        eval' e = do put args
                     eval e

We have picked Maybe as a computation context here; it needs to be a context which is available for every effect supported by eval. In particular, because we have exceptions, it needs to be a context which supports exceptions. Alternatively, Either String or IO would be fine, for example.

What if we want to extend the evaluator further, with random number generation? To achieve this, we add a new constructor to Expr, which gives a random number up to a maximum value:

data Expr = Val Integer
          | Var String
          | Add Expr Expr
          | Random Integer

Then, we need to deal with the new case, making sure that we extend the list of events to include RND. It doesn’t matter where RND appears in the list, as long as it is present:

eval : Expr -> Eff Integer [EXCEPTION String, RND, STATE Env]

eval (Random upper) = rndInt 0 upper

For test purposes, we might also want to print the random number which has been generated:

eval (Random upper) = do val <- rndInt 0 upper
                         putStrLn (show val)
                         pure val

If we try this without extending the effects list, we would see an error something like the following:

Expr.idr:28:6:When elaborating right hand side of eval:
Can't solve goal
   SubList [STDIO]
           [(EXCEPTION String), RND, (STATE (List (String, Integer)))]

In other words, the STDIO effect is not available. We can correct this simply by updating the type of eval to include STDIO.

eval : Expr -> Eff Integer [STDIO, EXCEPTION String, RND, STATE Env]

注解

Using STDIO will restrict the number of contexts in which eval can be run to those which support STDIO, such as IO. Once effect lists get longer, it can be a good idea instead to encapsulate sets of effects in a type synonym. This is achieved as follows, simply by defining a function which computes a type, since types are first class in Idris:

EvalEff : Type -> Type
EvalEff t = Eff t [STDIO, EXCEPTION String, RND, STATE Env]

eval : Expr -> EvalEff Integer

Dependent Effects

In the programs we have seen so far, the available effects have remained constant. Sometimes, however, an operation can change the available effects. The simplest example occurs when we have a state with a dependent type—adding an element to a vector also changes its type, for example, since its length is explicit in the type. In this section, we will see how the library supports this. Firstly, we will see how states with dependent types can be implemented. Secondly, we will see how the effects can depend on the result of an effectful operation. Finally, we will see how this can be used to implement a type-safe and resource-safe protocol for file management.

Dependent States

Suppose we have a function which reads input from the console, converts it to an integer, and adds it to a list which is stored in a STATE. It might look something like the following:

readInt : Eff () [STATE (List Int), STDIO]
readInt = do let x = trim !getStr
             put (cast x :: !get)

But what if, instead of a list of integers, we would like to store a Vect, maintaining the length in the type?

readInt : Eff () [STATE (Vect n Int), STDIO]
readInt = do let x = trim !getStr
             put (cast x :: !get)

This will not type check! Although the vector has length n on entry to readInt, it has length S n on exit. The library allows us to express this as follows:

readInt : Eff ()[STATE (Vect n Int), STDIO]
                [STATE (Vect (S n) Int), STDIO]
readInt = do let x = trim !getStr
             putM (cast x :: !get)

The type Eff a xs xs' means that the operation begins with effects xs available, and ends with effects xs’ available. We have used putM to update the state, where the M suffix indicates that the type is being updated as well as the value. It has the following type:

putM : y -> Eff () [STATE x] [STATE y]

Result-dependent Effects

Often, whether a state is updated could depend on the success or otherwise of an operation. In our readInt example, we might wish to update the vector only if the input is a valid integer (i.e. all digits). As a first attempt, we could try the following, returning a Bool which indicates success:

readInt : Eff Bool [STATE (Vect n Int), STDIO]
                   [STATE (Vect (S n) Int), STDIO]
readInt = do let x = trim !getStr
             case all isDigit (unpack x) of
                  False => pure False
                  True => do putM (cast x :: !get)
                             pure True

Unfortunately, this will not type check because the vector does not get extended in both branches of the case!

MutState.idr:18:19:When elaborating right hand side of Main.case
block in readInt:
Unifying n and S n would lead to infinite value

Clearly, the size of the resulting vector depends on whether or not the value read from the user was valid. We can express this in the type:

readInt : Eff Bool [STATE (Vect n Int), STDIO]
            (\ok => if ok then [STATE (Vect (S n) Int), STDIO]
                          else [STATE (Vect n Int), STDIO])
readInt = do let x = trim !getStr
             case all isDigit (unpack x) of
                  False => pureM False
                  True => do putM (cast x :: !get)
                             pureM True

Using pureM rather than pure allows the output effects to be calculated from the value given. Its type is:

pureM : (val : a) -> EffM m a (xs val) xs

When using readInt, we will have to check its return value in order to know what the new set of effects is. For example, to read a set number of values into a vector, we could write the following:

readN : (n : Nat) ->
        Eff () [STATE (Vect m Int), STDIO]
               [STATE (Vect (n + m) Int), STDIO]
readN Z = pure ()
readN {m} (S k) = case !readInt of
                      True => rewrite plusSuccRightSucc k m in readN k
                      False => readN (S k)

The case analysis on the result of readInt means that we know in each branch whether reading the integer succeeded, and therefore how many values still need to be read into the vector. What this means in practice is that the type system has verified that a necessary dynamic check (i.e. whether reading a value succeeded) has indeed been done.

注解

Only case will work here. We cannot use if/then/else because the then and else branches must have the same type. The case construct, however, abstracts over the value being inspected in the type of each branch.

File Management

A practical use for dependent effects is in specifying resource usage protocols and verifying that they are executed correctly. For example, file management follows a resource usage protocol with the following (informally specified) requirements:

  • It is necessary to open a file for reading before reading it
  • Opening may fail, so the programmer should check whether opening was successful
  • A file which is open for reading must not be written to, and vice versa
  • When finished, an open file handle should be closed
  • When a file is closed, its handle should no longer be used

These requirements can be expressed formally in , by creating a FILE_IO effect parameterised over a file handle state, which is either empty, open for reading, or open for writing. The FILE_IO effect’s definition is given below. Note that this effect is mainly for illustrative purposes—typically we would also like to support random access files and better reporting of error conditions.

module Effect.File

import Effects
import Control.IOExcept

FILE_IO : Type -> EFFECT

data OpenFile : Mode -> Type

open : (fname : String)
       -> (m : Mode)
       -> Eff Bool [FILE_IO ()]
                   (\res => [FILE_IO (case res of
                                           True => OpenFile m
                                           False => ())])
close : Eff () [FILE_IO (OpenFile m)] [FILE_IO ()]

readLine  : Eff String [FILE_IO (OpenFile Read)]
writeLine : String -> Eff () [FILE_IO (OpenFile Write)]
eof       : Eff Bool [FILE_IO (OpenFile Read)]

Handler FileIO IO where { ... }

In particular, consider the type of open:

open : (fname : String)
       -> (m : Mode)
       -> Eff Bool [FILE_IO ()]
                   (\res => [FILE_IO (case res of
                                           True => OpenFile m
                                           False => ())])

This returns a Bool which indicates whether opening the file was successful. The resulting state depends on whether the operation was successful; if so, we have a file handle open for the stated purpose, and if not, we have no file handle. By case analysis on the result, we continue the protocol accordingly.

readFile : Eff (List String) [FILE_IO (OpenFile Read)]
readFile = readAcc [] where
    readAcc : List String -> Eff (List String) [FILE_IO (OpenFile Read)]
    readAcc acc = if (not !eof)
                     then readAcc (!readLine :: acc)
                     else pure (reverse acc)

Given a function readFile, above, which reads from an open file until reaching the end, we can write a program which opens a file, reads it, then displays the contents and closes it, as follows, correctly following the protocol:

dumpFile : String -> Eff () [FILE_IO (), STDIO]
dumpFile name = case !(open name Read) of
                    True => do putStrLn (show !readFile)
                               close
                    False => putStrLn ("Error!")

The type of dumpFile, with FILE_IO () in its effect list, indicates that any use of the file resource will follow the protocol correctly (i.e. it both begins and ends with an empty resource). If we fail to follow the protocol correctly (perhaps by forgetting to close the file, failing to check that open succeeded, or opening the file for writing) then we will get a compile-time error. For example, changing open name Read to open name Write yields a compile-time error of the following form:

FileTest.idr:16:18:When elaborating right hand side of Main.case
block in testFile:
Can't solve goal
        SubList [(FILE_IO (OpenFile Read))]
                [(FILE_IO (OpenFile Write)), STDIO]

In other words: when reading a file, we need a file which is open for reading, but the effect list contains a FILE_IO effect carrying a file open for writing.

Pattern-matching bind

It might seem that having to test each potentially failing operation with a case clause could lead to ugly code, with lots of nested case blocks. Many languages support exceptions to improve this, but unfortunately exceptions may not allow completely clean resource management—for example, guaranteeing that any open which did succeed has a corresponding close.

Idris supports pattern-matching bindings, such as the following:

dumpFile : String -> Eff () [FILE_IO (), STDIO]
dumpFile name = do True <- open name Read
                   putStrLn (show !readFile)
                   close

This also has a problem: we are no longer dealing with the case where opening a file failed! The solution is to extend the pattern-matching binding syntax to give brief clauses for failing matches. Here, for example, we could write:

dumpFile : String -> Eff () [FILE_IO (), STDIO]
dumpFile name  = do True <- open name Read | False => putStrLn "Error"
                    putStrLn (show !readFile)
                    close

This is exactly equivalent to the definition with the explicit case. In general, in a do-block, the syntax:

do pat <- val | <alternatives>
   p

is desugared to

do x <- val
   case x of
        pat => p
        <alternatives>

There can be several alternatives, separated by a vertical bar |. For example, there is a SYSTEM effect which supports reading command line arguments, among other things (see Appendix Effects Summary). To read command line arguments, we can use getArgs:

getArgs : Eff (List String) [SYSTEM]

A main program can read command line arguments as follows, where in the list which is returned, the first element prog is the executable name and the second is an expected argument:

emain : Eff () [SYSTEM, STDIO]
emain = do [prog, arg] <- getArgs
           putStrLn $ "Argument is " ++ arg
           {- ... rest of function ... -}

Unfortunately, this will not fail gracefully if no argument is given, or if too many arguments are given. We can use pattern matching bind alternatives to give a better (more informative) error:

emain : Eff () [SYSTEM, STDIO]
emain = do [prog, arg] <- getArgs | [] => putStrLn "Can't happen!"
                                  | [prog] => putStrLn "No arguments!"
                                  | _ => putStrLn "Too many arguments!"
           putStrLn $ "Argument is " ++ arg
           {- ... rest of function ... -}

If getArgs does not return something of the form [prog, arg] the alternative which does match is executed instead, and that value returned.

Creating New Effects

We have now seen several side-effecting operations provided by the Effects library, and examples of their use in Section Simple Effects. We have also seen how operations may modify the available effects by changing state in Section Dependent Effects. We have not, however, yet seen how these operations are implemented. In this section, we describe how a selection of the available effects are implemented, and show how new effectful operations may be provided.

State

Effects are described by algebraic data types, where the constructors describe the operations provided when the effect is available. Stateful operations are described as follows:

data State : Effect where
     Get :      State a  a (\x => a)
     Put : b -> State () a (\x => b)

Effect itself is a type synonym, giving the required type for an effect signature:

Effect : Type
Effect = (result : Type) ->
         (input_resource : Type) ->
         (output_resource : result -> Type) -> Type

Each effect is associated with a resource. The second argument to an effect signature is the resource type on input to an operation, and the third is a function which computes the resource type on output. Here, it means:

  • Get takes no arguments. It has a resource of type a, which is not updated, and running the Get operation returns something of type a.
  • Put takes a b as an argument. It has a resource of type a on input, which is updated to a resource of type b. Running the Put operation returns the element of the unit type.

The effects library provides an overloaded function sig which can make effect signatures more concise, particularly when the result has no effect on the resource type. For State, we can write:

data State : Effect where
     Get :      sig State a  a
     Put : b -> sig State () a b

There are four versions of sig, depending on whether we are interested in the resource type, and whether we are updating the resource. Idris will infer the appropriate version from usage.

NoResourceEffect.sig : Effect -> Type -> Type
NoUpdateEffect.sig   : Effect -> (ret : Type) ->
                                 (resource : Type) -> Type
UpdateEffect.sig     : Effect -> (ret : Type) ->
                                 (resource_in : Type) ->
                                 (resource_out : Type) -> Type
DepUpdateEffect.sig  : Effect -> (ret : Type) ->
                                 (resource_in : Type) ->
                                 (resource_out : ret -> Type) -> Type

In order to convert State (of type Effect) into something usable in an effects list, of type EFFECT, we write the following:

STATE : Type -> EFFECT
STATE t = MkEff t State

MkEff constructs an EFFECT by taking the resource type (here, the t which parameterises STATE) and the effect signature (here, State). For reference, EFFECT is declared as follows:

data EFFECT : Type where
     MkEff : Type -> Effect -> EFFECT

Recall that to run an effectful program in Eff, we use one of the run family of functions to run the program in a particular computation context m. For each effect, therefore, we must explain how it is executed in a particular computation context for run to work in that context. This is achieved with the following interface:

interface Handler (e : Effect) (m : Type -> Type) where
      handle : resource -> (eff : e t resource resource') ->
               ((x : t) -> resource' x -> m a) -> m a

We have already seen some implementation declarations in the effect summaries in Section Simple Effects. An implementation of Handler e m means that the effect declared with signature e can be run in computation context m. The handle function takes:

  • The resource on input (so, the current value of the state for State)
  • The effectful operation (either Get or Put x for State)
  • A continuation, which we conventionally call k, and should be passed the result value of the operation, and an updated resource.

There are two reasons for taking a continuation here: firstly, this is neater because there are multiple return values (a new resource and the result of the operation); secondly, and more importantly, the continuation can be called zero or more times.

A Handler for State simply passes on the value of the state, in the case of Get, or passes on a new state, in the case of Put. It is defined the same way for all computation contexts:

Handler State m where
     handle st Get     k = k st st
     handle st (Put n) k = k () n

This gives enough information for Get and Put to be used directly in Eff programs. It is tidy, however, to define top level functions in Eff, as follows:

get : Eff x [STATE x]
get = call Get

put : x -> Eff () [STATE x]
put val = call (Put val)

putM : y -> Eff () [STATE x] [STATE y]
putM val = call (Put val)

An implementation detail (aside): The call function converts an Effect to a function in Eff, given a proof that the effect is available. This proof can be constructed automatically, since it is essentially an index into a statically known list of effects:

call : {e : Effect} ->
       (eff : e t a b) -> {auto prf : EffElem e a xs} ->
       Eff t xs (\v => updateResTy v xs prf eff)

This is the reason for the Can’t solve goal error when an effect is not available: the implicit proof prf has not been solved automatically because the required effect is not in the list of effects xs.

Such details are not important for using the library, or even writing new effects, however.

Summary

The following listing summarises what is required to define the STATE effect:

data State : Effect where
     Get :      sig State a  a
     Put : b -> sig State () a b

STATE : Type -> EFFECT
STATE t = MkEff t State

Handler State m where
     handle st Get     k = k st st
     handle st (Put n) k = k () n

get : Eff x [STATE x]
get = call Get

put : x -> Eff () [STATE x]
put val = call (Put val)

putM : y -> Eff () [STATE x] [STATE y]
putM val = call (Put val)

Console I/O

Then listing below gives the definition of the STDIO effect, including handlers for IO and IOExcept. We omit the definition of the top level Eff functions, as this merely invoke the effects PutStr, GetStr, PutCh and GetCh directly.

Note that in this case, the resource is the unit type in every case, since the handlers merely apply the IO equivalents of the effects directly.

data StdIO : Effect where
     PutStr : String -> sig StdIO ()
     GetStr : sig StdIO String
     PutCh : Char -> sig StdIO ()
     GetCh : sig StdIO Char

Handler StdIO IO where
    handle () (PutStr s) k = do putStr s; k () ()
    handle () GetStr     k = do x <- getLine; k x ()
    handle () (PutCh c)  k = do putChar c; k () ()
    handle () GetCh      k = do x <- getChar; k x ()

Handler StdIO (IOExcept a) where
    handle () (PutStr s) k = do ioe_lift $ putStr s; k () ()
    handle () GetStr     k = do x <- ioe_lift $ getLine; k x ()
    handle () (PutCh c)  k = do ioe_lift $ putChar c; k () ()
    handle () GetCh      k = do x <- ioe_lift $ getChar; k x ()

STDIO : EFFECT
STDIO = MkEff () StdIO

Exceptions

The listing below gives the definition of the Exception effect, including two of its handlers for Maybe and List. The only operation provided is Raise. The key point to note in the definitions of these handlers is that the continuation k is not used. Running Raise therefore means that computation stops with an error.

data Exception : Type -> Effect where
     Raise : a -> sig (Exception a) b

Handler (Exception a) Maybe where
     handle _ (Raise e) k = Nothing

Handler (Exception a) List where
     handle _ (Raise e) k = []

EXCEPTION : Type -> EFFECT
EXCEPTION t = MkEff () (Exception t)

Non-determinism

The following listing gives the definition of the Select effect for writing non-deterministic programs, including a handler for List context which returns all possible successful values, and a handler for Maybe context which returns the first successful value.

data Selection : Effect where
     Select : List a -> sig Selection a

Handler Selection Maybe where
     handle _ (Select xs) k = tryAll xs where
         tryAll [] = Nothing
         tryAll (x :: xs) = case k x () of
                                 Nothing => tryAll xs
                                 Just v => Just v

Handler Selection List where
     handle r (Select xs) k = concatMap (\x => k x r) xs

SELECT : EFFECT
SELECT = MkEff () Selection

Here, the continuation is called multiple times in each handler, for each value in the list of possible values. In the List handler, we accumulate all successful results, and in the Maybe handler we try the first value in the list, and try later values only if that fails.

File Management

Result-dependent effects are no different from non-dependent effects in the way they are implemented. The listing below illustrates this for the FILE_IO effect. The syntax for state transitions { x ==> {res} x’ }, where the result state x’ is computed from the result of the operation res, follows that for the equivalent Eff programs.

data FileIO : Effect where
     Open : (fname: String)
            -> (m : Mode)
            -> sig FileIO Bool () (\res => case res of
                                                True => OpenFile m
                                                False => ())
     Close : sig FileIO () (OpenFile m)

     ReadLine  :           sig FileIO String (OpenFile Read)
     WriteLine : String -> sig FileIO ()     (OpenFile Write)
     EOF       :           sig FileIO Bool   (OpenFile Read)

Handler FileIO IO where
    handle () (Open fname m) k = do h <- openFile fname m
                                    if !(validFile h)
                                             then k True (FH h)
                                             else k False ()
    handle (FH h) Close      k = do closeFile h
                                    k () ()

    handle (FH h) ReadLine        k = do str <- fread h
                                         k str (FH h)
    handle (FH h) (WriteLine str) k = do fwrite h str
                                         k () (FH h)
    handle (FH h) EOF             k = do e <- feof h
                                         k e (FH h)

FILE_IO : Type -> EFFECT
FILE_IO t = MkEff t FileIO

Note that in the handler for Open, the types passed to the continuation k are different depending on whether the result is True (opening succeeded) or False (opening failed). This uses validFile, defined in the Prelude, to test whether a file handler refers to an open file or not.

Example: A “Mystery Word” Guessing Game

In this section, we will use the techniques and specific effects discussed in the tutorial so far to implement a larger example, a simple text-based word-guessing game. In the game, the computer chooses a word, which the player must guess letter by letter. After a limited number of wrong guesses, the player loses [1].

We will implement the game by following these steps:

  1. Define the game state, in enough detail to express the rules
  2. Define the rules of the game (i.e. what actions the player may take, and how these actions affect the game state)
  3. Implement the rules of the game (i.e. implement state updates for each action)
  4. Implement a user interface which allows a player to direct actions

Step 2 may be achieved by defining an effect which depends on the state defined in step 1. Then step 3 involves implementing a Handler for this effect. Finally, step 4 involves implementing a program in Eff using the newly defined effect (and any others required to implement the interface).

Step 1: Game State

First, we categorise the game states as running games (where there are a number of guesses available, and a number of letters still to guess), or non-running games (i.e. games which have not been started, or games which have been won or lost).

data GState = Running Nat Nat | NotRunning

Notice that at this stage, we say nothing about what it means to make a guess, what the word to be guessed is, how to guess letters, or any other implementation detail. We are only interested in what is necessary to describe the game rules.

We will, however, parameterise a concrete game state Mystery over this data:

data Mystery : GState -> Type

Step 2: Game Rules

We describe the game rules as a dependent effect, where each action has a precondition (i.e. what the game state must be before carrying out the action) and a postcondition (i.e. how the action affects the game state). Informally, these actions with the pre- and postconditions are:

Guess

Guess a letter in the word.

  • Precondition: The game must be running, and there must be both guesses still available, and letters still to be guessed.
  • Postcondition: If the guessed letter is in the word and not yet guessed, reduce the number of letters, otherwise reduce the number of guesses.
Won

Declare victory

  • Precondition: The game must be running, and there must be no letters still to be guessed.
  • Postcondition: The game is no longer running.
Lost

Accept defeat

  • Precondition: The game must be running, and there must be no guesses left.
  • Postcondition: The game is no longer running.
NewWord

Set a new word to be guessed

  • Precondition: The game must not be running.
  • Postcondition: The game is running, with 6 guesses available (the choice of 6 is somewhat arbitrary here) and the number of unique letters in the word still to be guessed.
Get
Get a string representation of the game state. This is for display purposes; there are no pre- or postconditions.

We can make these rules precise by declaring them more formally in an effect signature:

data MysteryRules : Effect where
     Guess : (x : Char) ->
             sig MysteryRules Bool
                 (Mystery (Running (S g) (S w)))
                 (\inword => if inword
                             then Mystery (Running (S g) w)
                             else Mystery (Running g (S w)))
     Won  : sig MysteryRules () (Mystery (Running g 0))
                                (Mystery NotRunning)
     Lost : sig MysteryRules () (Mystery (Running 0 g))
                                (Mystery NotRunning)
     NewWord : (w : String) ->
               sig MysteryRules () (Mystery NotRunning) (Mystery (Running 6 (length (letters w))))
     Get : sig MysteryRules String (Mystery h)

This description says nothing about how the rules are implemented. In particular, it does not specify how to tell whether a guessed letter was in a word, just that the result of Guess depends on it.

Nevertheless, we can still create an EFFECT from this, and use it in an Eff program. Implementing a Handler for MysteryRules will then allow us to play the game.

MYSTERY : GState -> EFFECT
MYSTERY h = MkEff (Mystery h) MysteryRules

Step 3: Implement Rules

To implement the rules, we begin by giving a concrete definition of game state:

data Mystery : GState -> Type where
     Init     : Mystery NotRunning
     GameWon  : (word : String) -> Mystery NotRunning
     GameLost : (word : String) -> Mystery NotRunning
     MkG      : (word : String) ->
                (guesses : Nat) ->
                (got : List Char) ->
                (missing : Vect m Char) ->
                Mystery (Running guesses m)

If a game is NotRunning, that is either because it has not yet started (Init) or because it is won or lost (GameWon and GameLost, each of which carry the word so that showing the game state will reveal the word to the player). Finally, MkG captures a running game’s state, including the target word, the letters successfully guessed, and the missing letters. Using a Vect for the missing letters is convenient since its length is used in the type.

To initialise the state, we implement the following functions: letters, which returns a list of unique letters in a String (ignoring spaces) and initState which sets up an initial state considered valid as a postcondition for NewWord.

letters : String -> List Char
initState : (x : String) -> Mystery (Running 6 (length (letters x)))

When checking if a guess is in the vector of missing letters, it is convenient to return a proof that the guess is in the vector, using isElem below, rather than merely a Bool:

data IsElem : a -> Vect n a -> Type where
     First : IsElem x (x :: xs)
     Later : IsElem x xs -> IsElem x (y :: xs)

isElem : DecEq a => (x : a) -> (xs : Vect n a) -> Maybe (IsElem x xs)

The reason for returning a proof is that we can use it to remove an element from the correct position in a vector:

shrink : (xs : Vect (S n) a) -> IsElem x xs -> Vect n a

We leave the definitions of letters, init, isElem and shrink as exercises. Having implemented these, the Handler implementation for MysteryRules is surprisingly straightforward:

Handler MysteryRules m where
    handle (MkG w g got []) Won k = k () (GameWon w)
    handle (MkG w Z got m) Lost k = k () (GameLost w)

    handle st Get k = k (show st) st
    handle st (NewWord w) k = k () (initState w)

    handle (MkG w (S g) got m) (Guess x) k =
        case isElem x m of
             Nothing => k False (MkG w _ got m)
             (Just p) => k True (MkG w _ (x :: got) (shrink m p))

Each case simply involves directly updating the game state in a way which is consistent with the declared rules. In particular, in Guess, if the handler claims that the guessed letter is in the word (by passing True to k), there is no way to update the state in such a way that the number of missing letters or number of guesses does not follow the rules.

Step 4: Implement Interface

Having described the rules, and implemented state transitions which follow those rules as an effect handler, we can now write an interface for the game which uses the MYSTERY effect:

game : Eff () [MYSTERY (Running (S g) w), STDIO]
              [MYSTERY NotRunning, STDIO]

The type indicates that the game must start in a running state, with some guesses available, and eventually reach a not-running state (i.e. won or lost). The only way to achieve this is by correctly following the stated rules.

Note that the type of game makes no assumption that there are letters to be guessed in the given word (i.e. it is w rather than S w). This is because we will be choosing a word at random from a vector of String, and at no point have we made it explicit that those String are non-empty.

Finally, we need to initialise the game by picking a word at random from a list of candidates, setting it as the target using NewWord, then running game:

runGame : Eff () [MYSTERY NotRunning, RND, SYSTEM, STDIO]
runGame = do srand !time
             let w = index !(rndFin _) words
             call $ NewWord w
             game
             putStrLn !(call Get)

We use the system time (time from the SYSTEM effect; see Appendix Effects Summary) to initialise the random number generator, then pick a random Fin to index into a list of words. For example, we could initialise a word list as follows:

words : ?wtype
words = with Vect ["idris","agda","haskell","miranda",
         "java","javascript","fortran","basic",
         "coffeescript","rust"]

wtype = proof search

注解

Rather than have to explicitly declare a type with the vector’s length, it is convenient to give a hole ?wtype and let Idris’s proof search mechanism find the type. This is a limited form of type inference, but very useful in practice.

A possible complete implementation of game is presented below:

game : Eff () [MYSTERY (Running (S g) w), STDIO]
              [MYSTERY NotRunning, STDIO]
game {w=Z} = Won
game {w=S _}
     = do putStrLn !Get
          putStr "Enter guess: "
          let guess = trim !getStr
          case choose (not (guess == "")) of
               (Left p) => processGuess (strHead' guess p)
               (Right p) => do putStrLn "Invalid input!"
                               game
  where
    processGuess : Char -> Eff () [MYSTERY (Running (S g) (S w)), STDIO]
                                  [MYSTERY NotRunning, STDIO]
    processGuess {g} {w} c
      = case !(Main.Guess c) of
             True => do putStrLn "Good guess!"
                        case w of
                             Z => Won
                             (S k) => game
             False => do putStrLn "No, sorry"
                         case g of
                              Z => Lost
                              (S k) => game

Discussion

Writing the rules separately as an effect, then an implementation which uses that effect, ensures that the implementation must follow the rules. This has practical applications in more serious contexts; MysteryRules for example can be though of as describing a protocol that a game player most follow, or alternative a precisely-typed API.

In practice, we wouldn’t really expect to write rules first then implement the game once the rules were complete. Indeed, I didn’t do so when constructing this example! Rather, I wrote down a set of likely rules making any assumptions explicit in the state transitions for MysteryRules. Then, when implementing game at first, any incorrect assumption was caught as a type error. The following errors were caught during development:

  • Not realising that allowing NewWord to be an arbitrary string would mean that game would have to deal with a zero-length word as a starting state.
  • Forgetting to check whether a game was won before recursively calling processGuess, thus accidentally continuing a finished game.
  • Accidentally checking the number of missing letters, rather than the number of remaining guesses, when checking if a game was lost.

These are, of course, simple errors, but were caught by the type checker before any testing of the game.

[1]Readers may recognise this game by the name “Hangman”.

Further Reading

This tutorial has given an introduction to writing and reasoning about side-effecting programs in Idris, using the Effects library. More details about the implementation of the library, such as how run works, how handlers are invoked, etc, are given in a separate paper [1].

Some libraries and programs which use Effects can be found in the following places:

The inspiration for the Effects library was Bauer and Pretnar’s Eff language [2], which describes a language based on algebraic effects and handlers. Other recent languages and libraries have also been built on this ideas, for example [3] and [4]. The theoretical foundations are also well-studied see [5], [6], [7], [8].

[1]Edwin Brady. 2013. Programming and reasoning with algebraic effects and dependent types. SIGPLAN Not. 48, 9 (September 2013), 133-144. DOI=10.1145/2544174.2500581 https://dl.acm.org/citation.cfm?doid=2544174.2500581
[2]Andrej Bauer, Matija Pretnar, Programming with algebraic effects and handlers, Journal of Logical and Algebraic Methods in Programming, Volume 84, Issue 1, January 2015, Pages 108-123, ISSN 2352-2208, http://math.andrej.com/wp-content/uploads/2012/03/eff.pdf
[3]Ben Lippmeier. 2009. Witnessing Purity, Constancy and Mutability. In Proceedings of the 7th Asian Symposium on Programming Languages and Systems (APLAS ‘09), Zhenjiang Hu (Ed.). Springer-Verlag, Berlin, Heidelberg, 95-110. DOI=10.1007/978-3-642-10672-9_9 http://link.springer.com/chapter/10.1007%2F978-3-642-10672-9_9
[4]Ohad Kammar, Sam Lindley, and Nicolas Oury. 2013. Handlers in action. SIGPLAN Not. 48, 9 (September 2013), 145-158. DOI=10.1145/2544174.2500590 https://dl.acm.org/citation.cfm?doid=2544174.2500590
[5]Martin Hyland, Gordon Plotkin, John Power, Combining effects: Sum and tensor, Theoretical Computer Science, Volume 357, Issues 1–3, 25 July 2006, Pages 70-99, ISSN 0304-3975, (https://www.sciencedirect.com/science/article/pii/S0304397506002659)
[6]Paul Blain Levy. 2004. Call-By-Push-Value: A Functional/Imperative Synthesis (Semantics Structures in Computation, V. 2). Kluwer Academic Publishers, Norwell, MA, USA.
[7]Plotkin, Gordon, and Matija Pretnar. “Handlers of algebraic effects.” Programming Languages and Systems. Springer Berlin Heidelberg, 2009. 80-94.
[8]Pretnar, Matija. “Logic and handling of algebraic effects.” (2010).

Effects Summary

This appendix gives interfaces for the core effects provided by the library.

EXCEPTION

module Effect.Exception

import Effects
import System
import Control.IOExcept

EXCEPTION : Type -> EFFECT

raise : a -> Eff b [EXCEPTION a]

Handler (Exception a) Maybe where { ... }
Handler (Exception a) List where { ... }
Handler (Exception a) (Either a) where { ... }
Handler (Exception a) (IOExcept a) where { ... }
Show a => Handler (Exception a) IO where { ... }

FILE_IO

module Effect.File

import Effects
import Control.IOExcept

FILE_IO : Type -> EFFECT

data OpenFile : Mode -> Type

open : (fname : String)
       -> (m : Mode)
       -> Eff Bool [FILE_IO ()]
                   (\res => [FILE_IO (case res of
                                           True => OpenFile m
                                           False => ())])
close : Eff () [FILE_IO (OpenFile m)] [FILE_IO ()]

readLine  : Eff String [FILE_IO (OpenFile Read)]
writeLine : String -> Eff () [FILE_IO (OpenFile Write)]
eof       : Eff Bool [FILE_IO (OpenFile Read)]

Handler FileIO IO where { ... }

RND

module Effect.Random

import Effects
import Data.Vect
import Data.Fin

RND : EFFECT

srand  : Integer ->            Eff m () [RND]
rndInt : Integer -> Integer -> Eff m Integer [RND]
rndFin : (k : Nat) ->          Eff m (Fin (S k)) [RND]

Handler Random m where { ... }

SELECT

module Effect.Select

import Effects

SELECT : EFFECT

select : List a -> Eff m a [SELECT]

Handler Selection Maybe where { ... }
Handler Selection List where { ... }

STATE

module Effect.State

import Effects

STATE : Type -> EFFECT

get    :             Eff m x [STATE x]
put    : x ->        Eff m () [STATE x]
putM   : y ->        Eff m () [STATE x] [STATE y]
update : (x -> x) -> Eff m () [STATE x]

Handler State m where { ... }

STDIO

module Effect.StdIO

import Effects
import Control.IOExcept

STDIO : EFFECT

putChar  : Handler StdIO m => Char ->   Eff m () [STDIO]
putStr   : Handler StdIO m => String -> Eff m () [STDIO]
putStrLn : Handler StdIO m => String -> Eff m () [STDIO]

getStr   : Handler StdIO m =>           Eff m String [STDIO]
getChar  : Handler StdIO m =>           Eff m Char [STDIO]

Handler StdIO IO where { ... }
Handler StdIO (IOExcept a) where { ... }

SYSTEM

module Effect.System

import Effects
import System
import Control.IOExcept

SYSTEM : EFFECT

getArgs : Handler System e =>           Eff e (List String) [SYSTEM]
time    : Handler System e =>           Eff e Int [SYSTEM]
getEnv  : Handler System e => String -> Eff e (Maybe String) [SYSTEM]

Handler System IO where { ... }
Handler System (IOExcept a) where { ... }

Theorem Proving

A tutorial on theorem proving in Idris.

注解

The documentation for Idris has been published under the Creative Commons CC0 License. As such to the extent possible under law, The Idris Community has waived all copyright and related or neighboring rights to Documentation for Idris.

More information concerning the CC0 can be found online at: http://creativecommons.org/publicdomain/zero/1.0/

Running example: Addition of Natural Numbers

Throughout this tutorial, we will be working with the following function, defined in the Idris prelude, which defines addition on natural numbers:

plus : Nat -> Nat -> Nat
plus Z     m = m
plus (S k) m = S (plus k m)

It is defined by the above equations, meaning that we have for free the properties that adding m to zero always results in m, and that adding m to any non-zero number S k always results in S (plus k m). We can see this by evaluation at the Idris REPL (i.e. the prompt, the read-eval-print loop):

Idris> \m => plus Z m
\m => m : Nat -> Nat

Idris> \k,m => plus (S k) m
\k => \m => S (plus k m) : Nat -> Nat -> Nat

Note that unlike many other language REPLs, the Idris REPL performs evaluation on open terms, meaning that it can reduce terms which appear inside lambda bindings, like those above. Therefore, we can introduce unknowns k and m as lambda bindings and see how plus reduces.

The plus function has a number of other useful properties, for example:

  • It is commutative, that is for all Nat inputs n and m, we know that plus n m = plus m n.
  • It is associative, that is for all Nat inputs n, m and p, we know that plus n (plus m p) = plus (plus m n) p.

We can use these properties in an Idris program, but in order to do so we must prove them.

Equality Proofs

Idris has a built-in propositional equality type, conceptually defined as follows:

data (=) : a -> b -> Type where
   Refl : x = x

Note that this must be built-in, rather than defined in the library, because = is a reserved operator — you cannot define this directly in your own code.

It is propositional equality, where the type states that any two values in different types a and b may be proposed to be equal. There is only one way to prove equality, however, which is by reflexivity (Refl).

We have a type for propositional equality here, and correspondingly a program inhabiting an instance of this type can be seen as a proof of the corresponding proposition [1]. So, trivially, we can prove that 4 equals 4:

four_eq : 4 = 4
four_eq = Refl

However, trying to prove that 4 = 5 results in failure:

four_eq_five : 4 = 5
four_eq_five = Refl

The type 4 = 5 is a perfectly valid type, but is uninhabited, so when trying to type check this definition, Idris gives the following error:

When elaborating right hand side of four_eq_five:
Type mismatch between
        x = x (Type of Refl)
and
        4 = 5 (Expected type)
Type checking equality proofs

An important step in type checking Idris programs is unification, which attempts to resolve implicit arguments such as the implicit argument x in Refl. As far as our understanding of type checking proofs is concerned, it suffices to know that unifying two terms involves reducing both to normal form then trying to find an assignment to implicit arguments which will make those normal forms equal.

When type checking Refl, Idris requires that the type is of the form x = x, as we see from the type of Refl. In the case of four_eq_five, Idris will try to unify the expected type 4 = 5 with the type of Refl, x = x, notice that a solution requires that x be both 4 and 5, and therefore fail.

Since type checking involves reduction to normal form, we can write the following equalities directly:

twoplustwo_eq_four : 2 + 2 = 4
twoplustwo_eq_four = Refl

plus_reduces_Z : (m : Nat) -> plus Z m = m
plus_reduces_Z m = Refl

plus_reduces_Sk : (k, m : Nat) -> plus (S k) m = S (plus k m)
plus_reduces_Sk k m = Refl

Heterogeneous Equality

Equality in Idris is heterogeneous, meaning that we can even propose equalities between values in different types:

idris_not_php : 2 = "2"

Obviously, in Idris the type 2 = "2" is uninhabited, and one might wonder why it is useful to be able to propose equalities between values in different types. However, with dependent types, such equalities can arise naturally. For example, if two vectors are equal, their lengths must be equal:

vect_eq_length : (xs : Vect n a) -> (ys : Vect m a) ->
                 (xs = ys) -> n = m

In the above declaration, xs and ys have different types because their lengths are different, but we would still like to draw a conclusion about the lengths if they happen to be equal. We can define vect_eq_length as follows:

vect_eq_length xs xs Refl = Refl

By matching on Refl for the third argument, we know that the only valid value for ys is xs, because they must be equal, and therefore their types must be equal, so the lengths must be equal.

Alternatively, we can put an underscore for the second xs, since there is only one value which will type check:

vect_eq_length xs _ Refl = Refl

Properties of plus

Using the (=) type, we can now state the properties of plus given above as Idris type declarations:

plus_commutes : (n, m : Nat) -> plus n m = plus m n
plus_assoc : (n, m, p : Nat) -> plus n (plus m p) = plus (plus n m) p

Both of these properties (and many others) are proved for natural number addition in the Idris standard library, using (+) from the Num interface rather than using plus directly. They have the names plusCommutative and plusAssociative respectively.

In the remainder of this tutorial, we will explore several different ways of proving plus_commutes (or, to put it another way, writing the function.) We will also discuss how to use such equality proofs, and see where the need for them arises in practice.

[1]This is known as the Curry-Howard correspondence.

Inductive Proofs

Before embarking on proving plus_commutes in Idris itself, let us consider the overall structure of a proof of some property of natural numbers. Recall that they are defined recursively, as follows:

data Nat : Type where
     Z : Nat
     S : Nat -> Nat

A total function over natural numbers must both terminate, and cover all possible inputs. Idris checks functions for totality by checking that all inputs are covered, and that all recursive calls are on structurally smaller values (so recursion will always reach a base case). Recalling plus:

plus : Nat -> Nat -> Nat
plus Z     m = m
plus (S k) m = S (plus k m)

This is total because it covers all possible inputs (the first argument can only be Z or S k for some k, and the second argument m covers all possible Nat) and in the recursive call, k is structurally smaller than S k so the first argument will always reach the base case Z in any sequence of recursive calls.

In some sense, this resembles a mathematical proof by induction (and this is no coincidence!). For some property P of a natural number x, we can show that P holds for all x if:

  • P holds for zero (the base case).
  • Assuming that P holds for k, we can show P also holds for S k (the inductive step).

In plus, the property we are trying to show is somewhat trivial (for all natural numbers x, there is a Nat which need not have any relation to x). However, it still takes the form of a base case and an inductive step. In the base case, we show that there is a Nat arising from plus n m when n = Z, and in the inductive step we show that there is a Nat arising when n = S k and we know we can get a Nat inductively from plus k m. We could even write a function capturing all such inductive definitions:

nat_induction : (P : Nat -> Type) ->             -- Property to show
                (P Z) ->                         -- Base case
                ((k : Nat) -> P k -> P (S k)) -> -- Inductive step
                (x : Nat) ->                     -- Show for all x
                P x
nat_induction P p_Z p_S Z = p_Z
nat_induction P p_Z p_S (S k) = p_S k (nat_induction P p_Z p_S k)

Using nat_induction, we can implement an equivalent inductive version of plus:

plus_ind : Nat -> Nat -> Nat
plus_ind n m
   = nat_induction (\x => Nat)
                   m                      -- Base case, plus_ind Z m
                   (\k, k_rec => S k_rec) -- Inductive step plus_ind (S k) m
                                          -- where k_rec = plus_ind k m
                   n

To prove that plus n m = plus m n for all natural numbers n and m, we can also use induction. Either we can fix m and perform induction on n, or vice versa. We can sketch an outline of a proof; performing induction on n, we have:

  • Property P is \x => plus x m = plus m x.

  • Show that P holds in the base case and inductive step:

    • Base case: P Z, i.e.
      plus Z m = plus m Z, which reduces to
      m = plus m Z due to the definition of plus.
    • Inductive step: Inductively, we know that P k holds for a specific, fixed k, i.e.
      plus k m = plus m k (the induction hypothesis). Given this, show P (S k), i.e.
      plus (S k) m = plus m (S k), which reduces to
      S (plus k m) = plus m (S k). From the induction hypothesis, we can rewrite this to
      S (plus m k) = plus m (S k).

To complete the proof we therefore need to show that m = plus m Z for all natural numbers m, and that S (plus m k) = plus m (S k) for all natural numbers m and k. Each of these can also be proved by induction, this time on m.

We are now ready to embark on a proof of commutativity of plus formally in Idris.

Pattern Matching Proofs

In this section, we will provide a proof of plus_commutes directly, by writing a pattern matching definition. We will use interactive editing features extensively, since it is significantly easier to produce a proof when the machine can give the types of intermediate values and construct components of the proof itself. The commands we will use are summarised below. Where we refer to commands directly, we will use the Vim version, but these commands have a direct mapping to Emacs commands.

Command Vim binding Emacs binding Explanation
Check type \t C-c C-t Show type of identifier or hole under the cursor.
Proof search \o C-c C-a Attempt to solve hole under the cursor by applying simple proof search.
Make new definition \d C-c C-s Add a template definition for the type defined under the cursor.
Make lemma \l C-c C-e Add a top level function with a type which solves the hole under the cursor.
Split cases \c C-c C-c Create new constructor patterns for each possible case of the variable under the cursor.

Creating a Definition

To begin, create a file pluscomm.idr containing the following type declaration:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n

To create a template definition for the proof, press \d (or the equivalent in your editor of choice) on the line with the type declaration. You should see:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes n m = ?plus_commutes_rhs

To prove this by induction on n, as we sketched in Section Inductive Proofs, we begin with a case split on n (press \c with the cursor over the n in the definition.) You should see:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = ?plus_commutes_rhs_1
plus_commutes (S k) m = ?plus_commutes_rhs_2

If we inspect the types of the newly created holes, plus_commutes_rhs_1 and plus_commutes_rhs_2, we see that the type of each reflects that n has been refined to Z and S k in each respective case. Pressing \t over plus_commutes_rhs_1 shows:

  m : Nat
--------------------------------------
plus_commutes_rhs_1 : m = plus m 0

Note that Z renders as 0 because the pretty printer renders natural numbers as integer literals for readability. Similarly, for plus_commutes_rhs_2:

  k : Nat
  m : Nat
--------------------------------------
plus_commutes_rhs_2 : S (plus k m) = plus m (S k)

It is a good idea to give these slightly more meaningful names:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = ?plus_commutes_Z
plus_commutes (S k) m = ?plus_commutes_S

Base Case

We can create a separate lemma for the base case interactively, by pressing \l with the cursor over plus_commutes_Z. This yields:

plus_commutes_Z : m = plus m 0

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z
plus_commutes (S k) m = ?plus_commutes_S

That is, the hole has been filled with a call to a top level function plus_commutes_Z. The argument m has been made implicit because it can be inferred from context when it is applied.

Unfortunately, we cannot prove this lemma directly, since plus is defined by matching on its first argument, and here plus m 0 has a specific value for its second argument (in fact, the left hand side of the equality has been reduced from plus 0 m.) Again, we can prove this by induction, this time on m.

First, create a template definition with \d:

plus_commutes_Z : m = plus m 0
plus_commutes_Z = ?plus_commutes_Z_rhs

Since we are going to write this by induction on m, which is implicit, we will need to bring m into scope manually:

plus_commutes_Z : m = plus m 0
plus_commutes_Z {m} = ?plus_commutes_Z_rhs

Now, case split on m with \c:

plus_commutes_Z : m = plus m 0
plus_commutes_Z {m = Z} = ?plus_commutes_Z_rhs_1
plus_commutes_Z {m = (S k)} = ?plus_commutes_Z_rhs_2

Checking the type of plus_commutes_Z_rhs_1 shows the following, which is easily proved by reflection:

--------------------------------------
plus_commutes_Z_rhs_1 : 0 = 0

For such trivial proofs, we can let write the proof automatically by pressing \o with the cursor over plus_commutes_Z_rhs_1. This yields:

plus_commutes_Z : m = plus m 0
plus_commutes_Z {m = Z} = Refl
plus_commutes_Z {m = (S k)} = ?plus_commutes_Z_rhs_2

For plus_commutes_Z_rhs_2, we are not so lucky:

  k : Nat
--------------------------------------
plus_commutes_Z_rhs_2 : S k = S (plus k 0)

Inductively, we should know that k = plus k 0, and we can get access to this inductive hypothesis by making a recursive call on k, as follows:

plus_commutes_Z : m = plus m 0
plus_commutes_Z {m = Z} = Refl
plus_commutes_Z {m = (S k)} = let rec = plus_commutes_Z {m=k} in
                                  ?plus_commutes_Z_rhs_2

For plus_commutes_Z_rhs_2, we now see:

  k : Nat
  rec : k = plus k (fromInteger 0)
--------------------------------------
plus_commutes_Z_rhs_2 : S k = S (plus k 0)

Again, the fromInteger 0 is merely due to Nat having an implementation of the Num interface. So we know that k = plus k 0, but how do we use this to update the goal to S k = S k?

To achieve this, Idris provides a replace function as part of the prelude:

*pluscomm> :t replace
replace : (x = y) -> P x -> P y

Given a proof that x = y, and a property P which holds for x, we can get a proof of the same property for y, because we know x and y must be the same. In practice, this function can be a little tricky to use because in general the implicit argument P can be hard to infer by unification, so Idris provides a high level syntax which calculates the property and applies replace:

rewrite prf in expr

If we have prf : x = y, and the required type for expr is some property of x, the rewrite ... in syntax will search for x in the required type of expr and replace it with y. Concretely, in our example, we can say:

plus_commutes_Z {m = (S k)} = let rec = plus_commutes_Z {m=k} in
                                  rewrite rec in ?plus_commutes_Z_rhs_2

Checking the type of plus_commutes_Z_rhs_2 now gives:

  k : Nat
  rec : k = plus k (fromInteger 0)
  _rewrite_rule : plus k 0 = k
--------------------------------------
plus_commutes_Z_rhs_2 : S (plus k 0) = S (plus k 0)

Using the rewrite rule rec (which we can see in the context here as _rewrite_rule [1], the goal type has been updated with k replaced by plus k 0.

Alternatively, we could have applied the rewrite in the other direction using the sym function:

*pluscomm> :t sym
sym : (l = r) -> r = l
plus_commutes_Z {m = (S k)} = let rec = plus_commutes_Z {m=k} in
                                  rewrite sym rec in ?plus_commutes_Z_rhs_2

In this case, inspecting the type of the hole gives:

  k : Nat
  rec : k = plus k (fromInteger 0)
  _rewrite_rule : k = plus k 0
--------------------------------------
plus_commutes_Z_rhs_2 : S k = S k

Either way, we can use proof search (\o) to complete the proof, giving:

plus_commutes_Z : m = plus m 0
plus_commutes_Z {m = Z} = Refl
plus_commutes_Z {m = (S k)} = let rec = plus_commutes_Z {m=k} in
                                  rewrite rec in Refl

The base case is now complete.

Inductive Step

Our main theorem, plus_commutes should currently be in the following state:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z
plus_commutes (S k) m = ?plus_commutes_S

Looking again at the type of plus_commutes_S, we have:

  k : Nat
  m : Nat
--------------------------------------
plus_commutes_S : S (plus k m) = plus m (S k)

Conveniently, by induction we can immediately tell that plus k m = plus m k, so let us rewrite directly by making a recursive call to plus_commutes. We add this directly, by hand, as follows:

plus_commutes : (n : Nat) -> (m : Nat) -> n + m = m + n
plus_commutes Z m = plus_commutes_Z
plus_commutes (S k) m = rewrite plus_commutes k m in ?plus_commutes_S

Checking the type of plus_commutes_S now gives:

  k : Nat
  m : Nat
  _rewrite_rule : plus m k = plus k m
--------------------------------------
plus_commutes_S : S (plus m k) = plus m (S k)

The good news is that m and k now appear in the correct order. However, we still have to show that the successor symbol S can be moved to the front in the right hand side of this equality. This remaining lemma takes a similar form to the plus_commutes_Z; we begin by making a new top level lemma with \l. This gives:

plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)

Unlike the previous case, k and m are not made implicit because we cannot in general infer arguments to a function from its result. Again, we make a template definition with \d:

plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
plus_commutes_S k m = ?plus_commutes_S_rhs

Again, this is defined by induction over m, since plus is defined by matching on its first argument. The complete definition is:

total
plus_commutes_S : (k : Nat) -> (m : Nat) -> S (plus m k) = plus m (S k)
plus_commutes_S k Z = Refl
plus_commutes_S k (S j) = rewrite plus_commutes_S k j in Refl

All holes have now been solved.

The total annotation means that we require the final function to pass the totality checker; i.e. it will terminate on all possible well-typed inputs. This is important for proofs, since it provides a guarantee that the proof is valid in all cases, not just those for which it happens to be well-defined.

Now that plus_commutes has a total annotation, we have completed the proof of commutativity of addition on natural numbers.

[1]Note that the left and right hand sides of the equality have been swapped, because replace takes a proof of x=y and the property for x, not y.

DEPRECATED: Interactive Theorem Proving

警告

The interactive theorem-proving interface documented here has been deprecated in favor of Elaborator Reflection.

Idris also supports interactive theorem proving via tactics. This is generally not recommended to be used directly, but rather used as a mechanism for building proof automation which is beyond the scope of this tutorial. In this section, we briefly discus tactics.

One way to write proofs interactively is to write the general structure of the proof, and use the interactive mode to complete the details. Consider the following definition, proved in 定理证明:

plusReduces : (n:Nat) -> plus Z n = n

We’ll be constructing the proof by induction, so we write the cases for Z and S, with a recursive call in the S case giving the inductive hypothesis, and insert holes for the rest of the definition:

plusReducesZ' : (n:Nat) -> n = plus n Z
plusReducesZ' Z     = ?plusredZ_Z
plusReducesZ' (S k) = let ih = plusReducesZ' k in
                      ?plusredZ_S

On running , two global names are created, plusredZ_Z and plusredZ_S, with no definition. We can use the :m command at the prompt to find out which holes are still to be solved (or, more precisely, which functions exist but have no definitions), then the :t command to see their types:

*theorems> :m
Global holes:
        [plusredZ_S,plusredZ_Z]
*theorems> :t plusredZ_Z
plusredZ_Z : Z = plus Z Z

*theorems> :t plusredZ_S
plusredZ_S : (k : Nat) -> (k = plus k Z) -> S k = plus (S k) Z

The :p command enters interactive proof mode, which can be used to complete the missing definitions.

*theorems> :p plusredZ_Z

---------------------------------- (plusredZ_Z) --------
{hole0} : Z = plus Z Z

This gives us a list of premises (above the line; there are none here) and the current goal (below the line; named {hole0} here). At the prompt we can enter tactics to direct the construction of the proof. In this case, we can normalise the goal with the compute tactic:

-plusredZ_Z> compute

---------------------------------- (plusredZ_Z) --------
{hole0} : Z = Z

Now we have to prove that Z equals Z, which is easy to prove by Refl. To apply a function, such as Refl, we use refine which introduces subgoals for each of the function’s explicit arguments (Refl has none):

-plusredZ_Z> refine Refl
plusredZ_Z: no more goals

Here, we could also have used the trivial tactic, which tries to refine by Refl, and if that fails, tries to refine by each name in the local context. When a proof is complete, we use the qed tactic to add the proof to the global context, and remove the hole from the unsolved holes list. This also outputs a trace of the proof:

-plusredZ_Z> qed
plusredZ_Z = proof
    compute
    refine Refl
*theorems> :m
Global holes:
        [plusredZ_S]

The :addproof command, at the interactive prompt, will add the proof to the source file (effectively in an appendix). Let us now prove the other required lemma, plusredZ_S:

*theorems> :p plusredZ_S

---------------------------------- (plusredZ_S) --------
{hole0} : (k : Nat) -> (k = plus k Z) -> S k = plus (S k) Z

In this case, the goal is a function type, using k (the argument accessible by pattern matching) and ih — the local variable containing the result of the recursive call. We can introduce these as premises using the intro tactic twice (or intros, which introduces all arguments as premises). This gives:

  k : Nat
  ih : k = plus k Z
---------------------------------- (plusredZ_S) --------
{hole2} : S k = plus (S k) Z

Since plus is defined by recursion on its first argument, the term plus (S k) Z in the goal can be simplified, so we use compute.

  k : Nat
  ih : k = plus k Z
---------------------------------- (plusredZ_S) --------
{hole2} : S k = S (plus k Z)

We know, from the type of ih, that k = plus k Z, so we would like to use this knowledge to replace plus k Z in the goal with k. We can achieve this with the rewrite tactic:

-plusredZ_S> rewrite ih

  k : Nat
  ih : k = plus k Z
---------------------------------- (plusredZ_S) --------
{hole3} : S k = S k

-plusredZ_S>

The rewrite tactic takes an equality proof as an argument, and tries to rewrite the goal using that proof. Here, it results in an equality which is trivially provable:

-plusredZ_S> trivial
plusredZ_S: no more goals
-plusredZ_S> qed
plusredZ_S = proof {
    intros;
    rewrite ih;
    trivial;
}

Again, we can add this proof to the end of our source file using the :addproof command at the interactive prompt.

Language Reference

This is the reference guide for the Idris Language. It documents the language specification and internals. This will tell you how Idris works, for using it you should read the Idris Tutorial.

注解

The documentation for Idris has been published under the Creative Commons CC0 License. As such to the extent possible under law, The Idris Community has waived all copyright and related or neighboring rights to Documentation for Idris.

More information concerning the CC0 can be found online at: http://creativecommons.org/publicdomain/zero/1.0/

Code Generation Targets

Idris has been designed such that the compiler can generate code for different backends upon request. By default Idris generates a C backend when generating an executable. Included within the standard Idris installation are backends for Javascript and Node.js.

However, there are third-party code generators out there. Below we describe some of these backends and how you can use them when compiling your Idris code. If you want to write your own codegen for your language there is a stub project on GitHub that can help point you in the right direction.

Official Backends

C Language
Javascript

To generate code that is tailored for running in the browser issue the following command:

$ idris --codegen javascript hello.idr -o hello.js

Idris can produce very big chunks of JavaScript code (hello world weighs in at 1500 lines). However, the generated code can be minified using the closure-compiler from Google.

java -jar compiler.jar --compilation_level ADVANCED_OPTIMIZATIONS --js hello.js
Node.js

Generating code for NodeJS is slightly different. Idris outputs a JavaScript file that can be directly executed via node.

$ idris --codegen node hello.idr -o hello
$ ./hello
Hello world

Third Party

注解

These are third-party code generations and may have bit-rotted or do not work with current versions of Idris. Please speak to the project’s maintainors if there are any problems.

CIL (.NET, Mono, Unity)
idris --codegen cil Main.idr -o HelloWorld.exe \
  && mono HelloWorld.exe

The resulting assemblies can also be used with .NET or Unity.

Requires idris-cil.

Java

Available online

idris hello.idr --codegen java -o hello.jar

Note: The resulting .jar is automatically prefixed by a header including an .sh script to allow executing it directly.

Malfunction

Available online

WebAssembly

Available online

Documenting Idris Code

Idris documentation comes in two major forms: comments, which exist for a reader’s edification and are ignored by the compiler, and inline API documentation, which the compiler parses and stores for future reference. To consult the documentation for a declaration f, write :doc f at the REPL or use the appropriate command in your editor (C-c C-d in Emacs, <LocalLeader>h in Vim).

Comments

Use comments to explain why code is written the way that it is. Idris’s comment syntax is the same as that of Haskell: lines beginning with -- are comments, and regions bracketed by {- and -} are comments even if they extend across multiple lines. These can be used to comment out lines of code or provide simple documentation for the readers of Idris code.

Inline Documentation

Idris also supports a comprehensive and rich inline syntax for Idris code to be generated. This syntax also allows for named parameters and variables within type signatures to be individually annotated using a syntax similar to Javadoc parameter annotations.

Documentation always comes before the declaration being documented. Inline documentation applies to either top-level declarations or to constructors. Documentation for specific arguments to constructors, type constructors, or functions can be associated with these arguments using their names.

The inline documentation for a declaration is an unbroken string of lines, each of which begins with ||| (three pipe symbols). The first paragraph of the documentation is taken to be an overview, and in some contexts, only this overview will be shown. After the documentation for the declaration as a whole, it is possible to associate documentation with specific named parameters, which can either be explicitly name or the results of converting free variables to implicit parameters. Annotations are the same as with Javadoc annotations, that is for the named parameter (n : T), the corresponding annotation is ||| @ n Some description that is placed before the declaration.

Documentation is written in Markdown, though not all contexts will display all possible formatting (for example, images are not displayed when viewing documentation in the REPL, and only some terminals render italics correctly). A comprehensive set of examples is given below.

||| Modules can also be documented.
module Docs

||| Add some numbers.
|||
||| Addition is really great. This paragraph is not part of the overview.
||| Still the same paragraph.
|||
||| You can even provide examples which are inlined in the documentation:
||| ```idris example
||| add 4 5
||| ```
|||
||| Lists are also nifty:
||| * Really nifty!
||| * Yep!
||| * The name `add` is a **bold** choice
||| @ n is the recursive param
||| @ m is not
add : (n, m : Nat) -> Nat
add Z     m = m
add (S n) m = S (add n m)


||| Append some vectors
||| @ a the contents of the vectors
||| @ xs the first vector (recursive param)
||| @ ys the second vector (not analysed)
appendV : (xs : Vect n a) -> (ys : Vect m a) -> Vect (add n m) a
appendV []      ys = ys
appendV (x::xs) ys = x :: appendV xs ys

||| Here's a simple datatype
data Ty =
  ||| Unit
  UNIT |
  ||| Functions
  ARR Ty Ty

||| Points to a place in a typing context
data Elem : Vect n Ty -> Ty -> Type where
  Here : {ts : Vect n Ty} -> Elem (t::ts) t
  There : {ts : Vect n Ty} -> Elem ts t -> Elem (t'::ts) t

||| A more interesting datatype
||| @ n the number of free variables
||| @ ctxt a typing context for the free variables
||| @ ty the type of the term
data Term : (ctxt : Vect n Ty) -> (ty : Ty) -> Type where

  ||| The constructor of the unit type
  ||| More comment
  ||| @ ctxt the typing context
  UnitCon : {ctxt : Vect n Ty} -> Term ctxt UNIT

  ||| Function application
  ||| @ f the function to apply
  ||| @ x the argument
  App : {ctxt : Vect n Ty} -> (f : Term ctxt (ARR t1 t2)) -> (x : Term ctxt t1) -> Term ctxt t2

  ||| Lambda
  ||| @ body the function body
  Lam : {ctxt : Vect n Ty} -> (body : Term (t1::ctxt) t2) -> Term ctxt (ARR t1 t2)

  ||| Variables
  ||| @ i de Bruijn index
  Var : {ctxt : Vect n Ty} -> (i : Elem ctxt t) -> Term ctxt t

||| A computation that may someday finish
codata Partial : Type -> Type where

  ||| A finished computation
  ||| @ value the result
  Now : (value : a) -> Partial a

  ||| A not-yet-finished computation
  ||| @ rest the remaining work
  Later : (rest : Partial a) -> Partial a

||| We can document records, including their fields and constructors
record Yummy where
  ||| Make a yummy
  constructor MkYummy
  ||| What to eat
  food : String

Packages

Idris includes a simple system for building packages from a package description file. These files can be used with the Idris compiler to manage the development process of your Idris programmes and packages.

Package Descriptions

A package description includes the following:

  • A header, consisting of the keyword package followed by the package name. Package names can be any valid Idris identifier. The iPKG format also takes a quoted version that accepts any valid filename.
  • Fields describing package contents, <field> = <value>

At least one field must be the modules field, where the value is a comma separated list of modules. For example, a library test which has two modules foo.idr and bar.idr as source files would be written as follows:

package foo

modules = foo, bar

Other examples of package files can be found in the libs directory of the main Idris repository, and in third-party libraries.

Metadata

From Idris v0.12 the iPKG format supports additional metadata associated with the package. The added fields are:

  • brief = "<text>", a string literal containing a brief description of the package.
  • version = <text>, a version string to associate with the package.
  • readme = <file>, location of the README file.
  • license = <text>, a string description of the licensing information.
  • author = <text>, the author information.
  • maintainer = <text>, Maintainer information.
  • homepage = <url>, the website associated with the package.
  • sourceloc = <url>, the location of the DVCS where the source can be found.
  • bugtracker = <url>, the location of the project’s bug tracker.
Common Fields

Other common fields which may be present in an ipkg file are:

  • sourcedir = <dir>, which takes the directory (relative to the current directory) which contains the source. Default is the current directory.
  • executable = <output>, which takes the name of the executable file to generate. Executable names can be any valid Idris identifier. the iPKG format also takes a quoted version that accepts any valid filename.
  • main = <module>, which takes the name of the main module, and must be present if the executable field is present.
  • opts = "<idris options>", which allows options to be passed to Idris.
  • pkgs = <pkg name> (',' <pkg name>)+, a comma separated list of package names that the Idris package requires.
Binding to C

In more advanced cases, particularly to support creating bindings to external C libraries, the following options are available:

  • makefile = <file>, which specifies a Makefile, to be built before the Idris modules, for example to support linking with a C library. When building, Idris sets the environment variables IDRIS_INCLUDES (with C include flags) and IDRIS_LDFLAGS (with C linking flags) so they can be used from inside the Makefile.
  • libs = <libs>, which takes a comma separated list of libraries which must be present for the package to be usable.
  • objs = <objs>, which takes a comma separated list of additional files to be installed (object files, headers), perhaps generated by the Makefile.
Testing

For testing Idris packages there is a rudimentary testing harness, run in the IO context. The iPKG file is used to specify the functions used for testing. The following option is available:

  • tests = <test functions>, which takes the qualified names of all test functions to be run.

重要

The modules containing the test functions must also be added to the list of modules.

Comments

Package files support comments using the standard Idris singleline -- and multiline {- -} format.

Using Package files

Given an Idris package file test.ipkg it can be used with the Idris compiler as follows:

  • idris --build test.ipkg will build all modules in the package
  • idris --install test.ipkg will install the package, making it accessible by other Idris libraries and programs.
  • idris --clean test.ipkg will delete all intermediate code and executable files generated when building.
  • idris --mkdoc test.ipkg will build HTML documentation for your package in the folder test_doc in your project’s root directory.
  • idris --installdoc test.ipkg will install the packages documentation into Idris’ central documentation folder located at idris --docdir.
  • idris --checkpkg test.ipkg will type check all modules in the package only. This differs from build that type checks and generates code.
  • idris --testpkg test.ipkg will compile and run any embedded tests you have specified in the tests parameter.

When building or install packages the commandline flag --warnipkg will audit the project and warn of any potentiable problems.

Once the test package has been installed, the command line option --package test makes it accessible (abbreviated to -p test). For example:

idris -p test Main.idr

Uniqueness Types

Uniqueness Types are an experimental feature available from Idris 0.9.15. A value with a unique type is guaranteed to have at most one reference to it at run-time, which means that it can safely be updated in-place, reducing the need for memory allocation and garbage collection. The motivation is that we would like to be able to write reactive systems, programs which run in limited memory environments, device drivers, and any other system with hard real-time requirements, ideally while giving up as little high level conveniences as possible.

They are inspired by linear types, Uniqueness Types in the Clean programming language, and ownership types and borrowed pointers in the Rust programming language.

Some things we hope to be able to do eventually with uniqueness types include:

  • Safe, pure, in-place update of arrays, lists, etc
  • Provide guarantees of correct resource usage, state transitions, etc
  • Provide guarantees that critical program fragments will never allocate

Using Uniqueness

If x : T and T : UniqueType, then there is at most one reference to x at any time during run-time execution. For example, we can declare the type of unique lists as follows:

data UList : Type -> UniqueType where
     Nil   : UList a
     (::)  : a -> UList a -> UList a

If we have a value xs : UList a, then there is at most one reference to xs at run-time. The type checker preserves this guarantee by ensuring that there is at most one reference to any value of a unique type in a pattern clause. For example, the following function definition would be valid:

umap : (a -> b) -> UList a -> UList b
umap f [] = []
umap f (x :: xs) = f x :: umap f xs

In the second clause, xs is a value of a unique type, and only appears once on the right hand side, so this clause is valid. Not only that, since we know there can be no other reference to the UList a argument, we can reuse its space for building the result! The compiler is aware of this, and compiles this definition to an in-place update of the list.

The following function definition would not be valid (even assuming an implementation of ++), however, since xs appears twice:

dupList : UList a -> UList a
dupList xs = xs ++ xs

This would result in a shared pointer to xs, so the typechecker reports:

unique.idr:12:5:Unique name xs is used more than once

If we explicitly copy, however, the typechecker is happy:

dup : UList a -> UList a
dup [] = []
dup (x :: xs) = x :: x :: dup xs

Note that it’s fine to use x twice, because a is a Type, rather than a UniqueType.

There are some other restrictions on where a UniqueType can appear, so that the uniqueness property is preserved. In particular, the type of the function type, (x : a) -> b depends on the type of a or b - if either is a UniqueType, then the function type is also a UniqueType. Then, in a data declaration, if the type constructor builds a Type, then no constructor can have a UniqueType. For example, the following definition is invalid, since it would embed a unique value in a possible non-unique value:

data BadList : UniqueType -> Type where
     Nil   : {a : UniqueType} -> BadList a
     (::)  : {a : UniqueType} -> a -> BadList a -> BadList a

Finally, types may be polymorphic in their uniqueness, to a limited extent. Since Type and UniqueType are different types, we are limited in how much we can use polymorphic functions on unique types. For example, if we have function composition defined as follows:

(.) : {a, b, c : Type} -> (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)

And we have some functions over unique types:

foo : UList a -> UList b
bar : UList b -> UList c

Then we cannot compose foo and bar as bar . foo, because UList does not compute a Type! Instead, we can define composition as follows:

(.) : {a, b, c : Type*} -> (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)

The Type* type stands for either unique or non-unique types. Since such a function may be passed a UniqueType, any value of type Type* must also satisfy the requirement that it appears at most once on the right hand side.

Borrowed Types

It quickly becomes obvious when working with uniqueness types that having only one reference at a time can be painful. For example, what if we want to display a list before updating it?

showU : Show a => UList a -> String
showU xs = "[" ++ showU' xs ++ "]" where
  showU' : UList a -> String
  showU' [] = ""
  showU' [x] = show x
  showU' (x :: xs) = show x ++ ", " ++ showU' xs

This is a valid definition of showU, but unfortunately it consumes the list! So the following function would be invalid:

printAndUpdate : UList Int -> IO ()
printAndUpdate xs = do putStrLn (showU xs)
                       let xs' = umap (*2) xs -- xs no longer available!
                       putStrLn (showU xs')

Still, one would hope to be able to display a unique list without problem, since it merely inspects the list; there are no updates. We can achieve this, using the notion of borrowing. A Borrowed type is a Unique type which can be inspected at the top level (by pattern matching, or by lending to another function) but no further. This ensures that the internals (i.e. the arguments to top level patterns) will not be passed to any function which will update them.

Borrowed converts a UniqueType to a BorrowedType. It is defined as follows (along with some additional rules in the typechecker):

data Borrowed : UniqueType -> BorrowedType where
     Read : {a : UniqueType} -> a -> Borrowed a

implicit
lend : {a : UniqueType} -> a -> Borrowed a
lend x = Read x

A value can be “lent” to another function using lend. Arguments to lend are not counted by the type checker as a reference to a unique value, therefore a value can be lent as many times as desired. Using this, we can write showU as follows:

showU : Show a => Borrowed (UList a) -> String
showU xs = "[" ++ showU' xs ++ "]" where
  showU' : Borrowed (UList a) -> String
  showU' [] = ""
  showU' [x] = show x
  showU' (Read (x :: xs)) = show x ++ ", " ++ showU' (lend xs)

Unlike a unique value, a borrowed value may be referred to as many times as desired. However, there is a restriction on how a borrowed value can be used. After all, much like a library book or your neighbour’s lawnmower, if a function borrows a value it is expected to return it in exactly the condition in which it was received!

The restriction is that when a Borrowed type is matched, any pattern variables under the Read which have a unique type may not be referred to at all on the right hand side (unless they are themselves lent to another function).

Uniqueness information is stored in the type, and in particular in function types. Once we’re in a unique context, any new function which is constructed will be required to have unique type, which prevents the following sort of bad program being implemented:

foo : UList Int -> IO ()
foo xs = do let f = \x : Int => showU xs
            putStrLn $ free xs
            putStrLn $ f 42
            pure ()

Since lend is implicit, in practice for functions to lend and borrow values merely requires the argument to be marked as Borrowed. We can therefore write showU as follows:

showU : Show a => Borrowed (UList a) -> String
showU xs = "[" ++ showU' xs ++ "]" where
  showU' : Borrowed (UList a) -> String
  showU' [] = ""
  showU' [x] = show x
  showU' (x :: xs) = show x ++ ", " ++ showU' xs
Problems/Disadvantages/Still to do…

This is a work in progress, there is lots to do. The most obvious problem is the loss of abstraction. On the one hand, we have more precise control over memory usage with UniqueType and BorrowedType, but they are not in general compatible with functions polymorphic over Type. In the short term, we can start to write reactive and low memory systems with this, but longer term it would be nice to support more abstraction.

We also haven’t checked any of the metatheory, so this could all be fatally flawed! The implementation is based to a large extent on Uniqueness Typing Simplified, by de Vries et al, so there is reason to believe things should be fine, but we still have to do the work.

Much as there are with linear types, there are some annoyances when trying to prove properties of functions with unique types (for example, what counts as a use of a value). Since we require at most one use of a value, rather than exactly one, this seems to be less of an issue in practice, but still needs thought.

New Foreign Function Interface

Ever since Idris has had multiple backends compiling to different target languages on potentially different platforms, we have had the problem that the foreign function interface (FFI) was written under the assumption of compiling to C. As a result, it has been hard to write generic code for multiple targets, or even to be sure that if code compiles that it will run on the expected target.

As of 0.9.17, Idris will have a new foreign function interface (FFI) which is aware of multiple targets. Users who are working with the default code generator can happily continue writing programs as before with no changes, but if you are writing bindings for an external library, writing a back end, or working with a non-C back end, there are some things you will need to be aware of, which this page describes.

The IO' monad, and main

The IO monad exists as before, but is now specific to the C backend (or, more precisely, any backend whose foreign function calls are compatible with C.) Additionally, there is now an IO' monad, which is parameterised over a FFI descriptor:

data IO' : (lang : FFI) -> Type -> Type

The Prelude defines two FFI descriptors which are imported automatically, for C and JavaScript/Node, and defines IO to use the C FFI and JS_IO to use the JavaScript FFI:

FFI_C  : FFI
FFI_JS : FFI

IO : Type -> Type
IO a = IO' FFI_C a

JS_IO : Type -> Type
JS_IO a = IO' FFI_JS a

As before, the entry point to an Idris program is main, but the type of main can now be any implementation of IO', e.g. the following are both valid:

main : IO ()
main : JS_IO ()

The FFI descriptor includes details about which types can be marshalled between the foreign language and Idris, and the “target” of a foreign function call (typically just a String representation of the function’s name, but potentially something more complicated such as an external library file or even a URL).

FFI descriptors

An FFI descriptor is a record containing a predicate which holds when a type can be marshalled, and the type of the target of a foreign call:

record FFI where
     constructor MkFFI
     ffi_types : Type -> Type
     ffi_fn : Type

For C, this is:

||| Supported C integer types
public export
data C_IntTypes : Type -> Type where
    C_IntChar   : C_IntTypes Char
    C_IntNative : C_IntTypes Int
    C_IntBits8  : C_IntTypes Bits8
    C_IntBits16 : C_IntTypes Bits16
    C_IntBits32 : C_IntTypes Bits32
    C_IntBits64 : C_IntTypes Bits64

||| Supported C function types
public export
data C_FnTypes : Type -> Type where
    C_Fn : C_Types s -> C_FnTypes t -> C_FnTypes (s -> t)
    C_FnIO : C_Types t -> C_FnTypes (IO' FFI_C t)
    C_FnBase : C_Types t -> C_FnTypes t

||| Supported C foreign types
public export
data C_Types : Type -> Type where
    C_Str   : C_Types String
    C_Float : C_Types Double
    C_Ptr   : C_Types Ptr
    C_MPtr  : C_Types ManagedPtr
    C_Unit  : C_Types ()
    C_Any   : C_Types (Raw a)
    C_FnT   : C_FnTypes t -> C_Types (CFnPtr t)
    C_IntT  : C_IntTypes i -> C_Types i

||| A descriptor for the C FFI. See the constructors of `C_Types`
||| and `C_IntTypes` for the concrete types that are available.
%error_reverse
public export
FFI_C : FFI
    FFI_C = MkFFI C_Types String String

Linking foreign code

This is the example of linking C code.

Example Makefile

Foreign calls

To call a foreign function, the foreign function is used. For example:

do_fopen : String -> String -> IO Ptr
do_fopen f m
   = foreign FFI_C "fileOpen" (String -> String -> IO Ptr) f m

The foreign function takes an FFI description, a function name (the type is given by the ffi_fn field of FFI_C here), and a function type, which gives the expected types of the remaining arguments. Here, we’re calling an external function fileOpen which takes, in the C, a char* file name, a char* mode, and returns a file pointer. It is the job of the C back end to convert Idris String to C char* and vice versa.

The argument types and return type given here must be present in the fn_types predicate of the FFI_C description for the foreign call to be valid.

Note The arguments to foreign must be known at compile time, because the foreign calls are generated statically. The %inline directive on a function can be used to give hints to help this, for example a shorthand for calling external JavaScript functions:

%inline
jscall : (fname : String) -> (ty : Type) ->
          {auto fty : FTy FFI_JS [] ty} -> ty
jscall fname ty = foreign FFI_JS fname ty
C callbacks

It is possible to pass an Idris function to a C function taking a function pointer by using CFnPtr in the function type. The Idris function is passed to MkCFnPtr in the arguments. The example below shows declaring the C standard library function qsort which takes a pointer to a comparison function.

myComparer : Ptr -> Ptr -> Int
myComparer = ...

qsort : Ptr -> Int -> Int -> IO ()
qsort data elems elsize = foreign FFI_C "qsort"
                (Ptr -> Int -> Int -> CFnPtr (Ptr -> Ptr -> Int) -> IO ())
                data elems elsize (MkCFnPtr myComparer)

There are a few limitations to callbacks in the C FFI. The foreign function can’t take the function to make a callback of as an argument. This will give a compilation error:

-- This does not work
example : (Int -> ()) -> IO ()
example f = foreign FFI_C "callbacker" (CFnPtr (Int -> ()) -> IO ()) f

Note that the function that is used as a callback can’t be a closure, that is it can’t be a partially applied function. This is because the mechanism used is unable to pass the closed-over values through C. If we want to pass Idris values to the callback we have to pass them through C explicitly. Non-primitive Idris values can be passed to C via the Raw type.

The other big limitation is that it doesn’t support IO functions. Use unsafePerformIO to wrap them (i.e. to make an IO function usable as a callback, change the return type from IOr to r, and change the = do to = unsafePerformIO $ do).

There are two special function names: %wrapper returns the function pointer that wraps an Idris function. This is useful if the function pointer isn’t taken by a C function directly but should be inserted into a data structure. A foreign declaration using %wrapper must return IO Ptr.

-- this returns the C function pointer to a qsort comparer
example_wrapper : IO Ptr
example_wrapper = foreign FFI_C "%wrapper" (CFnPtr (Ptr -> Ptr -> Int) -> IO Ptr)
                        (MkCFnPtr myComparer)

%dynamic calls a C function pointer with some arguments. This is useful if a C function returns or data structure contains a C function pointer, for example structs of function pointers are common in object-oriented C such as in COM or the Linux kernel. The function type contains an extra Ptr at the start for the function pointer. %dynamic can be seen as a pseudo-function that calls the function in the first argument, passing the remaining arguments to it.

-- we have a pointer to a function with the signature int f(int), call it
example_dynamic : Ptr -> Int -> IO Int
example_dynamic fn x = foreign FFI_C "%dynamic" (Ptr -> Int -> IO Int) fn x

If the foreign name is prefixed by a &, it is treated as a pointer to the global variable with the following name. The type must be just IO Ptr.

-- access the global variable errno
errno : IO Ptr
errno = foreign FFI_C "&errno" (IO Ptr)

If the foreign name is prefixed by a #, the name is pasted in literally. This is useful to access constants that are preprocessor definitions (like INT_MAX).

%include C "limits.h"

-- access the preprocessor definition INT_MAX
intMax : IO Int
intMax = foreign FFI_C "#INT_MAX" (IO Int)

main : IO ()
main = print !intMax

For more complicated interactions with C (such as reading and setting fields of a C struct), there is a module CFFI available in the contrib package.

C heap

Idris has two heaps where objects can be allocated:

FP heap C heap
Cheney-collected Mark-and-sweep-collected
Garbage collections touches only live objects. Garbage collection has to traverse all registered items.
Ideal for FP-style rapid allocation of lots of small short-lived pieces of memory, such as data constructors. Ideal for C-style allocation of a few big buffers.
Finalizers are impossible to support reasonably. Items have finalizers that are called on deallocation.
Data is copied all the time (when collecting garbage, modifying data, registering managed pointers, etc.) Copying does not happen.
Contains objects of various types. Contains C heap items: (void *) pointers with finalizers. A finalizer is a routine that deallocates the resources associated with the item.
Fixed set of object types. The data pointer may point to anything, as long as the finalizer cleans up correctly.
Not suitable for C resources and arbitrary pointers. Suitable for C resources and arbitrary pointers.
Values form a compact memory block. Items are kept in a linked list.
Any Idris value, most notably ManagedPtr. Items represented by the Idris type CData.
Data of ManagedPtr allocated in C, buffer then copied into the FP heap. Data allocated in C, pointer copied into the C heap.
Allocation and reallocation not possible from C code (without having a reference to the VM). Everything is copied instead. Allocated and reallocate freely in C, registering the allocated items in the FFI.

The FP heap is the primary heap. It may contain values of type CData, which are references to items in the C heap. A C heap item contains a (void *) pointer and the corresponding finalizer. Once a C heap item is no longer referenced from the FP heap, it is marked as unused and the next GC sweep will call its finalizer and deallocate it.

There is no Idris interface for CData other than its type and FFI.

Usage from C code
  • Although not enforced in code, CData is meant to be opaque and non-RTS code (such as libraries or C bindings) should access only its (void *) field called data.
  • Feel free to mutate both the pointer data (eg. after calling realloc) and the memory it points to. However, keep in mind that this must not break Idris’s referential transparency.
  • WARNING! If you call cdata_allocate or cdata_manage, the resulting CData object must be returned from your FFI function so that it is inserted in the C heap by the RTS. Otherwise the memory will be leaked.
some_allocating_fun : Int -> IO CData
some_allocating_fun i = foreign FFI_C "some_allocating_fun" (Int -> IO CData) i

other_fun : CData -> Int -> IO Int
other_fun cd i = foreign FFI_C "other_fun" (CData -> Int -> IO Int) cd i
#include "idris_rts.h"

static void finalizer(void * data)
{
    MyStruct * ptr = (MyStruct *) data;
    free_something(ptr->something);
    free(ptr);
}

CData some_allocating_fun(int arg)
{
    size_t size = sizeof(...);
    void * data = (void *) malloc(size);
    // ...
    return cdata_manage(data, size, finalizer);
}

int other_fun(CData cd, int arg)
{
    int result = foo(cd->data);
    return result;
}

The Raw type constructor allows you to access or return a runtime representation of the value. For instance, if you want to copy a string generated from C code into an Idris value, you may want to return a Raw String``instead of a ``String and use MKSTR or MKSTRlen to copy it over.

getString : () -> IO (Raw String)
getString () = foreign FFI_C "get_string" (IO (Raw String))
const VAL get_string ()
{
    char * c_string = get_string_allocated_with_malloc()
    const VAL idris_string = MKSTR(c_string);
    free(c_string);
    return idris_string
}
FFI implementation

In order to write bindings to external libraries, the details of how foreign works are unnecessary — you simply need to know that foreign takes an FFI descriptor, the function name, and its type. It is instructive to look a little deeper, however:

The type of foreign is as follows:

foreign : (ffi : FFI)
       -> (fname : ffi_fn f)
       -> (ty : Type)
       -> {auto fty : FTy ffi [] ty}
       -> ty

The important argument here is the implicit fty, which contains a proof (FTy) that the given type is valid according to the FFI description ffi:

data FTy : FFI -> List Type -> Type -> Type where
     FRet : ffi_types f t -> FTy f xs (IO' f t)
     FFun : ffi_types f s -> FTy f (s :: xs) t -> FTy f xs (s -> t)

Notice that this uses the ffi_types field of the FFI descriptor — these arguments to FRet and FFun give explicit proofs that the type is valid in this FFI. For example, the above do_fopen builds the following implicit proof as the fty argument to foreign:

FFun C_Str (FFun C_Str (FRet C_Ptr))

Compiling foreign calls

(This section assumes some knowledge of the Idris internals.)

When writing a back end, we now need to know how to compile foreign. We’ll skip the details here of how a foreign call reaches the intermediate representation (the IR), though you can look in IO.idr in the prelude package to see a bit more detail — a foreign call is implemented by the primitive function mkForeignPrim. The important part of the IR as defined in Lang.hs is the following constructor:

data LExp = ...
          | LForeign FDesc -- Function descriptor
                     FDesc -- Return type descriptor
                     [(FDesc, LExp)]

So, a foreign call appears in the IR as the LForeign constructor, which takes a function descriptor (of a type given by the ffi_fn field in the FFI descriptor), a return type descriptor (given by an application of FTy), and a list of arguments with type descriptors (also given by an application of FTy).

An FDesc describes an application of a name to some arguments, and is really just a simplified subset of an LExp:

data FDesc = FCon Name
           | FStr String
           | FUnknown
           | FApp Name [FDesc]

There are corresponding structures in the lower level IRs, such as the defunctionalised, simplified and bytecode forms.

Our do_fopen example above arrives in the LExp form as:

LForeign (FStr "fileOpen") (FCon (sUN "C_Ptr"))
         [(FCon (sUN "C_Str"), f), (FCon (sUN "C_Str"), m)]

(Assuming that f and m stand for the LExp representations of the arguments.) This information should be enough for any back end to marshal the arguments and return value appropriately.

注解

When processing FDesc, be aware that there may be implicit arguments, which have not been erased. For example, C_IntT has an implicit argument i, so will appear in an FDesc as something of the form FApp (sUN "C_IntT") [i, t] where i is the implicit argument (which can be ignored) and t is the descriptor of the integer type. See CodegenC.hs, specifically the function toFType, to see how this works in practice.

JavaScript FFI descriptor

The JavaScript FFI descriptor is a little more complex, because the JavaScript FFI supports marshalling functions. It is defined as follows:

mutual
  data JsFn t = MkJsFn t

  data JS_IntTypes  : Type -> Type where
       JS_IntChar   : JS_IntTypes Char
       JS_IntNative : JS_IntTypes Int

  data JS_FnTypes : Type -> Type where
       JS_Fn     : JS_Types s -> JS_FnTypes t -> JS_FnTypes (s -> t)
       JS_FnIO   : JS_Types t -> JS_FnTypes (IO' l t)
       JS_FnBase : JS_Types t -> JS_FnTypes t

  data JS_Types : Type -> Type where
       JS_Str   : JS_Types String
       JS_Float : JS_Types Double
       JS_Ptr   : JS_Types Ptr
       JS_Unit  : JS_Types ()
       JS_FnT   : JS_FnTypes a -> JS_Types (JsFn a)
       JS_IntT  : JS_IntTypes i -> JS_Types i

The reason for wrapping function types in a JsFn is to help the proof search when building FTy. We hope to improve proof search eventually, but for the moment it works much more reliably if the indices are disjoint! An example of using this appears in IdrisScript when setting timeouts:

setTimeout : (() -> JS_IO ()) -> (millis : Int) -> JS_IO Timeout
setTimeout f millis = do
  timeout <- jscall "setTimeout(%0, %1)"
                    (JsFn (() -> JS_IO ()) -> Int -> JS_IO Ptr)
                    (MkJsFn f) millis
  pure $ MkTimeout timeout

Syntax Guide

Examples are mostly adapted from the Idris tutorial.

Source File Structure

Source files consist of:

  1. An optional Module Header.
  2. Zero or more Imports.
  3. Zero or more declarations, e.g. Variables, Data types, etc.

For example:

module MyModule   -- module header

import Data.Vect  -- an import

%default total    -- a directive

foo : Nat         -- a declaration
foo = 5
Module Header

A file can start with a module header, introduced by the module keyword:

module Semantics

Module names can be hierarchical, with parts separated by .:

module Semantics.Transform

Each file can define only a single module, which includes everything defined in that file.

Like with declarations, a docstring can be used to provide documentation for a module:

||| Implementation of predicate transformer semantics.
module Semantics.Transform
Imports

An import makes the names in another module available for use by the current module:

import Data.Vect

All the declarations in an imported module are available for use in the file. In a case where a name is ambiguous — e.g. because it is imported from multiple modules, or appears in multiple visible namespaces — the ambiguity can be resolved using Qualified Names. (Often, the compiler can resolve the ambiguity for you, using the types involved.)

Imported modules can be given aliases to make qualified names more compact:

import Data.Vect as V

Note that names made visible by import are not, by default, re-exported to users of the module being written. This can be done using import public:

import public Data.Vect

Variables

A variable is always defined by defining its type on one line, and its value on the next line, using the syntax

<id> : <type>
<id> = <value>

Examples

x : Int
x = 100
hello : String
hello = "hello"

Types

In Idris, types are first class values. So a type declaration is the same as just declaration of a variable whose type is Type. In Idris, variables that denote a type need not be capitalised. Example:

MyIntType : Type
MyIntType = Int

a more interesting example:

MyListType : Type
MyListType = List Int

While capitalising types is not required, the rules for generating implicit arguments mean it is often a good idea.

Data types

Idris provides two kinds of syntax for defining data types. The first, Haskell style syntax, defines a regular algebraic data type. For example

data Either a b = Left a | Right b

or

data List a = Nil | (::) a (List a)

The second, more general kind of data type, is defined using Agda or GADT style syntax. This syntax defines a data type that is parameterised by some values (in the Vect example, a value of type Nat and a value of type Type).

data Vect : Nat -> Type -> Type where
  Nil  : Vect Z a
  (::) : (x : a) -> (xs : Vect n a) -> Vect (S n) a

The signature of type constructors may use dependent types

data DPair : (a : Type) -> (a -> Type) -> Type where
  MkDPair : {P : a -> Type} -> (x : a) -> (pf : P x) -> DPair a P
Records

There is a special syntax for data types with one constructor and multiple fields.

record A a where
  constructor MkA
  foo, bar : a
  baz : Nat

This defines a constructor as well as getter and setter function for each field.

MkA : a -> a -> Nat -> A a
foo : A a -> a
set_foo : a -> A a -> A a

The types of record fields may depend on the value of other fields

record Collection a where
  constructor MkCollection
  size : Nat
  items : Vect size a

Setter functions are only provided for fields that do not use dependant types. In the example above neither set_size nor set_items are defined.

Co-data

Inifinite data structures can be introduced with the codata keyword.

codata Stream : Type -> Type where
  (::) a -> Stream a -> Stream a

This is syntactic sugar for the following, which is usually preferred:

data Stream : Type -> Type where
  (::) a -> Inf (Stream a) -> Stream a

Every occurence of the defined type in a constructor argument will be wrapped in the Inf type constructor. This has the effect of delaying the evaluation of the second argument when the data constructor is applied. An Inf argument is constructed using Delay (which Idris will insert implicitly) and evaluated using Force (again inserted implicitly).

Furthermore, recursive calls under a Delay must be guarded by a constructor to pass the totality checker.

Operators

Arithmetic
x + y
x - y
x * y
x / y
(x * y) + (a / b)
Equality and Relational
x == y
x /= y
x >= y
x > y
x <= y
x < y
Conditional
x && y
x || y
not x

Conditionals

If Then Else
if <test> then <true> else <false>
Case Expressions
case <test> of
    <case 1>  => <expr>
    <case 2>  => <expr>
    ...
    otherwise => <expr>

Functions

Named

Named functions are defined in the same way as variables, with the type followed by the definition.

<id> : <argument type> -> <return type>
<id> arg = <expr>

Example

plusOne : Int -> Int
plusOne x = x + 1

Functions can also have multiple inputs, for example

makeHello : String -> String -> String
makeHello first last = "hello, my name is " ++ first ++ " " ++ last

Functions can also have named arguments. This is required if you want to annotate parameters in a docstring. The following shows the same makeHello function as above, but with named parameters which are also annotated in the docstring

||| Makes a string introducing a person
||| @first The person's first name
||| @last The person's last name
makeHello : (first : String) -> (last : String) -> String
makeHello first last = "hello, my name is " ++ first ++ " " ++ last

Like Haskell, Idris functions can be defined by pattern matching. For example

sum : List Int -> Int
sum []        = 0
sum (x :: xs) = x + (sum xs)

Similarly case analysis looks like

answerString : Bool -> String
answerString False = "Wrong answer"
answerString True = "Correct answer"
Dependent Functions

Dependent functions are functions where the type of the return value depends on the input value. In order to define a dependent function, named parameters must be used, since the parameter will appear in the return type. For example, consider

zeros : (n : Nat) -> Vect n Int
zeros Z     = []
zeros (S k) = 0 :: (zeros k)

In this example, the return type is Vect n Int which is an expression which depends on the input parameter n.

Anonymous

Arguments in anonymous functions are separated by comma.

(\x => <expr>)
(\x, y => <expr>)
Modifiers
Visibility
public export
export
private
Totality
total
partial
covering

Sets explicitly to which extent pattern matching is terminating and/or exhaustive. A partial pattern matching makes no assumption. A covering pattern matching ensures that pattern matching is exhaustive on its clauses. Furthermore, a total pattern matching enforces both exhaustivity and termination of the evaluation of its clauses.

Implicit Coercion
implicit
Options
%export
%hint
%no_implicit
%error_handler
%error_reverse
%reflection
%specialise [<name list>]

Misc

Qualified Names

If multiple declarations with the same name are visible, using the name can result in an ambiguous situation. The compiler will attempt to resolve the ambiguity using the types involved. If it’s unable — for example, because the declarations with the same name also have the same type signatures — the situation can be cleared up using a qualified name.

A qualified name has the symbol’s namespace prefixed, separated by a .:

Data.Vect.length

This would specifically reference a length declaration from Data.Vect.

Qualified names can be written using two different shorthands:

  1. Names in modules that are imported using an alias can be qualified by the alias.
  2. The name can be qualified by the shortest unique suffix of the namespace in question. For example, the length case above can likely be shortened to Vect.length.
Comments
-- Single Line
{- Multiline -}
||| Docstring (goes before definition)
Multi line String literals
foo = """
this is a
string literal"""

Directives

%lib <path>
%link <path>
%flag <path>
%include <path>
%hide <function>
%freeze <name>
%access <accessibility>
%default <totality>
%logging <level 0--11>
%dynamic <list of libs>
%name <list of names>
%error_handlers <list of names>
%language <extension>

Erasure By Usage Analysis

This work stems from this feature proposal (obsoleted by this page). Beware that the information in the proposal is out of date — and sometimes even in direct contradiction with the eventual implementation.

Motivation

Traditional dependently typed languages (Agda, Coq) are good at erasing proofs (either via irrelevance or an extra universe).

half : (n : Nat) -> Even n -> Nat
half Z EZ = Z
half (S (S n)) (ES pf) = S (half n pf)

For example, in the above snippet, the second argument is a proof, which is used only to convince the compiler that the function is total. This proof is never inspected at runtime and thus can be erased. In this case, the mere existence of the proof is sufficient and we can use irrelevance-related methods to achieve erasure.

However, sometimes we want to erase indices and this is where the traditional approaches stop being useful, mainly for reasons described in the original proposal.

uninterleave : {n : Nat} -> Vect (n * 2) a -> (Vect n a, Vect n a)
uninterleave [] = ([] , [])
uninterleave (x :: y :: rest) with (unzipPairs rest)
  | (xs, ys) = (x :: xs, y :: ys)

Notice that in this case, the second argument is the important one and we would like to get rid of the n instead, although the shape of the program is generally the same as in the previous case.

There are methods described by Brady, McBride and McKinna in [BMM04] to remove the indices from data structures, exploiting the fact that functions operating on them either already have a copy of the appropriate index or the index can be quickly reconstructed if needed. However, we often want to erase the indices altogether, from the whole program, even in those cases where reconstruction is not possible.

The following two sections describe two cases where doing so improves the runtime performance asymptotically.

Binary numbers
  • O(n) instead of O(log n)

Consider the following Nat-indexed type family representing binary numbers:

data Bin : Nat -> Type where
  N : Bin 0
  O : {n : Nat} -> Bin n -> Bin (0 + 2*n)
  I : {n : Nat} -> Bin n -> Bin (1 + 2*n)

These are supposed to be (at least asymptotically) fast and memory-efficient because their size is logarithmic compared to the numbers they represent.

Unfortunately this is not the case. The problem is that these binary numbers still carry the unary indices with them, performing arithmetic on the indices whenever arithmetic is done on the binary numbers themselves. Hence the real representation of the number 15 looks like this:

I -> I -> I -> I -> N
S    S    S    Z
S    S    Z
S    S
S    Z
S
S
S
Z

The used memory is actually linear, not logarithmic and therefore we cannot get below O(n) with time complexities.

One could argue that Idris in fact compiles Nat via GMP but that’s a moot point for two reasons:

  • First, whenever we try to index our data structures with anything else than Nat, the compiler is not going to come to the rescue.
  • Second, even with Nat, the GMP integers are still there and they slow the runtime down.

This ought not to be the case since the Nat are never used at runtime and they are only there for typechecking purposes. Hence we should get rid of them and get runtime code similar to what an Idris programmer would write.

U-views of lists
  • O(n^2) instead of O(n)

Consider the type of U-views of lists:

data U : List a -> Type where
  nil : U []
  one : (z : a) -> U [z]
  two : {xs : List a} -> (x : a) -> (u : U xs) -> (y : a) -> U (x :: xs ++ [y])

For better intuition, the shape of the U-view of [x0,x1,x2,z,y2,y1,y0] looks like this:

x0   y0    (two)
x1   y1    (two)
x2   y2    (two)
   z       (one)

When recursing over this structure, the values of xs range over [x0,x1,x2,z,y2,y1,y0], [x1,x2,z,y2,y1], [x2,z,y2], [z]. No matter whether these lists are stored or built on demand, they take up a quadratic amount of memory (because they cannot share nodes), and hence it takes a quadratic amount of time just to build values of this index alone.

But the reasonable expectation is that operations with U-views take linear time — so we need to erase the index xs if we want to achieve this goal.

Changes to Idris

Usage analysis is run at every compilation and its outputs are used for various purposes. This is actually invisible to the user but it’s a relatively big and important change, which enables the new features.

Everything that is found to be unused is erased. No annotations are needed, just don’t use the thing and it will vanish from the generated code. However, if you wish, you can use the dot annotations to get a warning if the thing is accidentally used.

“Being used” in this context means that the value of the “thing” may influence run-time behaviour of the program. (More precisely, it is not found to be irrelevant to the run-time behaviour by the usage analysis algorithm.)

“Things” considered for removal by erasure include:

  • function arguments
  • data constructor fields (including record fields and dictionary fields of interface implementations)

For example, Either often compiles to the same runtime representation as Bool. Constructor field removal sometimes combines with the newtype optimisation to have quite a strong effect.

There is a new compiler option --warnreach, which will enable warnings coming from erasure. Since we have full usage analysis, we can compile even those programs that violate erasure annotations – it’s just that the binaries may run slower than expected. The warnings will be enabled by default in future versions of Idris (and possibly turned to errors). However, in this transitional period, we chose to keep them on-demand to avoid confusion until better documentation is written.

Case-tree elaboration tries to avoid using dotted “things” whenever possible. (NB. This is not yet perfect and it’s being worked on: https://gist.github.com/ziman/10458331)

Postulates are no longer required to be collapsible. They are now required to be unused instead.

Changes to the language

You can use dots to mark fields that are not intended to be used at runtime.

data Bin : Nat -> Type where
  N : Bin 0
  O : .{n : Nat} -> Bin n -> Bin (0 + 2*n)
  I : .{n : Nat} -> Bin n -> Bin (1 + 2*n)

If these fields are found to be used at runtime, the dots will trigger a warning (with --warnreach).

Note that free (unbound) implicits are dotted by default so, for example, the constructor O can be defined as:

O : Bin n -> Bin (0 + 2*n)

and this is actually the preferred form.

If you have a free implicit which is meant to be used at runtime, you have to change it into an (undotted) {bound : implicit}.

You can also put dots in types of functions to get more guarantees.

half : (n : Nat) -> .(pf : Even n) -> Nat

and free implicits are automatically dotted here, too.

What it means

Dot annotations serve two purposes:

  • influence case-tree elaboration to avoid dotted variables
  • trigger warnings when a dotted variable is used

However, there’s no direct connection between being dotted and being erased. The compiler erases everything it can, dotted or not. The dots are there mainly to help the programmer (and the compiler) refrain from using the values they want to erase.

How to use it

Ideally, few or no extra annotations are needed – in practice, it turns out that having free implicits automatically dotted is enough to get good erasure.

Therefore, just compile with --warnreach to see warnings if erasure cannot remove parts of the program.

However, those programs that have been written without runtime behaviour in mind, will need some help to get in the form that compiles to a reasonable binary. Generally, it’s sufficient to follow erasure warnings (which may be sometimes unhelpful at the moment).

Benchmarks

It can be clearly seen that asymptotics are improved by erasure.

Shortcomings

You can’t get warnings in libraries because usage analysis starts from Main.main. This will be solved by the planned %default_usage pragma.

Usage warnings are quite bad and unhelpful at the moment. We should include more information and at least translate argument numbers to their names.

There is no decent documentation yet. This wiki page is the first one.

There is no generally accepted terminology. We switch between “dotted”, “unused”, “erased”, “irrelevant”, “inaccessible”, while each has a slightly different meaning. We need more consistent and understandable naming.

If the same type is used in both erased and non-erased context, it will retain its fields to accommodate the least common denominator – the non-erased context. This is particularly troublesome in the case of the type of (dependent) pairs, where it actually means that no erasure would be performed. We should probably locate disjoint uses of data types and split them into “sub-types”. There are three different flavours of dependent types now: Sigma (nothing erased), Exists (first component erased), Subset (second component erased).

Case-tree building does not avoid dotted values coming from pattern-matched constructors (https://gist.github.com/ziman/10458331). This is to be fixed soon. (Fixed.)

Higher-order function arguments and opaque functional variables are considered to be using all their arguments. To work around this, you can force erasure via the type system, using the Erased wrapper: https://github.com/idris-lang/Idris-dev/blob/master/libs/base/Data/Erased.idr

Interface methods are considered to be using the union of all their implementations. In other words, an argument of a method is unused only if it is unused in every implementation of the method that occurs in the program.

Planned features

  • Fixes to the above shortcomings in general.
  • Improvements to the case-tree elaborator so that it properly avoids
    dotted fields of data constructors. Done.
  • Compiler pragma %default_usage used/unused and per-function
    overrides used and unused, which allow the programmer to mark the return value of a function as used, even if the function is not used in main (which is the case when writing library code). These annotations will help library writers discover usage violations in their code before it is actually published and used in compiled programs.

Troubleshooting

My program is slower

The patch introducing erasure by usage analysis also disabled some optimisations that were in place before; these are subsumed by the new erasure. However, in some erasure-unaware programs, where erasure by usage analysis does not exercise its full potential (but the old optimisations would have worked), certain slowdown may be observed (up to ~10% according to preliminary benchmarking), due to retention and computation of information that should not be necessary at runtime.

A simple check whether this is the case is to compile with --warnreach. If you see warnings, there is some unnecessary code getting compiled into the binary.

The solution is to change the code so that there are no warnings.

Usage warnings are unhelpful

This is a known issue and we are working on it. For now, see the section How to read and resolve erasure warnings.

There should be no warnings in this function

A possible cause is non-totality of the function (more precisely, non-coverage). If a function is non-covering, the program needs to inspect all arguments in order to detect coverage failures at runtime. Since the function inspects all its arguments, nothing can be erased and this may transitively cause usage violations. The solution is to make the function total or accept the fact that it will use its arguments and remove some dots from the appropriate constructor fields and function arguments. (Please note that this is not a shortcoming of erasure and there is nothing we can do about it.)

Another possible cause is the currently imperfect case-tree elaboration, which does not avoid dotted constructor fields (see https://gist.github.com/ziman/10458331). You can either rephrase the function or wait until this is fixed, hopefully soon. Fixed.

The compiler refuses to recognise this thing as erased

You can force anything to be erased by wrapping it in the Erased monad. While this program triggers usage warnings,

f : (g : Nat -> Nat) -> .(x : Nat) -> Nat
f g x = g x  -- WARNING: g uses x

the following program does not:

f : (g : Erased Nat -> Nat) -> .(x : Nat) -> Nat
f g x = g (Erase x)  -- OK

How to read and resolve erasure warnings

Example 1

Consider the following program:

vlen : Vect n a -> Nat
vlen {n = n} xs = n

sumLengths : List (Vect n a) -> Nat
sumLengths       []  = 0
sumLengths (v :: vs) = vlen v + sumLengths vs

main : IO ()
main = print . sumLengths $ [[0,1],[2,3]]

When you compile it using --warnreach, there is one warning:

Main.sumLengths: inaccessible arguments reachable:
  n (no more information available)

The warning does not contain much detail at this point so we can try compiling with --dumpcases cases.txt and look up the compiled definition in cases.txt:

Main.sumLengths {e0} {e1} {e2} =
  case {e2} of
  | Prelude.List.::({e6}) => LPlus (ATInt ITBig)({e0}, Main.sumLengths({e0}, ____, {e6}))
  | Prelude.List.Nil() => 0

The reason for the warning is that sumLengths calls vlen, which gets inlined. The second clause of sumLengths then accesses the variable n, compiled as {e0}. Since n is a free implicit, it is automatically considered dotted and this triggers the warning.

A solution would be either making the argument n a bound implicit parameter to indicate that we wish to keep it at runtime,

sumLengths : {n : Nat} -> List (Vect n a) -> Nat

or fixing vlen to not use the index:

vlen : Vect n a -> Nat
vlen [] = Z
vlen (x :: xs) = S (vlen xs)

Which solution is appropriate depends on the usecase.

Example 2

Consider the following program manipulating value-indexed binary numbers.

data Bin : Nat -> Type where
    N : Bin Z
    O : Bin n -> Bin (0 + n + n)
    I : Bin n -> Bin (1 + n + n)

toN : (b : Bin n) -> Nat
toN  N = Z
toN (O {n} bs) = 0 + n + n
toN (I {n} bs) = 1 + n + n

main : IO ()
main = print . toN $ I (I (O (O (I N))))

In the function toN, we attempted to “cheat” and instead of traversing the whole structure, we just projected the value index n out of constructors I and O. However, this index is a free implicit, therefore it is considered dotted.

Inspecting it then produces the following warnings when compiling with --warnreach:

Main.I: inaccessible arguments reachable:
  n from Main.toN arg# 1
Main.O: inaccessible arguments reachable:
  n from Main.toN arg# 1

We can see that the argument n of both I and O is used in the function toN, argument 1.

At this stage of development, warnings only contain argument numbers, not names; this will hopefully be fixed. When numbering arguments, we go from 0, taking free implicits first, left-to-right; then the bound arguments. The function toN has therefore in fact two arguments: n (argument 0) and b (argument 1). And indeed, as the warning says, we project the dotted field from b.

Again, one solution is to fix the function toN to calculate its result honestly; the other one is to accept that we carry a Nat with every constructor of Bin and make it a bound implicit:

O : {n : Nat} -> Bin n -> Bin (0 + n + n)
I : {n : Nat} -> bin n -> Bin (1 + n + n)

References

[BMM04]Edwin Brady, Conor McBride, James McKinna: Inductive families need not store their indices

The IDE Protocol

The Idris REPL has two modes of interaction: a human-readable syntax designed for direct use in a terminal, and a machine-readable syntax designed for using Idris as a backend for external tools.

Protocol Overview

The communication protocol is of asynchronous request-reply style: a single request from the client is handled by Idris at a time. Idris waits for a request on its standard input stream, and outputs the answer or answers to standard output. The result of a request can be either success, failure, or intermediate output; and furthermore, before the result is delivered, there might be additional meta-messages.

A reply can consist of multiple messages: any number of messages to inform the user about the progress of the request or other informational output, and finally a result, either ok or error.

The wire format is the length of the message in characters, encoded in 6 characters hexadecimal, followed by the message encoded as S-expression (sexp). Additionally, each request includes a unique integer (counting upwards), which is repeated in all messages corresponding to that request.

An example interaction from loading the file /home/hannes/empty.idr looks as follows on the wire::

00002a((:load-file "/home/hannes/empty.idr") 1)
000039(:write-string "Type checking /home/hannes/empty.idr" 1)
000025(:set-prompt "/home/hannes/empty" 1)
000032(:return (:ok "Loaded /home/hannes/empty.idr") 1)

The first message is the request from idris-mode to load the specific file, which length is hex 2a, decimal 42 (including the newline at the end). The request identifier is set to 1. The first message from Idris is to write the string Type checking /home/hannes/empty.idr, another is to set the prompt to */home/hannes/empty. The answer, starting with :return is ok, and additional information is that the file was loaded.

There are three atoms in the wire language: numbers, strings, and symbols. The only compound object is a list, which is surrounded by parenthesis. The syntax is:

A ::= NUM | '"' STR '"' | ':' ALPHA+
S ::= A | '(' S* ')' | nil

where NUM is either 0 or a positive integer, ALPHA is an alphabetical character, and STR is the contents of a string, with " escaped by a backslash. The atom nil is accepted instead of () for compatibility with some regexp pretty-printing routines.

The state of the Idris process is mainly the active file, which needs to be kept synchronised between the editor and Idris. This is achieved by the already seen :load-file command.

The available commands include:

(:load-file FILENAME [LINE])
Load the named file. If a LINE number is provided, the file is only loaded up to that line. Otherwise, the entire file is loaded.
(:interpret STRING)
Interpret STRING at the Idris REPL, returning a highlighted result.
(:type-of STRING)
Return the type of the name, written with Idris syntax in the STRING. The reply may contain highlighting information.
(:case-split LINE NAME)
Generate a case-split for the pattern variable NAME on program line LINE. The pattern-match cases to be substituted are returned as a string with no highlighting.
(:add-clause LINE NAME)
Generate an initial pattern-match clause for the function declared as NAME on program line LINE. The initial clause is returned as a string with no highlighting.
(:add-proof-clause LINE NAME)
Add a clause driven by the <== syntax.
(:add-missing LINE NAME)
Add the missing cases discovered by totality checking the function declared as NAME on program line LINE. The missing clauses are returned as a string with no highlighting.
(:make-with LINE NAME)
Create a with-rule pattern match template for the clause of function NAME on line LINE. The new code is returned with no highlighting.
(:make-case LINE NAME)
Create a case pattern match template for the clause of function NAME on line LINE. The new code is returned with no highlighting.
(:make-lemma LINE NAME)
Create a top level function with a type which solves the hole named NAME on line LINE.
(:proof-search LINE NAME HINTS)
Attempt to fill out the holes on LINE named NAME by proof search. HINTS is a possibly-empty list of additional things to try while searching.
(:docs-for NAME [MODE])
Look up the documentation for NAME, and return it as a highlighted string. If MODE is :overview, only the first paragraph of documentation is provided for NAME. If MODE is :full, or omitted, the full documentation is returned for NAME.
(:apropos STRING)
Search the documentation for mentions of STRING, and return any found as a list of highlighted strings.
(:metavariables WIDTH)
List the currently-active holes, with their types pretty-printed with WIDTH columns.
(:who-calls NAME)
Get a list of callers of NAME.
(:calls-who NAME)
Get a list of callees of NAME.
(:browse-namespace NAMESPACE)
Return the contents of NAMESPACE, like :browse at the command-line REPL.
(:normalise-term TM)
Return a highlighted string consisting of the results of normalising the serialised term TM (which would previously have been sent as the tt-term property of a string).
(:show-term-implicits TM)
Return a highlighted string consisting of the results of making all arguments in serialised term TM (which would previously have been sent as the tt-term property of a string) explicit.
(:hide-term-implicits TM)
Return a highlighted string consisting of the results of making all arguments in serialised term TM (which would previously have been sent as the tt-term property of a string) follow their usual implicitness setting.
(:elaborate-term TM)
Return a highlighted string consisting of the core language term corresponding to serialised term TM (which would previously have been sent as the tt-term property of a string).
(:print-definition NAME)
Return the definition of NAME as a highlighted string.
(:repl-completions NAME)
Search names, types and documentations which contain NAME. Return the result of tab-completing NAME as a REPL command.
:version
Return the version information of the Idris compiler.

Possible replies include a normal final reply::

(:return (:ok SEXP [HIGHLIGHTING]))
(:return (:error String [HIGHLIGHTING]))

A normal intermediate reply::

(:output (:ok SEXP [HIGHLIGHTING]))
(:output (:error String [HIGHLIGHTING]))

Informational and/or abnormal replies::

(:write-string String)
(:set-prompt String)
(:warning (FilePath (LINE COL) (LINE COL) String [HIGHLIGHTING]))

Proof mode replies::

(:start-proof-mode)
(:write-proof-state [String] [HIGHLIGHTING])
(:end-proof-mode)
(:write-goal String)

Output Highlighting

Idris mode supports highlighting the output from Idris. In reality, this highlighting is controlled by the Idris compiler. Some of the return forms from Idris support an optional extra parameter: a list mapping spans of text to metadata about that text. Clients can then use this list both to highlight the displayed output and to enable richer interaction by having more metadata present. For example, the Emacs mode allows right-clicking identifiers to get a menu with access to documentation and type signatures.

A particular semantic span is a three element list. The first element of the list is the index at which the span begins, the second element is the number of characters included in the span, and the third is the semantic data itself. The semantic data is a list of lists. The head of each list is a key that denotes what kind of metadata is in the list, and the tail is the metadata itself.

The following keys are available:
name
gives a reference to the fully-qualified Idris name
implicit
provides a Boolean value that is True if the region is the name of an implicit argument
decor
describes the category of a token, which can be type, function, data, keyword, or bound.
source-loc

states that the region refers to a source code location. Its body is a collection of key-value pairs, with the following possibilities:

filename
provides the filename
start
provides the line and column that the source location starts at as a two-element tail
end
provides the line and column that the source location ends at as a two-element tail
text-formatting
provides an attribute of formatted text. This is for use with natural-language text, not code, and is presently emitted only from inline documentation. The potential values are bold, italic, and underline.
link-href
provides a URL that the corresponding text is a link to.
quasiquotation
states that the region is quasiquoted.
antiquotation
states that the region is antiquoted.
tt-term
A serialised representation of the Idris core term corresponding to the region of text.

Source Code Highlighting

Idris supports instructing editors how to colour their code. When elaborating source code or REPL input, Idris will locate regions of the source code corresponding to names, and emit information about these names using the same metadata as output highlighting.

These messages will arrive as replies to the command that caused elaboration to occur, such as :load-file or :interpret. They have the format::

(:output (:ok (:highlight-source POSNS)))

where POSNS is a list of positions to highlight. Each of these is a two-element list whose first element is a position (encoded as for the source-loc property above) and whose second element is highlighting metadata in the same format used for output.

Semantic Highlighting & Pretty Printing

Since v0.9.18 Idris comes with support for semantic highlighting. When using the REPL or IDE support, Idris will highlight your code accordingly to its meaning within the Idris structure. A precursor to semantic highlighting support is the pretty printing of definitions to console, LaTeX, or HTML.

The default styling scheme used was inspired by Conor McBride’s own set of stylings, informally known as Conor Colours.

Legend

The concepts and their default stylings are as follows:

Idris Term HTML LaTeX IDE/REPL
Bound Variable Purple Magenta  
Keyword Bold Underlined  
Function Green Green  
Type Blue Blue  
Data Red Red  
Implicit Italic Purple Italic Magenta  

Pretty Printing

Idris also supports the pretty printing of code to HTML and LaTeX using the commands:

  • :pp <latex|html> <width> <function name>
  • :pprint <latex|html> <width> <function name>

Customisation

If you are not happy with the colours used, the VIM and Emacs editor support allows for customisation of the colours. When pretty printing Idris code as LaTeX and HTML, commands and a CSS style are provided. The colours used by the REPL can be customised through the initialisation script.

Further Information

Please also see the Idris Extras project for links to editor support, and pre-made style files for LaTeX and HTML.

DEPRECATED: Tactics and Theorem Proving

警告

The interactive theorem-proving interface documented here has been deprecated in favor of Elaborator Reflection.

Idris supports interactive theorem proving, and the analyse of context through holes. To list all unproven holes, use the command :m. This will display their qualified names and the expected types. To interactively prove a holes, use the command :p name where name is the hole. Once the proof is complete, the command :a will append it to the current module.

Once in the interactive prover, the following commands are available:

Basic commands

  • :q - Quits the prover (gives up on proving current lemma).
  • :abandon - Same as :q
  • :state - Displays the current state of the proof.
  • :term - Displays the current proof term complete with its yet-to-be-filled holes (is only really useful for debugging).
  • :undo - Undoes the last tactic.
  • :qed - Once the interactive theorem prover tells you “No more goals,” you get to type this in celebration! (Completes the proof and exits the prover)

Commonly Used Tactics

Compute
  • compute - Normalises all terms in the goal (note: does not normalise assumptions)
----------                 Goal:                  ----------
 (Vect (S (S Z + (S Z) + (S n))) Nat) -> Vect (S (S (S (S n)))) Nat
-lemma> compute
----------                 Goal:                  ----------
 (Vect (S (S (S (S n)))) Nat) -> Vect (S (S (S (S n)))) Nat
-lemma>
Exact
  • exact - Provide a term of the goal type directly.
----------                 Goal:                  ----------
 Nat
-lemma> exact Z
lemma: No more goals.
-lemma>
Refine
  • refine - Use a name to refine the goal. If the name needs arguments, introduce them as new goals.
Trivial
  • trivial - Satisfies the goal using an assumption that matches its type.
----------              Assumptions:              ----------
 value : Nat
----------                 Goal:                  ----------
 Nat
-lemma> trivial
lemma: No more goals.
-lemma>
Intro
  • intro - If your goal is an arrow, turns the left term into an assumption.
----------                 Goal:                  ----------
 Nat -> Nat -> Nat
-lemma> intro
----------              Assumptions:              ----------
 n : Nat
----------                 Goal:                  ----------
 Nat -> Nat
-lemma>

You can also supply your own name for the assumption:

----------                 Goal:                  ----------
Nat -> Nat -> Nat
-lemma> intro number
----------              Assumptions:              ----------
 number : Nat
----------                 Goal:                  ----------
Nat -> Nat
Intros
  • intros - Exactly like intro, but it operates on all left terms at once.
----------                 Goal:                  ----------
 Nat -> Nat -> Nat
-lemma> intros
----------              Assumptions:              ----------
 n : Nat
 m : Nat
----------                 Goal:                  ----------
 Nat
-lemma>
let
  • let - Introduces a new assumption; you may use current assumptions to define the new one.
----------              Assumptions:              ----------
 n : Nat
----------                 Goal:                  ----------
 BigInt
-lemma> let x = toIntegerNat n
----------              Assumptions:              ----------
 n : Nat
  x = toIntegerNat n: BigInt
----------                 Goal:                  ----------
 BigInt
-lemma>
rewrite
  • rewrite - Takes an expression with an equality type (x = y), and replaces all instances of x in the goal with y. Is often useful in combination with ‘sym’.
----------              Assumptions:              ----------
 n : Nat
 a : Type
 value : Vect Z a
----------                 Goal:                  ----------
 Vect (mult n Z) a
-lemma> rewrite sym (multZeroRightZero n)
----------              Assumptions:              ----------
 n : Nat
 a : Type
 value : Vect Z a
----------                 Goal:                  ----------
 Vect Z a
-lemma>
sourceLocation
  • sourceLocation - Solve the current goal with information about the location in the source code where the tactic was invoked. This is mostly for embedded DSLs and programmer tools like assertions that need to know where they are called. See Language.Reflection.SourceLocation for more information.

Less commonly-used tactics

  • applyTactic - Apply a user-defined tactic. This should be a function of type List (TTName, Binder TT) -> TT -> Tactic, where the first argument represents the proof context and the second represents the goal. If your tactic will produce a proof term directly, use the Exact constructor from Tactic.
  • attack - ?
  • equiv - Replaces the goal with a new one that is convertible with the old one
  • fill - ?
  • focus - ?
  • mrefine - Refining by matching against a type
  • reflect - ?
  • solve - Takes a guess with the correct type and fills a hole with it, closing a proof obligation. This happens automatically in the interactive prover, so solve is really only relevant in tactic scripts used for helping implicit argument resolution.
  • try - ?

The Idris REPL

Idris comes with a REPL.

Evaluation

Being a fully dependently typed language, Idris has two phases where it evaluates things, compile-time and run-time. At compile-time it will only evaluate things which it knows to be total (i.e. terminating and covering all possible inputs) in order to keep type checking decidable. The compile-time evaluator is part of the Idris kernel, and is implemented in Haskell using a HOAS (higher order abstract syntax) style representation of values. Since everything is known to have a normal form here, the evaluation strategy doesn’t actually matter because either way it will get the same answer, and in practice it will do whatever the Haskell run-time system chooses to do.

The REPL, for convenience, uses the compile-time notion of evaluation. As well as being easier to implement (because we have the evaluator available) this can be very useful to show how terms evaluate in the type checker. So you can see the difference between:

Idris> \n, m => (S n) + m
\n => \m => S (plus n m) : Nat -> Nat -> Nat

Idris> \n, m => n + (S m)
\n => \m => plus n (S m) : Nat -> Nat -> Nat

Customisation

Idris supports initialisation scripts.

Initialisation scripts

When the Idris REPL starts up, it will attempt to open the file repl/init in Idris’s application data directory. The application data directory is the result of the Haskell function call getAppUserDataDirectory "idris", which on most Unix-like systems will return $HOME/.idris and on various versions of Windows will return paths such as C:/Documents And Settings/user/Application Data/appName.

The file repl/init is a newline-separate list of REPL commands. Not all commands are supported in initialisation scripts — only the subset that will not interfere with the normal operation of the REPL. In particular, setting colours, display options such as showing implicits, and log levels are supported.

Example initialisation script
:colour prompt white italic bold
:colour implicit magenta italic

The REPL Commands

The current set of supported commands are:

Command Arguments Purpose
<expr>   Evaluate an expression
:t :type <expr> Check the type of an expression
:core <expr> View the core language representation of a term
:miss :missing <name> Show missing clauses
:doc <name> Show internal documentation
:mkdoc <namespace> Generate IdrisDoc for namespace(s) and dependencies
:apropos [<package list>] <name> Search names, types, and documentation
:s :search [<package list>] <expr> Search for values by type
:wc :whocalls <name> List the callers of some name
:cw :callswho <name> List the callees of some name
:browse <namespace> List the contents of some namespace
:total <name> Check the totality of a name
:r :reload   Reload current file
:l :load <filename> Load a new file
:cd <filename> Change working directory
:module <module> Import an extra module
:e :edit   Edit current file using $EDITOR or $VISUAL
:m :metavars   Show remaining proof obligations (holes)
:p :prove <hole> Prove a hole
:a :addproof <name> Add proof to source file
:rmproof <name> Remove proof from proof stack
:showproof <name> Show proof
:proofs   Show available proofs
:x <expr> Execute IO actions resulting from an expression using the interpreter
:c :compile <filename> Compile to an executable [codegen] <filename>
:exec :execute [<expr>] Compile to an executable and run
:dynamic <filename> Dynamically load a C library (similar to %dynamic)
:dynamic   List dynamically loaded C libraries
:? :h :help   Display this help text
:set <option> Set an option (errorcontext, showimplicits, originalerrors, autosolve, nobanner, warnreach, evaltypes, desugarnats)
:unset <option> Unset an option
:color :colour <option> Turn REPL colours on or off; set a specific colour
:consolewidth auto|infinite|<number> Set the width of the console
:printerdepth <number-or-blank> Set the maximum pretty-printing depth, or infinite if nothing specified
:q :quit   Exit the Idris system
:w :warranty   Displays warranty information
:let (<top-level-declaration>)… Evaluate a declaration, such as a function definition, instance implementation, or fixity declaration
:unlet :undefine (<name>)… Remove the listed repl definitions, or all repl definitions if no names given
:printdef <name> Show the definition of a function
:pp :pprint <option> <number> <name> Pretty prints an Idris function in either LaTeX or HTML and for a specified width.

Using the REPL

Getting help

The command :help (or :h or :?) prints a short summary of the available commands.

Quitting Idris

If you would like to leave Idris, simply use :q or :quit.

Evaluating expressions

To evaluate an expression, simply type it. If Idris is unable to infer the type, it can be helpful to use the operator the to manually provide one, as Idris’s syntax does not allow for direct type annotations. Examples of the include:

Idris> the Nat 4
4 : Nat
Idris> the Int 4
4 : Int
Idris> the (List Nat) [1,2]
[1,2] : List Nat
Idris> the (Vect _ Nat) [1,2]
[1,2] : Vect 2 Nat

This may not work in cases where the expression still involves ambiguous names. The name can be disambiguated by using the with keyword:

Idris> sum [1,2,3]
When elaborating an application of function Prelude.Foldable.sum:
        Can't disambiguate name: Prelude.List.::,
                                 Prelude.Stream.::,
                                 Prelude.Vect.::
Idris> with List sum [1,2,3]
6 : Integer
Adding let bindings

To add a let binding to the REPL, use :let. It’s likely you’ll also need to provide a type annotation. :let also works for other declarations as well, such as data.

Idris> :let x : String; x = "hello"
Idris> x
"hello" : String
Idris> :let y = 10
Idris> y
10 : Integer
Idris> :let data Foo : Type where Bar : Foo
Idris> Bar
Bar : Foo
Getting type information

To ask Idris for the type of some expression, use the :t command. Additionally, if used with an overloaded name, Idris will provide all overloadings and their types. To ask for the type of an infix operator, surround it in parentheses.

Idris> :t "foo"
"foo" : String
Idris> :t plus
Prelude.Nat.plus : Nat -> Nat -> Nat
Idris> :t (++)
Builtins.++ : String -> String -> String
Prelude.List.++ : (List a) -> (List a) -> List a
Prelude.Vect.++ : (Vect m a) -> (Vect n a) -> Vect (m + n) a
Idris> :t plus 4
plus (Builtins.fromInteger 4) : Nat -> Nat

You can also ask for basic information about interfaces with :doc:

Idris> :doc Monad
Interface Monad

Parameters:
    m

Methods:
    (>>=) : Monad m => m a -> (a -> m b) -> m b

        infixl 5

Instances:
    Monad id
    Monad PrimIO
    Monad IO
    Monad Maybe

...

Other documentation is also available from :doc:

Idris> :doc (+)
Prelude.Interfaces.(+) : Num ty => ty -> ty -> ty

infixl 8

The function is Total
Idris> :doc Vect
Data type Prelude.Vect.Vect : Nat -> Type -> Type

Arguments:
        Nat
        Type

Constructors:

Prelude.Vect.Nil : (a : Type) -> Vect 0 a


Prelude.Vect.:: : (a : Type) -> (n : Nat) -> a -> (Vect n a) -> Vect (S n) a

infixr 7

Arguments:
        a
        Vect n a
Idris> :doc Monad
Interface Monad

Parameters:
    m

Methods:
    (>>=) : Monad m => m a -> (a -> m b) -> m b
        Also called bind.
        infixl 5

        The function is Total
    join : Monad m => m (m a) -> m a
        Also called flatten or mu

        The function is Total
Implementations:
    Monad (IO' ffi)
    Monad Stream
    Monad Provider
    Monad Elab
    Monad PrimIO
    Monad Maybe
    Monad (Either e)
    Monad List
Finding things

The command :apropos searches names, types, and documentation for some string, and prints the results. For example:

Idris> :apropos eq
eqPtr : Ptr -> Ptr -> IO Bool


eqSucc : (left : Nat) -> (right : Nat) -> (left = right) -> S left = S right
S preserves equality

lemma_both_neq : ((x = x') -> _|_) -> ((y = y') -> _|_) -> ((x, y) = (x', y')) -> _|_


lemma_fst_neq_snd_eq : ((x = x') -> _|_) -> (y = y') -> ((x, y) = (x', y)) -> _|_


lemma_snd_neq : (x = x) -> ((y = y') -> _|_) -> ((x, y) = (x, y')) -> _|_


lemma_x_eq_xs_neq : (x = y) -> ((xs = ys) -> _|_) -> (x :: xs = y :: ys) -> _|_


lemma_x_neq_xs_eq : ((x = y) -> _|_) -> (xs = ys) -> (x :: xs = y :: ys) -> _|_


lemma_x_neq_xs_neq : ((x = y) -> _|_) -> ((xs = ys) -> _|_) -> (x :: xs = y :: ys) -> _|_


prim__eqB16 : Bits16 -> Bits16 -> Int

prim__eqB16x8 : Bits16x8 -> Bits16x8 -> Bits16x8

prim__eqB32 : Bits32 -> Bits32 -> Int

prim__eqB32x4 : Bits32x4 -> Bits32x4 -> Bits32x4

prim__eqB64 : Bits64 -> Bits64 -> Int

prim__eqB64x2 : Bits64x2 -> Bits64x2 -> Bits64x2

prim__eqB8 : Bits8 -> Bits8 -> Int

prim__eqB8x16 : Bits8x16 -> Bits8x16 -> Bits8x16

prim__eqBigInt : Integer -> Integer -> Int

prim__eqChar : Char -> Char -> Int

prim__eqFloat : Double -> Double -> Int

prim__eqInt : Int -> Int -> Int

prim__eqString : String -> String -> Int

prim__syntactic_eq : (a : Type) -> (b : Type) -> (x : a) -> (y : b) -> Maybe (x = y)

sequence : Traversable t => Applicative f => (t (f a)) -> f (t a)


sequence_ : Foldable t => Applicative f => (t (f a)) -> f ()


Eq : Type -> Type
The Eq interface defines inequality and equality.

GTE : Nat -> Nat -> Type
Greater than or equal to

LTE : Nat -> Nat -> Type
Proofs that n is less than or equal to m

gte : Nat -> Nat -> Bool
Boolean test than one Nat is greater than or equal to another

lte : Nat -> Nat -> Bool
Boolean test than one Nat is less than or equal to another

ord : Char -> Int
Convert the number to its ASCII equivalent.

replace : (x = y) -> (P x) -> P y
Perform substitution in a term according to some equality.

sym : (l = r) -> r = l
Symmetry of propositional equality

trans : (a = b) -> (b = c) -> a = c
Transitivity of propositional equality

:search does a type-based search, in the spirit of Hoogle. See Type-directed search (:search) for more details. Here is an example:

Idris> :search a -> b -> a
= Prelude.Basics.const : a -> b -> a
Constant function. Ignores its second argument.

= assert_smaller : a -> b -> b
Assert to the totality checker than y is always structurally
smaller than x (which is typically a pattern argument)

> malloc : Int -> a -> a


> Prelude.pow : Num a => a -> Nat -> a


> Prelude.Interfaces.(*) : Num a => a -> a -> a


> Prelude.Interfaces.(+) : Num a => a -> a -> a
... (More results)

:search can also look for dependent types:

Idris> :search plus (S n) n = plus n (S n)
< Prelude.Nat.plusSuccRightSucc : (left : Nat) ->
                                  (right : Nat) ->
                                  S (left + right) = left + S right
Loading and reloading Idris code

The command :l File.idr will load File.idr into the currently-running REPL, and :r will reload the last file that was loaded.

Totality

All Idris definitions are checked for totality. The command :total <NAME> will display the result of that check. If a definition is not total, this may be due to an incomplete pattern match. If that is the case, :missing or :miss will display the missing cases.

Editing files

The command :e launches your default editor on the current module. After control returns to Idris, the file is reloaded.

Invoking the compiler

The current module can be compiled to an executable using the command :c <FILENAME> or :compile <FILENAME>. This command allows to specify codegen, so for example JavaScript can be generated using :c javascript <FILENAME>. The :exec command will compile the program to a temporary file and run the resulting executable.

IO actions

Unlike GHCI, the Idris REPL is not inside of an implicit IO monad. This means that a special command must be used to execute IO actions. :x tm will execute the IO action tm in an Idris interpreter.

Dynamically loading C libraries

Sometimes, an Idris program will depend on external libraries written in C. In order to use these libraries from the Idris interpreter, they must first be dynamically loaded. This is achieved through the %dynamic <LIB> directive in Idris source files or through the :dynamic <LIB> command at the REPL. The current set of dynamically loaded libraries can be viewed by executing :dynamic with no arguments. These libraries are available through the Idris FFI in type providers and :exec.

Colours

Idris terms are available in amazing colour! By default, the Idris REPL uses colour to distinguish between data constructors, types or type constructors, operators, bound variables, and implicit arguments. This feature is available on all POSIX-like systems, and there are plans to allow it to work on Windows as well.

If you do not like the default colours, they can be turned off using the command

:colour off

and, when boredom strikes, they can be re-enabled using the command

:colour on

To modify a colour, use the command

:colour <CATEGORY> <OPTIONS>

where <CATEGORY is one of keyword, boundvar, implicit, function, type, data, or prompt, and is a space-separated list drawn from the colours and the font options. The available colours are default, black, yellow, cyan, red, blue, white, green, and magenta. If more than one colour is specified, the last one takes precedence. The available options are dull and vivid, bold and nobold, italic and noitalic, underline and nounderline, forming pairs of opposites. The colour default refers to your terminal’s default colour.

The colours used at startup can be changed using REPL initialisation scripts.

Colour can be disabled at startup by the --nocolour command-line option.

Compilation, Logging, and Reporting

This section provides information about the Idris compilation process, and provides details over how you can follow the process through logging.

Compilation Process

Idris follows the following compilation process:

  1. Parsing
  2. Type Checking
    1. Elaboration
    2. Coverage
    3. Unification
    4. Totality Checking
    5. Erasure
  3. Code Generation
    1. Defunctionalisation
    2. Inlining
    3. Resolving variables
    4. Code Generation

Type Checking Only

With Idris you can ask it to terminate the compilation process after type checking has completed. This is achieved through use of either:

  • The command line options
    • --check for files
    • --checkpkg for packages
  • The REPL command: :check

Use of this option will still result in the generation of the Idris binary .ibc files, and is suitable if you do not wish to generate code from one of the supported backends.

Reporting Compilation Process

During compilation the reporting of Idris’ progress can be controlled by setting a verbosity level.

  • -V, or alternatively --verbose and --V0, will report which file Idris is currently type checking.
  • --V1 will additionally report: Parsing, IBC Generation, and Code Generation.
  • --V2 will additionally report: Totality Checking, Universe Checking, and the individual steps prior to code generation.

By default Idris’ progress reporting is set to quiet--q, or --quiet.

Logging Internal Operation

For those that develop on the Idris compiler, the internal operation of Idris is captured using a category based logger. Currently, the logging infrastructure has support for the following categories:

  • Parser (parser)
  • Elaborator (elab)
  • Code generation (codegen)
  • Erasure (erasure)
  • Coverage Checking (coverage)
  • IBC generation (ibc)

These categories are specified using the command-line option: --logging-categories CATS, where CATS is a quoted colon separated string of the categories you want to see. By default if this option is not specified all categories are allowed. Sub-categories have yet to be defined but will be in the future, especially for the elaborator.

Further, the verbosity of logging can be controlled by specifying a logging level between: 1 to 10 using the command-line option: --log <level>.

  • Level 0: Show no logging output. Default level
  • Level 1: High level details of the compilation process.
  • Level 2: Provides details of the coverage checking, and further details the elaboration process specifically: Interface, Clauses, Data, Term, and Types,
  • Level 3: Provides details of compilation of the IRTS, erasure, parsing, case splitting, and further details elaboration of: Implementations, Providers, and Values.
  • Level 4: Provides further details on: Erasure, Coverage Checking, Case splitting, and elaboration of clauses.
  • Level 5: Provides details on the prover, and further details elaboration (adding declarations) and compilation of the IRTS.
  • Level 6: Further details elaboration and coverage checking.
  • Level 7:
  • Level 8:
  • Level 9:
  • Level 10: Further details elaboration.

Environment Variables

Several paths set by default within the Idris compiler can be overridden through environment variables. The provided variables are:

  • IDRIS_CC Change the C compiler used by the C backend.
  • IDRIS_CFLAGS Change the C flags passed to the C compiler.
  • TARGET Change the target directory i.e. data dir where Idris installs files when installing using Cabal/Stack.
  • IDRIS_LIBRARY_PATH Change the location of where installed packages are found/installed.
  • IDRIS_DOC_PATH Change the location of where generated idrisdoc for packages are installed.

注解

In versions of Idris prior to 0.12.3 the environment variables IDRIS_LIBRARY_PATH and TARGET were both used to affect the installation of single packages and direct where Idris installed its data. The meaning of these variables has changed, and command line options are preferred when changing where individual packages are installed.

The CLI option –ibcsubdir can be used to direct where generated IBC files are placed. However, this means Idris will install files in a non-standard location separate from the rest of the installed packages. The CLI option –idrispath <dir> allows you to add a directory to the library search path; this option can be used multiple times and can be shortened to -i <dir>. Similary, the –sourcepath <dir> option can be used to add directories to the source search path. There is no shortened version for this option as -s is a reserved flag.

Further, Idris also supports options to augment the paths used, and pass options to the code generator backend. The option –cg-opt <ARG> can be used to pass options to the code generator. The format of <ARG> is dependent on the selected backend.

Idris’ Internals

Note: this is still a fairly raw set of notes taken by David Christiansen at Edwin’s presentation at the 2013 Idris Developers Meeting. They’re in the process of turning into a useful guide - feel free to contribute.

This document assumes that you are already familiar with Idris. It is intended for those who want to work on the internals.

People looking to develop new back ends may want to look at [[Idris back end IRs|Idris-back-end-IRs]]

Core/TT.hs

Idris is compiled to a simple, explicit core language. This core language is called TT because it looks a bit like a Π. It’s a minimal language, with a locally nameless representation. That is, local variables are represented with de Bruijn indices and globally-defined constants are represented with names.

The TT datatype uses a trick that is common in the Idris code: it is polymorphic over the type of names stored in it, and it derives Functor. This allows fmap to be used as a general-purpose traversal.

There is a general construction for binders, used for λ, Π, and let-bindings. These are distinguished using a BinderType.

During compilation, some terms (especially types) will be erased. This is represented using the Erased constructor of TT. A handy trick when generating TT terms is to insert Erased where a term is uniquely determined, as the typechecker will fill it out.

The constructor Proj is a result of the optimizer. It is used to extract a specific constructor argument, in a more economical way than defining a new pattern-matching operation.

The datatype Raw represents terms that have not yet been typechecked. The typechecker converts a Raw to a TT if it can.

Core/CaseTree.hs

Case trees are used to represent top-level pattern-matching definitions in the TT language.

Just as with the TT datatype, the deriving Functor trick is used with SC and CaseAlt to get GHC to generate a function for mapping over contained terms.

Constructor cases (ConCase in CaseAlt) refer to numbered constructors. Every constructor is numbered 0,1,2,…. At this stage in the compiler, the tags are datatype-local. After defunctionalization, however, they are made globally unique.

The n+1 patterns (SucCase) and hacky-seeming things are to make code fast – please ask before “cleaning up” the representation.

Core/Evaluate.hs

This module contains the main evaluator for Idris. The evaluator is used both at the REPL and during type checking, where normalised terms need to be compared for equality.

A key datatype in the evaluator is a context. Contexts are mappings from global names to their values, but they are organized to make type-directed disambiguation quick. In particular, the main part of a name that a user might type is used as the key, and its values are maps from namespaces to actual values.

The datatype Def represents a definition in the global context. All global names map to this structure.

Type and Term are both synonyms for TT.

Datatypes are represented by a TyDecl with the appropriate NameType. A Function is a global constant term with an annotated type, Operator represents primitives implemented in Haskell, and CaseOp represents ordinary pattern-matching definitions. CaseOp has four versions for different purposes, and all are saved because that’s easiest.

CaseInfo: the tc_dictionary is because it’s a type class dictionary which makes totality checking easier.

The normalise* functions give different behaviors - but normalise is the most common.

normaliseC - “resolved” means with names converted to de Bruijn indices as appropriate.

normaliseAll - reduce everything, even if it’s non-total

normaliseTrace - special-purpose for debugging

simplify - reduce the things that are small - the list argument is the things to not reduce.

Core/Typecheck.hs

Standard stuff. Hopefully no changes are necessary.

Core/Elaborate.hs

Idris definitions are elaborated one by one and turned into the corresponding TT. This is done with a tactic language as an EDSL in the Elab monad (or Elab’ when there’s a custom state).

Lots of plumbing for errors.

All elaboration is relative to a global context.

The string in the pair returned by elaborate is log information.

See JFP paper, but the names don’t necessarily map to each other. The paper is the “idealized version” without logging, additional state, etc.

All the tactics take Raws, typechecking happens there.

claim (x : t) assumes a new x : t.

PLEASE TIDY THINGS UP!

proofSearch flag to try’ is whether the failure came from a human (so fail) or from a machine (so continue)

Idris-level syntax for providing alternatives explicitly: (| x, y, z |) try x, y, z in order, and take the first that succeeds.

Core/ProofState.hs

Core/Unify.hs

Deals with unification. Unification can reply with: - this works - this can never work - this will work if these other unification problems work out (eg unifying f x with 1)

match_unify: same thing as unification except it’s just matching name against name, term against term. x + y matches to 0 + y with x = 0. Used for <== syntax as well as type class resolution.

Idris/AbsSyntaxTree.hs

PTerm is the datatype of Idris syntax. P is for Program. Each PTerm turns into a TT term by applying a series of tactics.

IState is the major interpreter state. The global context is the tt_ctxt field.

Ctxt maps possibly ambiguous names to their referents.

Idris/ElabDecls.hs

This is where the actual elaboration from PTerm to TT happens.

Idris/ElabTerm.hs

build is the function that creates a Raw. All the “junk” is to deal with things like metavars and so forth. It has to remember what names are still to be defined, and it doesn’t yet know the type (filled in by unificaiton later). Also case expressions have to turn into top-level functions.

resolveTC is type class resolution.

Core Language Features

  • Full-spectrum dependent types
  • Strict evaluation (plus Lazy : Type -> Type type constructor for explicit laziness)
  • Lambda, Pi (forall), Let bindings
  • Pattern matching definitions
  • Export modifiers public, abstract, private
  • Function options partial, total
  • where clauses
  • “magic with”
  • Implicit arguments (in top level types)
  • “Bound” implicit arguments {n : Nat} -> {a : Type} -> Vect n a
  • “Unbound” implicit arguments — Vect n a is equivalent to the above in a type, n and a are implicitly bound. This applies to names beginning with a lower case letter in an argument position.
  • ‘Tactic’ implicit arguments, which are solved by running a tactic script or giving a default argument, rather than by unification.
  • Unit type (), empty type Void
  • Tuples (desugaring to nested pairs)
  • Dependent pair syntax (x : T ** P x) (there exists an x of type T such that P x)
  • Inline case expressions
  • Heterogeneous equality
  • do notation
  • Idiom brackets
  • Interfaces (like type classes), supporting default methods and dependencies between methods
  • rewrite prf in expr
  • Metavariables
  • Inline proof/tactic scripts
  • Implicit coercion
  • codata
  • Also Inf : Type -> Type type constructor for mixed data/codata. In fact codata is implemented by putting recursive arguments under Inf.
  • syntax rules for defining pattern and term syntactic sugar
  • these are used in the standard library to define if ... then ... else expressions and an Agda-style preorder reasoning syntax.
  • Uniqueness typing using the UniqueType universe.
  • Partial evaluation by %static argument annotations.
  • Error message reflection
  • Eliminators
  • Label types 'name
  • %logging n
  • %unifyLog

Language Extensions

Type Providers

Idris type providers are a way to get the type system to reflect observations about the world outside of Idris. Similarly to F# type providers, they cause effectful computations to run during type checking, returning information that the type checker can use when checking the rest of the program. While F# type providers are based on code generation, Idris type providers use only the ordinary execution semantics of Idris to generate the information.

A type provider is simply a term of type IO (Provider t), where Provider is a data type with constructors for a successful result and an error. The type t can be either Type (the type of types) or a concrete type. Then, a type provider p is invoked using the syntax %provide (x : t) with p. When the type checker encounters this line, the IO action p is executed. Then, the resulting term is extracted from the IO monad. If it is Provide y for some y : t, then x is bound to y for the remainder of typechecking and in the compiled code. If execution fails, a generic error is reported and type checking terminates. If the resulting term is Error e for some string e, then type checking fails and the error e is reported to the user.

Example Idris type providers can be seen at this repository. More detailed descriptions are available in David Christiansen’s WGP ‘13 paper and M.Sc. thesis.

Elaborator Reflection

The Idris elaborator is responsible for converting high-level Idris code into the core language. It is implemented as a kind of embedded tactic language in Haskell, where tactic scripts are written in an elaboration monad that provides error handling and a proof state. For details, see Edwin Brady’s 2013 paper in the Journal of Functional Programming.

Elaborator reflection makes the elaboration type as well as a selection of its tactics available to Idris code. This means that metaprograms written in Idris can have complete control over the elaboration process, generating arbitrary code, and they have access to all of the facilities available in the elaborator, such as higher-order unification, type checking, and emitting auxiliary definitions.

The Elaborator State

The elaborator state contains information about the ongoing elaboration process. In particular, it contains a goal type, which is to be filled by an under-construction proof term. The proof term can contain holes, each of which has a scope in which it is valid and a type. Some holes may additionally contain guesses, which can be substituted in the scope of the hole. The holes are tracked in a hole queue, and one of them is focused. In addition to the goal type, proof term, and holes, the elaborator state contains a collection of unsolved unification problems that can affect elaboration.

The elaborator state is not directly available to Idris programs. Instead, it is modified through the use of tactics, which are operations that affect the elaborator state. A tactic that returns a value of type a, potentially modifying the elaborator state, has type Elab a. The default tactics are all in the namespace Language.Reflection.Elab.Tactics.

Running Elaborator Scripts

On their own, tactics have no effect. The meta-operation %runElab script runs script in the current elaboration context. Before you can use %runElab, you will have to enable the language extension by adding %language ElabReflection in your file (or by passing -X ElabReflection to the idris executable from your command line). For example, the following script constructs the identity function at type Nat:

idNat : Nat -> Nat
idNat = %runElab (do intro `{{x}}
                     fill (Var `{{x}})
                     solve)

On the right-hand side, the Idris elaborator has the goal Nat -> Nat. When it encounters the %runElab directive, it fulfills this goal by running the provided script. The first tactic, intro, constructs a lambda that binds the name x. The name argument is optional because a default name can be taken from the function type. Now, the proof term is of the form \x : Nat => {hole}. The second tactic, fill, fills this hole with a guess, giving the term \x : Nat => {hole≈x}. Finally, the solve tactic instantiates the guess, giving the result \x : Nat => x.

Because elaborator scripts are ordinary Idris expressions, it is also possible to use them in multiple contexts. Note that there is nothing Nat-specific about the above script. We can generate identity functions at any concrete type using the same script:

mkId : Elab ()
mkId = do intro `{{x}}
          fill (Var `{{x}})
          solve

idNat : Nat -> Nat
idNat = %runElab mkId

idUnit : () -> ()
idUnit = %runElab mkId

idString : String -> String
idString = %runElab mkId

Interactively Building Elab Scripts

You can build an Elab script interactively at the REPL. Use the command :metavars, or :m for short, to list the available holes. Then, issue the :elab <hole> command at the REPL to enter the elaboration shell.

At the shell, you can enter proof tactics to alter the proof state. You can view the system-provided tactics prior to entering the shell by issuing the REPL command :browse Language.Reflection.Elab.Tactics. When you have discharged all goals, you can complete the proof using the :qed command and receive in return an elaboration script that fills the hole.

The interactive elaboration shell accepts a limited number of commands, including a subset of the commands understood by the normal Idris REPL as well as some elaboration-specific commands. It also supports the do-syntax, meaning you can write res <- command to bind the result of command to variable res.

General-purpose commands:

  • :eval <EXPR>, or :e <EXPR> for short, evaluates the provided expression and prints the result.
  • :type <EXPR>, or :t <EXPR> for short, prints the provided expression together with its type.
  • :search <TYPE> searches for definitions having the provided type.
  • :doc <NAME> searches for definitions with the provided name and prints their documentation.

Commands for viewing the proof state:

  • :state displays the current state of the term being constructed. It lists both other goals and the current goal.
  • :term displays the current proof term as well as its yet-to-be-filled holes.

Commands for manipulating the proof state:

  • :undo undoes the effects of the last tactic.
  • :abandon gives up on proving the current lemma and quits the elaboration shell.
  • :qed finishes the script and exits the elaboration shell. The shell will only accept this command once it reports, “No more goals.” On exit, it will print out the finished elaboration script for you to copy into your program.

Failure

Some tactics may fail. For example, intro will fail if the focused hole does not have a function type, solve will fail if the current hole does not contain a guess, and fill will fail if the term to be filled in has the wrong type. Scripts can also fail explicitly using the fail tactic.

To account for failure, there is an Alternative implementation for Elab. The <|> operator first tries the script to its left. If that script fails, any changes that it made to the state are undone and the right argument is executed. If the first argument succeeds, then the second argument is not executed.

Querying the Elaboration State

Elab includes operations to query the elaboration state, allowing scripts to use information about their environment to steer the elaboration process. The ordinary Idris bind syntax can be used to propagate this information. For example, a tactic that solves the current goal when it is the unit type might look like this:

triv : Elab ()
triv = do compute
          g <- getGoal
          case (snd g) of
            `(() : Type) => do fill `(() : ())
                               solve
            otherGoal => fail [ TermPart otherGoal
                              , TextPart "is not trivial"
                              ]

The tactic compute normalises the type of its goal with respect to the current context. While not strictly necessary, this allows triv to be used in contexts where the triviality of the goal is not immediately apparent. Then, getGoal is used, and its result is bound to g. Because it returns a pair consisting of the current goal’s name and type, we case-split on its second projection. If the goal type turns out to have been the unit type, we fill using the unit constructor and solve the goal. Otherwise, we fail with an error message informing the user that the current goal is not trivial.

Additionally, the elaboration state can be dumped into an error message with the debug tactic. A variant, debugMessage, allows arbitrary messages to be included with the state, allowing for a kind of “printf debugging” of elaboration scripts. The message format used by debugMessage is the same for errors produced by the error reflection mechanism, allowing the re-use of the Idris pretty-printer when rendering messages.

Changing the Global Context

Elab scripts can modify the global context during execution. Just as the Idris elaborator produces auxiliary definitions to implement features such as where-blocks and case expressions, user elaboration scripts may need to define functions. Furthermore, this allows Elab reflection to be used to implement features such as interface deriving. The operations declareType, defineFunction, and addImplementation allow Elab scripts to modify the global context.

Using Idris’s Features

The Idris compiler has a number of ways to automate the construction of terms. On its own, the Elab state and its interactions with the unifier allow implicits to be solved using unification. Additional operations use further features of Idris. In particular, resolveTC solves the current goal using interface resolution, search invokes the proof search mechanism, and sourceLocation finds the context in the original file at which the elaboration script is invoked.

Recursive Elaboration

The elaboration mechanism can be invoked recursively using the runElab tactic. This tactic takes a goal type and an elaboration script as arguments and runs the script in a fresh lexical environment to create an inhabitant of the provided goal type. This is primarily useful for code generation, particularly for generating pattern-matching clauses, where variable scope needs to be one that isn’t the present local context.

Learn More

While this documentation is still incomplete, elaboration reflection works in Idris today. As you wait for the completion of the documentation, the list of built-in tactics can be obtained using the :browse command in an Idris REPL or the corresponding feature in one of the graphical IDE clients to explore the Language.Reflection.Elab.Tactics namespace. All of the built-in tactics contain documentation strings.

Static Arguments and Partial Evaluation

As of version 0.9.15, Idris has support for partial evaluation of statically known arguments. This involves creating specialised versions of functions with arguments annotated as %static.

(This is an implementation of the partial evaluator described in this ICFP 2010 paper. Please refer to this for more precise definitions of what follows.)

Partial evaluation is switched off by default since Idris 1.0. It can be enabled with the --partial-eval flag.

Introductory Example

Consider the power function over natural numbers, defined as follows (we’ll call it my_pow since pow already exists in the Prelude):

my_pow : Nat -> Nat -> Nat
my_pow x Z = 1
my_pow x (S k) = mult x (my_pow x k)

This is implemented by recursion on the second argument, and we can evaluate the definition further if the second argument is known, even if the first isn’t. For example, we can build a function at the REPL to cube a number as follows:

*pow> \x => my_pow x 3
\x => mult x (mult x (mult x 1)) : Nat -> Nat
*pow> it 3
27 : Nat

Note that in the resulting function the recursion has been eliminated, since my_pow is implemented by recursion on the known argument. We have no such luck if the first argument is known and the second isn’t:

*pow> \x => my_pow 2 x
\x => my_pow 2 x : Nat -> Nat

Now, consider the following definition which calculates x^2 + 1:

powFn : Nat -> Nat
powFn x = plus (my_pow x (S (S Z))) (S Z)

Since the second argument to my_pow here is statically known, it seems a shame to have to make the recursive calls every time. However, Idris will not in general inline recursive definitions, in particular since they may diverge or duplicate work without some deeper analysis.

We can, however, give Idris some hints that here we really would like to create a specialised version of my_pow.

Automatic specialisation of pow

The trick is to mark the statically known arguments with the %static flag:

my_pow : Nat -> %static Nat -> Nat
my_pow k Z = 1
my_pow k (S j) = mult k (my_pow k j)

When an argument is annotated in this way, Idris will try to create a specialised version whenever it accounts a call with a concrete value (i.e. a constant, constructor form, or globally defined function) in a %static position. If my_pow is defined this way, and powFn defined as above, we can see the effect by typing :printdef powFn at the REPL:

*pow> :printdef powFn
powFn : Nat -> Nat
powFn x = plus (PE_my_pow_3f3e5ad8 x) 1

What is this mysterious PE_my_pow_3f3e5ad8? It’s a specialised power function where the statically known argument has been specialised away. The name is generated from a hash of the specialised arguments, and we can see its definition with :printdef too:

*petest> :printdef PE_my_pow_3f3e5ad8
PE_my_pow_3f3e5ad8 : Nat -> Nat
PE_my_pow_3f3e5ad8 (0arg) = mult (0arg) (mult (0arg) (PE_fromInteger_7ba9767f 1))

The (0arg) is an internal argument name (programmers can’t give variable names beginning with a digit after all). Notice also that there is a specialised version of fromInteger for Nats, since type class dictionaries are themselves a particularly common case of statically known arguments!

Specialising Type Classes

Type class dictionaries are very often statically known, so Idris automatically marks any type class constraint as %static and builds specialised versions of top level functions where the class is instantiated. For example, given:

calc : Int -> Int
calc x = (x * x) + x

If we print this definition, we’ll see a specialised version of + is used:

*petest> :printdef calc
calc : Int -> Int
calc x = PE_+_954510b4 (PE_*_954510b4 x x) x

More interestingly, consider vadd which adds corresponding elements in a vector of anything numeric:

vadd : Num a => Vect n a -> Vect n a -> Vect n a
vadd [] [] = []
vadd (x :: xs) (y :: ys) = x + y :: vadd xs ys

If we use this on something concrete as follows…

test : List Int -> List Int
test xs = let xs' = fromList xs in
              toList $ vadd xs' xs'

…then in fact, we get a specialised version of vadd in the definition of test, and indeed the specialised version of toList:

test : List Int -> List Int
test xs = let xs' = fromList xs
          in PE_toList_888ae67 (PE_vadd_33f98d3d xs' xs')

Here’s the specialised version of vadd:

PE_vadd_33f98d3d : Vect n Int -> Vect n Int -> Vect n Int
PE_vadd_33f98d3d [] [] = []
PE_vadd_33f98d3d (x :: xs) (y :: ys) = ((PE_+_954510b4 x y) ::
                                       (PE_vadd_33f98d3d xs ys))

Note that the recursive structure has been preserved, and the recursive call to vadd has been replaced with a recursive call to the specialised version. We’ve also got the same specialised version of + that we had above in calc.

Specialising Higher Order Functions

Another case where partial evaluation can be useful is in automatically making specialised versions of higher order functions. Unlike type class dictionaries, this is not done automatically, but we might consider writing map as follows:

my_map : %static (a -> b) -> List a -> List b
my_map f [] = []
my_map f (x :: xs) = f x :: my_map f xs

Then using my_map will yield specialised versions, for example to double every value in a list of Ints we could write:

doubleAll : List Int -> List Int
doubleAll xs = my_map (*2) xs

This would yield a specialised version of my_map, used in doubleAll as follows:

doubleAll : List Int -> List Int
doubleAll xs = PE_my_map_1f8225c4 xs

PE_my_map_1f8225c4 : List Int -> List Int
PE_my_map_1f8225c4 [] = []
PE_my_map_1f8225c4 (x :: xs) = ((PE_*_954510b4 x 2) :: (PE_my_map_1f8225c4 xs))

Specialising Interpreters

A particularly useful situation where partial evaluation becomes effective is in defining an interpreter for a well-typed expression language, defined as follows (see the Idris tutorial, section 4 for more details on how this works):

data Expr : Vect n Ty -> Ty -> Type where
     Var : HasType i gamma t -> Expr gamma t
     Val : (x : Int) -> Expr gamma TyInt
     Lam : Expr (a :: gamma) t -> Expr gamma (TyFun a t)
     App : Lazy (Expr gamma (TyFun a t)) -> Expr gamma a -> Expr gamma t
     Op  : (interpTy a -> interpTy b -> interpTy c) -> Expr gamma a -> Expr gamma
           Expr gamma c
     If  : Expr gamma TyBool -> Expr gamma a -> Expr gamma a -> Expr gamma a

dsl expr
    lambda = Lam
    variable = Var
    index_first = stop
    index_next = pop

We can write a couple of test functions in this language as follows, using the dsl notation to overload lambdas; first a function which multiplies two inputs:

eMult : Expr gamma (TyFun TyInt (TyFun TyInt TyInt))
eMult = expr (\x, y => Op (*) x y)

Then, a function which calculates the factorial of its input:

eFac : Expr gamma (TyFun TyInt TyInt)
eFac = expr (\x => If (Op (==) x (Val 0))
            (Val 1)
            (App (App eMult (App eFac (Op (-) x (Val 1)))) x))

The interpreter’s type is written as follows, marking the expression to be evaluated as %static:

interp : (env : Env gamma) -> %static (e : Expr gamma t) -> interpTy t

This means that if we write an Idris program to calculate a factorial by calling interp on eFac, the resulting definition will be specialised, partially evaluating away the interpreter:

runFac : Int -> Int
runFac x = interp [] eFac x

We can see that the call to interp has been partially evaluated away as follows:

*interp> :printdef runFac
runFac : Int -> Int
runFac x = PE_interp_ed1429e [] x

If we look at PE_interp_ed1429e we’ll see that it follows exactly the structur of eFac, with the interpreter evaluated away:

*interp> :printdef PE_interp_ed1429e
PE_interp_ed1429e : Env gamma -> Int -> Int
PE_interp_ed1429e (3arg) = \x =>
                             boolElim (x == 0)
                                      (Delay 1)
                                      (Delay (PE_interp_b5c2d0ff (x :: (3arg))
                                                                 (PE_interp_ed1429e (x :: (3arg)) (x - 1)) x))

For the sake of readability, I have simplified this slightly: what you will really see also includes specialised versions of ==, - and fromInteger. Note that PE_interp_ed1429e, which represents eFac has become a recursive function following the structure of eFac. There is also a call to PE_interp_b5c2d0ff which is a specialised interpeter for eMult.

These definitions arise because the partial evaluator will only specialise a definition by a specific concrete argument once, then it is cached for future use. So any future applications of interp on eFac will also be translated to PE_interp_ed1429e.

The specialised version of eMult, without any simplification for readability, is:

PE_interp_b5c2d0ff : Env gamma -> Int -> Int -> Int
PE_interp_b5c2d0ff (3arg) = \x => \x1 => PE_*_954510b4 x x1

Miscellaneous

Things we have yet to classify, or are two small to justify their own page.

The Unifier Log

If you’re having a hard time debugging why the unifier won’t accept something (often while debugging the compiler itself), try applying the special operator %unifyLog to the expression in question. This will cause the type checker to spit out all sorts of informative messages.

Namespaces and type-directed disambiguation

Names can be defined in separate namespaces, and disambiguated by type. An expression with NAME EXPR will privilege the namespace NAME in the expression EXPR. For example:

Idris> with List [[1,2],[3,4],[5,6]]
[[1, 2], [3, 4], [5, 6]] : List (List Integer)

Idris> with Vect [[1,2],[3,4],[5,6]]
[[1, 2], [3, 4], [5, 6]] : Vect 3 (Vect 2 Integer)

Idris> [[1,2],[3,4],[5,6]]
Can't disambiguate name: Prelude.List.::, Prelude.Stream.::, Prelude.Vect.::

Alternatives

The syntax (| option1, option2, option3, ... |) type checks each of the options in turn until one of them works. This is used, for example, when translating integer literals.

Idris> the Nat (| "foo", Z, (-3) |)
0 : Nat

This can also be used to give simple automated proofs, for example: trying some constructors of proofs.

syntax Trivial = (| Oh, Refl |)

Totality checking assertions

All definitions are checked for coverage (i.e. all well-typed applications are handled) and either for termination (i.e. all well-typed applications will eventually produce an answer) or, if returning codata, for productivity (in practice, all recursive calls are constructor guarded).

Obviously, termination checking is undecidable. In practice, the termination checker looks for size change - every cycle of recursive calls must have a decreasing argument, such as a recursive argument of a strictly positive data type.

There are two built-in functions which can be used to give the totality checker a hint:

  • assert_total x asserts that the expression x is terminating and covering, even if the totality checker cannot tell. This can be used for example if x uses a function which does not cover all inputs, but the caller knows that the specific input is covered.
  • assert_smaller p x asserts that the expression x is structurally smaller than the pattern p.

For example, the following function is not checked as total:

qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs) = qsort (filter (<= x) xs) ++ (x :: qsort (filter (>= x) xs)))

This is because the checker cannot tell that filter will always produce a value smaller than the pattern x :: xs for the recursive call to qsort. We can assert that this will always be true as follows:

total
qsort : Ord a => List a -> List a
qsort [] = []
qsort (x :: xs) = qsort (assert_smaller (x :: xs) (filter (<= x) xs)) ++
                  (x :: qsort (assert_smaller (x :: xs) (filter (>= x) xs))))

Preorder reasoning

This syntax is defined in the module Syntax.PreorderReasoning in the base package. It provides a syntax for composing proofs of reflexive-transitive relations, using overloadable functions called step and qed. This module also defines step and qed functions allowing the syntax to be used for demonstrating equality. Here is an example:

import Syntax.PreorderReasoning
multThree : (a, b, c : Nat) -> a * b * c = c * a * b
multThree a b c =
  (a * b * c) ={ sym (multAssociative a b c) }=
  (a * (b * c)) ={ cong (multCommutative b c) }=
  (a * (c * b)) ={ multAssociative a c b }=
  (a * c * b) ={ cong {f = (* b)} (multCommutative a c) }=
  (c * a * b) QED

Note that the parentheses are required – only a simple expression can be on the left of ={ }= or QED. Also, when using preorder reasoning syntax to prove things about equality, remember that you can only relate the entire expression, not subexpressions. This might occasionally require the use of cong.

Finally, although equality is the most obvious application of preorder reasoning, it can be used for any reflexive-transitive relation. Something like step1 ={ just1 }= step2 ={ just2 }= end QED is translated to (step step1 just1 (step step2 just2 (qed end))), selecting the appropriate definitions of step and qed through the normal disambiguation process. The standard library, for example, also contains an implementation of preorder reasoning on isomorphisms.

Pattern matching on Implicit Arguments

Pattern matching is only allowed on implicit arguments when they are referred by name, e.g.

foo : {n : Nat} -> Nat
foo {n = Z} = Z
foo {n = S k} = k

or

foo : {n : Nat} -> Nat
foo {n = n} = n

The latter could be shortened to the following:

foo : {n : Nat} -> Nat
foo {n} = n

That is, {x} behaves like {x=x}.

Existence of an implementation

In order to show that an implementation of some interface is defined for some type, one could use the %implementation keyword:

foo : Num Nat
foo = %implementation

‘match’ application

ty <== name applies the function name in such a way that it has the type ty, by matching ty against the function’s type. This can be used in proofs, for example:

plus_comm : (n : Nat) -> (m : Nat) -> (n + m = m + n)
-- Base case
(Z + m = m + Z) <== plus_comm =
    rewrite ((m + Z = m) <== plusZeroRightNeutral) ==>
            (Z + m = m) in Refl

-- Step case
(S k + m = m + S k) <== plus_comm =
    rewrite ((k + m = m + k) <== plus_comm) in
    rewrite ((S (m + k) = m + S k) <== plusSuccRightSucc) in
        Refl

Reflection

Including %reflection functions and quoteGoal x by fn in t, which applies fn to the expected type of the current expression, and puts the result in x which is in scope when elaborating t.

Bash Completion

Use of optparse-applicative allows Idris to support Bash completion. You can obtain the completion script for Idris using the following command:

idris --bash-completion-script `which idris`

To enable completion for the lifetime of your current session, run the following command:

source <(idris --bash-completion-script `which idris`)

To enable completion permanently you must either:

  • Modify your bash init script with the above command.
  • Add the completion script to the appropriate bash_completion.d/ folder on your machine.

Tutorials on the Idris Language

Tutorials submitted by community members.

注解

The documentation for Idris has been published under the Creative Commons CC0 License. As such to the extent possible under law, The Idris Community has waived all copyright and related or neighboring rights to Documentation for Idris.

More information concerning the CC0 can be found online at: http://creativecommons.org/publicdomain/zero/1.0/

Type Providers in Idris

Type providers in Idris are simple enough, but there are a few caveats to using them that it would be worthwhile to go through the basic steps. We also go over foreign functions, because these will often be used with type providers.

The use case

First, let’s talk about why we might want type providers. There are a number of reasons to use them and there are other examples available around the net, but in this tutorial we’ll be using them to port C’s struct stat to Idris.

Why do we need type providers? Well, Idris’s FFI needs to know the types of the things it passes to and from C, but the fields of a struct stat are implementation-dependent types that cannot be relied upon. We don’t just want to hard-code these types into our program… so we’ll use a type provider to find them at compile time!

A simple example

First, let’s go over a basic usage of type providers, because foreign functions can be confusing but it’s important to remember that providers themselves are simple.

A type provider is simply an IO action that returns a value of this type:

data Provider a = Provide a | Error String

Looks familiar? Provider is just Either a String, given a slightly more descriptive name.

Remember though, type providers we use in our program must be IO actions. Let’s write a simple one now:

module Provider
-- Asks nicely for the user to supply the size of C's size_t type on this
-- machine
getSizeT : IO (Provider Int)
getSizeT = do
  putStrLn "I'm sorry, I don't know how big size_t is. Can you tell me, in bytes?"
  resp <- getLine
  case readInt resp of
     Just sizeTSize => pure (Provide sizeTSize)
     Nothing => pure (Error "I'm sorry, I don't understand.")
-- the readInt function is left as an exercise

We assume that whoever’s compiling the library knows the size of size_t, so we’ll just ask them! (Don’t worry, we’ll get it ourselves later.) Then, if their response can be converted to an integer, we present Provide sizeTSize as the result of our IO action; or if it can’t, we signal a failure. (This will then become a compile-time error.)

Now we can use this IO action as a type provider:

module Main
-- to gain access to the IO action we're using as a provider
import Provider

-- TypeProviders is an extension, so we'll enable it
%language TypeProviders

-- And finally, use the provider!
-- Note that the parentheses are mandatory.
%provide (sizeTSize : Int) with getSizeT

-- From now on it's just a normal program where `sizeTSize` is available
-- as a top-level constant
main : IO ()
main = do
  putStr "Look! I figured out how big size_t is! It's "
  putStr (show sizeTSize)
  putStr " bytes!"

Yay! We… asked the user something at compile time? That’s not very good, actually. Our library is going to be difficult to compile! This is hardly a step up from having them edit in the size of size_t themselves!

Don’t worry, there’s a better way.

Foreign Functions

It’s actually pretty easy to write a C function that figures out the size of size_t:

int sizeof_size_t() { return sizeof(size_t); }

(Why an int and not a size_t? The FFI needs to know how to receive the return value of this function and translate it into an Idris value. If we knew how to do this for values of C type size_t, we wouldn’t need to write this function at all! If we really wanted to be safe from overflow, we could use an array of multiple integers, but the SIZE of size_t is never going to be a 65535 byte integer.)

So now we can get the size of size_t as long as we’re in C code. We’d like to be able to use this from Idris. Can we do this? It turns out we can.

foreign

With foreign, we can turn a C function into an IO action. It works like this:

getSizeT : IO Int
getSizeT = foreign FFI_C "sizeof_size_t" (IO Int)

Pretty simple. foreign takes a specification of what function it needs to call and that function’s return type.

Running foreign functions

This is all well and good for writing code that will typecheck. To actually run the code, we’ll need to do just a bit more work. Exactly what we need to do depends on whether we want to interpret or compile our code.

In the interpreter

If we want to call our foreign functions from interpreted code (such as the REPL or a type provider), we need to dynamically link a library containing the symbols we need. This is pretty easy to do with the %dynamic directive:

%dynamic "./filename.so"

Note that the leading “./” is important: currently, the string you provide is interpreted as by dlopen(), which on Unix does not search in the current directory by default. If you use the “./”, the library will be searched for in the directory from which you run idris (not the location of the file you’re running!). Of course, if you’re using functions from an installed library rather than something you wrote yourself, the “./” is not necessary.

In an executable

If we want to run our code from an executable, we can statically link instead. We’ll use the %include and %link directives:

%include C "filename.h"
%link C "filename.o"

Note the extra argument to the directive! We specify that we’re linking a C header and library. Also, unlike %dynamic, these directives search in the current directory by default. (That is, the directory from which we run idris.)

Putting it all together

So, at the beginning of this article I said we’d use type providers to port struct stat to Idris. The relevant part is just translating all the mysterious typedef’d C types into Idris types, and that’s what we’ll do here.

First, let’s write a C file containing functions that we’ll bind to.

/* stattypes.c */
#include <sys/stat.h>

int sizeof_dev_t() { return sizeof(dev_t); }
int sizeof_ino_t() { return sizeof(ino_t); }
/* lots more functions like this */

Next, an Idris file to define our providers:

-- Providers.idr
module Providers

%dynamic "./stattypes.so"
%access export

sizeOfDevT : IO Int
sizeOfDevT = foreign FFI_C "sizeof_dev_t" (IO Int)
{- lots of similar functions -}

-- Indicates how many bits are used to represent various system
-- stat types.
public export
data BitWidth = B8 | B16 | B32 | B64

Show BitWidth where
  show B8 = "8 bits"
  show B16 = "16 bits"
  show B32 = "32 bits"
  show B64 = "64 bits"

-- Now we have an integer, but we want a Provider BitWidth.
-- Since our sizeOf* functions are ordinary IO actions, we
-- can just map over them.
bytesToType : Int -> Provider BitWidth
bytesToType 1 = Provide B8  -- "8 bit value"
bytesToType 2 = Provide B16
bytesToType 4 = Provide B32
bytesToType 8 = Provide B64
bytesToType _ = Error "Unrecognised integral type."

getDevT : IO (Provider BitWidth)
getDevT = map bytesToType sizeOfDevT
{- lots of similar functions -}

Finally, we’ll write one more idris file where we use the type providers:

-- Main.idr
module Main
import Providers
%language TypeProviders
%provide (DevTBitWidth : BitWidth) with getDevT

-- We can now use DevTBitWidth in our program!
main : IO ()
main = putStrLn $ "size of dev_t: " ++ show DevTBitWidth

The Interactive Theorem Prover

This short guide contributed by a community member illustrates how to prove associativity of addition on Nat using the interactive theorem prover.

First we define a module Foo.idr

module Foo

plusAssoc : plus n (plus m o) = plus (plus n m) o
plusAssoc = ?rhs

We wish to perform induction on n. First we load the file into the Idris REPL as follows:

$ idris Foo.idr

We will be given the following prompt, in future releases the version string will differ:

     ____    __     _
    /  _/___/ /____(_)____
    / // __  / ___/ / ___/     Version 0.9.18.1
  _/ // /_/ / /  / (__  )      http://www.idris-lang.org/
 /___/\__,_/_/  /_/____/       Type :? for help

Idris is free software with ABSOLUTELY NO WARRANTY.
For details type :warranty.
Type checking ./Foo.idr
Metavariables: Foo.rhs
*Foo>

Explore the Context

We start the interactive session by asking Idris to prove the hole rhs using the command :p rhs. Idris by default will show us the initial context. This looks as follows:

*Foo> :p rhs
----------                 Goal:                  ----------
{ hole 0 }:
 (n : Nat) ->
 (m : Nat) ->
 (o : Nat) ->
 plus n (plus m o) = plus (plus n m) o

Application of Intros

We first apply the intros tactic:

-Foo.rhs> intros
----------              Other goals:              ----------
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
----------                 Goal:                  ----------
{ hole 3 }:
 plus n (plus m o) = plus (plus n m) o

Induction on n

Then apply induction on to n:

-Foo.rhs> induction n
----------              Other goals:              ----------
elim_S0
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
----------                 Goal:                  ----------
elim_Z0:
 plus Z (plus m o) = plus (plus Z m) o

Compute

-Foo.rhs> compute
----------              Other goals:              ----------
elim_S0
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
----------                 Goal:                  ----------
elim_Z0:
 plus m o = plus m o

Trivial

-Foo.rhs> trivial
----------              Other goals:              ----------
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
----------                 Goal:                  ----------
elim_S0:
 (n__0 : Nat) ->
 (plus n__0 (plus m o) = plus (plus n__0 m) o) ->
 plus (S n__0) (plus m o) = plus (plus (S n__0) m) o

Intros

-Foo.rhs> intros
----------              Other goals:              ----------
{ hole 4 }
elim_S0
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
 n__0 : Nat
 ihn__0 : plus n__0 (plus m o) = plus (plus n__0 m) o
----------                 Goal:                  ----------
{ hole 5 }:
 plus (S n__0) (plus m o) = plus (plus (S n__0) m) o

Compute

-Foo.rhs> compute
----------              Other goals:              ----------
{ hole 4 }
elim_S0
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
 n__0 : Nat
 ihn__0 : plus n__0 (plus m o) = plus (plus n__0 m) o
----------                 Goal:                  ----------
{ hole 5 }:
 S (plus n__0 (plus m o)) = S (plus (plus n__0 m) o)

Rewrite

-Foo.rhs> rewrite ihn__0
----------              Other goals:              ----------
{ hole 5 }
{ hole 4 }
elim_S0
{ hole 2 }
{ hole 1 }
{ hole 0 }
----------              Assumptions:              ----------
 n : Nat
 m : Nat
 o : Nat
 n__0 : Nat
 ihn__0 : plus n__0 (plus m o) = plus (plus n__0 m) o
----------                 Goal:                  ----------
{ hole 6 }:
 S (plus n__0 (plus m o)) = S (plus n__0 (plus m o))

Trivial

-Foo.rhs> trivial
rhs: No more goals.
-Foo.rhs> qed
Proof completed!
Foo.rhs = proof
  intros
  induction n
  compute
  trivial
  intros
  compute
  rewrite ihn__0
  trivial

Two goals were created: one for Z and one for S. Here we have proven associativity, and assembled a tactic based proof script. This proof script can be added to Foo.idr.