Lately I've been investigating what it would take to provide some sort of type analysis to the scheme programming language. In the process I wanted to learn a bit about how the Guile Scheme type system works. Naturally my first step was to ask the kind folks over at the #guile irc channel for some pointers. Along the way someone over there was interested as well, and asked me to write up what I found. It's been a while since I've blogged, so here we are.
This topic really digs into the internals of Guile's compiler, which is of course a very large topic. I'll also try my best to explain the bits that a reader absolutely must know to understand the main points. I'll try to only assume a reasonably proficient understanding of programming (in any language) and link to resources that provide more in depth background on the Guile compiler itself. I don't think a reader must know the Scheme programming language in order to understand what's happening here, but we will be using the scheme language throughout the post. Here is a quick mental mapping that should get you through.
// A top level variable
var foo = "foo"
// A top level function definition
function foo(bar, baz) {
return bar + baz
}
// A function call
foo(1, 2)
;; A top level variable
(define foo "foo")
;; A top level function (procedure) definition
(define (foo bar baz)
(+ bar baz))
;; A function call
(foo 1 2)
One last note before we get started. I'm almost going to walk through all the steps that I used to learn this topic. I say almost because along the way I went down many paths that proved to be useless or just plain wrong. Here I only show the questions that led to useful answers. With that, let's dig in!
In order to understand the details here, it is incredibly helpful to understand the Guile Compiler Tower. So best to go read that page and come back (don't worry it's not long). It is also pretty helpful to read the entire Compiling to the Virtual Machine section. But don't worry if you don't understand everything, or just don't have the time. We'll cover the important bits here. In particular the Intermediate Language (IL) that we are interested in here is the Continuation Passing Style IL, as that is the compiler pass that does type analysis.
Without getting all Compiler Theory™ on you, for our purposes it will probably suffice to consider CPS as a big goto table. Yes, goto is considered harmful, but as a tool for a compiler back-end it is actually pretty practical.
Guile is great, it exposes the entire compiler to the user, so we can start exploring right away. For our example let's consider a simple "hello world" program.
(define (main)
(display "Hello world!\n"))
In order to compile this to CPS we can just use the compile procedure in the Guile repl.
scheme@(guile-user)> (define hello-world
'(define (main)
(display "Hello world!\n")))
scheme@(guile-user)> (compile hello-world #:to 'cps)
$1 = #<intmap 0-18>
wtf is that #<intmap 0-21>
thing? Well, like I said, this is essentially a
big goto lookup table, so Guile has an efficient data structure for that: the
intmap found in the CPS Soup
section of the manual. Now, let's import the intmap module so that we can
inspect what the compiler just gave us. At the Guile repl again we call the
use-modules
procedure.
scheme@(guile-user)> (use-modules (language cps intmap))
Now we can start to poke at our intmap, but it is probably also useful to
assign a sensible name to this intmap, so let's do that now. We'll just use the
identifier that the repl automatically assigned to the result of our
expression (namely $1
).
scheme@(guile-user)> (define hello-world-cps $1)
An intmap is essentially a mapping of small integers to CPS values. Let's see
what the first value (at index 0
) is, for that we'll use intmap-ref
.
intmap-ref
is very similar to list-ref
and vector-ref
.
scheme@(guile-user)> (intmap-ref hello-world-cps 0)
$2 = #<cps (kfun () 0 1 18)>
Here we can see that we got some sort of CPS value out. It is a cps record of
type $kfun
. You can go find all the possible CPS record types in the
CPS in Guile
section of the manual, but (spoiler alert $kfun
defines a function entry).
Looking at that manual page we see that $kfun
has five arguments.
src This is the source location of the original scheme expression. First of all this property isn't printed. And second of all, we didn't compile our program from a file anyway, so this would be pretty useless for us. So we'll just ignore it.
meta
The meta
field is some sort of association list describing the
properties of the procedure (function). We aren't entirely sure
what this is yet. But none of our functions have arguments, and
there are no variables, so it should come as no surprise that it is
empty.
self
The documentation for the self
field says:
self is a variable bound to the procedure being called, and which may be used for self-references
So it should make sense that this value is 0
. Remember we are looking at
the CPS value at index 0
in our intmap.
tail
The documentation for the tail
field says:
tail is the label of the $ktail for this function, corresponding to the function’s tail continuation
So it seems that this field should be the "goto" label for when we are done with this function and are going to return.
clause
The documentation for the clause
field says:
clause is the label of the first $kclause for the first case-lambda clause in the function, or otherwise #f
So this should be the "goto" label for the first expression of the function body.
Well, that was fun. If we look back at what the repl printed initially this all lines up pretty nicely.
'()
0
1
18
Of course we could do the same for all indices in the intmap from 0-18, but
that sounds annoying. Let's instead write a function to do it for us! First
we'll need another module (language cps utils)
. This will give us two
very useful functions intmap-keys
and intmap-map
, along with a bunch of
other goodies that we won't need today.
scheme@(guile-user)> (use-modules (language cps utils))
Now let's write our function, doesn't have to be too special.
scheme@(guile-user)> (define (print-intmap intmap)
(intmap-map (lambda entry
(format #t "~a\n" entry))
intmap))
And now we're ready to see what our intmap contains!
scheme@(guile-user)> (print-intmap hello-world-cps)
(0 #<cps (kfun () 0 1 18)>)
(1 #<cps (ktail)>)
(2 #<cps (kargs (val) (3) (continue 1 (values 3)))>)
(3 #<cps (kargs () () (continue 2 (unspecified)))>)
(4 #<cps (kargs (arg) (4) (continue 3 (primcall scm-set!/immediate (box . 1) 2 4)))>)
(5 #<cps (ktail)>)
(6 #<cps (kargs (arg) (7) (continue 5 (call 6 7)))>)
(7 #<cps (kargs (arg) (6) (continue 6 (const "Hello world!\n")))>)
(8 #<cps (kargs (box) (8) (continue 7 (primcall scm-ref/immediate (box . 1) 8)))>)
(9 #<cps (kargs () () (continue 8 (primcall cached-toplevel-box (0 display #t))))>)
(10 #<cps (kclause (() () #f () #f) 9)>)
(11 #<cps (kfun ((name . main)) 5 5 10)>)
(12 #<cps (kargs () () (continue 4 (fun 11)))>)
(13 #<cps (kargs (module) (9) (continue 12 (primcall cache-current-module! (0) 9)))>)
(14 #<cps (kargs (main) (2) (continue 13 (primcall current-module #f)))>)
(15 #<cps (kargs (arg) (10) (continue 14 (primcall define! #f 1 10)))>)
(16 #<cps (kargs (mod) (1) (continue 15 (const main)))>)
(17 #<cps (kargs () () (continue 16 (primcall current-module #f)))>)
(18 #<cps (kclause (() () #f () #f) 17)>)
$3 = #<intmap 0-18>
Whew! That's a lot of stuff! That's because there is a lot of implicit control flow in the higher level scheme language compared to the CPS IL that makes all of that control flow explicit. Let's dissect this.
We've already seen that our compiled CPS starts at index 0 with some function, and that the first expression in that function is at index 18. Nearly every expression I've compiled follows this format. The entry to the program is at index 0, and then there is some boilerplate at the very end of the intmap. Here we see that from indices 18-17 Guile is setting up the current module environment.
I'm afraid we can't tell much about the first field of the $kclause
record at
index 18. Like the meta
field of the first kfun
record we encountered, it
is empty, and I suppose we also kind of expect it to be empty. Anyway, here is
what the documentation says:
CPS Continuation: $kclause arity cont alternate
A clause of a function with a given arity. Applications of a function with a compatible set of actual arguments will continue to the continuation labelled cont, a $kargs instance representing the clause body. If the arguments are incompatible, control proceeds to alternate, which is a $kclause for the next clause, or #f if there is no next clause.
At any rate it is a sure bet that after this $kclause
we jump to index 17
where we encounter a $kargs
record. Here's what the documentation has to say
about that:
CPS Continuation: $kargs names vars term
Bind the incoming values to the variables vars, with original names names, and then evaluate term.
So this seems like it is pretty much let
, and that makes sense, even though
ours is empty. The reason for that is that the current-module
$primcall
record takes no arguments. But don't worry, next we jump to index 16 where we
see a $kargs
record that does have a parameter. This one is called mod
and it is the result of the current-module
call we just stepped through.
It's assigned to variable 1
. We aren't quite sure yet why this variable is
associated to index 1 though. It could be an arbitrary number, or it could
relate back to our CPS index 1. So for now just remember that the
current-module is located at variable index 1. From here we continue to index
15 with the constant symbol 'main
. At index 15 we see that we now call the
define!
primitive with the variable at index 1 ((current-module)
),
and the variable at index 10 ((const main)
). As far as we can tell this
places the main
symbol in our module. For the documentation on these
primitive calls you'll have to jump over to the
Intrinsic Call Instructions
section of the manual. Then we jump through indices 14-13 which seems to be
more environment setup, though I can't seem to find the documentation for this.
Finally once we jump through index 12 we reach our main function. He we jump to
index 4 with the value of (fun 11)
. If we quickly scan up we can see our
function body starting at index 11 and ending with the $ktail
record at index 5.
Then from indices 4-1 we see the ending boilerplate for the current module. We'll skip
over this part.
As we've seen, our function body is located across indices 11-5.
Here we see something slightly different though. The
meta
field of our $kfun
at index 11 actually has some values! In our
case it is just the name of the function because it takes no arguments. Then we
see that the self
field is actually index 5 and the tail
field is also at
index 5, and the clause
field is at index 10 as expected. The self and tail
fields being the same seems to indicate that this function cannot recurse,
however that may be a tenuous assumption. At any rate, let's move on to our
function body at index 10.
Again we see a relatively empty $kclause
that forwards us to
index 9 where see another empty $kargs
record that continues to index 8
assigning the display
box that we just evaluated into variable 8, named box
and continues to index 7 by unboxing our display
procedure. At index 7 we see
that we store our unboxed display
procedure at variable 6 and continue
to index 6 with our "Hello world!\n"
constant string. At index 6 we see that
our string constant is assigned to variable 7. Then we continue to index 5 with
the result of (call 6 7)
. Our display
function is stored in variable 6, and
our constant string is stored in variable 7, so you can think of this as
(call display "Hello world!\n")
. From there we continue to our procedure's
$ktail
record which means this is the entire body. And from there we jump
back to index 4 to finish up the program's entry $kfun
.
After a few shuffling around of arguments we finally arrive at the $ktail
of
the entire program at index 1. Remember back when we started our journey that
the tail
field of the entry function was located at index 1? Well we're done
now! Yay!
Whew, yes. Let's figure out what Guile thinks of our types shall we? For this
we will need to deviate from the manual and look at some code 😱. Here we will
want to dig through the Guile code-base for something that looks like it does
some type inference. Lucky for you I did just that and found the module
(language cps types)
. You can find it in the Guile source tree at the path
module/language/cps/types.scm
. Yes, it does sound like a pain to go find the
Guile source code and open this file in a text editor. So here is a snippet
from the top of the file that gives a quick overview.
;;; Commentary:
;;;
;;; Type analysis computes the possible types and ranges that values may
;;; have at all program positions. This analysis can help to prove that
;;; a primcall has no side-effects, if its arguments have the
;;; appropriate type and range. It can also enable constant folding of
;;; type predicates and, in the future, enable the compiler to choose
;;; untagged, unboxed representations for numbers.
;;;
;;; For the purposes of this analysis, a "type" is an aspect of a value
;;; that will not change. Guile's CPS intermediate language does not
;;; carry manifest type information that asserts properties about given
;;; values; instead, we recover this information via flow analysis,
;;; garnering properties from type predicates, constant literals,
;;; primcall results, and primcalls that assert that their arguments are
;;; of particular types.
;;;
;;; A range denotes a subset of the set of values in a type, bounded by
;;; a minimum and a maximum. The precise meaning of a range depends on
;;; the type. For real numbers, the range indicates an inclusive lower
;;; and upper bound on the integer value of a type. For vectors, the
;;; range indicates the length of the vector. The range is the union of
;;; the signed and unsigned 64-bit ranges. Additionally, the minimum
;;; bound of a range may be -inf.0, and the maximum bound may be +inf.0.
;;; For some types, like pairs, the concept of "range" makes no sense.
;;; In these cases we consider the range to be -inf.0 to +inf.0.
;;;
;;; Types are represented as a bitfield. Fewer bits means a more precise
;;; type. Although normally only values that have a single type will
;;; have an associated range, this is not enforced. The range applies
;;; to all types in the bitfield. When control flow meets, the types and
;;; ranges meet with the union operator.
;;;
;;; It is not practical to precisely compute value ranges in all cases.
;;; For example, in the following case:
;;;
;;; (let lp ((n 0)) (when (foo) (lp (1+ n))))
;;;
;;; The first time that range analysis visits the program, N is
;;; determined to be the exact integer 0. The second time, it is an
;;; exact integer in the range [0, 1]; the third, [0, 2]; and so on.
;;; This analysis will terminate, but only after the positive half of
;;; the 64-bit range has been fully explored and we decide that the
;;; range of N is [0, +inf.0]. At the same time, we want to do range
;;; analysis and type analysis at the same time, as there are
;;; interactions between them, notably in the case of `sqrt' which
;;; returns a complex number if its argument cannot be proven to be
;;; non-negative. So what we do instead is to precisely propagate types
;;; and ranges when propagating forward, but after the first backwards
;;; branch is seen, we cause backward branches that would expand the
;;; range of a value to saturate that range towards positive or negative
;;; infinity (as appropriate).
;;;
;;; A naive approach to type analysis would build up a table that has
;;; entries for all variables at all program points, but this has
;;; N-squared complexity and quickly grows unmanageable. Instead, we
;;; use _intmaps_ from (language cps intmap) to share state between
;;; connected program points.
;;;
This looks like a pretty good description of a Lattice Constraint Propagation algorithm. The tricky bit might be that it seems like Guile handles dynamic ranges which I personally haven't found in the literature. Of course, for this type of type inference (no pun intended, but certainly appreciated), I've actually had a hard time finding a wealth of information. There is a lot of information on Hindley-Milner style type systems though, and I'm only one man (who wasn't trained in computer science), so your mileage may vary. Anywho, this sounds like what we're looking for, so how can I run it in the repl?
Well, in particular there is one procedure that looks interesting called
infer-types
. Here is an excerpt of the leading comment and doc-string that
I pulled from whatever Guile-3.0 version I currently have checked out.
;; For best results, the labels in the function starting should be
;; topologically sorted (renumbered). Otherwise the backward branch
;; detection mentioned in the module commentary will trigger for
;; ordinary forward branches.
(define (infer-types conts kfun)
"Compute types for all variables bound in the function labelled
@var{kfun}, from @var{conts}. Returns an intmap mapping labels to type
entries.
A type entry is a vector that describes the types of the values that
flow into and out of a labelled expression. The first slot in the type
entry vector corresponds to the types that flow in, and the rest of the
slots correspond to the types that flow out. Each element of the type
entry vector is an intmap mapping variable name to the variable's
inferred type. An inferred type is a 3-vector of type, minimum, and
maximum, where type is a bitset as a fixnum."
...)
It seems like once we have our CPS we should be able to call this procedure with it (somehow) to get the types out. Through some trial and error I figured out that the parameters you want to pass are:
hello-world-cps
$kfun
value.Let's call this procedure! Index 0 seems like as good a choice as any right?
We'll ignore the whole comment about sorting from the source. I'm sure it is
improtant, but our case is so simple let's hope it doesn't actually cause a
problem. When we call
compile
on our simple function does it already spit out a sorted intmap?
¯_(ツ)_/¯. We'll import the types module and then call infer-types
with
our cps.
scheme@(guile-user)> (use-modules (language cps types))
scheme@(guile-user)> (infer-types hello-world-cps 0)
$4 = #<intmap 0-4,12-18>
Hey, look at that, we have another intmap. Let's assign it to some meaningful name and then peek at what's inside.
scheme@(guile-user)> (define hello-world-types $4)
scheme@(guile-user)> (print-intmap hello-world-types)
(0 #(#<intmap> #<intmap 0>))
(1 #(#<intmap 0-4,9-10>))
(2 #(#<intmap 0-4,9-10> #<intmap 0-4,9-10>))
(3 #(#<intmap 0-2,4,9-10> #<intmap 0-4,9-10>))
(4 #(#<intmap 0-2,4,9-10> #<intmap 0-2,4,9-10>))
(12 #(#<intmap 0-2,9-10> #<intmap 0-2,4,9-10>))
(13 #(#<intmap 0-2,9-10> #<intmap 0-2,9-10>))
(14 #(#<intmap 0-2,10> #<intmap 0-2,9-10>))
(15 #(#<intmap 0-1,10> #<intmap 0-2,10>))
(16 #(#<intmap 0-1> #<intmap 0-1,10>))
(17 #(#<intmap 0> #<intmap 0-1>))
(18 #(#<intmap 0> #<intmap 0>))
$5 = #<intmap 0-4,12-18>
True to the source code's word we have an intmap of vectors that contain intmaps. There is one particular line in the source that I'd like to call out (because at first I only skimmed this file, and so it took me forever to realize).
The first slot in the type entry vector corresponds to the types that flow in, and the rest of the slots correspond to the types that flow out.
Just to reiterate, the first item in each vector is the inferred type of the inputs and the remaining are the inferred types of the outputs.
As usual, let's just look at the first entry (the program entry) to test our assumptions (it should be pretty boring). First let's look at the input types to our program. We'll access the first element of the inner intmap, that is stored in the first element of the vector that is stored in the first element of our outter intmap. So we are looking at indices (0, 0, 0).
scheme@(guile-user)> (intmap-ref (vector-ref (intmap-ref hello-world-types
0)
0)
0)
ice-9/boot-9.scm:1669:16: In procedure raise-exception:
not found 0
Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
scheme@(guile-user) [1]> ,q
We get an error, though we shouldn't be surprised. The printed representation of the first intmap in the first vector has no indices, because obviously the entry point to our program has no inputs!
Let's look at the outputs, to do that we'll just look at index 1 of the first vector. Looking at the printed representation it should have an entry at index 0.
scheme@(guile-user)> (intmap-ref (vector-ref (intmap-ref hello-world-types
0)
1)
0)
$6 = #(67108863 -inf.0 +inf.0)
Now we got a vector with a weird looking number at the front, and then two
trailing entries for -inf.0
and +inf.0
. So at least we immediately see the
variable can range from negative to positive infinity. But what is that first
number? Well let's remember the source comments
An inferred type is a 3-vector of type, minimum, and maximum, where type is a bitset as a fixnum.
We clearly see the minimum and maximum, but what does the type mean. Well we can
find more clues in the exports from language/cps/types.scm
. In particular the
module exports constants for all types starting on line 99 and ending on line 124.
It's a little hard to tell exactly what they all are from the source alone, but I bet they are all integer constants. Let's see if I'm right by just copy/pasting them into the repl
scheme@(guile-user)> &fixnum
$7 = 1
scheme@(guile-user)> &bignum
$8 = 2
scheme@(guile-user)> &flonum
$9 = 4
scheme@(guile-user)> &complex
$10 = 8
scheme@(guile-user)> &fraction
$11 = 16
scheme@(guile-user)>
scheme@(guile-user)> &char
$12 = 32
scheme@(guile-user)> &special-immediate
$13 = 64
scheme@(guile-user)> &symbol
$14 = 128
scheme@(guile-user)> &keyword
$15 = 256
scheme@(guile-user)> &procedure
$16 = 512
scheme@(guile-user)> &pointer
$17 = 1024
scheme@(guile-user)> &fluid
$18 = 2048
scheme@(guile-user)> &pair
$19 = 4096
scheme@(guile-user)> &immutable-vector
$20 = 8192
scheme@(guile-user)> &mutable-vector
$21 = 16384
scheme@(guile-user)> &box
$22 = 32768
scheme@(guile-user)> &struct
$23 = 65536
scheme@(guile-user)> &string
$24 = 131072
scheme@(guile-user)> &bytevector
$25 = 262144
scheme@(guile-user)> &bitvector
$26 = 524288
scheme@(guile-user)> &array
$27 = 1048576
scheme@(guile-user)> &syntax
$28 = 2097152
scheme@(guile-user)> &other-heap-object
$29 = 4194304
scheme@(guile-user)>
scheme@(guile-user)> ;; Special immediate values.
scheme@(guile-user)> &null &nil &false &true &unspecified &undefined &eof
$30 = 0
$31 = 1
$32 = 2
$33 = 3
$34 = 4
$35 = 5
$36 = 6
scheme@(guile-user)>
scheme@(guile-user)> ;; Union types.
scheme@(guile-user)> &exact-integer &exact-number &real &number &vector
$37 = 3
$38 = 19
$39 = 23
$40 = 31
$41 = 24576
scheme@(guile-user)>
scheme@(guile-user)> ;; Untagged types.
scheme@(guile-user)> &f64
$42 = 8388608
scheme@(guile-user)> &u64
$43 = 16777216
scheme@(guile-user)> &s64
$44 = 33554432
Well it sure seems like they are. It also seems like each one has just a single
bit set. So let's clean this up a bit and print the values with some padding so
we can compare them. First the regular format
procedure isn't up to the task,
so we'll import the ice-9
version that has hex/binary format strings. Then we
define a list of all the types. And then we'll loop through each of them
printing the hex, binary, symbol name, and decimal values for each type.
scheme@(guile-user)> (use-modules (ice-9 format))
scheme@(guile-user)> (define types
'(&fixnum
&bignum
&flonum
&complex
&fraction
&char
&special-immediate
&symbol
&keyword
&procedure
&pointer
&fluid
&pair
&immutable-vector
&mutable-vector
&box
&struct
&string
&bytevector
&bitvector
&array
&syntax
&other-heap-object
;; Special immediate values.
&null &nil &false &true &unspecified &undefined &eof
;; Union types.
&exact-integer &exact-number &real &number &vector
;; Untagged types.
&f64
&u64
&s64))
scheme@(guile-user)> (for-each (lambda (type-pair)
(format #t "~8,'0x ~32'0b ~a ~a\n" (car type-pair)
(car type-pair)
(cdr type-pair)
(car type-pair)))
;; Here we combine the symbol name with the value of
;; of the symbol. In the end we get something like this
;; '((1 . &fixnum) (2 . &bignum) ...)
(map (lambda (type)
(cons (eval type (current-module)) type))
types))
00000001 00000000000000000000000000000001 &fixnum 1
00000002 00000000000000000000000000000010 &bignum 2
00000004 00000000000000000000000000000100 &flonum 4
00000008 00000000000000000000000000001000 &complex 8
00000010 00000000000000000000000000010000 &fraction 16
00000020 00000000000000000000000000100000 &char 32
00000040 00000000000000000000000001000000 &special-immediate 64
00000080 00000000000000000000000010000000 &symbol 128
00000100 00000000000000000000000100000000 &keyword 256
00000200 00000000000000000000001000000000 &procedure 512
00000400 00000000000000000000010000000000 &pointer 1024
00000800 00000000000000000000100000000000 &fluid 2048
00001000 00000000000000000001000000000000 &pair 4096
00002000 00000000000000000010000000000000 &immutable-vector 8192
00004000 00000000000000000100000000000000 &mutable-vector 16384
00008000 00000000000000001000000000000000 &box 32768
00010000 00000000000000010000000000000000 &struct 65536
00020000 00000000000000100000000000000000 &string 131072
00040000 00000000000001000000000000000000 &bytevector 262144
00080000 00000000000010000000000000000000 &bitvector 524288
00100000 00000000000100000000000000000000 &array 1048576
00200000 00000000001000000000000000000000 &syntax 2097152
00400000 00000000010000000000000000000000 &other-heap-object 4194304
00000000 00000000000000000000000000000000 &null 0
00000001 00000000000000000000000000000001 &nil 1
00000002 00000000000000000000000000000010 &false 2
00000003 00000000000000000000000000000011 &true 3
00000004 00000000000000000000000000000100 &unspecified 4
00000005 00000000000000000000000000000101 &undefined 5
00000006 00000000000000000000000000000110 &eof 6
00000003 00000000000000000000000000000011 &exact-integer 3
00000013 00000000000000000000000000010011 &exact-number 19
00000017 00000000000000000000000000010111 &real 23
0000001f 00000000000000000000000000011111 &number 31
00006000 00000000000000000110000000000000 &vector 24576
00800000 00000000100000000000000000000000 &f64 8388608
01000000 00000001000000000000000000000000 &u64 16777216
02000000 00000010000000000000000000000000 &s64 33554432
Well, I think that should satisfy our curiosity. It turns out we were almost
right. The "raw" types are indeed components of a bit mask. But thankfully
Guile has also conveniently provided us with some relevant unions. For
example the type code for number is 00000000000000000000000000011111
so that
encompasses all of the appropriate number subtypes. But does it answer our
question of what the number 67108863 means? An easy hypothesis to make is that
it is the union of all types. Let's test that theory at the repl!
scheme@(guile-user)> (apply logior (map (lambda (type)
(eval type (current-module)))
types))
$45 = 67108863
And we get our magic number!. So the return type of the first function is any scheme value! Now we should be able to go hunting in our intmap for a sensible type, don't we have a constant string somewhere?
If we examine the intmap for our types, we see something troubling. The
procedure body we are interested in seems to be missing. We only have entries
for the $kfun
representing the entire program. The string we are looking for
only exists in the main
procedure's body. So we'll want to go analyze that
instead. That's ok though, this should all look pretty familiar.
scheme@(guile-user)> (define main-types (infer-types hello-world-cps 11))
Now let's crack this intmap open and see what we have. If we look at index 6 of
our CPS intmap we can see that there should be a &string
type flowing into it
from the constant string at index 7. So we'll want to pull out the CPS value in
the main-types
intmap for index 6, and then look at the first entry in the
vector that we get back.
scheme@(guile-user)> (print-intmap (vector-ref (intmap-ref main-types 6) 0))
(5 #(67108863 -inf.0 +inf.0))
(6 #(67108863 -inf.0 +inf.0))
(7 #(131072 13 13))
(8 #(32768 2 +inf.0))
$46 = #<intmap 5+0-3>
This seems to be the set of variables that our main
procedure can see.
32768
which is &box
with a range of 2 to +inf.0. This
is our define
box.131072
which is our "Hello world!\n"
string
constant. You can see that Guile has even conveniently provided us with
bounds for the string's length! It is 13 characters long.display
function, so I would have expected to see
512
for &procedure
. But perhaps that is just not something Guile infers
for us, or possibly this is due to not sorting our input intmap? A question
for another day though.$kargs
that stores a variable to
index 5. We'll also leave this as another question for another day.It seems that we haven't quite sorted everything out yet, but we certainly have seen the tools that will help us fill in the gaps. We've also verified enough of our assumptions that it seems pretty clear we are on the right track.
There are still some open questions to this analysis.
if
).At any rate, this post is already long enough, and we have covered our hello-world example to my satisfaction. The rest will have to wait for another time.
UPDATE: That time has come, see Understanding the Guile CPS Variable Indices Moor Better.
Yes, I included the repl prompt in all the code snippets that I expected a reader to copy/paste, which is just annoying. It means you have to copy paste them all one at a time. So here is a big dump of the entire repl history. Now you can dump this into your own repl and explore on your own!
Happy Hacking!
(define hello-world
'(define (main)
(display "Hello world!\n")
#;"Hello world!\n"))
(compile hello-world #:to 'cps)
(use-modules (language cps intmap))
(define hello-world-cps $1)
(intmap-ref hello-world-cps 0)
(use-modules (language cps utils))
(define (print-intmap intmap)
(intmap-map (lambda entry
(format #t "~a\n" entry))
intmap))
(print-intmap hello-world-cps)
(use-modules (language cps types))
(infer-types hello-world-cps 0)
(define hello-world-types $4)
(print-intmap hello-world-types)
(intmap-ref (vector-ref (intmap-ref hello-world-types
0)
0)
0)
,q
(intmap-ref (vector-ref (intmap-ref hello-world-types
0)
1)
0)
&fixnum
&bignum
&flonum
&complex
&fraction
&char
&special-immediate
&symbol
&keyword
&procedure
&pointer
&fluid
&pair
&immutable-vector
&mutable-vector
&box
&struct
&string
&bytevector
&bitvector
&array
&syntax
&other-heap-object
;; Special immediate values.
&null &nil &false &true &unspecified &undefined &eof
;; Union types.
&exact-integer &exact-number &real &number &vector
;; Untagged types.
&f64
&u64
&s64
(define types
'(&fixnum
&bignum
&flonum
&complex
&fraction
&char
&special-immediate
&symbol
&keyword
&procedure
&pointer
&fluid
&pair
&immutable-vector
&mutable-vector
&box
&struct
&string
&bytevector
&bitvector
&array
&syntax
&other-heap-object
;; Special immediate values.
&null &nil &false &true &unspecified &undefined &eof
;; Union types.
&exact-integer &exact-number &real &number &vector
;; Untagged types.
&f64
&u64
&s64))
(use-modules (ice-9 format))
(for-each (lambda (type-pair)
(format #t "~8,'0x ~32'0b ~a ~a\n" (car type-pair)
(car type-pair)
(cdr type-pair)
(car type-pair)))
;; Here we combine the symbol name with the value of
;; of the symbol. In the end we get something like this
;; '((1 . &fixnum) (2 . &bignum) ...)
(map (lambda (type)
(cons (eval type (current-module)) type))
types))
(apply logior (map (lambda (type)
(eval type (current-module)))
types))
(define main-types (infer-types hello-world-cps 11))
(print-intmap (vector-ref (intmap-ref main-types 6) 0))