I read the first post and thought someone should at the very least post the lambda cube. This isn't my area of expertise, since I've done very little with dependent types (staying firmly on the left side of the cube), but it outlines some useful categories of type systems here.
I see we need to add special syntax to the signature for dependent type variables.
If you take Zig, it's `comptime` parameters are kind of similar. They can be used to create functions that return types or whose output type depends on the inputs, etc. It seems to fulfil the three axioms at the start, no? The erasure stuff seems just as relavant.
Can I say that `comptime` is dependent types in imperative clothing? Or is there a missing capability making it strictly weaker?
You could probably say that. AFAIK there isn't a single valid definition of a dependently typed language any more than there is a single valid definition of a functional language.
I usually go with "you can make a dependent pair", which is two terms where the type of the second depends on the value of the first; I think you could do that in Zig with a bit of handwaving around whether the comptime expression really is the first value or not.
What Zig's doing is also, as best I understand it all, the only real path to anything like dependent types in imperative languages: make a separate sub-language for constant expressions that has its own evaluation rules that run at compile time. See also C++ constexpr, or the constants permitted in a Rust static array size parameter (practically a dependent type itself!)
A functional language's basic building block is an expression and terms are built out of composed expressions; normalisation means reducing a term as far as possible, and equality is (handwaving!) when two terms reduce to exactly the same thing. So you can do your dependent terms in the usual source language with the usual evaluation rules, and those terms are freely usable at runtime or compile time, it's all the same code.
An imperative language's basic building block is a memory peek or poke, with programs composed of sequences of peeks and pokes scheduled by control flow operators. It's much less clear what "normalisation" looks like here, so imperative languages need that separate constant-value embedded language.
Zig's sub-language and evaluation rules are exactly the same as the Zig language except that `type` is a first class type and the TypeInfo of any type can be both examined and modified. In Zig, generics are implemented by computing types--most often by a function returning a struct that is specialized to a type argument to the function.
I've been writing some D code and, while D "template" functions can take compile-time arguments that are types, they can only produce types via mixins, which are strings that are injected at the point of invocation and then parsed by the compiler. This gives you all of the power of Zig and then some, but it's not nearly as elegant (it would help a lot if D's token quote construct q{...} had lisp-like quasi-quote functionality). OTOH, I found when writing Zig it could become extremely confusing to have the comptime and runtime languages intermixed, which is often done for loop unrolling and "conditional compilation" -- whether it's a comptime or runtime operation is in many cases a function of whether the variables involved are comptime or runtime types. In D it's generally clearer because comptime operations are distinguished with such things as static if for flow control, enum for declaring comptime variables, foo!(comptime args)(runtime args) for function calls, etc.
Not really. With dependent types, you can have a function that returns a type that depends on a runtime value; with Zig's comptime, it can only depend on a compile-time value.
Note that a language with dependent types doesn't actually "generate" types at runtime (as Zig does at compile-time). It's really about managing proofs that certain runtime objects are instances of certain types. E.g. you could have a "prime number" type, and functions that compute integers would need to include a proof that their result is a prime number if they want to return an instance of the prime number type.
Using dependent types well can be a lot of work in practice. You basically need to write correctness proofs of arbitrary properties.
I'm not familiar with Zig, but I'm pretty sure comptime is not quite as powerful.
In the post, I didn't give any examples of situations where the input to a dependent function was not known at compile time. However, that was just due to the lack of time when writing the post.
I gave this example:
useValue : Bool -> String
useValue True = "The number is: " ++ natToString (getValue True)
useValue False = "The message is: " ++ getValue False
The difference is that nothing in dependent types requires that the depended-on value `b` is known at compile time. You could pass any Bool to `getValue`, including one that was read as input from the user. It would just be the case that, before using the value returned by `getValue b`, you'd have to check (at runtime) the value of `b`. Only then would you know `pickType b`, which would tell you the type of `getValue b`. My apologies that the post was unclear about this. (I've just updated it.)
If you don’t know at compile time what the type of the value is that is returned, how do you know whether the remainder of the expression is still correct?
The choice is either to have a total function that will be able to deal with every type and check at compile time whether they exist or to defer to the runtime and then throw an exception if the program doesn’t know what to do with the value.
The value returned by `getValue b` has type `pickType b`. It cannot be used in a way that depends on its type until you check the value of `b`. If `b = True`, then you know that `pickType b = pickType True = Nat` and that therefore `getValue b` has type `Nat`. There is no option to check the type of a value at runtime. You simply cannot use it in such a way as to create a runtime type error. (This is what it means for the types to be erased. The runtime doesn't know what type it's supposed to be!)
You're already in trouble here right? what's the type of thingy?
So maybe you write:
thingy: (pickType someBool) = getValue someBool
which is fine and great!
then you write:
print (thingy + 3) -- thingy can be an int right?
But at this point thingy is of type "pickType someBool" while + is expecting Int. So it's up to you to restructure your code to prove to the type checker that someBool is in fact True.
For example:
print ((if someBool then thingy else 5) + 3)
^ and here your code proves the fact! So in practice when you want to explore the results that are hidden behind the dependent types, you will need to have written code that unwraps it like this. But the whole point is the underpinning type theory means that you will be fine while doing it! You've proven that it'll be fine, so if you add thingy to 3 you'll have an integer.
I think of this stuff a bit like encryption password. You have this bundle of data, but to get in, you need to provide proof that what you're doing won't blow up. You'll need to write your code in a certain way to do it. But if you provide the right proof, then we're golden.
Consider how Rust monomorphizes generic functions, making a copy for each possibility. I imagine a compiler could generate a different continuation for each possible return type, along with a runtime check that branches to the appropriate continuation.
It seems like that might cause code bloat, though?
I commented this before but deleted it because I hadn't tested my code properly. Anyway, here is a version that works.
You're right Zig isn't quite as powerful, but you don't need type erasure if you have enough monomorphization. It's a bit annoying to use since you have to capture the comptime version of b inside the inline switch prongs, and get_value does need to take a comptime bool instead of a runtime one. It's functionally the same, except that instead of using the bool to prove the type of the type-erased value is correct and delaying the actual if (b) check to runtime, we're moving it to compile time and instead proving that b has the right value for each case, then generating specialized code.
This does NOT mean that it isn't dependent on user input, the call do_thing(get_user_input()) would absolutely work. do_thing has no compile-time parameters at all.
I don't have the natToString thing because it's a bit more annoying in Zig and would obscure the point.
fn pick_type(comptime b: bool) type {
if (b) return u32 else return []const u8;
}
and change the signature
fn get_value(comptime b: bool) pick_type(b)
if you wish.
Perhaps more convincingly, you can use inline for with a similar effect. It's obviously horrible code nobody would ever write but I think it might illustrate the point.
fn do_thing(b: bool) void {
inline for (0..2) |i| { // go through all bool variants
const known_b = (i != 0);
if (known_b == b) {
// Essentially what we have now is a known_b which is proven
// at compile-time to be equal to whatever runtime b we have.
// So we're free to do any kind of dependent type things we'd like
// with this value.
const value = get_value(known_b);
// Here we know the type of value but it still depends on the runtime b.
if (known_b) std.log.info("{d}", .{value}) // treated as an int
else std.log.info("{s}", .{value}); // treated as a string
// We can also just decide based on the type of value at this point.
if (@TypeOf(value) == u32) ... else ...;
// Or, a switch
switch (@TypeOf(value)) {
u32 => ...,
[]const u8 => ...,
}
}
}
}
I am slowly trying to understand dependent types but the explanation is a bit confusing to me as, I understand the mathematical terminology of a function that may return a type, but...
Since function types take a value and return a value, they are by definition in another universe from say morphisms that would take a type and return a type.
The same way, I see a value as a ur-element and types as sets of values.
So even if there is syntactic sugar around the value <-> type equivalence, I'd naively think that we could instead define some type morphism and that might be more accurate. The value parameter would merely be declared through a type parameter constrained to be a singleton.
The same way a ur-element is not a set but a member of set.
Then the question is representation but that could be left as an optimization.
Perhaps that it is already what is done.
Example:
type Nine int = {9}
And then the rest is generic functions, parameterizable by 9, or actually, Nine.
So nothing too different from a refinement of int.
Basically, 'value' would be a special constraint on a type parameter in normal parametric polymorphism implementations. There would probably be derived constraint information such as size etc...
But I guess, the same issue of "which refinement types can be defined, while keeping everything decidable" remains as an issue.
Also how to handle runtime values? That will require type assertions, just like union types?
Or is it only a compile time concept and there is no runtime instantiations.
Only some kind of const generics?
A typeof function could be an example of dependent type though? Even at runtime?
In the style of the linked post, you'd probably define a generic type (well, one of two generic types):
type ExactlyStatic : (0 t: Type) -> (0 v: t) -> Type
type ExactlyRuntime : (0 t: Type) -> (v: t) -> Type
Then you could have the type (ExactlyStatic Int 9) or the type (ExactlyRuntime Int 9).
The difference is that ExactlyStatic Int 9 doesn't expose the value 9 to the runtime, so it would be fully erased, while (ExactlyRuntime Int 9) does.
This means that the runtime representation of the first would be (), and the second would be Int.
> Also how to handle runtime values? That will require type assertions, just like union types?
The compiler doesn't insert any kind of runtime checks that you aren't writing in your code. The difference is that now when you write e.g. `if condition(x) then ABC else DEF` inside of the two expressions, you can obtain a proof that condition(x) is true/false, and propagate this.
Value representations will typically be algebraic-data-type flavored (so, often a tagged union) but you can use erasure to get more efficient representations if needed.
In type theory, all singleton types are isomorphic and have no useful distinguishing characteristics (indeed, this is true of all types of the same cardinality – and even then, comparing cardinalities is always undecidable and thus irrelevant at runtime). So your Nine type doesn’t really make sense, because you may as well just write Unit. In general, there is no amount of introspection into the “internal structure” of a type offered; even though parametricity does not hold in general (unless you postulate anticlassical axioms), all your functions that can run at runtime are required to be parametric.
Being isomorphic is not the same as being identical, or substitutable for one another. Type theory generally distinguishes between isomorphism and definitional equality and only the latter allows literal substitution. So a Nine type with a single constructor is indeed isomorphic to Unit but it's not the same type, it carries different syntactic and semantic meaning and the type system preserves that.
Some other false claims are that type theory does not distinguish types of the same cardinality. Type theory is usually intensional not extensional so two types with the same number of inhabitants can have wildly different structures and this structure can be used in pattern matching and type inference. Cardinality is a set-theoretic notion but most type theories are constructive and syntactic, not purely set-theoretic.
Also parametricity is a property of polymorphic functions, not of all functions in general. It's true that polymorphic code can't depend on the specific structure of its type argument but most type theories don't enforce full parametricity at runtime. Many practical type systems (like Haskell with type classes) break it with ad-hoc polymorphism or runtime types.
How does it become Unit if it is an integer of value 9?
Why would cardinalities be undecidable if they are encoded discretely in the type?
For instance, type Nine int = {9} would not be represented as 9. It would probably be a fat pointer.
It is not just an int, it would not even have the same operations (9 + 9 is 18 which is an int but is not of type Nine, but that's fine, a fat pointer does not need to share the same set of operations as the value it wraps).
It could be seen as a refinement of int?
I am not sure that it can truly be isomorphic? My suspicion was that it would only be somewhat isomorphic at compile time, for type checking, and if there is a mechanism for auto unwrapping of the value?
There's only one possible value of type Nine; there's only one possible value of type Unit. They're isomorphic: there's a pair of functions to convert from Nine to Unit and from Unit to Nine whose compositions are identities. Both functions are just constants that discard their argument un-inspected. "nineToUnit _ = unit" and "unitToNine _ = {9}".
You've made up your language and syntax for "type Nine int = {9}" so the rules of how it works are up to you. You're sort of talking about it like it's a refinement type, which is a type plus a predicate over values of the type. Refinement types aren't quite dependent types: they're sort of like a dependent pair where the second term is of kind Proposition, not Type; your type in Liquid Haskell would look something like 'type Nine = Int<\n -> n == 9>'.
Your type Nine carries no information, so the most reasonable runtime representation is no representation at all. Any use of the Nine boils down to pattern matching on it, and a pattern match on a Nine only has one possible branch, so you can ignore the Nine term altogether.
It is an integer. It is in the definition.
And any value should be equal to nine.
By construction Nine could have been given the same representation as int, at first. Except this is not enough to express the refinement/proposition.
One could represent it as a fat pointer with a space allocated to the set of propositions/predicates to check the value of.
That allows to check for equality for instance.
That information would not be discarded.
This is basically a subtype of int.
As such, it is both a dependent type and a refinement type. While it is true that not all refinement types are dependent types, because of cardinality.
I think Maxatar response in the same thread puts it in words that are possibly closer to the art.
Hi aatd86! We had a different thread about existence a couple days ago, and I'm just curious -- is English your second language? What's your first language? I'm guessing French, or something from Central Europe.
Functions that return types are indeed at a higher level than those that don't.
Values can be seen as Singleton types. The key difference is the universe they live in. In the universe of types, the level one level below is perceived as a value. Similarly, in the universe of kinds, types appear to be values.
> Basically, 'value' would be a special constraint on a type parameter in normal parametric polymorphism implementations
Yes this is a level constraint.
> But I guess, the same issue of "which refinement types can be defined, while keeping everything decidable" remains as an issue.
If you're dealing with fully computable types. Nothing is decidable.
> Also how to handle runtime values? That will require type assertions, just like union types? Or is it only a compile time concept and there is no runtime instantiations. Only some kind of const generics?
A compiler with dependent types is essentially producing programs that are itself embedded with its input. There cannot be a distinction between runtime and compel time. Because in general type checking requires you to be able to run a program. The compilation essentially becomes deciding which parts you want to evaluate and which parts you want to defer until later.
> A typeof function could be an example of dependent type though? Even at runtime?
Typeof is just const.
Typeof: (T : type) -> (x:T) -> Type
Final note: I've written some compilers for toy dependently typed languages. By far dependent typing makes both the language and the compiler easier not harder. This is because Haskell and c++ and other languages with type systems and metaprogramming or generics actually have several languages: the value language that we are familiar with, but also the type language.
In Haskell, this is the class/instance language which is another logic programming language atop the lambda calculus language. This means to write a Haskell compiler you have to write a compiler for Haskell and the logic programming language (which is turing complete btw) that is the type language
Similarly in c++, you have to implement c++ AND the template system which is a functional programming language with incredibly confusing semantics.
On the other hand for a dependently typed language, you just write one language... The types are talked about in the same way as values. The only difficulty is type inference. However, an explicit dependently typed language is actually the easiest typed compiler to write because it's literally just checking for term equality which is very easy! On the other hand with a dependent language, type inference is never going to be perfect so you're a lot freer to make your own heuristic decisions and forego HM-like perfection.
> Instead of writing the type directly, we're calling pickType and using the type returned. The typechecker evaluates pickType True at compile time, sees it equals Nat, and then it knows you're saying myNat has type Nat.
But how does the compiler evaluate `pickType True` at compile time? Does the typechecker contain an interpreter for (some subset of) the language? It seems potentially circular for a compiler to invoke functions defined in the code that it's compiling.
The compilers for Zig, D, Nim, and perhaps other languages contain an interpreter for the whole language that can run at compile time. It's not circular--compiling code and executing code are two quite different things (the interpreter does not contain yet another parser--it works on the AST). You simply can do calculations the results of which can be used in compile-time only contexts such as initializers. The trick in Zig is that the compile-time interpreter supports types as first-class objects, so you can compute types and implement generics this way. In Nim, the interpreter supports computations over the AST. In D, types can be passed as generic arguments, and computed strings can be "mixed in" to the source code and compiled at that point (on-the-fly code generation).
Yes, the compiler contains an interpreter for the entire language, not just a subset. It is circular as you rightly point out, but there's no contradiction here.
Typically the interpreter limits itself to only evaluating pure functions or functions that do not perform any kind of IO, but in principle this restriction is not necessary. For example Jai's compiler allows completely arbitrary execution at compile time including even downloading arbitrary data from the Internet and using that data to make decisions about how to compile code.
"But we'd like to give a type to this function. In most languages (Haskell, Typescript, C++, Rust, C#, etc.) there is no way to do that.[2] "
Can't you do this with a GADT? With a GADT, the type of the output can be specified by which branch of the GADT the input matches. Seems quite a similar idea.
GADTs let you do similar things in the simplest cases, they just don't work as well for complex constraints.
Like, append: Vector n t -> Vector m t -> Vector (n+m) t. With GADTs you need to manually lift (+) to work on the types, and then provide proofs that it's compatible with the value-level.
Then you often need singletons for the cases that are more complex- if I have a set of integers, and an int x, and I want to constrain that the item is in the set and is also even, the GADT approach looks like:
doThing ::
Singleton Int x
-> Singleton (Set Int) s
-> IsInSet x s
-> IsEven x
-> SomeOutput
Which looks reasonable, until you remember that the Singleton type has to have a representation which is convenient to prove things about, not a representation that is convenient/efficient to operate on, which means your proofs will be annoying or your function itself will hardly resemble a "naive" untyped implementation.
Dependent types fix this because your values are "just" values, so it would be something like
doThing ::
(x : int)
-> (s : set int)
-> so (is_in s x)
-> so (even x)
-> SomeOutput
Here, x and s are just plain values, not a weird wrapper, and is_in and event are ordinary (and arbitrary) library functions.
No, because 1) you can't do (full) dependent types in TypeScript, 2) it's not clear what "dependent types" would even mean without type checking, and 3) using the union type you aren't giving up type checking at all.
The limitation is what comes later in the article - the dependent type can express the relationship between the argument and the returned type. But that's "out of scope" of the question of "typing a function that can return either a number or a string depending on the boolean argument". It only becomes a problem later on because to use the result you need to test the type (using `typeof`) even if you know you which boolean you passed in; that information doesn't end up tracked by the type system - although there actually are a couple of different ways to do this that satisfy this example in TypeScript, as suggested by some other comments.
Just using a union type in TypeScript isn't dissimilar to my suggestion of wrapping the result in an ADT; it's just that TypeScript makes this lightweight.
Yes. You can give pickType a type in TS. Something like pickType<B extends boolean>(b: B): B extends true ? string : number.
I’d love to read a post explaining how TS conditional types are or are not a form of dependent types. Or, I’d like to understand dependent types well enough to write that post.
I think it's a bit apples and oranges. I was suggesting that compilation itself should probably not involve nondeterministic network requests. If I understand LSP correctly, it only uses network requests to allow your editor to communicate with a binary that provides type information. But the compiler behavior is unchanged. Honestly LSPs seem pretty reasonable to me.
What's the fundamental difference between a syntax highlighter and the frontend of a compiler? I would argue not much, apart from the fact that people are willing to have their syntax highlighter randomly explode, but are much more concerned about their compiler randomly exploding.
The fundamental difference is that a compiler's actions determine the semantics of the program; a syntax highlighter has no effect on it. And I don't think the concern here is about things blowing up. (Not that things blowing up isn't concerning, but it just doesn't have anything to do with this discussion.)
The fundamental difference is that in his case any conforming compiler has to do the network request. Whereas with LSP it's just an implementation detail of the editor.
The fundamental difference is that a compiler determines the semantics of the program being compiled, and having that be non-deterministic is disastrous ... whereas a syntax highlighter has no effect on the semantics of the program that is generated from the code being highlighted. Things blowing up is a complete red herring.
My point is "why do we tolerate network requests in LSPs, but not compiler frontends"?
If you think about it, the LSP is a way of doing libraries, which is dramatically worse than just inventing a binary interface and doing actual dynamic libraries.
If you follow the LSP model to it's natural conclusion, everything becomes this horrible, disconnected morass of servers talking to each other on your local machine..
It's not network requests that are a problem. It's nondeterministic network requests. Build systems do make network requests, but assuming the network is up and the build servers are running, the result of a compile should always be the same. If the server isn't reachable then the build fails.
Also, while build systems can do stuff over the network, the program isn't allowed to run arbitrary network code at compile time. At most it can import a library at a particular URL. The build system will need to download the library the first time, but it can also cache libraries so no download is needed.
> It's not network requests that are a problem. It's nondeterministic network requests.
Ohhhhh, right, my bad. I forgot that you could just make network requests deterministic with the __do_it_determinstically flag. Of courseeee, how could I be so silly.
..
Think about it; you just illustrated my point exactly. If the network request is assumed to always complete, with the desired payload,, and the client/server exist on the same machine, then why the fuck is there a network request?? An LSP server is just a library, but instead of being loaded into the process address space like a normal library, there's a separate server for some unknown reason.
A local network connection is the lowest-common-denominator way to do foreign function calls. This allows the language server to be written in any language you like without having to deal with the complexity of linking code written in different languages together. It just starts another process and connects to it.
For example, VS Code is written in TypeScript and the Go language server is written in Go.
Do the client and server necessarily exist on the same machine? I thought vscode had an architecture where the frontend and backend could be on separate machines, although I don't know if the LSP crosses that gap.
Also, I'm sure there's a way to do it, but if someone wanted to write an LSP in Haskell and the editor was written in JavaScript, I'm not sure it would be as straightforward to load it as a dynamic library as you suggest it should be.
A compiler is allowed to halt on a syntax error. Syntax highlighters ideally should not give up on colouring the rest of the document due to mismatched parentheses (or braces or any other delimiter pair).
Many syntax highlighters in fact do so, but this simply isn't what the point was about network indeterminacy--compilation determines the semantics of the program being compiled, syntax highlighters do not.
No, LSPs return the name/metadata of a concrete type. Dependent typing means that the return type of any given function in your (static) program can depend on a runtime value, e.g. user input... In well-defined ways ofc.
So, you're saying it's outside the scope of an LSP to return information about a dependent type because it's .. not a concrete type? That sounds wrong.
I can make literally any language support dependent types that have struct, enum, switch, and assert. You make a boxed type (tagged union, algebraic datatype, variant, discriminated union, fucking, whatever), and just assert whenever you pack/unpack it on function entry/exit. I do this all the time.
In plain English, my quip boils down to 'why do we tolerate network requests in our syntax highlighters, when we don't tolerate them in our compiler frontends?'
I read the first post and thought someone should at the very least post the lambda cube. This isn't my area of expertise, since I've done very little with dependent types (staying firmly on the left side of the cube), but it outlines some useful categories of type systems here.
https://en.wikipedia.org/wiki/Lambda_cube
I wrote a small post on that: https://azdavis.net/posts/lambda-cube/
Hope it’s helpful!
Cool that you worked on an implementation of the CoC.
I've been somewhat wanting to go back and revisit ATAPL and read chapter 2 on this subject: https://www.cis.upenn.edu/~bcpierce/attapl/frontmatter.pdf
That is a good post. I've linked to it from mine!
Thanks! But I don’t think it quite worked?
Ought to be fixed now
I see we need to add special syntax to the signature for dependent type variables.
If you take Zig, it's `comptime` parameters are kind of similar. They can be used to create functions that return types or whose output type depends on the inputs, etc. It seems to fulfil the three axioms at the start, no? The erasure stuff seems just as relavant.
Can I say that `comptime` is dependent types in imperative clothing? Or is there a missing capability making it strictly weaker?
You could probably say that. AFAIK there isn't a single valid definition of a dependently typed language any more than there is a single valid definition of a functional language.
I usually go with "you can make a dependent pair", which is two terms where the type of the second depends on the value of the first; I think you could do that in Zig with a bit of handwaving around whether the comptime expression really is the first value or not.
What Zig's doing is also, as best I understand it all, the only real path to anything like dependent types in imperative languages: make a separate sub-language for constant expressions that has its own evaluation rules that run at compile time. See also C++ constexpr, or the constants permitted in a Rust static array size parameter (practically a dependent type itself!)
A functional language's basic building block is an expression and terms are built out of composed expressions; normalisation means reducing a term as far as possible, and equality is (handwaving!) when two terms reduce to exactly the same thing. So you can do your dependent terms in the usual source language with the usual evaluation rules, and those terms are freely usable at runtime or compile time, it's all the same code.
An imperative language's basic building block is a memory peek or poke, with programs composed of sequences of peeks and pokes scheduled by control flow operators. It's much less clear what "normalisation" looks like here, so imperative languages need that separate constant-value embedded language.
Zig's sub-language and evaluation rules are exactly the same as the Zig language except that `type` is a first class type and the TypeInfo of any type can be both examined and modified. In Zig, generics are implemented by computing types--most often by a function returning a struct that is specialized to a type argument to the function.
I've been writing some D code and, while D "template" functions can take compile-time arguments that are types, they can only produce types via mixins, which are strings that are injected at the point of invocation and then parsed by the compiler. This gives you all of the power of Zig and then some, but it's not nearly as elegant (it would help a lot if D's token quote construct q{...} had lisp-like quasi-quote functionality). OTOH, I found when writing Zig it could become extremely confusing to have the comptime and runtime languages intermixed, which is often done for loop unrolling and "conditional compilation" -- whether it's a comptime or runtime operation is in many cases a function of whether the variables involved are comptime or runtime types. In D it's generally clearer because comptime operations are distinguished with such things as static if for flow control, enum for declaring comptime variables, foo!(comptime args)(runtime args) for function calls, etc.
Not really. With dependent types, you can have a function that returns a type that depends on a runtime value; with Zig's comptime, it can only depend on a compile-time value.
Note that a language with dependent types doesn't actually "generate" types at runtime (as Zig does at compile-time). It's really about managing proofs that certain runtime objects are instances of certain types. E.g. you could have a "prime number" type, and functions that compute integers would need to include a proof that their result is a prime number if they want to return an instance of the prime number type.
Using dependent types well can be a lot of work in practice. You basically need to write correctness proofs of arbitrary properties.
Sorry if this is obvious, but do these languages let programmers explicitly assert that a certain, relevant property holds at runtime?
Or do these compilers insist on working these proofs out on there own without the benefit of programmer-supplied assertions?
I'm not familiar with Zig, but I'm pretty sure comptime is not quite as powerful.
In the post, I didn't give any examples of situations where the input to a dependent function was not known at compile time. However, that was just due to the lack of time when writing the post.
I gave this example:
The difference is that nothing in dependent types requires that the depended-on value `b` is known at compile time. You could pass any Bool to `getValue`, including one that was read as input from the user. It would just be the case that, before using the value returned by `getValue b`, you'd have to check (at runtime) the value of `b`. Only then would you know `pickType b`, which would tell you the type of `getValue b`. My apologies that the post was unclear about this. (I've just updated it.)If you don’t know at compile time what the type of the value is that is returned, how do you know whether the remainder of the expression is still correct?
The choice is either to have a total function that will be able to deal with every type and check at compile time whether they exist or to defer to the runtime and then throw an exception if the program doesn’t know what to do with the value.
The value returned by `getValue b` has type `pickType b`. It cannot be used in a way that depends on its type until you check the value of `b`. If `b = True`, then you know that `pickType b = pickType True = Nat` and that therefore `getValue b` has type `Nat`. There is no option to check the type of a value at runtime. You simply cannot use it in such a way as to create a runtime type error. (This is what it means for the types to be erased. The runtime doesn't know what type it's supposed to be!)
So maybe you write:
which is fine and great!then you write:
But at this point thingy is of type "pickType someBool" while + is expecting Int. So it's up to you to restructure your code to prove to the type checker that someBool is in fact True.For example:
^ and here your code proves the fact! So in practice when you want to explore the results that are hidden behind the dependent types, you will need to have written code that unwraps it like this. But the whole point is the underpinning type theory means that you will be fine while doing it! You've proven that it'll be fine, so if you add thingy to 3 you'll have an integer.I think of this stuff a bit like encryption password. You have this bundle of data, but to get in, you need to provide proof that what you're doing won't blow up. You'll need to write your code in a certain way to do it. But if you provide the right proof, then we're golden.
Consider how Rust monomorphizes generic functions, making a copy for each possibility. I imagine a compiler could generate a different continuation for each possible return type, along with a runtime check that branches to the appropriate continuation.
It seems like that might cause code bloat, though?
I commented this before but deleted it because I hadn't tested my code properly. Anyway, here is a version that works.
You're right Zig isn't quite as powerful, but you don't need type erasure if you have enough monomorphization. It's a bit annoying to use since you have to capture the comptime version of b inside the inline switch prongs, and get_value does need to take a comptime bool instead of a runtime one. It's functionally the same, except that instead of using the bool to prove the type of the type-erased value is correct and delaying the actual if (b) check to runtime, we're moving it to compile time and instead proving that b has the right value for each case, then generating specialized code.
This does NOT mean that it isn't dependent on user input, the call do_thing(get_user_input()) would absolutely work. do_thing has no compile-time parameters at all.
I don't have the natToString thing because it's a bit more annoying in Zig and would obscure the point.
You could define a function and change the signature if you wish.Perhaps more convincingly, you can use inline for with a similar effect. It's obviously horrible code nobody would ever write but I think it might illustrate the point.
I am slowly trying to understand dependent types but the explanation is a bit confusing to me as, I understand the mathematical terminology of a function that may return a type, but... Since function types take a value and return a value, they are by definition in another universe from say morphisms that would take a type and return a type.
The same way, I see a value as a ur-element and types as sets of values. So even if there is syntactic sugar around the value <-> type equivalence, I'd naively think that we could instead define some type morphism and that might be more accurate. The value parameter would merely be declared through a type parameter constrained to be a singleton. The same way a ur-element is not a set but a member of set.
Then the question is representation but that could be left as an optimization. Perhaps that it is already what is done.
Example:
type Nine int = {9} And then the rest is generic functions, parameterizable by 9, or actually, Nine.
So nothing too different from a refinement of int.
Basically, 'value' would be a special constraint on a type parameter in normal parametric polymorphism implementations. There would probably be derived constraint information such as size etc...
But I guess, the same issue of "which refinement types can be defined, while keeping everything decidable" remains as an issue.
Also how to handle runtime values? That will require type assertions, just like union types? Or is it only a compile time concept and there is no runtime instantiations. Only some kind of const generics?
A typeof function could be an example of dependent type though? Even at runtime?
Just wondering...
In the style of the linked post, you'd probably define a generic type (well, one of two generic types):
type ExactlyStatic : (0 t: Type) -> (0 v: t) -> Type
type ExactlyRuntime : (0 t: Type) -> (v: t) -> Type
Then you could have the type (ExactlyStatic Int 9) or the type (ExactlyRuntime Int 9).
The difference is that ExactlyStatic Int 9 doesn't expose the value 9 to the runtime, so it would be fully erased, while (ExactlyRuntime Int 9) does.
This means that the runtime representation of the first would be (), and the second would be Int.
> Also how to handle runtime values? That will require type assertions, just like union types?
The compiler doesn't insert any kind of runtime checks that you aren't writing in your code. The difference is that now when you write e.g. `if condition(x) then ABC else DEF` inside of the two expressions, you can obtain a proof that condition(x) is true/false, and propagate this.
Value representations will typically be algebraic-data-type flavored (so, often a tagged union) but you can use erasure to get more efficient representations if needed.
In type theory, all singleton types are isomorphic and have no useful distinguishing characteristics (indeed, this is true of all types of the same cardinality – and even then, comparing cardinalities is always undecidable and thus irrelevant at runtime). So your Nine type doesn’t really make sense, because you may as well just write Unit. In general, there is no amount of introspection into the “internal structure” of a type offered; even though parametricity does not hold in general (unless you postulate anticlassical axioms), all your functions that can run at runtime are required to be parametric.
Being isomorphic is not the same as being identical, or substitutable for one another. Type theory generally distinguishes between isomorphism and definitional equality and only the latter allows literal substitution. So a Nine type with a single constructor is indeed isomorphic to Unit but it's not the same type, it carries different syntactic and semantic meaning and the type system preserves that.
Some other false claims are that type theory does not distinguish types of the same cardinality. Type theory is usually intensional not extensional so two types with the same number of inhabitants can have wildly different structures and this structure can be used in pattern matching and type inference. Cardinality is a set-theoretic notion but most type theories are constructive and syntactic, not purely set-theoretic.
Also parametricity is a property of polymorphic functions, not of all functions in general. It's true that polymorphic code can't depend on the specific structure of its type argument but most type theories don't enforce full parametricity at runtime. Many practical type systems (like Haskell with type classes) break it with ad-hoc polymorphism or runtime types.
How does it become Unit if it is an integer of value 9? Why would cardinalities be undecidable if they are encoded discretely in the type?
For instance, type Nine int = {9} would not be represented as 9. It would probably be a fat pointer. It is not just an int, it would not even have the same operations (9 + 9 is 18 which is an int but is not of type Nine, but that's fine, a fat pointer does not need to share the same set of operations as the value it wraps).
It could be seen as a refinement of int? I am not sure that it can truly be isomorphic? My suspicion was that it would only be somewhat isomorphic at compile time, for type checking, and if there is a mechanism for auto unwrapping of the value?
There's only one possible value of type Nine; there's only one possible value of type Unit. They're isomorphic: there's a pair of functions to convert from Nine to Unit and from Unit to Nine whose compositions are identities. Both functions are just constants that discard their argument un-inspected. "nineToUnit _ = unit" and "unitToNine _ = {9}".
You've made up your language and syntax for "type Nine int = {9}" so the rules of how it works are up to you. You're sort of talking about it like it's a refinement type, which is a type plus a predicate over values of the type. Refinement types aren't quite dependent types: they're sort of like a dependent pair where the second term is of kind Proposition, not Type; your type in Liquid Haskell would look something like 'type Nine = Int<\n -> n == 9>'.
Your type Nine carries no information, so the most reasonable runtime representation is no representation at all. Any use of the Nine boils down to pattern matching on it, and a pattern match on a Nine only has one possible branch, so you can ignore the Nine term altogether.
Why doesn't it seem exactly true?
It is an integer. It is in the definition. And any value should be equal to nine. By construction Nine could have been given the same representation as int, at first. Except this is not enough to express the refinement/proposition.
One could represent it as a fat pointer with a space allocated to the set of propositions/predicates to check the value of.
That allows to check for equality for instance.
That information would not be discarded.
This is basically a subtype of int.
As such, it is both a dependent type and a refinement type. While it is true that not all refinement types are dependent types, because of cardinality.
I think Maxatar response in the same thread puts it in words that are possibly closer to the art.
Hi aatd86! We had a different thread about existence a couple days ago, and I'm just curious -- is English your second language? What's your first language? I'm guessing French, or something from Central Europe.
Thanks for humoring me!
Functions that return types are indeed at a higher level than those that don't.
Values can be seen as Singleton types. The key difference is the universe they live in. In the universe of types, the level one level below is perceived as a value. Similarly, in the universe of kinds, types appear to be values.
> Basically, 'value' would be a special constraint on a type parameter in normal parametric polymorphism implementations
Yes this is a level constraint.
> But I guess, the same issue of "which refinement types can be defined, while keeping everything decidable" remains as an issue.
If you're dealing with fully computable types. Nothing is decidable.
> Also how to handle runtime values? That will require type assertions, just like union types? Or is it only a compile time concept and there is no runtime instantiations. Only some kind of const generics?
A compiler with dependent types is essentially producing programs that are itself embedded with its input. There cannot be a distinction between runtime and compel time. Because in general type checking requires you to be able to run a program. The compilation essentially becomes deciding which parts you want to evaluate and which parts you want to defer until later.
> A typeof function could be an example of dependent type though? Even at runtime?
Typeof is just const.
Typeof: (T : type) -> (x:T) -> Type
Final note: I've written some compilers for toy dependently typed languages. By far dependent typing makes both the language and the compiler easier not harder. This is because Haskell and c++ and other languages with type systems and metaprogramming or generics actually have several languages: the value language that we are familiar with, but also the type language.
In Haskell, this is the class/instance language which is another logic programming language atop the lambda calculus language. This means to write a Haskell compiler you have to write a compiler for Haskell and the logic programming language (which is turing complete btw) that is the type language
Similarly in c++, you have to implement c++ AND the template system which is a functional programming language with incredibly confusing semantics.
On the other hand for a dependently typed language, you just write one language... The types are talked about in the same way as values. The only difficulty is type inference. However, an explicit dependently typed language is actually the easiest typed compiler to write because it's literally just checking for term equality which is very easy! On the other hand with a dependent language, type inference is never going to be perfect so you're a lot freer to make your own heuristic decisions and forego HM-like perfection.
> Instead of writing the type directly, we're calling pickType and using the type returned. The typechecker evaluates pickType True at compile time, sees it equals Nat, and then it knows you're saying myNat has type Nat.
But how does the compiler evaluate `pickType True` at compile time? Does the typechecker contain an interpreter for (some subset of) the language? It seems potentially circular for a compiler to invoke functions defined in the code that it's compiling.
The compilers for Zig, D, Nim, and perhaps other languages contain an interpreter for the whole language that can run at compile time. It's not circular--compiling code and executing code are two quite different things (the interpreter does not contain yet another parser--it works on the AST). You simply can do calculations the results of which can be used in compile-time only contexts such as initializers. The trick in Zig is that the compile-time interpreter supports types as first-class objects, so you can compute types and implement generics this way. In Nim, the interpreter supports computations over the AST. In D, types can be passed as generic arguments, and computed strings can be "mixed in" to the source code and compiled at that point (on-the-fly code generation).
Yes, the compiler contains an interpreter for the entire language, not just a subset. It is circular as you rightly point out, but there's no contradiction here.
Typically the interpreter limits itself to only evaluating pure functions or functions that do not perform any kind of IO, but in principle this restriction is not necessary. For example Jai's compiler allows completely arbitrary execution at compile time including even downloading arbitrary data from the Internet and using that data to make decisions about how to compile code.
Thank you for the explanation. This reminds me somewhat of an F# type provider, which can execute arbitrary code at compile time.
https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/ty...
Saying dependent types can be erased is like saying math only contains tautologies. Both are _technically_ true.
"But we'd like to give a type to this function. In most languages (Haskell, Typescript, C++, Rust, C#, etc.) there is no way to do that.[2] "
Can't you do this with a GADT? With a GADT, the type of the output can be specified by which branch of the GADT the input matches. Seems quite a similar idea.
GADTs let you do similar things in the simplest cases, they just don't work as well for complex constraints.
Like, append: Vector n t -> Vector m t -> Vector (n+m) t. With GADTs you need to manually lift (+) to work on the types, and then provide proofs that it's compatible with the value-level.
Then you often need singletons for the cases that are more complex- if I have a set of integers, and an int x, and I want to constrain that the item is in the set and is also even, the GADT approach looks like:
doThing :: Singleton Int x -> Singleton (Set Int) s -> IsInSet x s -> IsEven x -> SomeOutput
Which looks reasonable, until you remember that the Singleton type has to have a representation which is convenient to prove things about, not a representation that is convenient/efficient to operate on, which means your proofs will be annoying or your function itself will hardly resemble a "naive" untyped implementation.
Dependent types fix this because your values are "just" values, so it would be something like
doThing :: (x : int) -> (s : set int) -> so (is_in s x) -> so (even x) -> SomeOutput
Here, x and s are just plain values, not a weird wrapper, and is_in and event are ordinary (and arbitrary) library functions.
I mean, you can trivially do it with a (non-G) ADT, if you're content with wrapping the return value.
OTOH you can do this in TypeScript easily without wrapping; it could return a union type `number | string`.
Isn't that just saying you can do dependent types in TypeScript easily so long as you're willing to give up type checking?
No, because 1) you can't do (full) dependent types in TypeScript, 2) it's not clear what "dependent types" would even mean without type checking, and 3) using the union type you aren't giving up type checking at all.
The limitation is what comes later in the article - the dependent type can express the relationship between the argument and the returned type. But that's "out of scope" of the question of "typing a function that can return either a number or a string depending on the boolean argument". It only becomes a problem later on because to use the result you need to test the type (using `typeof`) even if you know you which boolean you passed in; that information doesn't end up tracked by the type system - although there actually are a couple of different ways to do this that satisfy this example in TypeScript, as suggested by some other comments.
Just using a union type in TypeScript isn't dissimilar to my suggestion of wrapping the result in an ADT; it's just that TypeScript makes this lightweight.
Actually, it's even simpler: you should just be able to use signature overloading:
The article's assertion that TypeScript can't represent this is quite false.This made me realize TypeScript’s conditional types are closer to dependent types than I thought.
Yes. You can give pickType a type in TS. Something like pickType<B extends boolean>(b: B): B extends true ? string : number.
I’d love to read a post explaining how TS conditional types are or are not a form of dependent types. Or, I’d like to understand dependent types well enough to write that post.
Well written! I learned stuff. Dependent types was new on me. Cheers.
Off the topic of dependent types, but this is way too hilarious to pass up..
> Imagine if pickType did a network request to a server and then returned whatever type the server said. Sounds like a bad idea!
Fucking .. isn't this what LSPs are?! Finally something functional programming people and I can agree on. That does sound like a bad idea.
I think it's a bit apples and oranges. I was suggesting that compilation itself should probably not involve nondeterministic network requests. If I understand LSP correctly, it only uses network requests to allow your editor to communicate with a binary that provides type information. But the compiler behavior is unchanged. Honestly LSPs seem pretty reasonable to me.
What's the fundamental difference between a syntax highlighter and the frontend of a compiler? I would argue not much, apart from the fact that people are willing to have their syntax highlighter randomly explode, but are much more concerned about their compiler randomly exploding.
The fundamental difference is that a compiler's actions determine the semantics of the program; a syntax highlighter has no effect on it. And I don't think the concern here is about things blowing up. (Not that things blowing up isn't concerning, but it just doesn't have anything to do with this discussion.)
The fundamental difference is that in his case any conforming compiler has to do the network request. Whereas with LSP it's just an implementation detail of the editor.
The fundamental difference is that a compiler determines the semantics of the program being compiled, and having that be non-deterministic is disastrous ... whereas a syntax highlighter has no effect on the semantics of the program that is generated from the code being highlighted. Things blowing up is a complete red herring.
My point is "why do we tolerate network requests in LSPs, but not compiler frontends"?
If you think about it, the LSP is a way of doing libraries, which is dramatically worse than just inventing a binary interface and doing actual dynamic libraries.
If you follow the LSP model to it's natural conclusion, everything becomes this horrible, disconnected morass of servers talking to each other on your local machine..
It's not network requests that are a problem. It's nondeterministic network requests. Build systems do make network requests, but assuming the network is up and the build servers are running, the result of a compile should always be the same. If the server isn't reachable then the build fails.
Also, while build systems can do stuff over the network, the program isn't allowed to run arbitrary network code at compile time. At most it can import a library at a particular URL. The build system will need to download the library the first time, but it can also cache libraries so no download is needed.
> It's not network requests that are a problem. It's nondeterministic network requests.
Ohhhhh, right, my bad. I forgot that you could just make network requests deterministic with the __do_it_determinstically flag. Of courseeee, how could I be so silly.
..
Think about it; you just illustrated my point exactly. If the network request is assumed to always complete, with the desired payload,, and the client/server exist on the same machine, then why the fuck is there a network request?? An LSP server is just a library, but instead of being loaded into the process address space like a normal library, there's a separate server for some unknown reason.
Anyways, /rant
A local network connection is the lowest-common-denominator way to do foreign function calls. This allows the language server to be written in any language you like without having to deal with the complexity of linking code written in different languages together. It just starts another process and connects to it.
For example, VS Code is written in TypeScript and the Go language server is written in Go.
Do the client and server necessarily exist on the same machine? I thought vscode had an architecture where the frontend and backend could be on separate machines, although I don't know if the LSP crosses that gap.
Also, I'm sure there's a way to do it, but if someone wanted to write an LSP in Haskell and the editor was written in JavaScript, I'm not sure it would be as straightforward to load it as a dynamic library as you suggest it should be.
Theoretically, the client/server don't have to be on the same machine, but in practice they mostly are.
As for the interface.. any language that can't speak the C ABI isn't a real language (read: they all can).. but this is besides the point.
Counterpoint; when's the last time you used an editor written in JavaScript and an LSP written in Haskell? I bet never.
If you install the VSCode plugin for Haskell, I’m pretty sure that’s what you get?
I think VSCode is written in TypeScript, but yes, I found counterpoint very strange too!
A compiler is allowed to halt on a syntax error. Syntax highlighters ideally should not give up on colouring the rest of the document due to mismatched parentheses (or braces or any other delimiter pair).
Many syntax highlighters in fact do so, but this simply isn't what the point was about network indeterminacy--compilation determines the semantics of the program being compiled, syntax highlighters do not.
Fair point
No, LSPs return the name/metadata of a concrete type. Dependent typing means that the return type of any given function in your (static) program can depend on a runtime value, e.g. user input... In well-defined ways ofc.
So, you're saying it's outside the scope of an LSP to return information about a dependent type because it's .. not a concrete type? That sounds wrong.
I can make literally any language support dependent types that have struct, enum, switch, and assert. You make a boxed type (tagged union, algebraic datatype, variant, discriminated union, fucking, whatever), and just assert whenever you pack/unpack it on function entry/exit. I do this all the time.
In plain English, my quip boils down to 'why do we tolerate network requests in our syntax highlighters, when we don't tolerate them in our compiler frontends?'
Because in this world my vim editor gets to benefit from the money and time microsoft spent building a typescript syntax highlighter.
If I waited for one that spoke the c abi so someone could load it via dll instead of http, well, I'd still be waiting.