Techniques for implementing heterogeneous collections in Haskell: http://www.has...

jlouis · on Aug 1, 2014

There is no easy way around this if you really want a collection which is heterogenous, but often, you would just cook up a struct/record-like object for these things.

Trading off easy heterogenous lists buys you a better type checker, more static knowledge, faster programs and so on. It also has the advantage that mapping data into the system can be done safely. The price is that self-describing data (JSON, Transit, Msgpack, ...) is more unwieldy to work with - but a little combinator library, scrap-your-boilerplate-generics, or a reflective code generator goes a long way to solve that problem:

- OCaml: Yojson.Basic.Util - Haskell: Generics, Rank2 polymorphism - Go: JSON unmarshalling

are all examples of how to handle that problem.

kazinator · on Aug 1, 2014

You seem to be implying that the generic lists shown in Python or Lisp are somehow unsafe. They must be since if we throw them away, or wrap them in some clumsy structs/records, we can "buy" safety. Even if that is the case, we cannot sell Haskell as some kind of semantically rich language for doing cutting edge research, when coding up basic things like this is a struggle. If I have to cook up things with structures, I might as well use C.

Here is another simple problem: divide two integers so that the result has integer type when the numbers divide evenly, otherwise rational type.

edwardkmett · on Aug 1, 2014

On the other hand, here's a problem.

Extend your numeric type tower to handle whatever new numeric types I come up with in an internally consistent manner.

In haskell I have number types for things that automatically compute derivatives. I have number types for functions that go to number types so all vector spaces work like numbers as well. I have number types for arbitrary precision floating point numbers, and I can build these things on top of each other.

It is a trade-off. We give up a bit of one thing to get something else.

You can take either side of the deal. I'd argue that the weight of benefit is on the side where we don't have a magic type tower to reason about, but it is a perfectly reasonable stance to say that the thing you want is more important to you.

On the other hand, I can turn around and argue that if what I really want is an ad hoc tower of numeric types I can just make a type for that and work inside it.

data Number = Int Int | Rational Rational | Double Double | Complex (Complex Double) | ...

instance Num Number

Now I can opt into your model. Can you opt into mine?

samth · on Aug 1, 2014

You're totally right that there's a tradeoff here -- in Typed Racket, where we have strong types for the numeric tower plus the flexibility that the parent wants, arithmetic isn't extensible.

However, it's not the case that `data Number = ...` gets you everything. In particular, it gives up on the types! :)

For example, in Typed Racket:

-> (: norm : Real Real -> Real)

-> (define (norm x y) (sqrt (+ (sqr x) (sqr y))))

-> (norm -3 12)

- : Real

12.36931687685298

We've proved that the `norm` function always produces `Real` answers, even though `sqrt` might produce complex numbers given negative inputs. The sum type you've given won't let you prove that.

GregBuchholz · on Aug 1, 2014

>Can you opt into mine?

In Python, not only can you "opt in" to those features (in a completely natural way) but those features already exist in the form of highly popular libraries:

NumPy - http://www.numpy.org/

SymPy - http://www.sympygamma.com/input/?i=diff%28x**2%29

mpmath - http://mpmath.org/

...and the story is similar with Common Lisp, whose generic functions dispatch on function parameter type (just like Haskell allows with type classes), allowing you to create libraries for vector arithmetic, symbolic algebra, and arbitrary precision floating point math, again in a completely natural style.

The best forms of language advocacy probably involve showing the great things you've done with a language and forgetting about comparisons to the competition. Build it, and they will come.

dllthomas · on Aug 1, 2014

"The best forms of language advocacy probably involve showing the great things you've done with a language and forgetting about comparisons to the competition."

Great things the parent has done with the language:

https://www.youtube.com/watch?v=cefnmjtAolY

AnimalMuppet · on Aug 1, 2014

I am by no means an expert, but I think that Haskell handles your second problem without blinking (because integer is a subtype of rational).

For your first problem, C doesn't help in the general case. If I got a JSON package, I don't know at compile time what the types are going to be, so I don't know how to lay out the struct.

If I don't know anything about the types, the only things I can (safely) do are put them in untyped collections or convert them to strings. I can't add them - they might not be numeric. I can't convert them to uppercase - they might not be strings. (I can do all those things if you're going to return a Maybe - and maybe that's the right approach if you got a JSON package over the wire.)

kazinator · on Aug 1, 2014

Pardon me, I should have said "ratio" rather than "rational". (definition: rationals are partitioned into ratios and integers).

dllthomas · on Aug 1, 2014

"divide two integers so that the result has integer type when the numbers divide evenly, otherwise rational type."

    kazinatorDiv x y = case x `divMod` y of
        (q, 0) -> Left q
        _ -> Right (x % y)

the_af · on Aug 1, 2014

I don't see that they're a struggle. They are more verbose than with Python or Lisp, sure, but you gain something in return. On the other hand, you seldom find yourself needing to write those heterogeneous lists in Haskell, so what's the actual problem?

evincarofautumn · on Aug 2, 2014

I am really interested to know why I would want such a thing. In years of programming, I have never needed a heterogeneous collection except in a dynamically typed language where I can’t express more precise types. My values are always related by some set of operations I want to perform on them.

In C++, for example, I might have a “vector<unique_ptr<GameObject>>” containing instances of various subclasses of “GameObject” such as “Player” or “Enemy”. “GameObject” has some virtual member functions such as “update()” and “render()”.

In Haskell I can do exactly the same thing with a “GameObject” type class containing the “update” and “render” methods, and implement “GameObject” for my “Player” and “Enemy” types.

In Python I do a dynamic check on every value in the collection, on every iteration, to see if “update” and “render” are present and callable. It just seems wasteful because you know that information ahead of time—a non–game object should never end up in that collection, so why waste brain power and computing power considering the possibility?

There must be some advantage I’m not seeing here.

dasil003 · on Aug 1, 2014

The difference being that if you expend the effort you can write a heterogenous collection that does is guaranteed to run for any input, whereas with Lisp or Python you must be very very careful about that because one false step and your program will blow up.

kazinator · on Aug 1, 2014

This doesn't jive with the observation that some Lisp programmers stay in the same REPL for weeks while experimenting with all kinds of new or undebugged code.

dasil003 · on Aug 1, 2014

You are utterly missing the point. Of course any code that can be written in a type-safe manner in Haskell can be written in another language to perform the exact some machine operations. The point is that the compiler guarantees that this code will never break as long as the code compiles. In less type-safe languages you simply don't have these guarantees, no matter how careful you are, eventually someone will come along and break an assumption that kept the original code working.

dragonwriter · on Aug 1, 2014

> The point is that the compiler guarantees that this code will never break as long as the code compiles.

A program that compiles in Haskell can still break, consider:

  main = do 
    print $ head []

yawaramin · on Aug 2, 2014

You shouldn't consider this to be a break. Sure, it's crappy API design--lots of people have pointed out that `head` should return a `Maybe` because an empty list doesn't have a head--but triggering an error in a partial function is not a break. Taking the head of an empty list is no more a break than dividing something by zero.

dragonwriter · on Aug 2, 2014

> You shouldn't consider this to be a break.

I can't see any reasonable definition of "break" for this context for which that is valid; its exactly the same kind of breakage that strong type checking reduces the incidence of in Haskell compared to dynamic languages -- runtime errors resulting from a function call not being able to complete without error for all inputs that the language will allow to be passed to it.

> Taking the head of an empty list is no more a break than dividing something by zero.

I agree; both are clearly examples of things that can cause a program to break.

dasil003 · on Aug 2, 2014

Good point, but in general I think it's fair to say that if Java's type system prevents the type of breakages that are possible in ruby 50% of the time, Haskell prevents 99.9% of such breakages.

kazinator · on Aug 1, 2014

"Code will never break if it compiles?"

So just compile and ship; no testing?

dasil003 · on Aug 1, 2014

I'm talking about maintenance, obviously you must make sure the code does the correct thing initially. I have maintained a 100k LOC Rails app for 7 years with near 100% test coverage. And guess what? Stuff breaks. 99% of the stupid breakages I have experienced would have been impossible in Haskell.

What's the longest you've maintained something?

AnimalMuppet · on Aug 1, 2014

And it probably blows up on them somewhat often. That is, the code blows up - the REPL doesn't. Also, the REPL tells them why it blew up.

So I don't think your observation answers the GP's point at all.

kazinator · on Aug 1, 2014

So by "blow up", you don't literally mean "explode", like BSOD, hard lock up, "Segmentation Fault (core dumped)", etc.

There was some computation which produced a wrong result, which was detected, handled and diagnosed.

In what development tool does this never happen?

rvn1045 · on Aug 1, 2014

You should probably never be doing that anyways in Haskell.

kazinator · on Aug 1, 2014

Not to mention in Fortran or Pascal.

gertef · on Aug 1, 2014

Or Python. What are you going to do with [1.0, "abc", True], besides throw a TypeError?

dragonwriter · on Aug 1, 2014

> What are you going to do with [1.0, "abc", True], besides throw a TypeError?

  for x in [1.0, "abc", True]:
      print(x)

Of course, all the legitimate things you can do with it illustrate ways in which the collection isn't really heterogenous, its just as collection of elements which have different types that are all within the same sum type.

The thing is that in dynamic languages, there's one universal sum type, and every function effectively takes arguments and returns results of that type. In Haskell, you have to plan more for the specific sum types you are going to be using, which in certain situations makes some things more complicated. Of course, it also means that things that would be in documentation or approximately verified in tests can be statically verified by the compiler, so there is a trade off.

Chattered · on Aug 1, 2014

But how did you get the collection in the first place? If all you wanted to do was print the elements, you would have made it a collection of strings, and populated it by calling a "toString" function each time you added an element.

kazinator · on Aug 1, 2014

"The thing is that in dynamic languages, there's one universal sum type, and every function effectively takes arguments and returns results of that type."

That is a reductionistic view, like saying that everything is ultimately ones and zeros. The storage place in a dynamic language, such as a variable, or array element or whatever, is not understood to have a type in that language. The machine language (or whatever) implementation kernel of that language understands that to have a type, but that's in a different language. That kernel knows that, say, if the least significant two bits of the storage location are 00, then it's a pointer to a heap object, and that kind of thing. That code must be written safely: like only if those two bits are 00 must it treat the value as a pointer, and otherwise not. This is all implementation detail, not accessible from the high level language (or not without special escape hatches). The type of the dynamic objects is their dynamic type; and when dynamic languages feature compile-time type inference, that inference works with those types, not with the fictitious "universal sum type".

dragonwriter · on Aug 1, 2014

> That is a reductionistic view

Maybe, so what?

> like saying that everything is ultimately ones and zeros.

Well, except that saying everything is ultimately ones and zeros isn't (at least in this context) a useful starting point for explaining anything.

> The storage place in a dynamic language, such as a variable, or array element or whatever, is not understood to have a type in that language.

I'm not talking about storage places, I'm talking about the logical design -- a function (regardless of the features of the language) is designed to operate on some set of values and return a result from some set of values. These sets can ar the types I'm interested in -- whether they are implemented as (static) types in a language, or as tags / dynamic types, or whatever, or even if the language had no concepts of types at all (and, in most real languages, and can be true true even withHaskell, as in the head function on lists).

And to the extent that "heterogenous" collections are useful, they are useful specifically because they are not really heterogenous, even if the shared logical type to which the elements must belong is not coherently defined as a discrete "type" within the type system of the language.

coolsunglasses · on Aug 2, 2014

I'm pretty sure "Rank2 type + Show a =>" suffices for your "print" use-case.

judk · on Aug 2, 2014

For the record, the "technique" linked is a 1-line import statement and a one-word tag to box an item as it added to a collection.

GregBuchholz · on Aug 1, 2014

Haskell:

    (1, "abc", 4.0)

kazinator · on Aug 1, 2014

Are you sure that is a list, and not something else, like a tuple?

dllthomas · on Aug 1, 2014

It's precisely a tuple, obviously. Are you sure you said "list", not "heterogeneous collection"?

If you know what is where in the list, then you semantically have a tuple.

If you don't know what is where in the list, then you need some way of telling things apart. At which point an ADT is frequently the right solution, isn't any sort of advanced technique, and requires very little in the way of boilerplate.