Python to OCaml: Retrospective

pmarreck · on Jan 26, 2016

This seems near-proof to me that a gradual inflight translation of a codebase from ANY one language, to another, entirely different one, is feasible, as long as you build a data interop layer of some sort (JSON, etc.). I imagine that if it were powering a web app, dual-deployment would become an additional (but potentially manageable) concern during the "transition" period.

This seems a much safer/saner way to do total rewrites/refactorings.

Note that it practically demands decent (if not impeccable) test coverage (he even admits that many parts were not tested... the only saving grace being that due to the semantics of the 2 specific languages here, he was able to use roughly the same logic for the less-tested portions, reducing risk).

Also note that according to the graph, during the middle of this process, application and testing performance will be the most terrible. At that point, some managers would probably decide to back out/bail on it, which is why I thought it was important to note.

wwweston · on Jan 26, 2016

> as long as you build a data interop layer of some sort (JSON, etc.). I imagine that if it were powering a web app, dual-deployment would become an additional (but potentially manageable) concern during the "transition" period

Which brings up an interesting idea:

* All codebases are in some kind of inflight transition. Usually not migrating across languages, but often migrating across authors and authoring styles, sometimes migrating across underlying platforms. So, things that make these inflight transitions easier might well be practices one should consider adopting.

* The "data interop layer" might simply be another way of understanding another principle: if your data structures/formats are (a) legible and (b) well-fitted to your problem domain, your program is probably going to be easier to understand and modify... maybe even when it comes to modifications that might seem extreme.

Or to use words attributed to Linus Torvalds: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."

agentultra · on Jan 26, 2016

When you think about the data first you can usually synthesize the program. It's almost a game now to try and write a DOOM engine from only the description of the WAD format and its wonderful documentation. I'm under the impression that if you think about your data and how to serialize it first the code will fall out of it.

So I'm starting to think more about formal specifications and documenting serializations formats first and worrying about the code second.

update: punctuation corrections

pmarreck · on Jan 27, 2016

So, data-driven development, basically? https://en.wikipedia.org/wiki/Data-driven_programming

agentultra · on Jan 28, 2016

Data-driven design[0] and formal methods[1].

update

[0] http://dataorienteddesign.com/site.php

[1] http://research.microsoft.com/pubs/64638/high-level.pdf

pmarreck · on Jan 28, 2016

Relevant (and great) talk: https://www.destroyallsoftware.com/talks/boundaries

e_d_g_a_r · on Jan 26, 2016

I think rewrites are seldom worth it, no need to rewrite everything causes it not in your favourite language (Looking at you JS land)

aikah · on Jan 26, 2016

What about OCaml and concurrency ? what are the options out there ?

after reading :

http://roscidus.com/blog/blog/2013/09/28/ocaml-objects/

the language sounds quite interesting. Does it support something like channels ?

jallmann · on Jan 26, 2016

The canonical way to do concurrency these days is with a monadic library like Lwt or Async, which is a mechanism akin to promises/deferreds. They have their own implementations of channels/events/streams/what-call-you-it.

There is some interesting work being done on an effects system [1] which would be another option for type-safe concurrency, hopefully it will land soon alongside the multicore work.

There is also the old fashioned way with the stdlib's Event module[2] (in conjunction with Thread), but that's not used as often, mostly because Lwt/Async offer richer, safer abstractions.

[1] http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...

[2] http://caml.inria.fr/pub/docs/manual-ocaml/libref/Event.html

krat0sprakhar · on Jan 26, 2016

OT: Yaron Minsky has a rather interesting view on concurrency in OCaml. Do checkout his latest interview on SE Daily - http://softwareengineeringdaily.com/2015/11/09/automated-tra...

masklinn · on Jan 26, 2016

LWT (a popular IO library) has channels: https://ocsigen.org/lwt/dev/api/Lwt_io

That's what TFAA (and MirageOS) use.

bkirkbri · on Jan 26, 2016

LWT's channels look more like filehandles. I think the GP was asking about channels in the CSP sense. It looks like LWT has that too in the form of streams.

vog · on Jan 26, 2016

Forgive my ignorance, but what is TFAA? All my web searches yield irrelevant results, such as "Triveni Faridabad Allottees Association" or "Trifluoroacetic anhydride".

mercurial · on Jan 26, 2016

The Fine Article Above?

masklinn · on Jan 26, 2016

The F(ine|ucking) Article's Author.

fjh · on Jan 26, 2016

"The fucking article's author"

NelsonMinar · on Jan 26, 2016

I regularly use another OCaml program: Unison, the file syncer. It's sort of like a bidirectional rsync. It's a fantastic tool but nearly abandonware. My suspicion has always been the open source project languishes because so few people know OCaml to work on it.

k_bx · on Jan 26, 2016

In order to not become abandoned, open source software needs to be not only open source, but also to have easily accessible source control, issue tracker and other community-required tools.

In case of Unison – it's a program done by a brilliant Computer Scientist and author of one of the best books I've read (well, reading) "Types and Programming Languages" Benjamin Pierce. I know he had put quite some research effort into that tool, but I don't see community-related infrastructure around it being in place, that's why it looks (or maybe is) rather abandoned.

Btw, here's another fun video of B. Pierce making some intriguing statements regarding a popular file-sync program called Dropbox https://www.youtube.com/watch?v=Y2jQe8DFzUM

massysett · on Jan 26, 2016

If it's a fantastic tool, does it matter if it's abandonware? Is nobody working on it because it does what it's supposed to do, is reasonably bug-free, and so it needs no work?

mehrzad · on Jan 26, 2016

How does it compare to syncthing?

mands · on Jan 26, 2016

Good to see this back up on here - am just about to start porting a Python codebase to OCaml myself (www.stackhut.com) and have been reading this to help. Am thoroughly looking forward to all the typed, functional goodness :)

e_d_g_a_r · on Jan 26, 2016

You will learn about how python's scoping rules are actually quite bad.

mands · on Jan 26, 2016

Yep, a source of constant frustration when moving between them!

melling · on Jan 26, 2016

"Most errors are picked up by the type checker immediately at the start of the build, rather than by the unit-tests at the end. That saves a lot of time."

Better tooling would help so that you'd get the error checking as you type. Are there any good configurations for vim or Emacs, for example?

krat0sprakhar · on Jan 26, 2016

Merlin for Vim is an absolute delight to use. You get auto-complete, indentation, compile-on-save, type checking right within your editor with less than 5 lines of configuration.

For someone new to the type system, it helps a lot to compulsively keep checking the types of the expressions as you go along building the program. Highly recommended.

PS: If you prefer screenshots - https://twitter.com/prakharsriv9/status/689141428161802241

mercurial · on Jan 26, 2016

Also, destructuring for pattern-matching is pretty cool.

krat0sprakhar · on Jan 26, 2016

It surely is. But how does Merlin help with that?

mercurial · on Jan 26, 2016

Like that: http://the-lambda-church.github.io/merlin/destruct.ogv

Really nice.

LeonidasXIV · on Jan 27, 2016

I used Merlin pretty extensively, yet this example had me sitting there with an open mouth, staring in awe.

The funniest thing is that in my experience Merlin just seems to work. When I think about how many hoops I need to jump through with the Clojure REPL to connect it to CIDER in Emacs, Merlin just works in the background without me having to think about it at all. Very impressive.

testcross · on Jan 26, 2016

Yes, you can have a look at merlin: https://github.com/the-lambda-church/merlin

elbasti · on Jan 26, 2016

Excellent writeup. It's a very useful example of a refactor done right. I'd love to know more about the json interface between the Python and OCaml code, since "integration" is usually where refactors like this get hard.

detaro · on Jan 26, 2016

from 2014, discussion from back then here: https://news.ycombinator.com/item?id=7858276

dorfsmay · on Jan 26, 2016

I'd love to see a comparison with recent Rust. Looking at that graph with 2013 results, I'm surprise how much slower Rust was compared Haskell and OCaml.

steveklabnik · on Jan 26, 2016

For fun I started porting the example over: https://gist.github.com/steveklabnik/f86cba4da1dc9c5c68e0

Then I realized that it wasn't totally clear to me what json_list_to_str_vector() should do, exactly, and so I don't even know how it would compare.

When this article was written Rust had a big runtime. It was a very different language. OCaml is pretty fast, but I would still expect Rust to fare much, much better today.

masklinn · on Jan 27, 2016

> Then I realized that it wasn't totally clear to me what json_list_to_str_vector() should do, exactly, and so I don't even know how it would compare.

It just converts a JSON-encoded list of strings into a Vec<String> doesn't it? That's what the other languages do:

* get data from some envvar

* decode it from json to an array/list/vector of strings

* concatenate argv[1..]to the result of (2)

* execute this new argslist (the first item being the name of the program)

steveklabnik · on Jan 27, 2016

Ah that makes sense. Oh well.

meric · on Jan 26, 2016

Thank you for the writeup. I loved the charts and the writing.

tempodox · on Jan 26, 2016

Was a great writeup before, now even enhanced. +1.

systems · on Jan 26, 2016

ats , hmmm never heard of it, seems to have done really well, whats the catch?

masklinn · on Jan 26, 2016

If you read the original article[0] it did very well on the speed/size bench (5/5) but took a big hit on ease of writing (1/5) ending up with 48 between C# (47) and Haskell (49) and the following summary note:

> Everything would be incredibly fast, but getting new contributors would be very difficult due to the learning curve. There’s a risk of crashes as the library is not entirely memory safe, and there are likely to be changes ahead to the language. Probably writing the whole thing in ATS would be too much work for anyone.

ATS didn't make it to round 2[1] on grounds of use difficulty and difficulty of separating memory-safe and memory-unsafe code.

That seems prescient as the original ATS1 (ATS/Anairiats) was replaced by ATS2 (ATS/Postiats) a few months later, I don't know how compatible the two are but the FAQ puts ATS1 and ATS2 in different categories[2]

[0] http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-r...

[1] http://roscidus.com/blog/blog/2013/06/20/replacing-python-ro...

[2] https://github.com/githwxi/ATS-Postiats/wiki/ATS-implementat...

aksx · on Jan 26, 2016

Is it just me or is there a mistake here http://i.imgur.com/65GPWq6.png

He says the color of the UI part is orange but i see yellow

vog · on Jan 26, 2016

Color perception is very subjective, because color learning differs from childhood to childhood.

That specific color is between the yellow and red range, although it is closer to the ideal yellow. Some people learned that this still counts as yellow, while others learned that this is yellow is "red enough" to count as orange.

Symmetry · on Jan 26, 2016

This xkcd color name survey seem apropos.

http://blog.xkcd.com/2010/05/03/color-survey-results/

lectrick · on Jan 26, 2016

Fascinating. I see it as yellow.

This also hints at the "qualia" mystery...

hacker_9 · on Jan 26, 2016

Whilst colour can be subjective, in this case it is clearly yellow and a mistake by the OP, because the preceding bar is clearly orange.

masklinn · on Jan 26, 2016

> the preceding bar is clearly orange.

With a hue of 7˚[0] the middle color is in the middle of the red range[1], it's on the pink side of scarlet (8.5˚, 100%, 100%).

The rightmost bar has a hue of 47˚[2] making it an orange-yellow[3], calling it an orange (if a light one) is not insane.

[0] http://www.color-hex.com/color/f12910

[1] http://www.workwithcolor.com/red-color-hue-range-01.htm

[2] http://www.color-hex.com/color/fbcc1a

[3] http://www.workwithcolor.com/orange-yellow-color-hue-range-0...

hacker_9 · on Jan 26, 2016

From your link, compare the red RBG bar to the clearly orange bars at the side of the screen. Additionally the 'Analogous Colors' contains an even more obvious orange, and pink, but no sign of red.

masklinn · on Jan 26, 2016

> From your link, compare the red RBG bar to the clearly orange bars at the side of the screen.

I see red all around.

> Additionally the 'Analogous Colors' contains an even more obvious orange, and pink, but no sign of red.

Hue-wise, red is in the middle of orange and pink, if your analogous colors are orange and pink your base is a red. That's exactly what you get for scarlet: http://www.color-hex.com/color/ff2400 or pure straight no-frills red: http://www.color-hex.com/color/ff0000

kbenson · on Jan 26, 2016

Arguing about this over the internet is somewhat comical, as you are probably being represented by your computers at least slightly differing colors, and at most entirely different colors, due to things like screen quality, viewing angle (if LCD), brightness/contrast settings, etc. Trying to use specific descriptions are can be replicated by a computer is a good step, but ultimately of little use if you can't be sure that same color is represented the same to who you are talking to.

I mean, obviously, the color is blue. Or is it gold? In any case, it's definitely a dress...

strmpnk · on Jan 26, 2016

Display technology also varies quite a bit. Some cheaper devices tend to saturate certain hues like redness. It could very well look a bit more orange on one display and look completely yellow on another.

data_hope · on Jan 26, 2016

not everything has to be binary. In fact outside of math and deterministic CS, we hardly have the comfort of being clearly wrong or right. https://www.cs.rit.edu/~rlaz/prec20092/slides/BayesianDecisi...

joelwilliamson · on Jan 26, 2016

The middle bar looks red to me.

fucking_tragedy · on Jan 26, 2016

It's clearly red.

myst · on Jan 26, 2016

Looks yellow to me.

kazinator · on Jan 26, 2016

There are oranges which have that color, but I wouldn't buy one.