Hacker News new | past | comments | ask | show | jobs | submit login
Python to OCaml: Retrospective (roscidus.com)
196 points by antouank on Jan 26, 2016 | hide | past | favorite | 55 comments



This seems near-proof to me that a gradual inflight translation of a codebase from ANY one language, to another, entirely different one, is feasible, as long as you build a data interop layer of some sort (JSON, etc.). I imagine that if it were powering a web app, dual-deployment would become an additional (but potentially manageable) concern during the "transition" period.

This seems a much safer/saner way to do total rewrites/refactorings.

Note that it practically demands decent (if not impeccable) test coverage (he even admits that many parts were not tested... the only saving grace being that due to the semantics of the 2 specific languages here, he was able to use roughly the same logic for the less-tested portions, reducing risk).

Also note that according to the graph, during the middle of this process, application and testing performance will be the most terrible. At that point, some managers would probably decide to back out/bail on it, which is why I thought it was important to note.


> as long as you build a data interop layer of some sort (JSON, etc.). I imagine that if it were powering a web app, dual-deployment would become an additional (but potentially manageable) concern during the "transition" period

Which brings up an interesting idea:

* All codebases are in some kind of inflight transition. Usually not migrating across languages, but often migrating across authors and authoring styles, sometimes migrating across underlying platforms. So, things that make these inflight transitions easier might well be practices one should consider adopting.

* The "data interop layer" might simply be another way of understanding another principle: if your data structures/formats are (a) legible and (b) well-fitted to your problem domain, your program is probably going to be easier to understand and modify... maybe even when it comes to modifications that might seem extreme.

Or to use words attributed to Linus Torvalds: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."


When you think about the data first you can usually synthesize the program. It's almost a game now to try and write a DOOM engine from only the description of the WAD format and its wonderful documentation. I'm under the impression that if you think about your data and how to serialize it first the code will fall out of it.

So I'm starting to think more about formal specifications and documenting serializations formats first and worrying about the code second.

update: punctuation corrections


So, data-driven development, basically? https://en.wikipedia.org/wiki/Data-driven_programming




I think rewrites are seldom worth it, no need to rewrite everything causes it not in your favourite language (Looking at you JS land)


What about OCaml and concurrency ? what are the options out there ?

after reading :

http://roscidus.com/blog/blog/2013/09/28/ocaml-objects/

the language sounds quite interesting. Does it support something like channels ?


The canonical way to do concurrency these days is with a monadic library like Lwt or Async, which is a mechanism akin to promises/deferreds. They have their own implementations of channels/events/streams/what-call-you-it.

There is some interesting work being done on an effects system [1] which would be another option for type-safe concurrency, hopefully it will land soon alongside the multicore work.

There is also the old fashioned way with the stdlib's Event module[2] (in conjunction with Thread), but that's not used as often, mostly because Lwt/Async offer richer, safer abstractions.

[1] http://kcsrk.info/ocaml/multicore/2015/05/20/effects-multico...

[2] http://caml.inria.fr/pub/docs/manual-ocaml/libref/Event.html


OT: Yaron Minsky has a rather interesting view on concurrency in OCaml. Do checkout his latest interview on SE Daily - http://softwareengineeringdaily.com/2015/11/09/automated-tra...


LWT (a popular IO library) has channels: https://ocsigen.org/lwt/dev/api/Lwt_io

That's what TFAA (and MirageOS) use.


LWT's channels look more like filehandles. I think the GP was asking about channels in the CSP sense. It looks like LWT has that too in the form of streams.


Forgive my ignorance, but what is TFAA? All my web searches yield irrelevant results, such as "Triveni Faridabad Allottees Association" or "Trifluoroacetic anhydride".


The Fine Article Above?


The F(ine|ucking) Article's Author.


"The fucking article's author"


I regularly use another OCaml program: Unison, the file syncer. It's sort of like a bidirectional rsync. It's a fantastic tool but nearly abandonware. My suspicion has always been the open source project languishes because so few people know OCaml to work on it.


In order to not become abandoned, open source software needs to be not only open source, but also to have easily accessible source control, issue tracker and other community-required tools.

In case of Unison – it's a program done by a brilliant Computer Scientist and author of one of the best books I've read (well, reading) "Types and Programming Languages" Benjamin Pierce. I know he had put quite some research effort into that tool, but I don't see community-related infrastructure around it being in place, that's why it looks (or maybe is) rather abandoned.

Btw, here's another fun video of B. Pierce making some intriguing statements regarding a popular file-sync program called Dropbox https://www.youtube.com/watch?v=Y2jQe8DFzUM


If it's a fantastic tool, does it matter if it's abandonware? Is nobody working on it because it does what it's supposed to do, is reasonably bug-free, and so it needs no work?


How does it compare to syncthing?


Good to see this back up on here - am just about to start porting a Python codebase to OCaml myself (www.stackhut.com) and have been reading this to help. Am thoroughly looking forward to all the typed, functional goodness :)


You will learn about how python's scoping rules are actually quite bad.


Yep, a source of constant frustration when moving between them!


"Most errors are picked up by the type checker immediately at the start of the build, rather than by the unit-tests at the end. That saves a lot of time."

Better tooling would help so that you'd get the error checking as you type. Are there any good configurations for vim or Emacs, for example?


Merlin for Vim is an absolute delight to use. You get auto-complete, indentation, compile-on-save, type checking right within your editor with less than 5 lines of configuration.

For someone new to the type system, it helps a lot to compulsively keep checking the types of the expressions as you go along building the program. Highly recommended.

PS: If you prefer screenshots - https://twitter.com/prakharsriv9/status/689141428161802241


Also, destructuring for pattern-matching is pretty cool.


It surely is. But how does Merlin help with that?



I used Merlin pretty extensively, yet this example had me sitting there with an open mouth, staring in awe.

The funniest thing is that in my experience Merlin just seems to work. When I think about how many hoops I need to jump through with the Clojure REPL to connect it to CIDER in Emacs, Merlin just works in the background without me having to think about it at all. Very impressive.


Yes, you can have a look at merlin: https://github.com/the-lambda-church/merlin


Excellent writeup. It's a very useful example of a refactor done right. I'd love to know more about the json interface between the Python and OCaml code, since "integration" is usually where refactors like this get hard.


from 2014, discussion from back then here: https://news.ycombinator.com/item?id=7858276


I'd love to see a comparison with recent Rust. Looking at that graph with 2013 results, I'm surprise how much slower Rust was compared Haskell and OCaml.


For fun I started porting the example over: https://gist.github.com/steveklabnik/f86cba4da1dc9c5c68e0

Then I realized that it wasn't totally clear to me what json_list_to_str_vector() should do, exactly, and so I don't even know how it would compare.

When this article was written Rust had a big runtime. It was a very different language. OCaml is pretty fast, but I would still expect Rust to fare much, much better today.


> Then I realized that it wasn't totally clear to me what json_list_to_str_vector() should do, exactly, and so I don't even know how it would compare.

It just converts a JSON-encoded list of strings into a Vec<String> doesn't it? That's what the other languages do:

* get data from some envvar

* decode it from json to an array/list/vector of strings

* concatenate argv[1..]to the result of (2)

* execute this new argslist (the first item being the name of the program)


Ah that makes sense. Oh well.


Thank you for the writeup. I loved the charts and the writing.


Was a great writeup before, now even enhanced. +1.


ats , hmmm never heard of it, seems to have done really well, whats the catch?


If you read the original article[0] it did very well on the speed/size bench (5/5) but took a big hit on ease of writing (1/5) ending up with 48 between C# (47) and Haskell (49) and the following summary note:

> Everything would be incredibly fast, but getting new contributors would be very difficult due to the learning curve. There’s a risk of crashes as the library is not entirely memory safe, and there are likely to be changes ahead to the language. Probably writing the whole thing in ATS would be too much work for anyone.

ATS didn't make it to round 2[1] on grounds of use difficulty and difficulty of separating memory-safe and memory-unsafe code.

That seems prescient as the original ATS1 (ATS/Anairiats) was replaced by ATS2 (ATS/Postiats) a few months later, I don't know how compatible the two are but the FAQ puts ATS1 and ATS2 in different categories[2]

[0] http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-r...

[1] http://roscidus.com/blog/blog/2013/06/20/replacing-python-ro...

[2] https://github.com/githwxi/ATS-Postiats/wiki/ATS-implementat...


Is it just me or is there a mistake here http://i.imgur.com/65GPWq6.png

He says the color of the UI part is orange but i see yellow


Color perception is very subjective, because color learning differs from childhood to childhood.

That specific color is between the yellow and red range, although it is closer to the ideal yellow. Some people learned that this still counts as yellow, while others learned that this is yellow is "red enough" to count as orange.


This xkcd color name survey seem apropos.

http://blog.xkcd.com/2010/05/03/color-survey-results/


Fascinating. I see it as yellow.

This also hints at the "qualia" mystery...


Whilst colour can be subjective, in this case it is clearly yellow and a mistake by the OP, because the preceding bar is clearly orange.


> the preceding bar is clearly orange.

With a hue of 7˚[0] the middle color is in the middle of the red range[1], it's on the pink side of scarlet (8.5˚, 100%, 100%).

The rightmost bar has a hue of 47˚[2] making it an orange-yellow[3], calling it an orange (if a light one) is not insane.

[0] http://www.color-hex.com/color/f12910

[1] http://www.workwithcolor.com/red-color-hue-range-01.htm

[2] http://www.color-hex.com/color/fbcc1a

[3] http://www.workwithcolor.com/orange-yellow-color-hue-range-0...


From your link, compare the red RBG bar to the clearly orange bars at the side of the screen. Additionally the 'Analogous Colors' contains an even more obvious orange, and pink, but no sign of red.


> From your link, compare the red RBG bar to the clearly orange bars at the side of the screen.

I see red all around.

> Additionally the 'Analogous Colors' contains an even more obvious orange, and pink, but no sign of red.

Hue-wise, red is in the middle of orange and pink, if your analogous colors are orange and pink your base is a red. That's exactly what you get for scarlet: http://www.color-hex.com/color/ff2400 or pure straight no-frills red: http://www.color-hex.com/color/ff0000


Arguing about this over the internet is somewhat comical, as you are probably being represented by your computers at least slightly differing colors, and at most entirely different colors, due to things like screen quality, viewing angle (if LCD), brightness/contrast settings, etc. Trying to use specific descriptions are can be replicated by a computer is a good step, but ultimately of little use if you can't be sure that same color is represented the same to who you are talking to.

I mean, obviously, the color is blue. Or is it gold? In any case, it's definitely a dress...


Display technology also varies quite a bit. Some cheaper devices tend to saturate certain hues like redness. It could very well look a bit more orange on one display and look completely yellow on another.


not everything has to be binary. In fact outside of math and deterministic CS, we hardly have the comfort of being clearly wrong or right. https://www.cs.rit.edu/~rlaz/prec20092/slides/BayesianDecisi...


The middle bar looks red to me.


It's clearly red.


Looks yellow to me.


There are oranges which have that color, but I wouldn't buy one.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: