This seems near-proof to me that a gradual inflight translation of a codebase from ANY one language, to another, entirely different one, is feasible, as long as you build a data interop layer of some sort (JSON, etc.). I imagine that if it were powering a web app, dual-deployment would become an additional (but potentially manageable) concern during the "transition" period.
This seems a much safer/saner way to do total rewrites/refactorings.
Note that it practically demands decent (if not impeccable) test coverage (he even admits that many parts were not tested... the only saving grace being that due to the semantics of the 2 specific languages here, he was able to use roughly the same logic for the less-tested portions, reducing risk).
Also note that according to the graph, during the middle of this process, application and testing performance will be the most terrible. At that point, some managers would probably decide to back out/bail on it, which is why I thought it was important to note.
> as long as you build a data interop layer of some sort (JSON, etc.). I imagine that if it were powering a web app, dual-deployment would become an additional (but potentially manageable) concern during the "transition" period
Which brings up an interesting idea:
* All codebases are in some kind of inflight transition. Usually not migrating across languages, but often migrating across authors and authoring styles, sometimes migrating across underlying platforms. So, things that make these inflight transitions easier might well be practices one should consider adopting.
* The "data interop layer" might simply be another way of understanding another principle: if your data structures/formats are (a) legible and (b) well-fitted to your problem domain, your program is probably going to be easier to understand and modify... maybe even when it comes to modifications that might seem extreme.
Or to use words attributed to Linus Torvalds: "Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
When you think about the data first you can usually synthesize the program. It's almost a game now to try and write a DOOM engine from only the description of the WAD format and its wonderful documentation. I'm under the impression that if you think about your data and how to serialize it first the code will fall out of it.
So I'm starting to think more about formal specifications and documenting serializations formats first and worrying about the code second.
The canonical way to do concurrency these days is with a monadic library like Lwt or Async, which is a mechanism akin to promises/deferreds. They have their own implementations of channels/events/streams/what-call-you-it.
There is some interesting work being done on an effects system [1] which would be another option for type-safe concurrency, hopefully it will land soon alongside the multicore work.
There is also the old fashioned way with the stdlib's Event module[2] (in conjunction with Thread), but that's not used as often, mostly because Lwt/Async offer richer, safer abstractions.
LWT's channels look more like filehandles. I think the GP was asking about channels in the CSP sense. It looks like LWT has that too in the form of streams.
Forgive my ignorance, but what is TFAA? All my web searches yield irrelevant results, such as "Triveni Faridabad Allottees Association" or "Trifluoroacetic anhydride".
I regularly use another OCaml program: Unison, the file syncer. It's sort of like a bidirectional rsync. It's a fantastic tool but nearly abandonware. My suspicion has always been the open source project languishes because so few people know OCaml to work on it.
In order to not become abandoned, open source software needs to be not only open source, but also to have easily accessible source control, issue tracker and other community-required tools.
In case of Unison – it's a program done by a brilliant Computer Scientist and author of one of the best books I've read (well, reading) "Types and Programming Languages" Benjamin Pierce. I know he had put quite some research effort into that tool, but I don't see community-related infrastructure around it being in place, that's why it looks (or maybe is) rather abandoned.
Btw, here's another fun video of B. Pierce making some intriguing statements regarding a popular file-sync program called Dropbox https://www.youtube.com/watch?v=Y2jQe8DFzUM
If it's a fantastic tool, does it matter if it's abandonware? Is nobody working on it because it does what it's supposed to do, is reasonably bug-free, and so it needs no work?
Good to see this back up on here - am just about to start porting a Python codebase to OCaml myself (www.stackhut.com) and have been reading this to help. Am thoroughly looking forward to all the typed, functional goodness :)
"Most errors are picked up by the type checker immediately at the start of the build, rather than by the unit-tests at the end. That saves a lot of time."
Better tooling would help so that you'd get the error checking as you type. Are there any good configurations for vim or Emacs, for example?
Merlin for Vim is an absolute delight to use. You get auto-complete, indentation, compile-on-save, type checking right within your editor with less than 5 lines of configuration.
For someone new to the type system, it helps a lot to compulsively keep checking the types of the expressions as you go along building the program. Highly recommended.
I used Merlin pretty extensively, yet this example had me sitting there with an open mouth, staring in awe.
The funniest thing is that in my experience Merlin just seems to work. When I think about how many hoops I need to jump through with the Clojure REPL to connect it to CIDER in Emacs, Merlin just works in the background without me having to think about it at all. Very impressive.
Excellent writeup. It's a very useful example of a refactor done right. I'd love to know more about the json interface between the Python and OCaml code, since "integration" is usually where refactors like this get hard.
I'd love to see a comparison with recent Rust. Looking at that graph with 2013 results, I'm surprise how much slower Rust was compared Haskell and OCaml.
Then I realized that it wasn't totally clear to me what json_list_to_str_vector() should do, exactly, and so I don't even know how it would compare.
When this article was written Rust had a big runtime. It was a very different language. OCaml is pretty fast, but I would still expect Rust to fare much, much better today.
If you read the original article[0] it did very well on the speed/size bench (5/5) but took a big hit on ease of writing (1/5) ending up with 48 between C# (47) and Haskell (49) and the following summary note:
> Everything would be incredibly fast, but getting new contributors would be very difficult due to the learning curve. There’s a risk of crashes as the library is not entirely memory safe, and there are likely to be changes ahead to the language. Probably writing the whole thing in ATS would be too much work for anyone.
ATS didn't make it to round 2[1] on grounds of use difficulty and difficulty of separating memory-safe and memory-unsafe code.
That seems prescient as the original ATS1 (ATS/Anairiats) was replaced by ATS2 (ATS/Postiats) a few months later, I don't know how compatible the two are but the FAQ puts ATS1 and ATS2 in different categories[2]
Color perception is very subjective, because color learning differs from childhood to childhood.
That specific color is between the yellow and red range, although it is closer to the ideal yellow. Some people learned that this still counts as yellow, while others learned that this is yellow is "red enough" to count as orange.
From your link, compare the red RBG bar to the clearly orange bars at the side of the screen. Additionally the 'Analogous Colors' contains an even more obvious orange, and pink, but no sign of red.
Arguing about this over the internet is somewhat comical, as you are probably being represented by your computers at least slightly differing colors, and at most entirely different colors, due to things like screen quality, viewing angle (if LCD), brightness/contrast settings, etc. Trying to use specific descriptions are can be replicated by a computer is a good step, but ultimately of little use if you can't be sure that same color is represented the same to who you are talking to.
I mean, obviously, the color is blue. Or is it gold? In any case, it's definitely a dress...
Display technology also varies quite a bit. Some cheaper devices tend to saturate certain hues like redness. It could very well look a bit more orange on one display and look completely yellow on another.
This seems a much safer/saner way to do total rewrites/refactorings.
Note that it practically demands decent (if not impeccable) test coverage (he even admits that many parts were not tested... the only saving grace being that due to the semantics of the 2 specific languages here, he was able to use roughly the same logic for the less-tested portions, reducing risk).
Also note that according to the graph, during the middle of this process, application and testing performance will be the most terrible. At that point, some managers would probably decide to back out/bail on it, which is why I thought it was important to note.