Hacker News new | past | comments | ask | show | jobs | submit login
Beyond 10,000 Lines: Lessons Learned from a Large Phoenix Project (infinite.red)
239 points by mikecarlton on Nov 9, 2016 | hide | past | favorite | 84 comments



Also I found Dialyzer (success typing checker) to be helpful in large projects:

http://learnyousomeerlang.com/dialyzer

The way success typing works it is a bit like static typing but when it cannot deduce the types it assumes success. However when it finds a discrepancy it is always right. So the more type annotations you add, and the more precise they are the more benefits you get from it.

It has also been there for many years. I think Python only very recently has started getting the same kind of things via MyPy. Of course, they had to call it something differently (Optional Static Typing).

Elixir has a wrapper around it, it seems: https://github.com/jeremyjh/dialyxir but never used it (I user Erlang mostly).


Totaly agree. Dialyzer/Dialyxir can provide a high degree of confidence about types.

The wonderful thing about it is that you don't have to get all type annotations since the begining, you can add them over the time. This allows to use "unspecified" types for quick prototyping, and make them more specific when the project evolves, which is a major complain of traditional type systems such as Java's.

Another tool that I find useful is Credo[1] which "is a static code analysis tool for the Elixir language with a focus on teaching and code consistency".

The Ecto[2] project uses a tool called Ebert[3] that automatically runs Credo for each pull-request and comments with the issues found. Here you can see an example of Ebert's bot commenting on a PR[4]

[1] https://github.com/rrrene/credo [2] https://github.com/elixir-ecto/ecto [3] https://ebertapp.io/ [4] https://github.com/elixir-ecto/ecto/pull/1785


I feel like the bit about OTP misses the point of OTP.

Erlang is not just about distributed computing (in fact, it never was; any affinity for distributed computing was more of a side-effect of Erlang's design). Rather, it's about fault-tolerance. Supervision trees and "let it crash" are the cornerstone of Erlang programming, and therefore by extension the cornerstone of Elixir programming.

Meanwhile, OTP applications build on this in a way that permits composability. It's kind of like microservices behind the scenes, bit they feel like a monolith; you build up your system from lots of different OTP applications that work together to provide a unified whole.

Elixir and Erlang web frameworks (Phoenix, Sugar, Chicago Boss (IIRC), etc.) already do a lot of this for you by kicking off various OTP dependencies; for example, your average Phoenix or Sugar application will in turn start Plug, Ecto, and various other OTP apps, and these will in turn spin up their own dependencies (like Cowboy and Postgrex, respectively).

Basically, it's not quite right to equate OTP to just distributed computing. OTP is at the heart and soul of the vast majority of software written for BEAM.


Could anyone more knowledgable explain whether or not my feeling is correct that Elixir becoming (relatively) popular is making Erlang a less viable choice? Whereas the total number of libraries may be increasing, code that might have been written in Erlang is now written in Elixir and established Erlang codebases (RabbitMQ for one) have started to migrate (parts) to Elixir. Calling Erlang from Elixir is easy, but what about the reverse? It reminds of me of how, at least in the early days, Play Framework (largely Scala) could be used from Java but not without much friction. Should I prefer Erlang as a language for its maturity and fewer constructs, it would be a pity to then miss out on new libraries or frameworks, such as Phoenix, that revolve around Elixir.


The impedance mismatch between Erlang/elixir is much smaller than that between Java/scala. All data types are the same, functional idioms and conventions are the same, exception handling and propagation is the same. None of that is true of Java/scala.



I did expect it to be theoretically possible, but how realistic/sensible is it really to do an Erlang project with Elixir libraries or frameworks? Elixir brings a lot of its own tools, conventions and so on. Will it soon take over my project, making it a better option to just use Elixir for everything, or would it be very manageable and not interfere much with the rest?


We use both Erlang and Elixir where I work. We successfully use Erlang libs in our Elixir projects pretty easily. We recently started trying to do the reverse and use an Elixir app inside an existing Erlang app. It has been significantly more work. The main issues have been around dependency management.

I wasn't personally working on this, so I apologize for being fuzzy on the details, but I understand that getting rebar3 to fetch all the deps that would normally be managed by hex was not possible or at least non-trivial. There was talk of having to manually install each dep that you knew the Elixir lib would be requiring.

I think it was sorted into something workable, but if anyone has better understanding I would love to be pointed to some resources!


Had a quick chat this morning with the people that worked on it. The package rebar3_elixir_compile[0] makes this pretty easy. However, our target Elixir lib is not a public hex package, which requires using git submodules.

Not a seamless setup, but it does work. As the sibling comment suggests, perhaps using mix for everything would help.

[0] https://github.com/barrel-db/rebar3_elixir_compile


Hex is going to be the main repo for erlang to and rebar3 should be good to use it. You could even just use mix to build an erlang project.


If anyone from Erlang Solutions is around they prob would be most qualified to answer.


Claudio from Erlang Solutions here.

I'll ask around in the company for more information.

I've worked briefly on wrapping an existing erlang app in a Elixir (Nerves) container and I did have a few issues around dependency management which required small changes here and there, but nothing major.

One area where I usually spend more time than I'd like is to get type specifications in good shape so that dialyzer doesn't report too many warnings, but that gets better with every Erlang/Elixir release.


Thank you for the info you guys rock !


Anything more you can tell by now? :)


> Should I prefer Erlang as a language for its maturity and fewer constructs, it would be a pity to then miss out on new libraries or frameworks, such as Phoenix, that revolve around Elixir.

If you are just starting out and have to make a choice between elixir and erlang, go with elixir. I can't see any reason to start with erlang instead of elixir. You can use all of the erlang codebase in elixir. As a matter of fact most elixir projects call into erlang as erlang has a huge standard library. Also, elixir has nice macros which makes your code a lot DRYer.


> 5. Avoid Non-RESTful Routes

Every "large" project gets to a point where some routes aren't totally RESTful. It happens. But it's not great advice to say that non-RESTful routes are always a code smell.


What a total RESTful route to begin with ? I thought REST had nothing to do with "routes", url or resources but how the state of the application is communicated in an API ? It seems to me the Roy wrote something with a certain perspective, someone came up, read the paper and decided to apply his writings to something totally unrelated because he didn't like SOAP ... I personally stopped using that word so I don't have to confront and debate with "REST purists" and just went back to talking about web service, because that's what it is.


As commonly applied it means don't reinvent an RPC+identity+caching+encryption+compression mechanism on top of HTTP because HTTP/S itself is already an RPC mechanism with optional encryption + compression + caching support and URIs provide identity.

HTTP verbs are the operations you wish to perform. GET on a "/" resource is effectively a listing, GET on "/identifier" retrieves a specific item. PUT updates, POST creates new, DELETE is obvious. SOAP was a bad idea because it stuffed an extra RPC layer on top of HTTP's existing layer, requiring you to parse the body to find the content. Thus almost anything interacting with SOAP had to understand both HTTP and SOAP, plain HTTP tools were useless.

HTTP provides out-of-band signaling and extensions with headers. It also has a built-in mechanism to negotiate wire and content formats with Accept/Content-type. All of this means it automatically supports graceful degradation and backwards-compatibility. HTTP is stateless so unless you go out of your way to break that property it scales really well.

URIs identify the resource you want to perform the operation on. Items that are children are located "under" their parents: "/parent/42/childtype/child_id". Again - why introduce some extra system for describing these relationships and identifying resources when URIs already do a fine job of that?

If you want to take it even further you can use URIs in your data types. For example if a child needs to indicate its parent you can provide the parent URI (just the path/query/fragment portion) rather than a parent identifier. Why should the client care how you generate identifiers? It also means you can change them in the future (e.g. Int to String). Hide implementation details of the server from clients when possible.


well articulated and concise explanation. Thank you.


The idea is to limit exposed routes to the standard CRUD routes: create (/new), read (/show), update (/edit), delete (/destroy).


Ugh, no. The idea is to use standard HTTP verbs (GET, POST, PUT, DELETE) with noun routes, like:

    Create: POST /resources (or PUT /resources/42)  
    Read: GET /resources/42  
    Update: PUT /resources/42 (or PATCH /resources/42)  
    Delete: DELETE /resources/42
    List/Search: GET /resources


I thought the idea was to make an API behave like a webpage by using hyperlinks that represents the state of the application, so in theory a "smart" client could navigate an API autonomously like a human interacts with a web page ? Because otherwise it's no different from RPC.


I think you are thinking of HATEOAS. Read about it here: https://en.wikipedia.org/wiki/HATEOAS?wprov=sfla1


Agreed. I'm working on an API where most of the endpoints are RESTful, but there's a handful where it makes total sense to break the rules in order to make everyone's lives easier.


I agree, as REST isn't a silver bullet. I do, however, also agree with TFA that you should investigate it as a "code smell", but that should be done before you actually implement it. "Do I need to actually break from the REST pattern, or is this evidence of bad modeling?"


Avoid verbose REST all together unless your core buisness is deliverying REST API for a non-evolving purpose.

Use Cmd/Qrys from CQRS, route to a single endpoint in a restful fashion, and spend more time doing instead of writing boilerplate.


Anyone has a small-ish gig where I could learn Phoenix/Elixir? I am a senior level programmer but all my experience is with PHP/Drupal. I would very much like to work on a smaller side (15-20 hrs a week) gig learning Phoenix/Elixir. A lot of experience is going to transfer, I hope (problem solving is not really language dependent) and so I think I offer a pretty good deal: you could get a very experienced coder for a language learner's rates :)


Not mine, but here's the CMS behind thechangelog.com:

https://github.com/thechangelog/changelog.com

Also take a look at the code powering hex.pm:

https://github.com/hexpm/hex_web


I've been hoping to pick it up on weekends & time off but startup hours don't leave much free time. I worked on a few simple heroku / json phoenix apps last year. Hoping I can put together a useful but small web app next.


I can recommend to both of you to get the book "Elixir in Action". It's fantastic! Starts very simple and step by step explores the must-need concepts of Elixir.

I started phoenix before fully understanding Elixir and had a hard time. Then I got that book and when I was halfway through I loved Elixir already.

Now I use phoenix for almost everything web related.


Programming Phoenix is also excellent. I think it helps that it was written by the core contributors to Phoenix itself (including Jose Valim, who created Elixir). I really appreciated not only looking at how things work, but also why they work in the way they do. The authors clearly have a real sense of excitement about Phoenix and its contagious.


The RedFour instructional is a solid intro to Elixir as well.


In section titled "3. Write Fewer, Valuable Tests", I don't understand the terminology ("controllers and plugs"), which I assume is some stuff in the Phoenix framework that I have no idea about. Can somebody possibly explain to me what are the "controllers and plugs" here? I'm very interested in learning a more general idea (how to effectively reduce lines of tests) from this nugget, that I could hopefully translate to different languages & frameworks I use...

Quoting the relevant fragment in full:

"We focused our automated tests on our controller actions and plugs rather than going for 100% test coverage. Since these are the main ways that the Phoenix application interfaces with the outside world, they’re the critical points of failure.

Controller tests also exercise a lot of the code paths in your application, making it less necessary to unit test every single module. As a result, you end up with fewer tests, which makes it easier to do refactoring, provided your changes preserve the behavior of the controllers and plugs."


Controllers are the normal "C" inside MVC. That stuff that talks to your Models and passes it to the view / template layer.

Plug [0] on the other hand is something like "rack" in the ruby world, a "A specification for composable modules between web applications". Phoenix is built on plug.

I think very simple you could say that plugs are pipes or similar to middlewares - functions that take the connection, transform it or do some checks based on something, then pass it to the next plug. Like a pipe full of functions.

Because phoenix is built on it, it is very easy to specify custom plugs and add them to your request / transform pipeline.

Someone correct me if I explained that wrong.

[0]: https://github.com/elixir-lang/plug


So, it's slang for plug-in(s)? (services, extensions, ...)


they did black box unit test instead of proper unit test.

e.g. if a method is private, it's not tested.

that's the usual way when full reliability is not worth the dev time, or when devs think functional tests only are enough but still want a tap on the back for having unit tests and coverage numbers.

in their case it's the later as you can see for: "Controller tests also exercise a lot of the code paths in your application"

classical functional tests being called unit tests excuse. not that functional is better or worse, but correct names are better no matter what.


The thing about unit testing is that you have to decide what your "unit" is. It could be down to the function/method level, but that's not always appropriate. Such a strict testing framework can leave you immobilized when you want or need to refactor.

Their HTTP API is their interface for the application. It makes a great deal of sense to focus on that level for their testing (note: they didn't say they didn't test internals, they said they focused on the external interfaces).


They don't call them unit tests though do they?


> Lessons Learned from a Large Phoenix Project

More like "large" projects.


10,000 lines is supposed to be 'big'? wtf?

I write something like 25kLoC/year (of shipping code, generally very complex stuff) and I don't even program full-time. The two projects I am working on now are 35kloc (the smaller one) and 250kloc (the medium-sized one).

If someone thinks 10kloc is big, I have a hard time thinking of that person as a professional programmer.

(Numbers listed here exclude blank lines and comments.)


What does it matter how many you write? How many do you delete?

I have a project that is 5k lines of code and roughly 11k of tests and specs(cucumber). It is a rewrite of a project that was 50k lines of code with 1.2k lines of tests and had less functionality and features than what it does now.

Lines of code are meaningless when it comes to how much value they provide. I personally prefer when a codebase is smaller because it means some thought was put into it and most likely has less bugs as a result.


I believe jblows games (braid, the witness) have less lines of code than one would typically expect. I think you should assume hes not writing code like a government contractor.


upvote for government contractor, I once saw 5k lines of code for simple change of address in the front end alone !!!


I said shipping code, by which I specifically mean that which is left after all deletions (of which there are many).


Some of the gain in concision is from the language it seems but I think this case is very much from the view of leveraging frameworks and libraries which themselves are quite a bit bigger than the app. I cloned a few of the repositories and excluded tooling that was used for documentation, testing, and so on. It's non-exhaustive since I only wanted to get a ballpark figure but we have:

    1.4 million lines of Erlang

    309 thousand lines of C

    120 thousand lines of Elixir
... And lots of other code and things I've excluded. These are lines of code, not counting blank or comment-lines.

Now we could argue that we could unwrap this count at any layer? Why stop at Erlang/OTP? Why not include the C runtime and the Kernel? The real effect here is that a lot of this code implement things that are directly used by Phoenix apps, so I think it's fair to say, it's concision comes from being very good at leveraging other code.

(I don't write Elixir code but I've seen some good results from teams who have. None of it is magic though.)


I don't really think one can judge the work of someone else by the line of codes he writes a year. I thought IBM management failures demonstrated that kind of metric is meaningless. Furthermore, 10,000 lines of elixir/Ruby is not like 10,000 of say Java that don't do much except wiring dependencies.


INDEED. 10K lines RoR = 100K lines JEE. More or less.

(OK, I made those numbers up, but reading Java code is bloody tiring due to the silly walks and typical paste-o-graphic work styles)


One interesting thing I've found with my limited experience with Elixir is that the code is very compact. What would often take 10+ LoC in other languages can be succinctly expressed as 1-2 lines.

But I do in general agree with your take on it :)


> would often take 10+ LoC in other languages can be succinctly expressed as 1-2 lines.

Every time I have heard this kind of claim (with modern languages), it turned out not to be true except for trivial code or straw-man bad code in the 'bigger' language. So if you have real-world examples that have real-world effort put in, I'd like to see them! (I would be happy to be wrong.)

5x-10x productivity increase would be huge if it actually existed; it would be so unstoppable that everyone would switch to the new really-great language immediately. That hasn't happened, which should be a clue that maybe the increase is not there.

Even a 20% decrease in cost of engineering would be so large as to be unignorable.


Concurrent, preemptive socket servers with robust failure handling are "trivial" in Erlang. I'd say 10x productivity would be an understatement compared to other languages. Writing FPU code? 0.1x productivity would be generous. It depends on the problem you're solving.

Obviously we're throwing around random values like 5x and 10x "productivity" but it's more nuanced than that. There's more than LoC that can be measured as "productivity": how about bug count and severity per LoC written, refactoring cost, performance, robustness, library support, setup time, etc. And many more metrics.

Metrics are valued differently depending on the programmer and the problem. Eg, who cares if my CRUD web app has memory leaks and crashes randomly, it's stateless! There's not one really great language because every programmer their own productivity priorities.

Often, though, LoC is used as a poor proxy for this multi-dimensional "productivity" value.


> 5x-10x productivity increase would be huge if it actually existed

When writing software, how fast you can type the code is rarely the limiting factor for speed of development – the architecting and consideration of interplay between components takes the bulk of the time. The grandparent claimed code reduction (which has intrinsic maintainability benefits) but made no statements about general cost of engineering.


> the architecting and consideration of interplay between components takes the bulk of the time.

Which is supposed to be what is simplified as LOC goes down.

So if a supposed 5x-10x code reduction (which I've never seen real evidence of) doesn't lead to 5x-10x productivity increase, how much increase is there supposed to be? Surely more than zero?


> Which is supposed to be what is simplified as LOC goes down.

I don't think so. If you can express the same concepts with the same interfaces and functionality in 1kloc vs 10kloc, most of your time has probably still gone into figuring out the interfaces and connections.

> So if a supposed 5x-10x code reduction (which I've never seen real evidence of) doesn't lead to 5x-10x productivity increase, how much increase is there supposed to be? Surely more than zero?

Oh, certainly more than zero! Sometimes much more. But there's simply not a one-size-fits-all formula for the relationship between lines of code written and productivity.

Anyways, not really sure what you're getting at. Your original comment was that 10kloc isn't "big"; the rebuttal is that lines of code is a naive way of looking at system complexity, which is presumably what you mean by "big".


You find having to read 1kloc vs 10kloc the same?


I don't believe that's what was said. But in any case, the answer is generally no, depending on your definition of "read".


20% is a low enough bar that you can cite a well-known example: Objective C vs. Swift removes entire sections like headers and adds sorely needed typechecks. Our internal results are less code and fewer defects per line of code. However, a large number of people haven't switched. I don't blame them either; the transition (split code base) is painful and Apple makes things worse with language churn.

For your 5x-10x case, what is true is that the gain is genuinely possible, but it is just as likely to come from libraries than it is language constructs. Because of that it's often 5x-10x in a limited area.


In Elixir you never have to write an iteration, no need to dance the "for-loop-with-counter" dance.

Pattern matching makes it easy to bind variables and validate their values in one line (so no needs for if statement).

In Elixir, you usually don't catch exceptions, you let it crash. So all the code to handle failures/errors doesn't exist, it is handled by OTP.

No need to write any communication layer since it's built-in OTP that implements location transparency.

These are kind of the low hanging fruits on top of my head. I'll point out that fewer lines of code doesn't automatically translate to increased productivity. Also, transitioning to a brand new stack is not always justifiable/possible even with the promise of significant increased in productivity, it's really not that simple.


It's more like there are a lot of things you get for free with OTP/BEAM that you can not get at all or would take a lot of effort in other environments. So in those instances it can be easily even 20 to 1 or more those are also far from trivial. From hot code reload to calling code on any cluster node to process supervision and on and on


Checked exceptions

Lack of functional composition

Mandatory types (for every trivial parameter object or "lambda")

Excessive Object-Whatever-Mapping

Coping with mutable state in large amounts of effectively global data.

Excessive partitioning due to the size impacts of above hardships

Then we can start to talk about the lengths of the lines...


FYI: criticism aimed at a certain "Enterprise" language / libraries / culture which is pretty much mandatory outside of the Bay Area.


Note that a decrease in LOC does not imply a corresponding increase in productivity. Our Scala code is easily smaller than its equivalent Java representation, but:

1. Thinking about the problem and modeling it properly is still hard, and language independent

2. The language still has its own quirks/failings that you have to work around

PS. My comment has nothing to do with Scala per se - I'm using it as an example.


Even if you multiply it by 3 it's a small codebase. And Phoenix does bunch of code generation from what I've seen.


10,000 lines is actually a lot for Elixir. Its syntax and built-in standard library makes the code very compact.


two things:

1) elixir is relatively young so there aren't many projects that have had a chance to grow to be huge. so yea, take everything you read about it with that grain of salt. people haven't grown to hate it yet...maybe everyone will always love it, but I wouldn't bet on it.

2) when used in certain contexts, elixir applications may end up being quite a bit smaller than comparable java/scala/python/whatever apps. A lot of what we end up writing with microservices is just basic RPC and fault tolerance boilerplate stuff, which OTP takes care of out of the box. Example, pinterest re-writing a 10kloc java app in 1kloc of elixir, running it on fewer servers: https://engineering.pinterest.com/blog/introducing-new-open-...


Pro-tip: never look at number of LOCs to measure the size or complexity of a project, unless you are comparing two systems written in the same language. Even then, comparisons might be tricky due to factors such as libraries used.


This is why I'm wary of posts about Elixir. I love the language a lot but the blog posts and articles on it are mostly hype by people without much experience.


I'd say 10k LoC is where design and architecture start to become vital, you can't just hack away without breaking stuff.

I personally think copy & paste is an anti-pattern (instead, use features such as generics), but many think it is a godsend. Using code reuse rather than copying code can make a massive difference in size and maintainability, less is definitely more!


Dealing with that at work now. Dev added to project won't use the existing structure, just recreates and pastes stuff in over and over :-(

Part of that is a political problem, of course, but was not an issue earlier.


I spent the best part of yesterday's afternoon writing a complex SQL query. I could have written a few straightforward queries instead and still be here writing, debugging and testing the code to merge their results. I have this single LOC now instead of 100+. That line is worth 100.


I regularly delete or replace several thousands of lines of code without changing the functionality of application, put in by programmers like you, who are obsessed with the amount of LOC they produce. I am not sure why anybody thinks that a lengthy solution is great or we should judge programmers by the LOC they produce. I strongly believe that the better the programmer is the shorter the solution will be produced by her/him.


It's like judging the quality of a book by number of pages.


Run a code duplication detector in your code. I am curious to see if you actually need 250,000 SLOC, or if it's actually just copying and pasting around.


It's relatively large for a Phoenix web app, given that there aren't that many out there yet.


Trivial, but it warms my heart to see short-ish identifiers with gaps in the names (underlines, but other languages use dashes, dots, etc), rather than reallyLongStudlyCaps names. I love that Elixir uses the Ruby convention of short names with underscores in them.

C++ is the tragedy that keeps smiting the industry with its children. (e.g. - Java, and its silly names)

"But we need long names, and need to compress them since they contain so many words". No, you need to better partition things so there is enough context around what the name is attached to, so that the name can be simple.


Started meddling with Phoenix recently so this is an interesting data point. I have to say that the very basic architecture diagram from the PragProg book basically sold me on using it for my next side project(s).

connection |> endpoint |> router |> pipeline |> controller

Not sure what it is but it must be a combination of "|> seems pretty awesome" and "well yeah it obviously makes sense to get a request as a struct and chain functions in this way". It just clicked and aligned so perfectly with my mental model :)


I was expecting the IT and devops book, The Phoenix Project: http://itrevolution.com/books/phoenix-project-devops-book/


Typo in the title : it's Phoenix, not Phoeniz.


Not sure if the words "admin", "admins" or "mods" trigger anything? Mentioning just in case.


try also dang


No, those don't have any magic, but if something like this persists you can always email us at hn@ycombinator.com.


Ah, okay then. I was curious, thanks for the info.

Idea: make users with high karma capable of using benign, unassuming and non-common parts of speech to ping the admins?

...I think I've been thinking about security too much


Just click the post timestamp and click flag first.


...I am so dense. That makes perfect sense, thanks.


> I am so dense.

Maybe just busy with real work?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: