Hacker News new | past | comments | ask | show | jobs | submit login
Ruby `reject!` (accidentallyquadratic.tumblr.com)
306 points by dmit on Feb 20, 2017 | hide | past | favorite | 141 comments



I love this blog, but I'm not sure that I agree with this:

> In my mind, I’d just rather not have a reject! at all, and callers who need to mutate an array in-place can figure out how to do safely with respect to their own use cases.

We ought to have a standard version of the reject! code precisely because it's so difficult to get right. Saying application developers should solve the problem themselves just sounds like some sort of political move: 'sure they may get the edge cases wrong, but then it will be their fault not ours'.

One of the best things about Ruby is that having helper methods for every kind of possible operation frees application developers from re-solving these edge-case problems and allows them to just write their business logic.


I think the argument here is to embrace the immutability of arrays in Ruby. The reason why this is called `reject!` rather than `reject` is because the designers of Ruby wanted you to favor returning new objects rather than mutating objects in place. Sure, you might need to do it from time-to-time and I'm really happy we have methods that enable such behavior, but this brings to mind the old adage "just because you can, doesn't mean you should". The author might have been trying to say in that quote that app developers should find other ways of solving whatever problem they have without resorting to mutating objects in place. I don't think we should go as far as to not include such methods in Ruby if everyone finds them useful from time-to-time, but I also think we should be staying away from mutating objects in place, rather than returning new versions of said objects and using that.


i wouldn't have said that the ! is there because the designers want you to favour immutable rather than mutable operations. it's there for the same reason confirmations are there for deleting files; they want to you be sure you know what you're doing, because the immutable operation is always safe while the mutable operation is sometimes unsafe.

if the ! for mutable operations weren't there, people would often be very surprised by the behaviour of reverse, sort, etc because in languages like ruby (python, etc too) they expect things to be fairly safe by default.

there are plenty of cases when mutating an object is far preferable to doing the same in an immutable way for both speed, and algorithmic complexity.


Small Ruby conventions note: ! means "dangerous", not "mutable." Yeah, sometimes the kind of danger you're looking at is mutability, but it's a bit broader than that.


To expand, ! means it's a more dangerous version of the method without the ![0]. A method could be dangerous, but not have a safer counterpart, and thus be named without a !.

[0] https://www.ruby-forum.com/topic/176830#773946


That's not what the ! means. Matz has said a few times that this is not the case. "Bang" methods do not necessarily mean dangerous, just different.


That's actually the meaning Matz gives !. http://www.rubyletter.com/blog/2017/01/31/when-to-use-ruby-b...


It's neither. It's reject! because reject already exists. Ruby isn't Scheme: a ! suffix doesn't have a simple conventional meaning. (Unlike a ? suffix.)

The standard library has lots of side-effecting methods which don't come with ! suffix. On the same Array type that has reject!, there's delete, pop, push, among others. Only some of them have pure equivalents - e.g. last for pop.


Methods in ruby only have the bang if there's another version of the method and this one might surprise you.

Push and pop don't have it because they mutate the array like they do in almost every non-hardcore-functional language, so that's the expected behavior.

Things like map! and reject! have them because there _is_ another version, and _this_ one might bite you. It has nothing to do with side-effects or immutability per se, that's just the most common case.


My impression is that the convention is a ! suffix means both that

1) There's another, similar, method without the suffix.

2) The ! version has some side effect (usually mutating something or throwing an exception) that the non-! doesn't, which doesn't mean that the non-! has no side effects (`ApplicationRecord#save` and `#save!` both have side effects, but only `#save!` throws on failure).

Is there any idiomatic Ruby library where ! methods don't follow this convention?


Immutability by default is the right idea, but the nature of arrays as a data structure is that they are (should be) less expensive to transform in-place than by creating a copy.

Mutation per se isn't harmful — if our ape brains were better able to think like a digital computer, we'd never have created functional programming (or even procedural, but I digress). What's harmful is our ape-brain habit to use mutation incorrectly (i.e. a buggy implementation). So a very large part of the problem can be solved with a (standard) library that provides (in theory) correct, performant implementations of common mutation patterns (such as Ruby's `Array#reject!`).

(I realize you were taking the devil's advocate; I sort of am too, since I rarely use mutating methods when I write Ruby).


Edit: I know this sounds like I am trying to dismiss the problem. That's not my point. I'm trying to say the problem is compounded by the choice to bundle the solutions to several different problems. If people have criticisms I would love to read a reply. </Edit>

> We ought to have a standard version of the reject! code precisely because it's so difficult to get right.

No, it's easy to get right in any specific case. It's iterating over a dang list, it's the first thing you learn how to do as a programmer. It's only hard to get right if you're trying to solve it for every use case with a single interface.

Essentially: you are taking two different algorithms and trying two switch between them behind a single interface. Meaty if-statements behind flags are an indication this is happening. It's cool of you want that but put it in a library and give it a GitHub per.

An example: I don't use a library to merge attributes on objects. I cut and paste one of a few snippets or write from scratch. Sometimes you want to mutate an object, sometimes you want a copy, sometimes you want to shallow copy or keep a reference to the parent. I could write a nasty complicated function that intelligently solved "the problem". But all that does is bundle together a bunch of different solutions to different problems and make it hard for me to understand which one it picked.

Edit 2: I could be persuaded that what I'm saying is just not Rubyish. I'm a JavaScript programmer (after many years of Ruby) so maybe there's some truth in that. I do think that distancing ourselves from the mechanics of list manipulation can be bad for our code quality.


> No, it's easy to get right in any specific case. It's iterating over a dang list, it's the first thing you learn how to do as a programmer. It's only hard to get right if you're trying to solve it for every use case with a single interface.

Isn't any correct implementation of any function that retains any only the elements that pass some predicate going to have to do basically the same thing that reject! does, or else be slow (creating a new list instead of mutating in place)?

I guess there's a somewhat simpler implementation that would work for some use cases (those in which order is not important) that just swaps in the last element instead of removing. But if maintaining order is important the complexity is similar.

I think that what would happen if Ruby didn't include reject! is that most programmers would just needlessly create new arrays all the time, since that's easier to write.


> [won't] any function that retains ... elements that pass some predicate going to have to do basically the same thing that reject! does ...?

Well, possible things you might want that still fall under the banner of "in-place reject":

- single slice at the end, vs atomic iterations as the article stated

- traversing the list in different directions, or according to priority

- aborting part way through

- any of a million domain concerns

If you rolled your own iteration, you have a platform for handling any of these. If you used a library, even the standard library, you need to start over from scratch, and either build your own iteration anyway, or find another library.

The exception to this is something like Haskell where the abstractions of the abstractions are modelled in the type system, so you never have to start anything over you just bisect your types and reassign responsibilities around the wound.


In this case, perhaps mutability is the real problem. Array#filter is quite fast in JS and never mutates the array.


it might be quite fast, but it's never going to be as fast as a mutable version of the same thing. there are plenty of cases where you have many (millions?) of elements and want to remove only a few. in that case, copying an array of many elements for the sake of immutability is quite wasteful

*edit: i say never, but if you do cool stuff with referencing array ranges so that you don't copy, but reference the unchanged parts of the previous array then sure, but that's more of a low level feature of the language (eg erlang and clojure both do this i thiiiink?) than something you'd want to implement in a library because it can be suuuuper tricky to get right


No one does immutability with copying, that would be a joke. There is also nothing tricky about not copying full lists on append at userland lib, just more advanced data structures.

Yes, it won't be as fast in modifying. But it will be a lot faster in other operations, so it's a trade off and you chose appropriately for your use case.

Plus, it's a lot easier to read and understand "immutable code".


Like anything, it depends. I frequently work on pieces of code cannot be optimized algorithmically and memory allocation is the bottleneck.

A version which doesn't allocate a new chunk of memory will always be faster.


I'm going to assume this bottlenecked-by-mem-alloc work isn't Ruby.

The rule of thumb I personally go by is that if you're working in e.g. Ruby or Python it's better to favor immutability over mutability because mem alloc will almost never be the problem.

I say this having worked on a Python (pypy) product where mem alloc WAS the bottleneck in a particular area. So I know it's not always true, but almost always. Probably preaching to the choir here :)


I can't disagree with you there. In this context you're probably right 99% of the time


> It's iterating over a dang list,

Oof! reject! is not "iterating over a list," but compacting an array. It's a pretty simple operation that should be available in a standard library. There will be some hairy edge cases around nonlocal exits and accesses to the array itself within the reject block, but it's not too hard to create a standard implementation that properly handles most of the edge cases, and documents the ones that aren't handled.

Roughly, you move elements into place as you go, but don't shift all the higher-index elements down right away. Document that the block can only access elements with greater indices than the current element, and you're good to go.


> I don't use a library to merge attributes on objects. I cut and paste one of a few snippets or write from scratch. Sometimes you want to mutate an object, sometimes you want a copy, sometimes you want to shallow copy or keep a reference to the parent. I could write a nasty complicated function that intelligently solved "the problem". But all that does is bundle together a bunch of different solutions to different problems and make it hard for me to understand which one it picked.

Rubyish-ness aside, this just seems like a worse solution than a library.

1. You're cluttering your files with code that isn't essential to your business logic (and although you might be able to recognise a particular snippet as in-place-merge vs shallow-copy-merge, anyone new who's reading your code will need to read each snippet as they come across it to understand what high-level operation you're performing).

2. Especially in JS, more copied code equals more parse time (which can be a real killer as a project grows).

3. If your code does have a bug (like being accidentally quadratic), you now have a harder time finding and updating all of the places that use it.

4. You've spent your own time writing, customising, debugging and maintaining these snippets when you could have just written _.extend() and had short, clear, working code that most JS developers already understand the behaviour-of.

I'm not saying never write your own snippet — there will be plenty of times when you need some really specific behaviours or performance characteristics — but the default way to do common data-shuffling operations should be a clearly named function call to a shared library.


1. I could be wrong but I took the original comment to mean he copy and pastes the functions, not actually writing out the algo for every instance (I don't think I could possibly argue that's a good idea). An even better solution is to just put it in it's own module with an implementation for each.

2. I don't know if I entirely buy the more code=more parse time being a real reason here if you're willing to pull in the entire underscore library just to get _.extend

3. See response to #1

4. I disagree with this, for the reasons you stated with #2 actually - I really can't stand bringing in one gigantic library for a single piece of functionality. It's the exact reason that jQuery lingers in so many projects when it's really just there to provide a coherent XHR request interface.

I think the general direction of smaller reusable modules that node pushed everyone towards is the best solution.


I totally agree: small specialised modules are great. Underscore is only worthwhile if you need some significant fraction of it (but it makes for an easy example).

I think 4. it actually really important — regardless of whether you're using big or small libraries — it doesn't take long before the maintenance cost of any code you write starts to become a drag; any maintenance you can avoid by using open source libraries should be considered an investment in future productivity. Of course, that doesn't apply if the third party code is crap or not suited to your immediate use case.


In regards to 4: if you use ES6+ modules and Webpack 2 (or rollup), you can import only the the stuff you need from lodash.

That said, my experience is that in some cases you still end up with a bunch more code than expected because of internal dependencies, but perhaps that code would've been necessary anyways.


Nearly all bugs in anyone's code come from things you get right 99% of the time, but mess up that 1%. If Ruby's standard library provides a correct implementation of `Array#reject!`, it's eliminating one thing, however insignificant, that you might mess up. That even standard library implementers occasionally mess up iterating over a dang array (`#reject!` was correct but inadvertently suboptimal) is a good example of how, in the large, "just get it right" is much easier said than done.

Another good reason is that the Ruby standard library implements many methods, such as `#reject!`, in C instead of pure Ruby, for vastly better performance. You can do that yourself of course, but that sort of defeats the purpose of using Ruby. Try implementing some `Array` iteration methods in pure Ruby with for loops, then benchmark them against the standard equivalent; they'll usually be dozens to hundreds of times slower.


Make two versions RejectSlow() and RejectThreadUnsafe() ?


Your Java is showing :-)


Since he capitalized the method names, it's more likely C#.


The c# user just writes arr.Select(el => el == condition);


Yes, but this function is defined as in-place replacement.


So arr = arr.Select(el => el == condition);

I'd push-back in code-review anyone who tried to do something not in obvious, idiomatic (read: LINQ) C#.


Showing my own ignorance. My heuristics have been updated -- thanks!


It could be Algol or any of its derivatives. :)


Good eye.


`reject!` seems like a very straightforward case to me, one that makes sense to be in the standard library. How would different use cases change its behavior? What are the "meaty if-statements behind flags" here?


Did you read the article? It explains exactly what went wrong.


Rather than an edge case, I'd call this a case of trying to satisfy two incompatible goals: fast and internally consistent after each deletion.

Either implementation will be a poor choice for some use cases (hence the bug reports about both).


Couldn't you easily satisfy both goals by moving the "copy all elements not marked for deletion into a new array" code into a "finally" clause?

If I remember correctly, that should ensure the code is executed even if the block contains a break.

That said, I'd wager Ruby's strange way of handling breaks in blocks passed to functions is the real source of problems here.


`..into a new array` is precisely what `reject` does. We're talking about `reject!` which modifies the existing array.


Isn't the simple solution for doing it in place "swap reject element position with current end of array, decrement size of array?" What am i missing?


Wouldn't that change the order of elements?


The only guarantee i saw was that the output array did not contain the rejected elements. (and looking at the reference docs, that's all it says)

It appears order is guaranteed to be maintained but it's not specified :)


No. [1,2,3].reject! { |x| x == 1 } would give you [3,2], rearranging the order of elements. It would work for sets.


maybe there's a finalization step that's necessary to free up memory? This seems like the most straightforward solution though


xg15's comment still makes sense if you replace "copy all ..." with "slide the remaining elements down by the number of removed elements" to match the mutative behavior.

However, a bigger problem is concurrent modification: What happens if your reject! predicate has a side effect which modifies the array? Ideally you'd receive a "ConcurrentModificationException" or the like.


That would still require O(n) space (to store the list of elements to keep or remove) so why not simply use the regular `reject` ?


No, before sliding elements, you set a local variable "dirty" to true. You iterate over all elements and keep the ones that are not rejected by moving them at the end of the current max index. When you code finishes normally, dirty is set to false.

You have an "ensure" block (finally) protecting the code which handles the cases where dirty is true (moving all the remaining elements after the current max). Then you truncate the array.


Because you're saving the O(N) time copy...

iirc, Ruby's array doesn't expose the distinction between length and capacity, but internally it may decide to amortize the cost of such copies. For example, if reject only removes a small % of elements, it may choose not to shrink capacity.


You're not saving the O(N) time copy, just replacing it with an O(N) slide.

(That's assuming you copy at all, instead of simply updating the reference.)


I meant O(N) space, as you said "That would still require O(n) space" - but a slide does not require additional space.


In a multithreaded context, even that isn't perfect; another thread can observe the array in an inconsistent state while you're mutating it. I wonder what Ruby does here?


If you're in a multithreaded context, then you need to handle locking of shared mutable memory; Ruby doesn't fix that for you (but it does make it pretty easy with stdlib Mutex and block syntax). MRI happens to make many basic operations thread-safe because of the GIL, but you shouldn't get into the habit of relying on that naively because it will stop working as you expect as soon as you start thinking of complex operations like reject! as 'just another atomic operation'.

Here's an example of how you'd manage the contention yourself:

    my_array = [1, 2, 3]
    my_array_lock = Mutex.new

    Thread.new do
        my_array_lock.synchronize do
            my_array.reject! { |x| x % 2 == 0 }
        end
    end

    Thread.new do
        my_array_lock.synchronize do
            my_array.push(4)
        end
    end

    Thread.new do
        my_array_lock.synchronize do
            # Because we've serialised our operations on the array, there's no chance
            # that we read the elements in an incorrect order
            my_array.each do |x|
                puts x
            end
        end
    end


Yeah, I like ruby (and especially rails) because it has stuff like this. I also like that most languages have sort() as part of the stdlib. If you need something specific, sometimes they come with more than one sort, or it would be easy enough to write yourself, but I appreciate and expect a high-level language to include the basics so I don't need to keep data structures and algorithms books within reach.

Rails, especially rails 5 with API-only mode is particularly great as well.


The problem with `reject!` is that it's too ad-hoc: the more fundamental operation is to filter the elements of a stream. The input stream could be generated from any data structure (not necessarily an array), and the output stream could similarly be redirected into any data structure (not necessarily an array, let alone physically the same array).


Not clear what you're suggesting here. Can you implement reject! using your propose more general interface? If so, how? Is it more efficient than just creating a copy and then overwriting the original, like this?

  ary = ary.reject {|x| x > 2}
If there's a language agnostic insight to be had here, I'd love to see it. Actually, even if it's language dependent, I'd like to see it.


Have something akin to C++'s `std::copy_if`.

    auto first = vec.begin();
    auto last = vec.end();
    last = std::copy_if(first, last, first,
        [](int x) { return x > 2; });
    auto size = std::distance(first, last);
    vec.resize(size);
(Ironically, in C++ this doesn't always work, because not all types are copyable or movable. But Ruby has no such problems.)


You can simplify that code a bit, since the contents of a `std::vector` are guaranteed to be contiguous in C++03 and later, and taking a pointer offset is therefore well-defined and correct.

  const auto last{ std::copy_if(vec.cbegin(), vec.cend(), vec.begin(),
                                [](const auto x) { return x > 2; }) };
  vec.resize(last - vec.cbegin());
Edit: As someone mentioned below, C++ also provides `std::remove_if()`, for a more exact analogue to Ruby `Enumerable#reject`. http://en.cppreference.com/w/cpp/algorithm/remove

  const auto last{ std::remove_if(vec.cbegin(), vec.cend(), vec.begin(),
                                  [](const auto x) { return x <= 2; }) };
  vec.resize(last - vec.cbegin());


`std::remove_if` only takes two iterator arguments, and it's too ad hoc in exactly the same way Ruby's `reject!` is too ad hoc: it's the result of specializing `std::copy_if` to the case when the zeroth and second iterator arguments agree.


My mistake; I read about it (http://en.cppreference.com/w/cpp/algorithm/remove) too fast and thought it was the simple inverse of copy. I wasn't aware of it before this thread.

In the unlikely event some future reader is copy-pasting these snippets, here's the corrected version:

  const auto last{ std::remove_if(vec.begin(), vec.end(),
                                  [](const auto x) { return x <= 2; }) };
  vec.resize(last - vec.cbegin());


remove_if is a convenience method; for full generality you can use remove_copy_if. I don't see how any of this allows you to achieve your idea of two independent containers backed by the same storage and thereby eliminate the extra step of calling resize.


Okay, but with copy_if, you need an extra resize step at the end, which is annoying in C++ and that Ruby users currently have the luxury of avoiding. And this seems fundamental. The reason that copy_if can't call resize for you is that it assumes nothing about the underlying container; indeed there may not even be one. You say this is a good thing, and there are certainly advantages, but it doesn't seem to solve this problem, unless I'm missing something. Or maybe you're saying that the need to call resize is worth it, since you get a more general interface for the price?


The resize step at the end is a defect of how C++'s collections work. Ideally, I'd like to be able to say that there are two collections, possibly stored in the same buffer so long as they don't overlap:

(0) A collection of elements to be processed (given by an InputIterator range).

(1) A collection of already processed elements (given by an OutputIterator range).

When the algorithm starts running, (1) is empty. With each iteration of the algorithm's loop, (0) becomes smaller and (1) becomes larger [maybe]. When the algorithm halts, (0) is empty.

Notice that, while both collections share the same underlying buffer, each has its own size. So there is no need to perform a `resize()` step at the end.


That sounds super neat, and also super hard to specify and implement, in either C++ or Ruby.


Shameless plug:

After years working with Ruby, I felt that "bang" methods should only be used for methods which mutate the receiver. Many methods which do do so do not have a "!" at the end.

To that end I created a little library called "Bang All The Things" :) https://github.com/pmarreck/ruby-snippets/blob/master/bang_a...

I'm convinced that this simplification helps prevent bugs, at the minor cost of changing some conventions possibly borrowed from other languages.

I have since (mostly) moved on to Elixir which does not have the "mutability problem" at all.


> I felt that "bang" methods should only be used for methods which mutate the receiver.

I mentioned this upthread too, but it's a bit broader than that. Consider http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod... vs http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod... for example. The ! version raises an exception rather than return nil.


I actually like the ActiveRecord convention more than destructive-to-receiver -- it was even adopted by the elixir community (which makes even more sense than ruby, as there really aren't destructive updates in elixir).


Good points. It would be great if there were two separately available suffixes, one to signify receiver mutation and one to signify raising vs nil. I certainly think Elixir went the right way here on the latter front (as it doesn't have to worry about receiver mutation).


Agreed. For whatever reason, my mind really gravitate towards the idea of bang being a mutator. That should have been a standard.

`delete` always gets me.


I think that perhaps the most interesting part of this article is that it comes from a blog called 'accidentallyquadratic' with more than one entry.


I remember coming across it before when the left-pad incident happened, and marvelling at the fact there were already many entries there then. The problem is now, whenever a new entry on this blog gets linked, I find myself compelled to go back and read all the entries in it again. I can't help thinking that over time, that process is going to become prohibitively slow.


It's probable that, as time goes on, the interestingness threshold for linking to a post on the blog will get higher, perhaps even exponentially.


This should be the highest voted comment on hackernews, or we're doing something wrong.


it's pretty elegant.


One of the older posts eloquently discusses why "accidentally quadratic" behavior is both so recurring and so insidious: http://accidentallyquadratic.tumblr.com/post/113840433022/wh...


Raise your hand if you have not only written accidentally quadratic code, but managed to compose said code in such a way that you ended up with something even worse!

raises hand

Imagine you have a React SPA where users can select a number of items for which they want more information (I can't give more details at this moment). Said information has to be fetched from the server upon selection.

I recently discovered that the website would fetch all selected items whenever a new item was selected, including previously selected items, regardless of whether it had already been fetched. So if I start with one selection and build up to n, that is 1 + 2 + .. + n-1 + n = (n²+n)/2[0]. Whoops.

To make it worse, I had also somehow had managed to set up my components in such a way that for each fetch being received, React would remount (that's right, re-mount) all of the selected items. Don't ask me how. I'll just say have only been doing web stuff since last year and leave it at that.

If we assume a user clicks fast enough to select every item before the fetches are received, that would be the previous equation multiplied by n selections, so.. (n³+n²)/2. Yikes!

So yeah... that was stupidly slow. Here's what I did to fix it; obvious but worth spelling out:

    - only fetch what hasn't been/isn't being fetched
    - only re-render the component that was updated
    - don't immediately do the above when the user 
      selects an item; let them make a selection first,
      then click a "show information" button, then
      fetch all the needed information in a *single* fetch
      and render it in a *single* update to the components.
[0] https://www.wolframalpha.com/input/?i=sum+one+to+n


Looking forward to sister blog "Inadvertently Exponential" and the satirical "Maliciously Factorial".


Many years ago, I got a bug report about a feature in an ex-coworker's code. The complaint was that the program would hang when you added the tenth item in a GUI. It turned out that it didn't really hang, but took a half hour to execute because the algorithm was inadvertently exponential. Further examination of the code revealed that the result of the exponential calculation wasn't even being used. So I didn't have to figure out how to make the algorithm linear - I just removed the useless code.


Yes! And posts dealing with quadratic behavior in languages!


One of the more confusing parts of the C++ standard library algorithms is remove/remove_if. A new user would think calling remove(v.begin(), v.end(), value_to_remove) would result in the value being removed from the container[1]. Instead, they'll find that all it did was shift around the ordering of the vector and leave the elements you wanted removed in place at the end (if you read the linked article you can already see where this is going) which is a very unintuitive result. They might look up the function signature and notice it returns an iterator which is also surprising.

Even after discovering they have to call the container's erase method using the result of remove and an iterator to the end of the container[2] they'd probably just decide that remove just has a terrible design. It takes experience and some data structures knowledge to realize why it is designed the way it is.

The design of remove means every element that remains are only be shifted (in the case of an array/vector), at most, one time. The naive approach results in the problem described in the article where you are constantly shifting the elements forward as you progress through the container.

The usability of remove is still a problem (the concept shouldn't need a wikipedia page explaining it) but the way it was done was done for good reasons (within the design decisions of STL at least). It's probably fixed in the C++ ranges proposal but I haven't looked.

D's standard algorithm library remove[3] (which is very much based on Stepanov's STL too) goes one performance step further and lets you specify the swap strategy so that if stability isn't a requirement the number of shifts required is the absolute minimum possible (essentially moving elements from the end of the range into spots freed up by the items being removed).

1. Slightly more experienced users would quickly notice that it can't remove the value, it has no reference to the container to change its length.

2. So common and unintuitive it gets its own wikipedia page: https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom

3. http://dlang.org/phobos/std_algorithm_mutation.html#.remove


My homegrown C dynamic array has remove() macro with the same purpose (I didn't know it is in c++). Except that it takes range instead of index and doesn't move to the end, because you checkout/delete item first, and then remove, not vice versa. It also has insert function that does guess what. The module takes around one screen.

Cool thing is that it is not C++, where remove is hardcore stl magic, and I can just iterate over elements' memory and remove them in sliding ranges, so multi-remove problem simply doesn't exist. Idk why c++ overengineers everything. We abstract a generic vector away from our convenience only to to store items in regular array, because it is the only fast solution for generic cases.


This is one of those things that makes perfect sense after you actually tried to implement it, but not necessarily beforehand. I think this happens a lot with C++ that design decision are quite well though through, but it is hard to appreciate this without taking time to understand the rationale.


Why is `reject!` even a separate, and different implementation than `select!`? Perf reasons?

Haskell's Data.List doesn't seem to even have a remove/reject. Clojure's remove is simply a complement of filter.


They are identical (inverted) implementations as of ruby 2.3, so I think it was just an oversight.


Same reason ruby has both "if" and "Unless", or capybara has .to_not as well as .not_to... there are times when the inverse expression is just more natural and less confusing.

Rubocop, on the other hand against what I've just said, will tell you in almost every case that you might use Unless, or in agreement with what I think it is that You are saying here, the preferred idiom is to not use the inverse, and you should only say "unless" when what you meant to say is unambiguously !if.

IOW sometimes you really mean reject


I don't think the commenter is asking why the method exists. But more why is it that the bug exists with reject! and not select!. Why the underlying implementations of those two very similar methods different?


I caught that too late to avoid the downvoting, shame on me for commenting before I rtfa


Ah, all good. It happens.


Because it's Ruby.

I've been using Ruby (in the context of Rails apps) for almost six months now, after a decade or so in mostly Python. Ruby is just "messy" in my opinion, and having the separate methods Array#select and Array#reject is just a single symptom of that.

Another example is that something in Rails adds Array#fourth. I can't imagine a circumstance where using my_array.fourth is clearer than my_array[3].


Avoiding this should be as simple as using the inverse, `select!`, and then flipping whatever the check is in the block, right?


I think that only half avoids it, or shifts the wrong side of the big O to opposite of wherever it is now. select! also mutates in place.

But I'm not totally sure about that, after reading the Rubydoc for Array[1], only reject! has the warning about mutating the array every time the block is called, select! does not have any such warning. You might be right.

[1]: https://ruby-doc.org/core-2.2.0/Array.html#method-i-reject-2...


I can't agree with the conclusion. Sure complexity is complex, but the solution here would seem to be better performance analysis of the standard library rather than blindly avoiding risk. How do you know where to stop? All built in block-accepting methods for enumerator types would seem to be equally risky in this analysis.


Without much knowledge at all, I'd agree with the conclusion that "block-accepting methods for enumerator types would seem to be equally risky". It sounds to me like having non-pure functions that can terminate at any point. And that sounds like asking for trouble in your code.

I like my safeties in languages though.


So what is the right way to do the same thing? Use `select` and get new array included elements we want to keep instead of using `reject!`?


That's what languages with immutable data structures do. Internally they implement simple data structures (lists, arrays) with more complex ones not to get a performance hit. For example, they don't want to copy every byte every time they add or remove from a list.

In ascending order of complexity

http://theerlangelist.blogspot.com/2013/05/working-with-immu...

http://concurrencyfreaks.blogspot.com/2013/10/immutable-data...

http://debasishg.blogspot.com/2010/05/grokking-functional-da...

In the case of Ruby, it's a language build on mutability so let's use mutable data. There are already plenty of languages with immutable data to use if we want to.


There's a non-mutating version of `reject!` called `reject`. (Likewise there's a mutating version of `select` called `select!`.)


Quite a comprehensive Post ! What makes Ruby block an essential feature of language is it's power and elegance.


I think worth clarifying, the fix on svn r49255 is in git tag v2_3_1, so if you're on ~2.3.1 Then you're back to linear #reject.


I find it weird that the case is called a "bug" multiple times. Sure, if its o(n^2), that's a problem, but if the code works correctly at the end, it's not a bug.


O(n^2) algorithms are insidious in that they will work fine on test data, but will ruin performance when there's an unusually large input in production. They're landmines in your code.


One of the more interesting approaches I've seen lately is in Andrei Alexandrescu's "An Algebra for Expressing Time Complexity of Generic Functions" article[1]. The gist is you (or more likely, the library writer) annotate functions with their complexity guarantees then you can make static assertions about your runtime performance expectations which result in a compile time error if the algorithms/data structures being used cannot meet them. In theory that would prevent the O(n^2) landmine.

1. http://erdani.com/d/bigo.html


Alternatively, it will ruin performance on artificially large test data, but will work fine in production :)


Certainly. But does it make it a bug?


You do know that the most popular general-purpose, comparison-based sorting algorithm quicksort has a worst case complexity of O(n^2), right?


Good implementations tend to address this with introspection, randomization, etc..


Unfortunately, it's not that simple. It's really depends on context.

There are cases where accidental quadratic complexity would definitively be considered a bug. If a fleet of GitHub servers were burning cycles due to a quadratic reject!, I'm pretty sure the engineers would classify the misbehavior as such.


Yes it is? Asymptotic complexity is a well defined property of an algorithm. It is an output, and it doesn't match the expected output.


"reject!" is not an algorithm. It is a semantic element of the ruby language, which is implemented using an algorithm.

That algorithm, in this case, is a quadratic one.


No, because a linear implementation of bsearch is clearly a bug.


Can you point out where in the function definition this was defined?


It's ruby, so the expected complexity of any line of code is "it might complete sometime in my lifetime".

No surprises for users here, which is why that bug lasted for about 4 years with noone caring.

If the documentation stated it was O(n), or benchmarks tested it was fast, then it being slow would become a bug.

As is, it's just what it is.


Well, probably the reason it lasted for about 4 years with no one caring is because .reject! is a fairly rarely used function.


Would a sleep(100000) at the top of the method implementation not count as a bug either?


I don't see how this is relevant. That certainly is a bug if such a call is not a part of the algorithm. Here the function is correct in the sense that it returns as expected.


The function with a long sleep also returns as expected, and is just as much a part of that particular algorithm as the quadratic implementation.


Quadratic performance is not part of the algorithm. End of story.


I am not your student and this is not a classroom. I really dislike that tone.


How is that a bug? Unless sleep was a typo or something. A bug isn't related to performance, a bug is just a mistake.


This quadratic implementation was a mistake.


No it wasn't, the developer probably knew exactly what they were doing.


If the function was implemented as an infinite loop, would you consider the code working correctly as well? After all, the result after execution of the function is always correct (ex falso quodlibet).


An infinite loop is a loop that never ends, it'd not be possible to implement a function that's expected to return in terms of an infinite loop.

Here the algorithm is quadratic but will, theoretically, always complete. Thus there is no bug, but a performance problem. I think that a bug is a mistake in a programme that causes erroneous behaviour.

Say if I was writing a programme which needed to make a computation which is doable only with a quadratic algorithm. Should I not write that programme at all because it'd be a bug, albeit in most cases it'd do the processing I needed?

I'd rather call this a mistake as long as the output from the function is as expected. I'm not saying that it's a negligible mistake to use a quadratic algo where there is a linear one though. Just that I doubt it is fundamentally a bug.


> Say if I was writing a programme which needed to make a computation which is doable only with a quadratic algorithm. Should I not write that programme at all because it'd be a bug, albeit in most cases it'd do the processing I needed?

This only makes sense if you completely ignore context, which doesn't make any sense at all. The point/"bug"/mistake here is using a quadratic algorithm when it isn't necessary, not just the fact that the algorithm is quadratic. If something can be solved in O(1) time, it's a mistake to use an O(n) algorithm, and similarly, if something can be solved in O(2^(n!)) time, it's a mistake to use a O(9999^(n!)) algorithm (assuming constant factors are of similar orders of magnitude etc, as they are in the OP).


Programs without specification have only surprising behaviour no bugs. As far as I can see "reject!" does not specify complexity, so in this sense it would be completely fine. Additionally it claims to change the array instantly every time the block is called, which doesn't seem to give any other options than making it at least quadratic (presuming flat representation for arrays). This feels quite overspecified as accessing the array from within the block seems quite unusual case to cater for.

Though, this behaviour was changed after all which leaves me wondering how much Ruby people care about backward compatibility.


It's not a bug in the method, but it should be.

A peeve I have with Ruby is that methods in the standard library don't have defined big-O complexities. If they did, I don't think reject! would have been specified to be O(n²), and so it would be a bug.


I'd love a type-system alike checking feature of a language that made you specify (and would verify) time complexity of functions

Then in reviews its explicit when someone's doing something dumb (specifying high complexity for new algos) at compile time if a lower-complexity function calls a higher complexity one with the same sized input, or in CI when an algorithm isn't within the complexity it claims to be (experimentally)


I suspect the "verify" part of that is going to slam head-first into the Halting Problem, given that some of the quadratic behavior only happens (see the blog) for particular patterns of use or input data.

You'd also likely want to somehow encode space complexity as well, otherwise silly things like "precompute every possible result and put it in a giant lookup table" will register as "O(1)".


If it can take down your webserver due to unexpected CPU overload, then I would consider it bug


If it's n^2 it may never return in your lifetime. Does the code work? Nobody knows. It this situation you may consider it a bug.


O() doesn't say anything about how long it'll take. O(n) may never return in your lifetime too.


O(1) might not either :)


Because you might be dead already. Which would be a weird edge case.


No. O(1) just means the algorithm always takes no more than some arbitrary, but fixed amount of time to complete, regardless of the size of the input. That fixed amount of time could be a billion years (or more).


It consistently takes a billion years, it's O(1).


That's true, but the odds are way lower.


O(0) would though, right?


You mean O(1) (alternatively, O(k)) -- from the definition of Big O notation, O(0) is nonsensical. But even then; O(1) just means "constant time", it does not mean "soon."


> from the definition of Big O notation, O(0) is nonsensical

Nitpick: not sure it's nonsensical. Plug g(x) = 0 into the usual definition, and you get |f(x)| ≤ k⋅0 for some k in R, which reduces to f(x) = 0. Which is not satisfiable by any nontrivial algorithm, so not very useful, but not nonsensical.


O(n^2) is not the same as O(2^n)


It's still no O(n log n) O(n) or O(1) though.


Can't wait for the entry on quicksort.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: