I was thinking of creating a system like this to find duplicate code in a progra...

pierrec · on June 19, 2016

There are tools that do this, though of course they analyze the source, not the binaries like what's done here (binaries can legitimately contain lots of duplicate code, introduced by the compiler for optimization).

https://en.wikipedia.org/wiki/List_of_tools_for_static_code_...

Waterluvian · on June 19, 2016

I bet an analysis like that will find a ton of code that shouldn't be abstracted. Maybe at some compilation level more optimization can be done. But perfectly optimized code is likely much less maintainable. Nevertheless, it would be very revealing and educational.

heurist · on June 20, 2016

I imagine it would be like code coverage. You don't always want to unit test every single branch and 100% doesn't mean you're bug-free, but it's still useful as a metric.

gravypod · on June 19, 2016

Is there any reason why something shouldn't be abstracted?

At least in a functional or procedural realm, I don't think that this would be a detriment.

Ace17 · on June 19, 2016

Structural equivalence doesn't necessarily means there's some missed abstraction:

struct Point { int x, y; }

struct Fraction { int num, denom; }

Would it be desirable to replace both above definitions by the following "abstraction"?

struct StructWithTwoIntegers { int first, second; }

gravypod · on June 19, 2016

I'd just call it a union. Since it's a union of two integers.

Than name of the variables should tell you about what the data means.

The data structure should tell you how it's organized in memory.

Edit: You can also abstract the data type from the union. You can have unions of other types as well so why have 30+ union types. You can also have unions of more then a single pair.

Also I feel that rather then having the data structure represent the meaning, variable-based meaning is the best way to go as it will lead to clearer names.

   union_t<int> location_coordinate = {
      x = 0;
      y = 0;
   };

This would be accessed by:

   location_coordinate.x; // == x
   location_coordinate.y; // == y

Rather then

   location_t position = ...;

I personally like longer variable names as it makes it explicitly easy to see exactly how it's expected to behave.

Ace17 · on June 20, 2016

You just moved the problem.

Suppose you have two "add" functions: one for fractions, one for points:

  Fraction add(Fraction a, Fraction b)
  {
    Fraction result;
    result.num = a.num * b.denom + b.num * a.denom;
    result.denom = a.denom * b.denom;
    return result;
  }

  Point add(Point a, Point b)
  {
    Point result;
    result.x = a.x + b.x;
    result.y = a.y + b.y;
    return result;
  }

How would you write this in C++?

Obviously the code below won't do:

  union_t<int> add_fractions(union_t<int> a, union_t<int> b);
  union_t<int> add_points(union_t<int> a, union_t<int> b);

Because you now just lost type safety (i.e you can add fractions with points). To get back strong type safety, you could write:

  struct Fraction : public union_t<int> {};
  struct Point : public union_t<int> {};

At this point, you're mostly back to square one, except that Fraction users must use members named "x" and "y" ... (and I'm not even talking about operator overloading)

(By the way, "union" is an unfortunate choice of words, as the language already defines it to mean something completetly different. Maybe "pair" is more appropriate?).

slavik81 · on June 19, 2016

If two things just happen to be similar, you go through a whole lot of work to make them share code, then you go through a whole lot of work to separate them again when one changes.

ximeng · on June 19, 2016

You can design wide or deep with abstraction. Repeat similar code or abstract it out and branch down. Both introduce complexity. Possible to have too many abstraction layers.

https://plus.google.com/+aerotwist/posts/1QhcnQizuPc

Convenient proxy factory bean superclass for proxy factory beans that create only singletons.

elcapitan · on June 19, 2016

If you analyze the binary, wouldn't you most likely find lots of inlined code anyway? Code that is neatly abstracted already, but which the compiler then optimizes to avoid too many function calls.

kwhitefoot · on June 19, 2016

Visual Studio has a feature that searches for Code Clones. Works pretty well.

But I would love to have a command line tool that did not depend on VS and Windows.

techbio · on June 20, 2016

I'd contribute to that. Especially it was operating on a source tree that was heavily abstracted already.