Thinking in an array language

unnouinceput · on May 14, 2022

Any performance benchmarks between a simple C implementation of matrix multiplication like explained in the article and K one?

Story time: 2 years ago, right before 1st lockdown, I landed a client. A data scientist who had already implemented an algorithm which dealt with matrices, implemented by a previous programmer he hired. Said implementation was in Python, no more than 10 line altogether, which performed well when matrix size where small, like 10x10. But the problem was, his real work need it to have matrices of size 10^6 x 10^6. Not only the Python implementation had to be ran on a beast of server with 4TB of memory, it also took 3 days to finish. And while the algorithm was small in size in Python, its explaining paper was 4 pages in total, which took me 1 week to understand. And then an entire month to implement in C. But in the end, when all was said and done, the run time when used with real data was only 20 minutes and consumed only 8 GB of memory, though it did required at least 16 virtual processors.

Hence my question, in the end performance is what it matters, not number of lines.

mlochbaum · on May 14, 2022

K timing, squaring a 100x100 matrix in ngn/k:

   a:100 0N#?10000    / 100x100 double matrix
   \t:1000 a(+/*)\:a  / Total of 1k runs in ms
  1514

Following C program outputs 272 when compiled and run with -O3 -march=native.

  #include <stdio.h>
  #include <time.h>
  size_t monoclock(void) {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return 1000000000*ts.tv_sec + ts.tv_nsec;
  }
  
  int main() {
    const size_t n = 100;
    double a[n][n], r[n][n];
    size_t t = monoclock();
    for (size_t iter=0; iter<1000; iter++) {
      for (size_t i=0; i<n; i++) {
        for (size_t j=0; j<n; j++) {
          double sum = 0;
          for (size_t k=0; k<n; k++) sum += a[i][k]*a[k][j];
          r[i][j] = sum;
        }
      }
    }
    printf("%lld\n", (monoclock() - t)/1000000);
  }

So K is slower, by a factor of 5 rather than the 200 you saw with Python. K appears to scale to larger matrices better than C: 8664 in K vs 2588 in C for 200x200, but I can't be bothered to sort out heap allocation for the C code to do larger sizes. I would certainly not say that your story supports the idea that performance is what matters. In the month you took to implement the C version, the Python solution could have run ten times over, and there are many computations that only ever need to be performed once. Besides, I bet you were glad to have working Python code to use as a reference!

jiggawatts · on May 15, 2022

I prefer "middle of the road" languages that are high-level AND readable AND have decent performance optimisation for bulk operations. Python with C libraries suffices for a lot of people, Julia similarly is getting popular.

Even the older Mathematica language blows both K and C out of the water for readability and performance:

    m = Table[RandomReal[], 100, 100];
    t = RepeatedTiming[MatrixPower[m, 2]];
    First[t]*1000*1000

    18.0245

You would have to know literally zero about Mathematica's Wolfram Language to be able to read that clearly! From a standing start you could understand what another person has created. For K, you'd have to have memorized the K-specific syntax. For C, if you hadn't seen the standard patterns for matrix multiplication you'd have to read the code carefully. A lot of it is just noise, like the verbose for loop syntax.

Oh, and Mathematica's matrix power function is:

- Parallel! If I use a 10K x 10K matrix as an input, it uses about 75% CPU on my 8-core laptop. It can complete a single multiplication in 5.3 seconds. For laughs, try that with either K or C and see what you get...

- Extends to negative or fraction powers.

- Has optimisations for applying the matrix power directly to a vector.

- Is extensively documented, unlike the terse K snippet or the hand-rolled C code: https://reference.wolfram.com/language/ref/MatrixPower.html....

Essentially what I'm trying to say is that terseness is an anti-pattern, and doesn't even begin to approach the utility of a well designed high-level language intended for teams of collaborating humans.

johndough · on May 14, 2022

When you compile the C example with the compiler option "-Wall", you'll get a warning that the variable 'r' is set but not used, so the compiler would be free to simply skip the for loops. In fact, if you compile with clang instead of gcc, the compiler will do just that and you'll get almost zero computation time.

It would be better to do something with the computed result so the compiler does not remove the computation, e.g. print a randomly selected value.

I also benchmarked the fixed C code against Python and "Python" (which uses OpenBLAS under the hood in my case) was 10 times faster:

    import time
    import numpy as np

    a = np.random.rand(100, 100)
    t = time.perf_counter()

    for _ in range(1000): a @ a

    print(time.perf_counter() - t, "seconds")

Implementation matters a lot for matrix multiplication.

mlochbaum · on May 14, 2022

Yes, definitely a quick and dirty benchmark (I did test after I posted to see if initializing a does anything; it didn't). Timings for J below, since I think it's the most focused on linear algebra. The remarkable thing about the K code from the article is that it's all made from totally general-purpose pieces that have nothing to do with matrix products, and K interpreters don't have any clue what a matrix product is. In J the matrix product is written +/ .* with the generalized dot product operator . (which does need the preceding space, oof) and handled by specialized code. Given that, I found this measurement a little disappointing: about as fast as my C code in the 100x100 case and slightly faster in the 200x200 case.

     a =: ?100 100$0
     <.1e6 * 1000 (6!:2) '+/ .*~ a'
  269
     a =: ?200 200$0
     <.1e6 * 1000 (6!:2) '+/ .*~ a'
  1796

harshreality · on May 15, 2022

Naive implementations of stock matrix math can't get anywhere close to numpy or julia, which both use BLAS and automatically parallelize across cores.

  % python matrix.py
  Timing 10 squares of a random 10000 x 10000 matrix
  97.3976636590669 seconds
  python matrix.py  364.41s user 8.10s system 379% cpu 1:38.25 total

julia has more overhead, and the first multiply triggers code compilation so there's an additional warm-up square outside of the timing loop, but its "warm" performance is equivalent to numpy. Turning on extra optimizations (-O3) can even make it a couple seconds faster than numpy once warmed up.

  % julia matrix.jl
  Timing 10 squares of a random 10000 x 10000 matrix
   97.787679 seconds (31 allocations: 7.451 GiB, 0.33% gc time)
  julia matrix.jl  405.34s user 8.13s system 375% cpu 1:50.09 total

If you're going to wait for that C implementation, or the other comment's K implementation, to finish that loop, you'll want a book.

unnouinceput · on May 14, 2022

Nice benchmarking, thank you for your effort.

Also the scientist was leading a team, so my program would've been used by at least 20 people, 10 times per day. That was why Python one was a no go for them from beginning.

harshreality · on May 14, 2022

Can you share the algorithm, or anything computationally equivalent, for people to try benchmarking different implementations?

unnouinceput · on May 14, 2022

The one I wrote 2 years ago? That's intellectual property of said data scientist, not mine. Al I can say is that I parallelize it a lot, hence the entire month. From programming point of view is a mess and hard to follow its ~5k lines. Usually parallel programming is a mess, you should take a look at any parallelization CUDA code available on GitHub.

harshreality · on May 14, 2022

Was the original just regular python, or numpy?

The C version wasn't GPU-targeted though from your description. I'm curious what other implementations would be capable of, for instance julia, maybe gpu-targeted.

unnouinceput · on May 15, 2022

We discussed, since we already agreed on parallelization, if he wanted CUDA, since that would've been even faster. But after discussing with his team, he said no GPU dependent implementation and I started the work. He never shared why no GPU implementation and I didn't pressed the matter further since I was already knee deep in trying to understand the algorithm which was the bigger stone to crack at the time.

carapace · on May 14, 2022

I was just playing with Nils M Holm's Klong this morning: https://t3x.org/klong/index.html (Klong rather than the others mostly because the C implementation looks like C so I have a ghost of a chance of actually grokking it.)

These folks are really onto something, but I think they get sidetracked in the (admittedly very very fun) minutia of the languages and lose sight of the crucial insight in re: mathematical notation, to wit: it's a means of human communication.

For APL or K to get out of their niches would require, I am convinced, something like a tome of documentation of a ratio of about 1.5 paragraphs per line of code. That would give us mere mortals a fighting chance at grokking these tools.

A similar problem plagues the higher-order stuff they're pursuing over in Haskell land. I know e.g. "Functional programming with bananas, lenses, envelopes and barbed wire" and "Compiling to Categories" are really important and useful, but I can't actually use them unless some brave Prometheus scales Olympus and returns with the fire.

Stuff dribbles out eventually. Type inference and checking have finally made it into the mainstream after how many decades?

razetime · on May 15, 2022

Here's some links relating to this style of code that you may find useful:

https://docs.google.com/document/d/1W83ME5JecI2hd5hAUqQ1BVF3...

https://github.com/tlack/b-decoded

https://chat.stackexchange.com/rooms/90748/conversation/ngn-...

They're not 1.5 paragraphs per line, but enough to give a taste of the implementation style.

carapace · on May 15, 2022

Thank you. :)

jodrellblank · on May 14, 2022

> "would require, I am convinced, something like a tome of documentation of a ratio of about 1.5 paragraphs per line of code. That would give us mere mortals a fighting chance at grokking these tools."

You can download a free PDF copy of Mastering Dyalog APL by Bernard Legrand which is 700+ pages, from here:

https://www.dyalog.com/mastering-dyalog-apl.htm

carapace · on May 15, 2022

That's an amazing reference, but it's about the language, I was thinking more of walk-throughs of code in the language. E.g., for some BQN code: https://news.ycombinator.com/item?id=30913872

There was a better example a couple of weeks ago here in a thread, someone had done a bit of APL or K for Advent of Code or something and posted a line, and someone else broke it down and explained how it worked. I spent an hour just now with Algolia trying to find it but I failed. :(

Holm's Klong docs have a good example in https://t3x.org/klong/klong-intro.txt.html where he explains how a table formatter function works.

Mathematical equations are usually embedded in papers that explain them. (I mean, I've read papers that were basically equations one-after-another with just scraps of interstitial prose, but they were heavy going.)

jodrellblank · on May 15, 2022

> "There was a better example a couple of weeks ago here in a thread, someone had done a bit of APL or K for Advent of Code or something and posted a line, and someone else broke it down and explained how it worked. I spent an hour just now with Algolia trying to find it but I failed. :("

It wasn't a couple of weeks ago, but I did that for a line here: https://news.ycombinator.com/item?id=30463080

Or could it have been on an Advent of Code link? There have been some explanations in the answers mega-threads on Reddit. Anyway, yes I agree more explanations would be benficial - and I think there would be room for an animated explainer website with small blocks representing the array elements, coloured by how they are grouped by each primitive operation, and visually showing them moving round and splitting and combining. Such a thing would make a lot more sense for an array language than for many languages.

carapace · on May 15, 2022

Ach! Yes, thank you! That comment! LOL I feel a little silly now.

Your explanation was fantastic, and yeah I think the availability of more information like that would go a long way towards lowering the barrier for people to pick up these languages. Even if they don't actually use APL and it's ilk they can still get a better idea of how to use things like Numpy by getting familiar with the concepts and idioms that support them.

(I forgot to mention "A History of APL in 50 Functions" https://www.jsoftware.com/papers/50/ I just started working my way through that and it's been very helpful.)

> and I think there would be room for an animated explainer website with small blocks representing the array elements, coloured by how they are grouped by each primitive operation, and visually showing them moving round and splitting and combining. Such a thing would make a lot more sense for an array language than for many languages.

That would be pretty awesome. Reminds me of Guo's Python Tutor https://pythontutor.com/

jodrellblank · on May 18, 2022

Thanks; yes I agree more examples like that would help. That PythonTutor site looks brilliant, I will have to play with it some more. It's a lot like I was imagining.

rramadass · on May 15, 2022

https://code.jsoftware.com/wiki/Books

carapace · on May 15, 2022

That looks great! Cheers!

rramadass · on May 16, 2022

You might also find this interesting: https://news.ycombinator.com/item?id=30412654#30416737

carapace · on May 16, 2022

YES! Thank you! A Rosetta stone at last! Prometheus has come and my hearth is lit.

btheshoe · on May 14, 2022

This all seems very similar to writing vectorized code in numpy

eismcc · on May 14, 2022

That’s because numpy is based on J which is based on APL

MontyCarloHall · on May 15, 2022

Yet another Iverson Ghost!

https://dev.to/bakerjd99/numpy-another-iverson-ghost-9mc

da39a3ee · on May 15, 2022

What?

  matmul: (+/*)\:

ogogmad · on May 14, 2022

Array languages might make good maths notation. It's terse and easy to write, and there's a logical naming scheme (for instance, matrix multiplication is just (+/*)\: ). I suppose the trick is to think of (+/*)\: as one unit.

pxeger1 · on May 14, 2022

APL, an array language, literally started out as a computer adaptation of traditional mathematical notation.

https://aplwiki.com/wiki/Iverson_notation

https://aplwiki.com/wiki/Comparison_with_traditional_mathema...

hoosieree · on May 14, 2022

Math notation is highly context dependent (is + addition or boolean or?) and yet authors rarely feel the need to provide context.

If they wrote in an array language instead of LaTeX, not only would it make writing papers easier (+/ is shorter than either \Sigma or \sum), but it would be trivially reproducible, due to being an executable notation.

grayclhn · on May 14, 2022

Yeah… for actual papers “easier to read” is 1000x more important than “easier to write.”

kmstout · on May 14, 2022

Somewhere on YouTube is a talk where Gerald Sussman described mathematics (or at least mathematical presentation) as "impressionistic."

Mathnerd314 · on May 14, 2022

I think he says that in every talk he gives. https://youtu.be/HB5TrK7A4pI?t=1090, https://youtu.be/arMH5GjBwUQ?t=377, https://youtu.be/EbzQg7R2pYU?t=3201, etc. There's a 2002 paper where he uses the term: https://dspace.mit.edu/bitstream/handle/1721.1/6707/AIM-2002...

IshKebab · on May 14, 2022

"What if we could make entire programs as unreadable as regexes?" -K

LAC-Tech · on May 14, 2022

This is such a lame take.

Everything is unreadable until you learn how to read it.

IshKebab · on May 15, 2022

Everything is not equally readable once you have learnt it.

icsa · on May 14, 2022

And yet regular expressions are a part of most programming languages as a library or built-in syntax.

Why is that?

bear8642 · on May 14, 2022

Probably due to they're a great notation for the problem area which for regex is concisely describing text patterns.

For example 'a*b' is any number of 'a's followed by 'b'.

How else would you concisely state that?

LAC-Tech · on May 14, 2022

How else would you concisely state that?

Presumably people who hate array languages think all 3 character regexes should instead be big nested loops, so they are "readable".

samatman · on May 15, 2022

Regexes in the Unix tradition are a user interface as much as a programming language. Not that there's a sharp distinction, but it's almost a trite observation that regexes per se shine for ad hoc string searching but show their weakness when they start becoming parts of programs.

When writing a program, I prefer to use a PEG, giving the less compact notation `'a'* 'b'` but also letting me say `'a'* b` and define b as its own rule, including recursion for the useful cases. It helps that it's more powerful, being little more than a formalization of the post-regular strategies used in Perl-style 'regular' expressions while embracing recursion.

For '/' in vim, grep, wherever? Yeah regex is fine, that's what it was designed for.

IshKebab · on May 15, 2022

I can't remember the names but I've seen at least two alternative syntaxes recently that are a lot more readable. At least one of them fixed the issue of regex mixing up control in-band with data. So your example would be something like

    "a"* "b"

Much more readable and less error-prone.

bear8642 · on May 15, 2022

> the issue of regex mixing up control in-band with data

Could you explain this? I don't quite understand what the problem is. Do you mean something like sed's regex substitute command?

woojoo666 · on May 15, 2022

I believe they mean the operators and operands are all mixed up, eg in `ab`, and this makes it so you have to escape all sorts of characters, but if you split it into `"a" "b"` then the separation is clear

IshKebab · on May 15, 2022

I mean it isn't clear whether a character is a control character (* + ? [ ] - etc) or a literal character because they're all mixed up. The rules about which is which are too complex, extensive and varying.

If you use syntax like "a"* "b" then it's really obvious - the stuff in quotes is literal text, everything else is control.

Lots of formats make the same mistake, e.g. YAML.

IshKebab · on May 15, 2022

They're very quick to write and they (in appropriate cases) would be quite difficult to implement otherwise (tedious state machine stuff). They are still massively overused though. Using a regex at all is a huge red flag. Sometimes they are appropriate, but not in 90% of cases in my experience.

Anyway I'm not sure the same is true for K. At least for the given example the for loop was not exactly difficult to write.

snidane · on May 14, 2022

Array notation is great, but only for single array operations and for dense array operations.

The moment you need to run complex group bys and store non-contiguous data efficiently, it gets awkward pretty quick.

On the other hand, operations on dense data is pretty cumbersome in SQL. You can only do so much with limited support of proper scan algorithms, merge joins or bolted on window functions.

Please somebody combine APL with SQL and you win the programming language wars.

hoosieree · on May 14, 2022

> Please somebody combine APL with SQL and you win the programming language wars.

kdb+ and q fit this description. Docs here: https://code.kx.com/q/basics/qsql/

Here's an example.

In SQL:

    SELECT stock, SUM(amount) AS total FROM trade GROUP BY stock

In q:

    q)select total:sum amt by stock from trade

eismcc · on May 14, 2022

Was gonna say the same plus kdb+ is a columnar store so you can get vectorizarion in your sql execution as well.

mlochbaum · on May 14, 2022

I would say APL-family languages today have largely addressed these concerns with operators such as Key[0][1] and nested arrays. J also has built-in support for sparse arrays. Some more complicated things like storing a list of lists in a compact representation (perhaps lengths and data) aren't supported natively, but I'd consider that a niche concern as a list of lists will have similar performance, just with more memory use.

There's a lot of database software built on array languages, with kdb and Jd being the most prominent as far as I know.

[0] https://aplwiki.com/wiki/Key

[1] https://code.jsoftware.com/wiki/Vocabulary/slashdot#dyadic

nerdponx · on May 14, 2022

R? Julia? Python with Pandas even.

geophile · on May 14, 2022

Sort of related: thinking in relational algebra, or SQL. It appears to be "natural" to think about computing one atomic value at a time, in loops, or slightly less intuitively, recursive functions. (That latter choice may follow from whether your first language was pure Lisp.)

I was fortunate to have a teacher whose database course drilled relational algebra into us. This was in the 70s, shortly after Codd's paper, and well before SQL was invented, much less established. Now I think about much computation algebraically (and often functionally). But I do see that this is "unnatural" for many, including students, having taught databases for several years.

SQL reflects this. I often see students writing nested subqueries, because that is more procedural, where joins would be a cleaner choice. A colleague of mine wrote a paper many years ago, pointing out that thinking procedurally is more "natural" for many: https://dl.acm.org/doi/10.1145/319628.319656. But thinking set-at-a-time instead of one-at-a-time is a valuable skill, not that far off from thinking functionally.

snidane · on May 14, 2022

It's easier not to mess up table based filters using explicit semi-join operators (eg. in, not in, exists) instead of using regular joins because joins can introduce duplicates.

Give me 'any join' operation - ie. just select the first value instead of all, and I'll happily use joins more. They are actually more intuitive.

It's not that relational algebra is untintuitive. It's because standard SQL sucks.

magicalhippo · on May 14, 2022

Indeed, I've taught myself to only use JOIN when I actually need some data from the table I join. For everything else I use EXISTS and friends.

I was thinking SQL could do with a keyword for that, like maybe FILTER, that looks like a JOIN but works like EXISTS.

snidane · on May 14, 2022

Clickhouse implements an explicit SEMI join. It can be called semi or any, it doesn't really matter. It's just another join modifier

    [OUTER|SEMI|ANTI|ANY|ASOF]

https://clickhouse.com/docs/en/sql-reference/statements/sele...

nerdponx · on May 14, 2022

My problem with semijoins is that the semantics of "what exactly does a SELECT evaluate to inside an expression" are sometimes murky and might vary across databases.

magicalhippo · on May 14, 2022

Could you expand a little?

nerdponx · on May 16, 2022

If I write

    WHERE x IN (SELECT ...)

what the heck is the result of evaluating the inner query, in the outer expression?

Maybe I am missing something, but the exact meaning to vary a lot across different databases. Some seem to have a standalone "table" data type, while others don't.

magicalhippo · on May 16, 2022

I might be missing something as I'm self-taught, but the inner select specifies a set, and you "just" do a simple set membership test?

How it's implemented is as usual up to the database server implementation. Ones I've used creates a temporary table (like it does in so many other cases), and as such EXISTS is usually faster.

But I wouldn't rely on this when moving to another implementation, and use the query planner to see, just as I'd view the assembly output when moving to a new compiler.

Again, I don't have tons of experience, so concrete (counter) examples are welcome.

fiddlerwoaroof · on May 14, 2022

“First” doesn’t make sense without an order

jodrellblank · on May 14, 2022

It even has its own tag on StackOverflow: https://stackoverflow.com/questions/tagged/greatest-n-per-gr...

People who want it, want it with an order.

Look at

https://stackoverflow.com/questions/121387/fetch-the-row-whi...

and

https://stackoverflow.com/questions/3800551/select-first-row...

and

https://stackoverflow.com/questions/8748986/get-records-with...

and their combined thousands of votes and dozens of answers, all full of awkward workaround or ill-performing or specialised-for-one-database-engine code for this common and desirable thing which would be trivial with a couple of boring loops in Python.

snidane · on May 14, 2022

It does make sense for semi-joins. I care about the key, not the value.

Random order is also a valid order.

TFortunato · on May 14, 2022

Having taught it, do you have any recommendations for folks who are looking to improve their thinking in relations/sets/SQL skills?

geophile · on May 16, 2022

I teach, in order, more or less (there is some interleaving):

- Data modeling - Relational algebra - SQL - DBMS architecture (buffering, btrees, query processing, index selection, ...) - Query optimization - Transactions

A few assignments have students write an in-memory relational algebra implementation, and then use it to write queries. A typical query is a deeply nested set of function calls (as a "one-liner"). And only then to we get to SQL. The hope is that RA is so ingrained, that the connections to SQL are easier to see. And this background is also really useful in understanding query optimization.

All of this material, including assignments, is available online (this is my own webserver): http://geophile.com/115.

co_dh · on May 14, 2022

https://arxiv.org/abs/1803.05316

agumonkey · on May 14, 2022

I think this is related to "wholemeal" programming as some haskellers/FP-ists do, thinking in sets/tree/graphs operations.

webmaven · on May 14, 2022

> SQL reflects this. I often see students writing nested subqueries, because that is more procedural, where joins would be a cleaner choice.

In my experience, in the non-ad-hoc use-case, views can often be substituted for the procedural approach, forming the equivalent of a POSIX pipe.

*> A colleague of mine wrote a paper many years ago, pointing out that thinking procedurally is more "natural" for many: https://dl.acm.org/doi/10.1145/319628.319656. But thinking set-at-a-time instead of one-at-a-time is a valuable skill, not that far off from thinking functionally.

Hmm. Given the proliferation of tabular data tools (especially spreadsheets) over the intervening 40 years, I wonder if those results would remain the same today (and whether there would be any difference among Excel power users that use pivot tables, etc.)