It’s a real shame Dijkstra rubbed so many people the wrong way.
Maybe his incisive polemic, which I greatly enjoy, was all but pandering to a certain elitist sensibility in the end.
To make manageable programs, you have to trade off execution speed both on the cpu and in the organization. His rather mathematized prescriptions imply we should hire quarrelsome academics such as him to reduce performance and slow down product development[initially…] all in the interest of his stratified sensibilities of elegance and simplicity.
These are all expository diagrams. Second-hand and auxiliary by nature.
Diagrams can be authoritative, and the ones I’ve seen will break some or all of these rules because they represent natural heuristics that practitioners are expected to fill inn themselves.
> In the aggregate, almost no programmer can think up code faster than they can type it in.
And thank god! Code is a liability. The price of code is coming down but selling code is almost entirely supplanted by selling features (SaaS) as a business model. The early cloud services have become legacy dependencies by now (great work if you can get it). Maintaining code is becoming a central business concern in all sectors governed by IT (i.e. all sectors, eating the world and all that).
On a per-feature basis, more code means higher maintenance costs, more bugs and greater demands on developer skills and experience. Validated production code that delivers proven customer value is not something you refactor on a whim (unless you plan to go out of business), and the fact that you did it in an evening thanks to ClippyGPT means nothing—-the costly part is always what comes after: demonstrating value or maintaing trust in a competitive market with a much shallower capital investment moat.
I find the prevailing use of “recursion”, i.e. β-reduction all the way to ground terms, to be an impoverished sense of the term.
By all means, you should be familiar with the evaluation semantics of your runtime environment. If you don’t know the search depth of your input or can’t otherwise constrain stack height beforehand beware the specter of symbolic recursion, but recursion is so much more.
Functional reactive programs do not suffer from stack-overflow on recursion (implementation details notwithstanding). By Church-Turing every sombolic-recursive function can be translated to a looped iteration over some memoization of intermeduate results. The stack is just a special case of such memoization. Popular functional programming patterns provide other bottled up memoization strategies: Fold/Reduce, map/zip/join, either/amb/cond the list goes on (heh). Your iterating loop is still traversing the solution space in a recursive manner, provided you dot your ‘i’s and carry your remainders correctly.
Heck, any feedback circuit is essentially recursive and I have never seen an IIR-filter blow the stack (by itself, mind you).
> By Church-Turing every sombolic-recursive function can be translated to a looped iteration over some memoization of intermeduate results. The stack is just a special case of such memoization.
Ah, so functional reactive programs don’t suffer from stack overflow on recursion, they just suffer from memoisation overflow? ;)
Electronic circuits with feedback could be thought of as tail end recursion. :)
And in imperative languages one can get away with no recursion. For example setTimeout(f, 0) in js or go routine in go. Of course one needs an accumulator in that case.
Most vendors use three indexes for triples and 4 or 6 for quads. All the indexes are covering, which is to say they triplicate all data—-in other words the database consists only of indexes.
I refer to RDF as the "absurd normal form". When my friends and I do DB design, it's almost inevitable we fall into what might inevitably become the "ThingThing" table thats a many-to-many joiner of everything to everything else. (That's when we giggle, leave the room, go to lunch, and then come back when we've returned to our senses.)
But, for RDF its exactly what I want, I'm not interested in schemas and such for this work, so it's perfect for my scenario.
I’m completely flabbergasted by the number of comments implying copyright concepts such as “fair use” or “derivative work” apply to trained ML models. Copyright is for _people_, as are the entailing rights, responsibilities and exemptions.
This has gone far beyond anthropomorphising and we need to like get it together, man!
My initial interpretation was that you're saying fair use is irrelevant to the situation because machine learning models aren't themselves legal persons. But, fair use doesn't solely apply to manual creation - use of traditional algorithms (e.g: the snippets, caching, and thumbnailing done by search engines) is still covered by fair use. To my understanding, that's why ronsor pointed out that ML models are tools used by people (and those people can give a fair use defense).
Possibly you instead meant that fair use is relevant, but people are wording remarks in a way that suggests the model itself is giving a fair use defence to copyright infringement, rather than the persons training or using it?
Well then I could have been much clearer because I meant something like the latter.
An ML model can neither have nor be in breach of copyright so any discussion about how it works, and how that relates to how people work or “learn” is besides the point.
What actually matters is firstly details about collation of source material, and later the particular legal details surrounding attribution. The last part involves breaking new ground legally speaking and IANAL so I will reserve judgement. The first part, collation of source material for training is emphatically not unexplored legal or moral territory. People are acting like none of the established processes apply in the case of LLMs and handwave about “learning” to defend it.
> and how that relates to how people work or “learn” is besides the point
It is important (for the training and generation stages) to distinguish between whether the model copies the original works or merely infers information from them - as copyright does not protect against the latter.
> The first part, collation of source material for training is emphatically not unexplored legal or moral territory.
Similar to as in Authors Guild v. Google, Inc. where Google internally made entire copies of millions of in-copyright books:
> > While Google makes an unauthorized digital copy of the entire book, it does not reveal that digital copy to the public. The copy is made to enable the search functions to reveal limited, important information about the books. With respect to the search function, Google satisfies the third factor test
Or in the ongoing Thomson Reuters v. Ross Intelligence case where the latter used the former's legal headnotes for training a language model:
> > verbatim intermediate copying has consistently been upheld as fair use if the copy is "not reveal[ed] to the public."
That it's an internal transient copy is not inherently a free pass, but it is something the courts take into consideration, as mentioned more explicitly in Sega v. Accolade:
> > Accolade, a commercial competitor of Sega, engaged in wholesale copying of Sega's copyrighted code as a preliminary step in the development of a competing product [yet] where the ultimate (as opposed to direct) use is as limited as it was here, the factor is of very little weight
And, given training a machine learning model is a considerably different purpose than what the images were originally intended for, it's likely to be considered transformative; as in Campbell v. Acuff-Rose Music:
> > The more transformative the new work, the less will be the significance of other factors
Listen, most website and book-authors want to be indexed by google. It brings potential audience their way, so most don’t make use of their _right_ to be de-listed. For these models, there is no plausible benefit to the original creators, and so one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.
> It brings potential audience their way, so most don’t make use of their _right_ to be de-listed.
The Authors Guild lawsuit against Google Books ended in a 2015 ruling that Google Books is fair use and as such they don't have a right to be de-listed. It's not the case that they have a right to be de-listed but choose not to make use of it.
The same would apply if collation of data for machine learning datasets is found to be fair use.
> one has to argue they have _no_ such right to be “de-listed” in order to get any training data currently under copyright.
Datasets I'm aware of already have respected machine-readable opt-outs, so if that were to be legally enforced (as it is by the EU's DSM Directive for commercial data mining) I don't think it'd be the end of the world.
There's a lot of power in a default; the set of "everything minus opted-out content" will be significantly bigger than "nothing plus opted-in content" even with the same opinions.
With the caveat that I was exactly wrong about the books de-listing, I feel you are making my point for me and retreating to a more pragmatic position about defaults.
The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow. You would be a fool not to, the AI companies are trying to end run you, using your own content, and make a profit without compensating you and leave you with no recourse.
> I feel you are making my point for me and retreating to a more pragmatic position about defaults
I'm unclear on what stance I've supposedly retreated from. My position is that an opt-out is not necessary under current US law, but that it wouldn't be the worst-case outcome if new regulation were introduced to mandate it.
> The (quite entertaining) saga of Nightshade tells a story about what is going to be content creators “default position” going forward and everyone else will follow
By "default" I refer not to the most common choice, but to the outcome that results from inaction. There's a bias towards this default even if the majority of rightsholders do opt to use Nightshade (which I think is unlikely).
Oh come on, you’re being insincere. Wether or not the model is learning from the work just like people is hotly debated as if it would make a difference. Fair use is even brought up. Fair use! Even if it applied, these training sets collate all of everything
These are technicalities IMO. There is nothing else to EA than the current institutions, people and praxis. If they become unfashionable or dishonored their moment will pass.
It’s not some special new kind of cause, it’s just charity with an almost intolerably smug self-image.
EA was a philosophy before it was a culture. Sure, the culture has wankers. And unfortunately the wankers are loud, which can make you think everyone who does EA is smug and unrealistic.
But the idea of bringing statistical rigor to charity is an important one, and it really was a development -- it came shockingly late. Statistics came late to baseball too, and plenty of people fought the idea, but it won pretty fast, because people in baseball care about winning. If* people doing charity really care about achieving the most good, it will win there, too.
How could you think I didn't read what you wrote when I disagreed so specifically with it? EA is something new -- not as a goal, clearly (Bentham had already gotten there) but as a method. And it is more than the people who identify with it, just like mathematics can be distinguished from mathematicians.
If there is something new about EA it is, as a matter of public record, not the use of statistically rigorous cost-/benefit analysis. The hubris!
Anyhow, I will admit that the (putative) indifference to specific causes, so long as they are E and A, and (ostensibly) apolitical posture is original.
They are ofcourse transparently performative. Animal welfare cannot be measured but only assumed, and should therefore by their own “philosophy” rank lower than the lowest net positive measurable utility at any price.
Whatever else “effective” implies, it must include some change in society to be materially meaningfull. While this is not a standard the Longtermist arm adheres to, the rest of EA and the “philosophy” demand it. It doesn’t take all that much imagination to see which political project EA is most aligned with…