Hacker News new | past | comments | ask | show | jobs | submit | drvortex's comments login

I think it is just that Obsidian has a nerd vibe. Like Obsidian is to Notion what Neovim is to VSCode. It isn’t immediately obvious why it is better, but one of them is more l33t hax0r.

I used Obsidian for 2 years. But I ditched it because it was local only. And it’s sync capabilities (without storage) cost more than an entire office site + cloud storage subscription package. Ridiculou that they expect 8 Euro per month just to sync (not even store) my data.

I now use UpNote. Which has cloud sync, works cross platform and has a one time purchase option that is less than 50 bucks


You can sync Obsidian with whatever solution you want to. It’s just flat text files on your filesystem. I store my notes in OneCloud and it syncs fine. Heck, there is even a free plugin to sync your notes using Git.

The subscription is a convenience but in no way required.


Yeah, hearing the complaints about Sync is just confusing. I didn't have the money to pay for yet another subscription but i wanted to have the same notes on my phone as my computer, and have some backup somewhere. I just googled it and within 30 minutes i have a completely free git syncing plugin working on my laptop and phone installations with a private repository that backs up and holds the complete history.

It was very easy and immediately discoverable. Some day when my savings account starts going up again I'll pay for sync but it was trivial to get "Obsidian Git" plugin working in the meantime


But is there a way to do that with my iPhone and windows pc that doesn’t take <1-2 hours ?


You can use git to backup Obsidian for free, it works on every device, even iOS.

And that's where Obsidian is "obviously better" than notion, it has plugins that anyone can develop.

Another reason why it's better, which is also why it can be so easily backed-up to git, is that it uses simple markdown files with 1 file = 1 note. If Obsidian stops working one day for.. reasons? You still have all your notes and can use any markdown editor to use them.


> It isn’t immediately obvious why it is better

It's pretty obvious. It's an open format with local files and has plugins. It sucks all over the place, but it has a solid enough foundation for people to tinker it to death. Something they can't really do with most other tools in that space.


I think you'd want notes to last a long time, I personally am skeptical to Notion for this reason.

But I also found obsidian to be a bit brittle (especially if you don't pay for sync).

I've just reverted back to a note book for my journal and then files in ~/notes that I grep.


I hear what you're saying - though I've had good luck (i.e. haven't had to give it a thought in two years) with Syncthing pumping my vaults (and all business files for that matter) to phone, laptop, and backup nas.


Good idea. No extension for browser to use Autofill visible on the website. No mobile clients visible. Unfortunately, that's a deal breaker as there is no reason for me to switch away from Bitwarden.


Hi, yes, I understand - there are no extension or mobile clients available yet. Wanted to see if people would actually be interested in this kind of product before committing to that kind of development. Thank you for the feedback!


Forgot to mention; the application itself https://www.cryptex-vault.com/app is a PWA - meaning, android (chrome, firefox) or iOS (Safari) will let you install it and use it as a standalone application.


Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

It is not directly using your code any more than programmers are using print statements. A book can be copyrighted, the vocabulary of language cannot. A particular program can be copyrighted, but snippets of it cannot, especially when they are used in a different context.

And that is why this lawsuit is dead on arrival.


> Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

This is kinda smug, because it overcomplicates things for no reason, and only serves as a faux technocentric strawman. It just muddies the waters for a sane discussion of the topic, which people can participate in without a CS degree.

The AI models of today are very simple to explain: its a product built from code (already regulated, produced by the implementors) and source data (usually works that are protected by copyright and produced by other people). It would be a different product if it didn't have used the training data.

The fact that some outputs are similar enough to source data is circumstantial, and not important other than for small snippets. The elephant in the room is the act of using source data to produce the product, and whether the right to decide that lies with the (already copyright protected) creator or not. That's not something to dismiss.


It's not something to dismiss but it is something that has already been addressed. Authors Guild v Google. Google Books is built upon scanning millions of books from libraries without first gaining permission from copyright holders, this was found to not be a violation of copyright.

Building a product on top of copyright works that does not directly distribute those works is legal. More specifically, a computer consuming a copyright work is not a violation of copyright.


At the time the suit was launched, Google search would only display snippet views. The very nature presents the attribution to the user, enabling them to separately obtain a license for the content.

This would be more or less analogous to Copilot linking to lines in repositories. If Copilot was doing that, there wouldn't be much outrage.

The fact that they are producing the entire relevant snippet, without attribution and in a way that does not necessitate referencing the source corpus, suggests the transgression is different. It is further amplified by the fact that the output itself is typically integrated in other copyrighted works.


Attribution is irrelevant in Authors Guild, the books were not released under open source licenses where attribution is sufficient to meeting the licensing terms. Google never sought or obtained licenses from any of the publishers, and the court ruled such a license was not needed as Google's usage of the contents of the books (scanning them to build a product) did not represent a copyright infringement.

Attribution is mentioned in this filing because such attribution would be sufficient to meet the licensing terms for some of the alleged infringements.

It's an irrelevant discussion though, the suit does not make a claim that the training of Copilot was an infringement which is where Authors Guild is a controlling precedent.


Attribution goes directly to factors 1, 3, and 4 of the fair use test.


In some contexts it's used to characterize the purpose of the copying, but it's not a consideration that was made in Authors Guild.


> Authors Guild v Google. Google Books is built upon scanning millions of books from libraries

I agree it's relevant precedent, but not exactly the same. Libraries are a public good and more importantly Google books references the original works. In short, I don't think that's the final word in all seemingly related cases.

> More specifically, a computer consuming a copyright work is not a violation of copyright.

I don't agree with this way of describing technology, as if humans weren't responsible for operating and designing the technology. Law is concerned with humans and their actions. If you create an autonomous scraper that takes copyrighted works and distributes them, you are (morally) responsible for the act of distributing them, even if you didn't "handle" them or even see them yourself.

Neither of the important aspects – remixing and automation – is novel, but the combination is. That's what we should focus on, instead of treating AI as some separate anthropomorphized entity.


Your disagreement and feelings about how copyright and the law should work are valid, they have very little to do with how copyright is addressed judicially in the United States


>Authors Guild v Google

At which case Google paid some hundred million $ to companies and authors, created a registry collecting revenues and giving to rightsholders, provided opt-out to already scanned books, etc. Hey, doesn't sound that bad for same thing to happen with Copilot.


But Copilot has been shown to distribute (parts of) the copyrighted works used to create it. That’s the difference.


A) No it doesn't, there's nothing in the Copilot model or the plugin that represents or constitutes a reproduction of copyright code being distributed by GH/MS. The allegation is it generates code that constitutes a copyright violation. This distinction is not academic, it's significant, and represents an unexplored area of copyright law.

B) "parts of" copyright works are not themselves sufficient to constitute a copyright violation. The violation must be a substantial reproduction. While it's up to the court to determine if the alleged infringements demonstrated in the suit (I'm sure far more will be submitted if this case moves forward) meet this bar, from what I've seen none of them have.

Historically the bar is pretty high for software, hundreds or thousands of lines depending on use case. A purely mechanical description of an operation is not sufficient for copyright, you cannot copyright an implementation of a matrix transformation in isolation no matter what license you slap on the repo. Recall that the recent Google v Oracle case was litigated over tens of thousands of lines of code and found to be fair use because of the context of those lines.

I've yet to see a demonstrated case of Copilot generating code that is both non-transformative and represents a significant reproduction of the source work.


> The allegation is it generates code that constitutes a copyright violation.

The weights of the Copilot very likely contain verbatim parts of the copyrighted code, just like in a zip archive. It chooses semi-randomly which parts to show and sometimes breaks copyright by displaying large enough pieces.

https://news.ycombinator.com/item?id=33458603


Speculation, and furthermore the model itself isn't distributed to consumers.


Say you publish a song and copyright it. Then I record it and save it in a .xz format. It's not an MP3, it is not an audio file. Say I split it into N several chunks and I share it with N different people. Or with the same people, but I share it at N different dates. Say I charge them $10 a month for doing that, and I don't pay you anything.

Am I violating your copyright? Are you entitled to do that?

To make it funnier: Say instead of the .xz, I "compress" it via π compression [1]. So what I share with you is a pair of π indices and data lengths for each of them, from which you can "reconstruct" the audio. Am I illegally violating your copyrights by sharing that?

[1] https://github.com/philipl/pifs


What you are actually giving people is a set of chords that happen to show up in your song, the machine can suggest an appropriate next chord.

It’s also smart enough to rebuild your song from the chords _if you ask it to_.


I take your code and I compress it in a tar.gz file. Il call that file "the model". Then I ask an algorithm (Gzip) to infer some code using "the model". The algorithm (gzip) just learned how to code by reading your code. It just happened to have it memorized in its model.


Yeah, and that’s completely fine.

I’ve seen this point made before, but it assumes you use the entire input as output, which is silly.


Oh no, I'm not using the entire input, just a few functions of interest. And not the copyright headers of course.


With the exception that there are infinite types of chords in this case, and even though many musicians follow familiar chord structures the underlying melodies and rhythms are unique enough for any familiar person to be able to differentiate "Red Hot Chill Peppers" from "All-American Rejects", and now there is a system where All-American Rejects hit a few buttons and a song is generated (using audio samples of "Under the Bridge") that sounds like "Under the Bridge pt 2, All-American Rejects Boogaloo".

That's why it's actionable and why there is meat on the bone for this case. The real issue is going to be if they can convince a jury that this software is just stealing code and whether its wrong if a robot does it.



I was thinking of something similar as a counter argument and lo and behold, it’s a real thing maths has solved with a real implementation.


This analogy is flawed


This is demonstrably false. It is a system outputting character-for-character repository code.[1]

[1]: https://news.ycombinator.com/item?id=33457517


If I use Photoshop to create an image that is identical to a registered trademark, is the rights violation my fault or Adobe’s fault?


Photoshop can't produce copyrighted images on its own.


To play devil's advocate: Co-Pilot can't reproduce copyrighted work without appropriate user input.

Just trying to demonstrate a point- this analogy seems flawed.


If I draw some eyes in Photoshop, it won't automatically draw the Mona Lisa around it for me.


Until you sprinkle a bit of Stable Diffusion V2 or 3 on it, or perhaps some GaN.

The more I think about it, the more this all seems like another dimension of Jack and the Magic Beanstalk crossed with The Matrix.


If you Google Mona Lisa the result is the Mona Lisa. If you query Copilot for a common piece of code you get that code.


Google doesn't sell its search feature as a product that you can just plagiarize the results from and they're yours. Microsoft does that with Copilot.

Copilot is as much of a search engine as Stable Diffusion or DALL-e are, which is to say they aren't at all. If you want to compare it to a search engine, despite it being a tortured metaphor, the most apt comparison is not to Google, but to The Pirate Bay if TPB stored all of their copyrighted content and served it up themselves.


With Copilot it's your responsibility not to use it as a search engine to copy-paste code. It's completely obvious when it's being used as a search engine so it's not a problem at all.

Stable Diffusion works on completely different principles and they can't exactly replicate a pixels from their training data.


So the problem you have with it is the UI?


No because that's not a trademark violation in anyway. Using GPL code in a non GPL project is a violation of copyright law though.


Ok, cool. Presumably that is because it’s smart enough to know that there is only one (public) solution to the constraints you set (like asking it to reproduce licensed code).

Now, while you may be able to get it to reproduce one function. One file, and definitely the whole repository seems extremely unlikely.


[flagged]


Individual words can't be copyrighted.


It can be modified to not do that (example: mutating the code to a "synonym" that is functionally but not visually identical).

It can also be modified to be opt-in-only (only peoples' code that they permit to be learned on, can use the product)


Perhaps you are right, and it could be so modified.

Could be, but isn’t. And that matters.


plagiarism with some words swapped is still plagiarism


Just to be clear; I cannot prove that they have used my code, but for the sake of argument, lets assume so.

They would have directly used my code when they trained the thing. I see it as an equivalent of creating a zip-file. My code is not directly in the zip file either. Only by the act of un-zipping does it come back, which requires a sequence of math-steps.


But there is no equivalent of "unzipping" for Copilot.

This is a generative neural network. It doesn't contain a copy of your code; it contains weightings that were slightly adjusted by your code. Getting it to output a literal copy is only possible in two cases:

- If your code solves a problem that can only be solved in a single way, for a given coding style / quality level. The AI will usually produce the same result, given the same input, and it's going to be an attempt at a solution. This isn't copyright violation.

- If 'your' code has actually already been replicated hundreds of times over, such that the AI was over-trained on it. In that case it's a copyright violation... but how come you never went after the hundreds of other violations?


There is no guarantee that a ML network only produces the input data under those two conditions. But even for

> If 'your' code has actually already been replicated hundreds of times over, such that the AI was over-trained on it. In that case it's a copyright violation... but how come you never went after the hundreds of other violations?

Replication is not a violation if the terms of the license are followed. Many open source projects are replicated hundreds of times with no license violation - that doesn't mean that you can now ignore the license.

But even if they did violate the license, that doesn't give you the right to do it too. There is no requirement to enforce copyright consistently - see e.g. mods for games which are more often than not redistributing copyrighted content and derivatives of it but usually don't run into trouble because they benefit the copyright owner. But try to make your own game based on that same content and the original publisher will not handle it in the same way as those mods. Same for OSS licenses: The original author does not lose any rights to sue you if they have ignored technical license violations by others when those uses are acceptable to the original author.


Neutral nets can and do encode and compress the information they're trained on, and can regurgitate it given the right inputs. It is very likely that someone's code is in that neural net, encoded/compressed/however you want to look at it, which Copilot doesn't have a license to distribute.

You can easily see this happen, the regurgitation of training data, in an over fitted neural net.


This is not necessarily true, the function space defined by the hidden layers might not contain an exact duplicate of the original training input for all (or even most) of the training inputs. Things that are very well represented in the training data probably have a point in the function space that is "lossy compression" level close to the original training image though, not so much in terms of fidelity as in changes to minor details.


When I say encoded or compressed, I do not mean verbatim copies. That can happen, but I wouldn't say it's likely for every piece of training data Copilot was trained on.

Pieces of that data are encoded/compressed/transformed, and given the right incantation, a neutral net can put them together to produce a piece of code that is substantially the same as the code it was trained on. Obviously not for every piece of code it was trained on, but there's enough to see this effect in action.


> which Copilot doesn't have a license to distribute

when you upload code to a public repository on github.com, you necessarily grant GitHub the right to host that code and serve it to other users. the methods used for serving are not specified. This is above and beyond the license specified by the license you choose for your own code.

you also necessarily grant other GitHub users the right to view this code, if the code is in a public repository.


Host that code. Serve that code to other users. It does not grant the right to create derivative works of that code outside the purview of the code's license. That would be a non-starter in practice; see every repository with GPL code not written by the repository creator.

Whether the results of these programs is somehow Not A Derivative Work is the question at hand here, not "sharing". I think (and I hope) that the answer to that question won't go the way the AI folks want it to go; the amount of circumlocution needed to excuse that the not actually thinking and perceiving program is deriving data changes from its copyright-protected inputs is a tell that the folks pushing it know it's silly.


copilot isn't creating derivative works: copilot users are.

the human at the keyboard is responsible for what goes into the source code being written.

to aid copilot users here, they are creating tools to give users more info about the code they are seeing: https://github.blog/2022-11-01-preview-referencing-public-co...


Your argument is essentially the same as the argument that the pirate bay didn't infringe copyright, it only facilitated infringement.

And we all saw how well that went legally.


Actually pirate bay was even less of an infringement as they did not dsitribute the copygihted content or derivatives themselves, only indexed where it could be found. With Copilot all the content you're getting goes trough Microsoft.


that is not how similar at all that is not how machine learning works OMG


Machine learning is not important to this line of argument. We are talking about the legal responsibility of a tool.


Pirate Bay couldn't be used to do anything but infringe copyright, practically. That's not true for Copilot.


Nonsense. It tracked millions of legitimate torrents.


The page surrounding the code in the GitHub UI is a derivative work, isn't it?

It's an html file containing both the licensed code and some other html


It still has attribution.


The relevant part of GitHub's terms of service:

"4. License Grant to Us

We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video.

This license does not grant GitHub the right to sell Your Content. It also does not grant GitHub the right to otherwise distribute or use Your Content outside of our provision of the Service, except that as part of the right to archive Your Content, GitHub may permit our partners to store and archive Your Content in public repositories in connection with the GitHub Arctic Code Vault and GitHub Archive Program."

https://docs.github.com/en/site-policy/github-terms/github-t...

I don't think these terms allow using content for Copilot.


It's served under the terms of my licenses when viewed on GitHub. Both attribution and licenses are shared.

This is like saying GitHub is free to do whatever they want with copyrighted code that's uploaded to their servers, even use it for profit while violating its licenses. According to this logic, Microsoft can distribute software products based on GPL code to users without making the source available to them in violation of the terms of the GPL. Given that Linux is hosted on GitHub, this logic would say that Microsoft is free to base their next version of Windows on Linux without adhering to the GPL and making their source code available to users, which is clearly a violation of the GPL. Copilot doing the same is no different.


Then github should make sure that people only upload stuff they are copyright owner of… which it has never done, warned about or tried to enforce.


> It is not directly using your code any more than programmers are using print statements. A book can be copyrighted, the vocabulary of language cannot. A particular program can be copyrighted, but snippets of it cannot, especially when they are used in a different context.

So what? Why shouldn't we update the rules of copyright to catch up to advances in technology?

Prior to the invention of the printing press, we didn't have copyright law. Nobody could stop you from taking any book you liked, and paying a scribe to reproduce it, word for word, over and over again. You could then lend, gift, or sell those copies.

The printing press introduced nothing novel to this process! It simply increased the rate at which ink could be put to pages. And yet, in response to its invention, copyright law was created, that banned the most obvious and simple application of this new technology.

I think it's entirely reasonable for copyright law to be updated, to ban the most obvious and simple application of this new technology, both for generating images, and code.


> Your code is not in that thing. That thing has merely read your code and adjusted its own generative code.

Completely incorrect. False dichotomy. It's widely known that AI can and does memorize things just like humans do. Memorization isn't a defense to violating copyright, and calling memorization "adjusting a generative model" doesn't make it stop being memorization.

If you memorized Microsoft's code in your brain while working there and exfiltrated it, the fact that it passed through your brain wouldn't be a defense. Substituting "generative model" for "brain" and the fact that it's a tool used by third parties doesn't change this.




> but snippets of it cannot

Yeah they can, and the whole functions that Copilot spits out are quite obviously covered by copyright.

> especially when they are used in a different context.

That doesn't matter.


it is essentially a weighted sum of your code and other copyright holders code. Do not let the mystique of AI fool you. Copilot does not learn, it glues.


I agree.

If I read JRR Tolkien and then go and write a fantasy novel following a unexpected hero on his dangerous quest to undo evil, I haven't infringed, even if I use some of Tolkien's better turns of phrase.


Games aren't even allowed to use the word "hobbit" without paying royalties. I'm sure you completely ignore what you're talking about.


Hmm. Are you sure that's true?


What a long winded article on what has been known to scientists for decades as "emergence". Emergent properties are systems level properties that are not obvious/predictable from properties of individual components. Looking and observing one ant is unlikely to tell you that several of these creatures can build an anthill.


Your comment was very puzzling to me, as I couldn't figure out what kind of misunderstanding about this article would prompt a comment such as this. But finally a possibility occurred to me: perhaps you think the point of this article was simply to say that there exist "systems that defy detailed understanding". It is possible that one could think that, if one went in with preconceived expectations based only on title of the post. (But this is a very dangerous habit in general, as outside of personal blogs like this one, almost always headlines in publications aren't chosen by the author.)

But we all know such systems already: for instance, people! No, this post is a supplement/subsidiary to the previous one ("Computers can be understood" — BTW here's another recent blog post making the same point: https://jvns.ca/blog/debugging-attitude-matters/), carving out exceptions to the general rule, and illustrating concretely why these are exceptions (and what works instead). It is useful to the practitioner as a rule-of-thumb for having a narrow set of criteria for when to avoid aiming to understand fully (and alternative strategies for such cases). Otherwise, it's very easy to throw up one's hands and say "computers are magic; I can't possibly understand this".

(The point of the article here is obvious from even just the first or last paragraphs of the article IMO.)


Yes, but to a lot of people that sounds like a lot of woo-woo. What this article does is explain it in a clear and persuasive way to the people in a particular field.

The fact that you didn't pick this up leads me to think you are more interested in being smart than helpful, but perhaps I am wrong about that.


5$ per month for sync? Are you really kidding me ?

Firefox and Chrome both already sync bookmarks, why are you not just letting the browser sync it?


I wanted something that would sync across multiple browsers and devices.


And yet, most people do not care.

The elephant in the room is, did the gene editing work? If yes, are those babies protected from HIV?

If yes...the world is not the same anymore.


There is no spin here. The fellow is remorseless and distributed software that he did not own, nor was lecensed to distribute, violating the license agreements of both Dell and Microsoft.

No sympathy here. Also, clickbait headline.


The fact that this is a criminal offense is what is right and truly fucked up.


He was making discs designed to intentionally deceive customers into thinking they were genuine discs from Dell and Microsoft, and doing it for profit. Isn't that classic criminal fraud?


The block is indeed in place, but trivially circumvented using a VPN.

Please e careful with your mirror. You are now susceptible to the same lawsuit as the original Project Gutenberg. I am not sure where you are in the world, but Germany has legal assistance treaties with the US/UK/Scandanavia, all EEC and Schengen countries.

https://www.congress.gov/treaty-document/108th-congress/27/d... https://www.mlat.info/mlat-index


Because of జ్ఞా ?


No.


How often will we reinvent LaTeX ?


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: