I guess it depends on the position - if you have to work with people as a regular part of your position, social skills would be a part of the role, an important one at that.
I always bring this back to photoshopping. It's been around forever.
That teacher in the article - what if she was photoshopped instead of "deep faked" would anything have been different?
Photoshopping has been around forever.
Just a bit weird to me that people are predicting the AI end of the world with this stuff when its really just a slightly faster way to do something we've had for ages.
Search engines have a special legal carve-out, but otherwise granting access to browse a site ABSOLUTELY DOES NOT mean you have any rights to take it and do whatever you want with it. In the US, all works are automatically granted a copyright with all rights reserved, and the owner can choose to relax or waive those rights at their discretion, which most blog/social media posts, etc. do not waive those rights.
- Crawl Limitations: Search engines typically adhere to guidelines provided by website owners through the robots.txt file. This file instructs web crawlers on which parts of a website they are allowed to access and index. Website owners can use these instructions to control the extent to which search engines crawl and display their copyrighted content.
- Indexing vs. Displaying: Search engines primarily index web pages to create a searchable database of information. They do not generally host or display full copyrighted content directly. Instead, search results usually provide brief snippets, page titles, and links that direct users to the original source. This approach aims to respect copyright by driving traffic to the copyright holders' websites.
- Fair Use Considerations: In some cases, search engines may display limited portions of copyrighted content under the fair use doctrine, which allows for the limited use of copyrighted material for purposes such as commentary, criticism, news reporting, or educational purposes. The application of fair use can be subjective and depends on the specific circumstances of each case.
Replace "search engine" with "LLMs", it's (practically) the same.
> I care about longevity, will I get local .md files so I can easily back up and have them to move to other programs if the worst were to happen?
Yes, you can easily download PDF, HTML and MD files.
> Do you have a taggin system? Hard to see in the screenshot.
No not yet, since this is our second MVP and public beta. We have planned a tagging system though, because it's important for our social blog features.
> What about linking notes? To be honest, joplin sucks a bit about this.
Not yet, because we want to avoid too much complexity on the UI side but yes, the features will grow as the app grows :)
> How do you pay for hosting? Joplin has Jolin cloud, and its a couple bucks a month, how do you make it free, especially in the long run?
I have a sponsored hosting from Vercel because the project is FOSS. All I need to do is take care of the firebase side.
We did talk about introducing paid plans but that's far future for now, we'll adjust according to the feedback we get and it also depends on how people are using the app.
That little stinger at the end was not as surprising as they thought it was :P
It's very cool tech, but it's far from transparent. It has a very obvious "autotune" like sound to it that jumps right out. when they edited that one word it was obvious it had been edited.
Again, super cool tech, just not going to replace voice actors or anything.
For me its more like I wake up and check if humans have been replaced yet. Oh good, it's another day that I don't have to share one time pads with my mother to ensure that I'm talking to her and not a simulant performing fraud on a massive scale.
imagine when they have robots indistinguishable from humans, with some sort of real skin face that can be morphed to any existing or non existing face. I mean, it almost seems easy, or inevitable at least.
Most of the time you want to confirm that you're talking to someone from a given context -- they own a specific Twitter account, or you met them at a party last week, or they sent you an email or were present in a meeting that you want to have a conversation about.
Government ID doesn't help much with those -- it's actually the thing that is not strong enough.
Yes and it's going to be done through digital IDs. Unless something dramatic happens, we're poised to turn to digital IDs linked to your real ID and in turn validating access to apps/communication.
And authoritarians everywhere will rejoice (and they will give out the means to duplicate these IDs to a select few in case they need to generate evidence that 'you' have offended the state).
The only sensible approach to this problem (assuming it is a real problem) is trusted individuals certifying others as human.
There are alternatives to using the state for this, but they are difficult and fraught with UX issues. Perhaps a decentralised web of trust or some sort of blockchain based registrar of trust that can trace trust routes between mutually distrusting individuals.
Unless such a system is in place, international and strong before states start playing in this space, there isn't much chance of beating a state's approach to the problem.
Just look at https certificates. The current system involves browsers shipping configured to trust a whole bunch of entities I don't really trust, and there has been relatively little interest in trying to build a working decentralised approach to site security.
I'll also add that some of the state proposals are in general quite nice in where their priorities are. Such as prioritizing working in the open, both in sharing research, working with standards groups, and open source tooling.
It would be nice if we could come to a thorough solution that actually does cover all bases, rather than all these companies trying to create their own digital ID services that just encourage us to instead do silly things like photograph your ID front and back.
I mean hell it's taken like 20 years for privacy by design to become an ISO standard? That sort of timeline is not something we can really tolerate as more and more people continue relying on online services and in turn wind up trusting horribly outdated techniques/general malaise about data.
To me it has a very obvious "Hindi is my native language" accent. I mean after literally the first sentence: "The research team at Meta is excited to share our work...". Ouch. The "our work": just ouch. I was wondering why it wasn't a native english speaker presenting the video when the video is precisely about generating speech.
The first seven seconds are particularly bad.
Don't get me wrong: I've got a lovely french accent when I speak english.
This has either been trained on too many audiobooks spoken by non-natives or they've used their own tech, where the "reference audio" given as input was from a non-native.
In any case something is seriously off.
At 1:59, the "Hi guys, thanks you for tuning in! Today we are going to show you..."... That is obviously an Hindi speaker speaking (it's an example of fixing a real voice by removing background sounds).
I think that the main voice of the video was done by the same person who did the example at 1:59. And I think that they used their example of using a "reference audio".
And that person ain't a native english speaker.
To compare: when the reference audio uses a proper english accent (the example with the "diverse ecosystem" at 0:52), then the output from the text-to-speech sounds native.
I think they just fucked the demo video and it may already be ready for prime time.
Maybe they deliberately chose an accent that wasn't native English to demonstrate the style transfer capability. I think the ability of the system to output accented voices is a strength not a weakness, so long as it can do other accents too.
I'm surprised you had such a negative reaction to the Hindi accent! To me, it was no more difficult to understand than my colleagues who speak English as a second language.
To me, this is a style choice for the demo. Not evidence that they "fucked" it up. Accents are common - everyone has one! It's nice to see the model can support your personal voice even if it's not completely neutral English.
>Nonetheless, a form of speech known to linguists as General American is perceived by many Americans to be "accent-less", meaning a person who speaks in such a manner does not appear to be from anywhere in particular. The region of the United States that most resembles this is the central Midwest, specifically eastern Nebraska (including Omaha and Lincoln), southern and central Iowa (including Des Moines), parts of Missouri, Indiana, Ohio and western Illinois (including Peoria and the Quad Cities, but not the Chicago area).
> Nonetheless, a form of speech known to linguists as General American is perceived by many Americans to be "accent-less"
TLDR: "neutral English" is like "neutral water temperature" - it feels neither hot not cold because it matches ones body temperature. It's subjective, and terming it "temperatureless water" is even less accurate.
I'd put emphasis on "perceived" and "American" in that statement, and also note that this is limited to regional accents: General American is unambiguously American. Similar to General American, many countries have developed a "Newscaster" accent, e.g. Received Pronunciation for Britain, but it's not considered neutral as it is the "upper class" accent.
In every language I've known well enough to distinguish accents, I've realized newscasters adopt a distinct accent/cadence that's not commonly used. But I wouldn't call it "accentless" - it's just another accent that may/may not have evolved from a culturally dominant regional accent (or dominant figure from a specific region.)
The accent was obvious enough that I wonder if they might have not been trying to hide it at all? Maybe they just happened to pick somebody from the team with a very mild accent.
The accent was part of the show. They demonstrated how to create an accent from scratch: sample a voice in the accent's original language (हिंदी, français) and then have that voice read text in the target language (English). Voilà, accent.
For video narration elocution, I'd say it was most of the way there.
When. Narrating. Videos. One. Tends. To. Speak. Differently.
Or, the more important case -- if I'm listening to audio-version-of-X, is it sufficiently human-like that I can forget that it's synthesized voice?
To me, yes.
Easy to tell if you're specifically listening for it, but to use an analogy one doesn't typically read novels and parse closely for grammar, does one? Your attention is elsewhere, on the content and plot.
In my opinion it isn't exactly great. It doesn't do much creativity. Eg. I asked it to make a dance song from a club but have classical instruments as well. It was unable to do that. It's mono and can often be out of tune.
We will see how it improves, but this certainly isn't taking away jobs for musicians in it's current state.
Those examples are really hard to listen to, except for maybe the techno/dancebeat ones.
Part of the problem is the awful sound, would be much more interesting to have scores generated by AI and then have musicians record it. As it is you can't really tell why it sounds so awful.
Give it 3 months. We've gone through this already with generative image AI. Another vibe I'm getting from reading these comments is that musicians clearly have discerning taste. Unfortunately, I don't think the public really does. I certainly don't. All these samples sound "good" to me.
Openai's Jukebox, which is maybe two years old now, is musically much more creative and interesting. Its sound quality is horrendous and it's amusingly unstable, but the density of good ideas in what it produces is orders of magnitude better than this.