Hacker News new | past | comments | ask | show | jobs | submit | more canadianwriter's comments login

I guess it depends on the position - if you have to work with people as a regular part of your position, social skills would be a part of the role, an important one at that.


While not perfectly matching to actual usage - interest has dropped MASSIVELY.

https://trends.google.com/trends/explore?date=today%201-m&q=...

It's kind of interesting to see stuff like that.


If they told the parents they were photoshopped instead of AI made, would that have changed anything?

Curious if we change the language used if that changes how people think of it...


I always bring this back to photoshopping. It's been around forever.

That teacher in the article - what if she was photoshopped instead of "deep faked" would anything have been different?

Photoshopping has been around forever.

Just a bit weird to me that people are predicting the AI end of the world with this stuff when its really just a slightly faster way to do something we've had for ages.


Photoshop doesn't create photorealistic live video of you. Even voices can be matched.

"Slightly" faster is vastly underselling it, deepfakes can be rendered in real time now. The same tech powers TikTok filters.


Not sure what you are implying here - just because something is free doesn't mean you can use it in a commercial product....


What about search engines?

If you post something to the public internet, you lose privacy ... that's how internet works.

For this we have robots.txt and authentication ... if a site allows you to browse their content, it's free to take, whatever the purpose.


Search engines have a special legal carve-out, but otherwise granting access to browse a site ABSOLUTELY DOES NOT mean you have any rights to take it and do whatever you want with it. In the US, all works are automatically granted a copyright with all rights reserved, and the owner can choose to relax or waive those rights at their discretion, which most blog/social media posts, etc. do not waive those rights.


- Crawl Limitations: Search engines typically adhere to guidelines provided by website owners through the robots.txt file. This file instructs web crawlers on which parts of a website they are allowed to access and index. Website owners can use these instructions to control the extent to which search engines crawl and display their copyrighted content.

- Indexing vs. Displaying: Search engines primarily index web pages to create a searchable database of information. They do not generally host or display full copyrighted content directly. Instead, search results usually provide brief snippets, page titles, and links that direct users to the original source. This approach aims to respect copyright by driving traffic to the copyright holders' websites.

- Fair Use Considerations: In some cases, search engines may display limited portions of copyrighted content under the fair use doctrine, which allows for the limited use of copyrighted material for purposes such as commentary, criticism, news reporting, or educational purposes. The application of fair use can be subjective and depends on the specific circumstances of each case.

Replace "search engine" with "LLMs", it's (practically) the same.


For a heavy joplin user - can you compare the features? It looks like an awesome UI, but wanna make sure itll be good before I jump!

I care about longevity, will I get local .md files so I can easily back up and have them to move to other programs if the worst were to happen?

Do you have a taggin system? Hard to see in the screenshot.

What about linking notes? To be honest, joplin sucks a bit about this.

How do you pay for hosting? Joplin has Jolin cloud, and its a couple bucks a month, how do you make it free, especially in the long run?


> I care about longevity, will I get local .md files so I can easily back up and have them to move to other programs if the worst were to happen?

Yes, you can easily download PDF, HTML and MD files.

> Do you have a taggin system? Hard to see in the screenshot.

No not yet, since this is our second MVP and public beta. We have planned a tagging system though, because it's important for our social blog features.

> What about linking notes? To be honest, joplin sucks a bit about this.

Not yet, because we want to avoid too much complexity on the UI side but yes, the features will grow as the app grows :)

> How do you pay for hosting? Joplin has Jolin cloud, and its a couple bucks a month, how do you make it free, especially in the long run?

I have a sponsored hosting from Vercel because the project is FOSS. All I need to do is take care of the firebase side. We did talk about introducing paid plans but that's far future for now, we'll adjust according to the feedback we get and it also depends on how people are using the app.


like... a pin?


In the US some phone companies have been using voice recognition to authenticate when their customers call. This will definitely have to see its end.


HMRC in the UK also use it. Has never worked once for me and I don't even have any kind of accent.


That little stinger at the end was not as surprising as they thought it was :P

It's very cool tech, but it's far from transparent. It has a very obvious "autotune" like sound to it that jumps right out. when they edited that one word it was obvious it had been edited.

Again, super cool tech, just not going to replace voice actors or anything.


For me its more like I wake up and check if humans have been replaced yet. Oh good, it's another day that I don't have to share one time pads with my mother to ensure that I'm talking to her and not a simulant performing fraud on a massive scale.


imagine when they have robots indistinguishable from humans, with some sort of real skin face that can be morphed to any existing or non existing face. I mean, it almost seems easy, or inevitable at least.


Cryptographic proof of personhood is going to be a thing, is it not? Outside of BigTech, Signal is as poised as WorldCoin to be just that.


I’m just not convinced that anything not tied directly to a government issued ID is going to be strong enough.


Most of the time you want to confirm that you're talking to someone from a given context -- they own a specific Twitter account, or you met them at a party last week, or they sent you an email or were present in a meeting that you want to have a conversation about.

Government ID doesn't help much with those -- it's actually the thing that is not strong enough.


Yes and it's going to be done through digital IDs. Unless something dramatic happens, we're poised to turn to digital IDs linked to your real ID and in turn validating access to apps/communication.


And authoritarians everywhere will rejoice (and they will give out the means to duplicate these IDs to a select few in case they need to generate evidence that 'you' have offended the state).


The only sensible approach to this problem (assuming it is a real problem) is trusted individuals certifying others as human.

There are alternatives to using the state for this, but they are difficult and fraught with UX issues. Perhaps a decentralised web of trust or some sort of blockchain based registrar of trust that can trace trust routes between mutually distrusting individuals.

Unless such a system is in place, international and strong before states start playing in this space, there isn't much chance of beating a state's approach to the problem.

Just look at https certificates. The current system involves browsers shipping configured to trust a whole bunch of entities I don't really trust, and there has been relatively little interest in trying to build a working decentralised approach to site security.


I'll also add that some of the state proposals are in general quite nice in where their priorities are. Such as prioritizing working in the open, both in sharing research, working with standards groups, and open source tooling.

It would be nice if we could come to a thorough solution that actually does cover all bases, rather than all these companies trying to create their own digital ID services that just encourage us to instead do silly things like photograph your ID front and back.

I mean hell it's taken like 20 years for privacy by design to become an ISO standard? That sort of timeline is not something we can really tolerate as more and more people continue relying on online services and in turn wind up trusting horribly outdated techniques/general malaise about data.


> It has a very obvious "autotune"

To me it has a very obvious "Hindi is my native language" accent. I mean after literally the first sentence: "The research team at Meta is excited to share our work...". Ouch. The "our work": just ouch. I was wondering why it wasn't a native english speaker presenting the video when the video is precisely about generating speech.

The first seven seconds are particularly bad.

Don't get me wrong: I've got a lovely french accent when I speak english.

This has either been trained on too many audiobooks spoken by non-natives or they've used their own tech, where the "reference audio" given as input was from a non-native.

In any case something is seriously off.

At 1:59, the "Hi guys, thanks you for tuning in! Today we are going to show you..."... That is obviously an Hindi speaker speaking (it's an example of fixing a real voice by removing background sounds).

I think that the main voice of the video was done by the same person who did the example at 1:59. And I think that they used their example of using a "reference audio".

And that person ain't a native english speaker.

To compare: when the reference audio uses a proper english accent (the example with the "diverse ecosystem" at 0:52), then the output from the text-to-speech sounds native.

I think they just fucked the demo video and it may already be ready for prime time.


Maybe they deliberately chose an accent that wasn't native English to demonstrate the style transfer capability. I think the ability of the system to output accented voices is a strength not a weakness, so long as it can do other accents too.


I'm surprised you had such a negative reaction to the Hindi accent! To me, it was no more difficult to understand than my colleagues who speak English as a second language.

To me, this is a style choice for the demo. Not evidence that they "fucked" it up. Accents are common - everyone has one! It's nice to see the model can support your personal voice even if it's not completely neutral English.


> It's nice to see the model can support your personal voice even if it's not completely neutral English

There is no such thing as "neutral" English.


>Nonetheless, a form of speech known to linguists as General American is perceived by many Americans to be "accent-less", meaning a person who speaks in such a manner does not appear to be from anywhere in particular. The region of the United States that most resembles this is the central Midwest, specifically eastern Nebraska (including Omaha and Lincoln), southern and central Iowa (including Des Moines), parts of Missouri, Indiana, Ohio and western Illinois (including Peoria and the Quad Cities, but not the Chicago area).


> Nonetheless, a form of speech known to linguists as General American is perceived by many Americans to be "accent-less"

TLDR: "neutral English" is like "neutral water temperature" - it feels neither hot not cold because it matches ones body temperature. It's subjective, and terming it "temperatureless water" is even less accurate.

I'd put emphasis on "perceived" and "American" in that statement, and also note that this is limited to regional accents: General American is unambiguously American. Similar to General American, many countries have developed a "Newscaster" accent, e.g. Received Pronunciation for Britain, but it's not considered neutral as it is the "upper class" accent.

In every language I've known well enough to distinguish accents, I've realized newscasters adopt a distinct accent/cadence that's not commonly used. But I wouldn't call it "accentless" - it's just another accent that may/may not have evolved from a culturally dominant regional accent (or dominant figure from a specific region.)


The accent was obvious enough that I wonder if they might have not been trying to hide it at all? Maybe they just happened to pick somebody from the team with a very mild accent.


The accent was part of the show. They demonstrated how to create an accent from scratch: sample a voice in the accent's original language (हिंदी, français) and then have that voice read text in the target language (English). Voilà, accent.


For video narration elocution, I'd say it was most of the way there.

When. Narrating. Videos. One. Tends. To. Speak. Differently.

Or, the more important case -- if I'm listening to audio-version-of-X, is it sufficiently human-like that I can forget that it's synthesized voice?

To me, yes.

Easy to tell if you're specifically listening for it, but to use an analogy one doesn't typically read novels and parse closely for grammar, does one? Your attention is elsewhere, on the content and plot.


SPOILER ALERT :p

Its more surprising nowadays if an article about AI doesn't have a "twist" that what you read/heard/saw was AI.


I agree with you that v1 isn't a suitable replacement... right now. But what v2? v3?


I don’t think GP is saying it won’t improve. The thread is about the current state of it that meta wrote this article on.


What stinger at the end?


Not releasing code or weights under the false pretense of misuse.


No, they are referring to the end of the video, where they have a “surprise twist” that the video narration was autogenerated and not a real person.


I mean that wasn’t even slightly non-obvious.


False? The misuse opportunities are obvious.


Reproducing the code is a matter of time, and short at that.


It has buttons you can push! Seriously, for an instrument, the feel of the buttons is kind of a big deal.

This is meant for live "playing" as well as recording, so the feel of the buttons rather than a touch screen is important.

You should check out videos of people using the original push as a live instrument - it can take serious talent.


Non-clickbait link: https://aitestkitchen.withgoogle.com/experiments/music-lm

Its part of AI test kitchen, if you were already part of that for playing with BARD and stuff you just need to log in and give it a go.

If not, you'll have to get on the waitlist. When I asked I got access within a day, but might be much slower now with so much interest.

Here are a bunch of examples of the music: https://google-research.github.io/seanet/musiclm/examples/

In my opinion it isn't exactly great. It doesn't do much creativity. Eg. I asked it to make a dance song from a club but have classical instruments as well. It was unable to do that. It's mono and can often be out of tune.

We will see how it improves, but this certainly isn't taking away jobs for musicians in it's current state.


Those examples are really hard to listen to, except for maybe the techno/dancebeat ones.

Part of the problem is the awful sound, would be much more interesting to have scores generated by AI and then have musicians record it. As it is you can't really tell why it sounds so awful.


Yes, I actually found that painful to listen to. Interesting to see where it goes, but for this erstwhile musician it’s like fingers in a chalk board.


Give it 3 months. We've gone through this already with generative image AI. Another vibe I'm getting from reading these comments is that musicians clearly have discerning taste. Unfortunately, I don't think the public really does. I certainly don't. All these samples sound "good" to me.


Yeah I’m sure it will get better, but I’d you have a musical ear this is pure torture.


Openai's Jukebox, which is maybe two years old now, is musically much more creative and interesting. Its sound quality is horrendous and it's amusingly unstable, but the density of good ideas in what it produces is orders of magnitude better than this.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: