More

ftyers · on Dec 18, 2023

ftyers · on Dec 6, 2023

[comment removed]

vidarh · on Dec 6, 2023

Frankly this does seem like a massive barrier to me.

It's certainly causing me to lose interest, and I suspect it's driving away a lot of people, not least because it was not at all obvious to me there was some way of speeding up getting a language in the first place.

It was already off-putting not to be given a way to write sentences or record right away.

But now that I know, I have no interest in wasting time contributing to a UI translation I actively don't want to be subjected to, but would happily contribute recordings and sentences on occasion if the language was enabled because the potential for speech recognition and tts utility is entirely separate in value from UI.

This whole approach feels really backwards to me, and the really short list of languages no longer surprise me.

EDIT: I see I actually have had it bookmarked a long time, and presumably lost interest once before due to the lack of my language.

EDIT2: As much as the Norwegian UI is already annoying me and I've already spotted at least one spelling mistake in it, and one translation that is "correct" that thoroughly annoys me, I'll see if I can submit some sentences at least.

yorwba · on Dec 6, 2023

> it was not at all obvious to me there was some way of speeding up getting a language in the first place.

Yeah, that's the biggest failing of Common Voice in my opinion. Getting a new language up to speed could be much improved by simply adding a few links to documentation, but even the existing links are broken, which I reported in March 2022... https://github.com/common-voice/common-voice/issues/3637

> I have no interest in wasting time contributing to a UI translation I actively don't want to be subjected to

Translating the UI may still help you get other people to record, even if you don't want to use it yourself.

> I'll see if I can submit some sentences at least

If you want to go faster, there's also a project to extract sentences from Wikipedia etc. in small doses Mozilla's lawyers and Wikimedia's lawyers have agreed are fair use. I think you'd only need to define how Norwegian Bokmål separates sentences. (E.g. after a period but not if it's a common abbreviation like "etc." in the preceding sentence.) https://github.com/Common-Voice/cv-sentence-extractor

ftyers · on Dec 6, 2023

> If you want to maximize the utility of a dataset like this, you really would want to let each speaker at least assign a lot of tags/labels to their profile; even if you don't want to deal with the hornet nest of trying to figure out all the distinctions, even unstructured labels would be a start, and ideally allowing people to tag individual recordings as well, because there are a lot more variations than just "language" and "accent" here.

This is exactly what the freeform accent (actually "variant") field is. You can add as many tags as you like. https://foundation.mozilla.org/en/blog/how-we-are-making-com...

vidarh · on Dec 6, 2023

Then the guidance on the site really needs to be updated, as that's not what the help in the profile section says, and starting to type the auto-completing options didn't really give reason to suspect that either.

ftyers · on Dec 6, 2023

Target segment was a was of including specific subdatasets. For example the digits dataset which was just the digits 0-9 and yes/no.

ftyers · on Nov 13, 2023

That's what I thought... The price amortises over the length of the marriage, so it actually works out fairly reasonably.

ftyers · on Nov 11, 2023

True, but it's also interesting to see what survived. My wife is a Nahuatl speaker, and some of the stuff in the book on Omens is still part of the culture, e.g. About owls being a sign of death.

ftyers · on Nov 11, 2023

The sad thing is that there isn't really anything new here. It's the Anderson and Dibble translation, and some random extra stuff. For 15 years work it's quite a limited contribution. In addition, it's not freely licensed. I'm working on a free/open-source licensed edition with linguistic annotation. If anyone is interested, ask for the link, it's on GitHub.

ftyers · on Nov 11, 2023

It's also available online at UNAM, https://temoa.iib.unam.mx/ along with a lot of other texts (see pages beginning with CF "Códice Florentino"), e.g. https://temoa.iib.unam.mx/cf_05_v

asimpletune · on Nov 11, 2023

Isn't it also the "García Garagarza" Spanish-English translation? Or is that what you mean by some extra stuff.

ftyers · on Nov 11, 2023

Yeah, that's what I mean, e.g. it's mostly existing published stuff. The new stuff is some partial summarisation in Eastern Huasteca Nahuatl, and some spoken audio (by EHN speakers), although, it's unclear what the audio gains. Without training, it's not really intelligible to most speakers of modern varieties.

belugacat · on Nov 11, 2023

Link?

ftyers · on Nov 11, 2023

Here is an example of what were producing so far for book 5 "the omens": https://github.com/ftyers/UD_Classical_Nahuatl-FloCo/blob/ma...

The repo is kind of messy, it's mostly me and some students working on it, but we're pretty passionate. Let me know if you'd like to get involved! :)

ftyers · on Nov 11, 2023

Note that the annotation is alignable with images of the original manuscript, which are online at the Library of Congress. I.e. https://www.loc.gov/item/2021667850/

The [orig] contains all the original token and line breaks.

nathancahill · on Nov 12, 2023

That's really cool. I went deep into epigraphy when I lived in Guatemala and was regularly finding carved pottery in our garden beds. Spent a lot of time annotating paper copies of glyphs.

What's the way to view/edit the .conllu files?

ftyers · on Nov 12, 2023

They're generated from a tonne of Python scripts, but feel free to get in contact and I can take you through how it works ! :)

ftyers · on Nov 9, 2023

What are the main things that USCIS looks at when deciding to approve an LPR application? Do they really require affidavits? How long are applications taking at the moment? And for advanced parole too?

proberts · on Nov 9, 2023

Do you mean marriage-based green card applications? These are super easy and fast now (less than 6 months in many cases) and don't require a lot of evidence of the bona fides of the marriage. A few affidavits can be helpful but they're absolutely not required, particularly if the other evidence is solid.

ftyers · on Nov 9, 2023

I meant for work-based ones. E.g. employer sponsored (by public university) ... But yes, with additional spouse dependent. (Both non-citizens, with H1B and H4 respectively)

ftyers · on Nov 7, 2023

Drew Devault: https://drewdevault.com/2018/10/05/Dont-sign-a-CLA.html

ftyers · on Oct 28, 2023

NSF definitely does, and it's often (at least at R1) of >50%