Hacker News new | past | comments | ask | show | jobs | submit | ivanpashenko's comments login

sounds pretty advanced! can you share some examples of deterministic dimensions for scoring?

and what about llm-scoring: does LLM output passed/not_passed or it is more?


So for context our app (https://nativi.sh) is a language correction app. It takes in text and cleans it up to make it sound more fluent/correct, it's basically geared towards being grammarly for your second language.

For some of our deterministic LLM tests, we have inputs that have known spelling errors but no wrong word errors, or some other combination of errors. If the config under test doesn't identify the issue, or identifies issues that we know aren't there then it's marked as being wrong for that test case. Then we can test across config x language x kind_of_error.

For the LLM vibe driven scoring we have it set up to just do a head to head between the current leading config (usually what's in prod) and the new candidate config rather than generating an abstract score. It will flag "x config straight up failed question N based on some_reason(s)" so that we can manually check it.

My partner wrote the testing framework. She's been thinking about cleaning it up and open sourcing it.


"7 likes / no comments" --> should I read it as: people interested in others people experience, but have nothing to share about their own? - No prompt on production? - No testing or other routines about it yet?

Please share your current status :)


Just launched http://ineedicons.com –– custom made outline icons. Will see soon if it has legs.


And how is it going? Do people use it?


It is going well! (At least, I'm quite happy).

Between HN and Product Hunt on the same day we took in about 8,000 unique visitors, which resulted in over 350 signups.

Big drop-off since that early traffic spike, but is look like 40% of my traffic is still returning users. So that is interesting...

That traffic spike exposed a few big bugs which we closed this week, and now I'm figuring out next steps (marketing automation, more user acquisition, increasing sharing/virality).

Also, the more users I talk to the more I understand their use cases.

All in all, I'm loving it despite juggling this and my day job :-)


How do you send it, via email?


Which YCF debacle you mean?



Got it, thx!


Nice. How does it works?


kinda like my own personal HN(lol)...I post links with hash tags in the title. I added a login but you can post anonymously. I just haven't worked on it in ages.


Everyone who write to your public email have to pay. So basically new contacts. For friends and people you know you use your private email.

This is how a solution could look like: http://wrte.io


I could see it being used. Lots of people get hundreds of emails a day being pitched to or asking for help. Charging people (even a small amount) might get people to put more effort into their emails.


My bad. Is the question not clear enough?


instead of not charging (like it is now).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: