Hacker Newsnew | past | comments | ask | show | jobs | submit | jansenderr's commentslogin

Loman AI | Founding Engineer | Austin, TX | Hybrid | Full-Time | $175k - 225k + equity | https://loman.ai

-----------------------------------------------------------------------------------------------------------------------

We are building the best-in-class phone AI for restaurants, enabling autonomous agents to handle orders, reservations, and customer inquiries. As a founding engineer, you’ll shape our core product while working directly with cutting-edge LLMs, developing third-party integrations, and designing scalable APIs.

Unfortunately, we cannot offer visa sponsorship.

Stack: nodejs, typescript, postgres, fastify, docker, redis, react

-----------------------------------------------------------------------------------------------------------------------

Apply at jansen@loman.ai


join us!


Loman AI | Software Engineer | Austin, TX | Hybrid | Full-Time | https://loman.ai

--------------------------------------

We are building the best-in-class phone AI for restaurants, enabling autonomous agents to handle orders, reservations, and customer inquiries.

Hiring for multiple positions with experience in:

* AI Agents

* Integrations

* Cross-platform (react native)

Unfortunately, we cannot offer visa sponsorship.

Stack: nodejs, typescript, openai, postgres, fastify, docker, redis, react, react native, tailwind

--------------------------------------

Apply at jansen@loman.ai


Loman AI | Founding Engineer | Austin, TX | Hybrid | Full-Time | $125-175k + 0.5-1.5% equity | https://loman.ai

We are building the best-in-class phone AI for restaurants, enabling autonomous agents to handle orders, reservations, and customer inquiries. As a founding engineer, you’ll shape our core product while working directly with cutting-edge LLMs, developing third-party integrations, and designing scalable APIs.

Unfortunately, we cannot offer visa sponsorship.

Stack: nodejs, typescript, openai, postgres, fastify, docker, redis

Apply at jansen@loman.ai


There seems to be many ways to vary this tech stack. What is the logic behind each of the decisions you made?


That's a long story. Any project you're currently working on that our decision-making process could help with?


Trying to build a real-time telephone agent. So the same requirements as your project except going through twilio I would think


really? I think you are being extremely unfavorable to the authors here. They are simply stating the results on blood pressure went down but in a non-significant way. There is nothing wrong with that and also they are not making claims either way they are just reporting the results of the study and literally describing what happened. Please enlighten with what claims they are allegedly making? that the blood pressure decreased in a non-significant way?


"Non-significant" in scientific means "within the error bars". It means that it is indistinguishable from randomness.


That's a bit of an oversimplification that leaves out important qualifiers. "Non-significant" usually means it's indistinguishable from a null hypothesis when a certain level of randomness is allowed in a single, isolated trial.

Who picks the null hypothesis? How is it picked, i.e. why does that specific hypothesis get favourable treatment? What level of randomness should one allow? What does it even mean for an experiment to be a single, isolated trial? How can anything be?

Those are critical questions to understand the concept, and your explanation just pretends they don't exist.


You may be technically correct, but in real life where the rest of us live, it means this study can't be used to draw any conclusions.


That's another fallacy of frequentist reasoning, that we have to draw definitive conclusions from evidence. That something is definitely false until we have "statistical significance" where it all of a sudden becomes definitely true.

In real life, to borrow your description, we can hold varying levels of belief in statements depending on how strong the evidence is, and the magnitude of the payoff in the various cases.

Maybe the probability of the result in the study in question is 51 %. That's still more than 50 %. Whether that difference is meaningful to you is not something someone else can decide.


Nobody who knows what they are doing, and uses statistics, can flip from something being definitely true to definitely false. At best, they can find overwhelmingly convincing probabilities close to 0 or 1.

Honest scientists who use statistics do not make such a claim that an effect does not exist. Rather than the experiment that was conducted did not produce sufficient evidence (to a numerically defined standard) which justifies believing in the effect.

That is to say, that the existence of the effect, given the results of the experiment, has a low likelihood, and that low likelihood can be statistically quantified.

What that means is that exactly the same results as were observed will, or would, with a high probability, also be observed if the experiment occurs in the null hypothesis universe: the world in which the effect is absent.

So even if we are not in that universe (the effect is real), the experiment didn't show it.

The experiment simply doesn't discriminate between the null hypothesis and its negation to a level that could convince one to hold a probabilistic belief in the existence of the effect.


> the existence of the effect, given the results of the experiment, has a low likelihood, and that low likelihood can be statistically quantified

You have this completely backwards. It means that the likelihood of the null hypothesis was not below some threshold such that it can be "ruled out". It says absolutely nothing about the likelihood of the data if the effect exists.


Of course, but the fact that people apply a binary threshold tells you that they want to be able to rule out some things from their models entirely, and include other things as something that's as good as a true fact.


What does a non-binary threshold look like, and how is it different from just fine-tuning a regular binary threshold to err more or less on the side of caution?


It's not about a non-binary threshold. The problem is having a threshold in the first place.

Say that given the evidence there's a 9 % chance the null hypothesis is true. A frequentist used to a 10 % significance level would then say the effect is true. A frequentist trained on a 5 % significance level would say it is false.

But that's just an arbitrary cutoff that by itself means nothing.

If instead we look at a practical scenario where we would use this result, we understand the problem space better. Maybe we have figured out how to get limited rights to the transpositions of famous music to other A4s, and this would cost a lot to do, but earn us some money if we do it and the effect is real.

Should we acquire those rights or not? Ask the 5 % frequentist and they would say "there's no significant difference, so you shouldn't." Ask the 10 % frequent and they say the opposite. Who do we listen to?

Let's ask the poker player. They will ask "What exactly does it cost to do this, and how much will you earn if it works out? That matters!"

So let's say it costs us $100 per song to get the rights to the transposed version, and we think it will earn us $102 per song if the effect is true. Now we can just plug in, remembering that there's a 9 % chance the null hypothesis is correct:

-100 + 102 * 0.91 = -7.18 per song

Not good. What if we made $127 per song with the same cost?

-100 + 127 * 0.91 = +15.57 per song

Worth doing, at the same probability of the null hypothesis!

In other words, you can't determine what's significant until you know how you will use the result.

Statistics by itself is meaningless. It gains meaning only when it's used to choose between actions.


> Should we acquire those rights, or not?

That's a binary decision, which is bad, so we shouldn't.

In accordance with the confidence probability, we should buy into a percentage of the rights, and transpose the music to an interpolated value between A=432 and A=440 Hz.


Huh, yeah, you might be right!


This is not about fallacy or frequentism.

You badly misunderstand


The null hypothesis, in a nutshell, is the proposition that the effects which the experiment is designed to look for do not exist.

The obvious and only possible null hypothesis in this situation that tuning from 432 to 440 does absolutely diddly squat to the listener's physiology.


(Next day reply, after reading this discussion a couple of times)

> "432 Hz tuned music was associated with a slight decrease of mean (systolic and diastolic) blood pressure values (although not significant)"

I can see why this line is fine to some and bothersome to others.

Strictly, it's just describing their results. No problem. The numbers are what they are.

On the other hand, why draw attention to the difference in means, when they are about to tell you not to take it very seriously?

This version avoids that:

"432 Hz tuned music was not associated with a significant difference in mean (systolic and diastolic) blood pressure values"

Maybe it's my age, origin, or personality, but I prefer this version.


"Not significant" means that the probability is >=5% their result was obtained by chance.

We've settled as a community on a convention that we don't claim an effect is real until it is supported by data ("statistically significant") ie. <5% likely to be explained by chance in your results.

"Significant" does not mean big or important in this context. It means better than 5% unlikely to be (un)lucky data.


The threshold for significance lies in the eye of the beholder. A particle physicist might not be satisfied with anything over 0.01 %. A social scientist might be happy to see 10 %.

The 5 % number you mention is completely arbitrary and often woefully inappropriate.

Look at it from a betting perspective. Can you earn more than 10 × your investment if the null hypothesis is false? Then anything less likely than 10 % is significant.


It's a convention for scientific reporting. Your trades are not bound by this convention.

The parameter value is not arbitrary. It's a convention arrived at after hundreds of years. If it were arbitrary, p=0.999 or p=0.00001 would be just as good. We've settled on p=0.05 being usefully convincing but not crazy demanding to obtain by experiment with noisy measurements.


Null hypothesis testing was invented less than 100 years ago by Fisher, who completely arbitrarily picked 0.05 [0]. That value was not arrived at through wisdom of experience, and certainly not after hundreds of years of practice.

Though it has now indeed become conventional to test with p=0.05, there is nothing wrong with reporting an effect that fails the null hypothesis test. At least that is the position of the American Statistical Association [1].

[0] https://www.cantorsparadise.com/what-is-the-meaning-of-p-val...

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5017929/


Thanks for these refs. I read [1] carefully and I take your point that it’s ok to strictly report whatever the data says.

On the value itself, we are quibbling about the meaning of ‘arbitrary’: Fisher certainly could have chosen another value, but not all values would be considered useful. Some expertise about the nature of real world data and the minds of statisticians is encoded in the chosen value.

If I propose that we change the convention to use 1e-12 instead and you think ‘that’s too small, I prefer it the way it is’, then it’s not arbitrary in the sense I mean.


The thing you seem to be missing is that there's no one number that's a meaningful limit for all purposes.

What probability you accept as significant should depend entirely on how you plan to use the results. Something with a p value of a staggering 70 % (i.e. it's more likely not true than true) is significant if the payoff is good when it's true, and the cost is small when it's not true.

And 70 % is very far from 5 %!

Then again, if the payoff is tiny compared to the cost, you might ask for a p-value of less than 0.01 %, in order for it to make sense to take the chance on it.

Think like a poker player: a hand that has 1/4 chance of winning needs better than a 3-to-1 payout when it wins to be playable. Conversely, when the pot offers you a 3-to-1 payout, you better make sure your hand has more than a 1/4 chance of winning.


I understand and agree with that. Pascal’s Wager is the extreme example: unlimited upside makes the likelihood irrelevant.


They didn't claim it was real did they? Just read out the result which was lower but not in a significant way. I've read hundreds of papers do the same.


"was lower but not in a significant way"

By convention, this means "indistinguishable from", so reporting that it is lower is an unsupported claim. They would be equally justified in reporting that it was higher, ie. not at all.


It was lower though, just not significantly so (depending on your threshold for significance) - that's the standard way of reporting it. You can't just chunk part of the sentence and take issue with it, the sentence in it's entirety is accurate.


Yes I agree that the sentence is accurate.

Perhaps I’m too zealous about it, and maybe the conventions vary, but I was trained to avoid using words like ‘lower’ here.


> which was lower but not in a significant way

But if "result [was] lower but not in a significant way" means "result was not proven to be lower", how does saying "lower but not really lower" make ever any sense? It seems to me that such nonsensical formulation ought not to be ever used by anyone.


Because significance thresholds can vary pretty dramatically. Plenty of experiments done in physics for instance have reported results even though they didn't yet reach a 5 sigma threshold (3x10e-7). In physics something can be highly highly likely but still not 'significant' enough to warrant a discovery. They simply couch it as, hey this was the result and even though it isn't 'significant' the high likelihood may warrant additional research here. Reporting a binary significant/not significant is far less useful.


> "Not significant" means that the probability is >=5% their result was obtained by chance.

Ackchually... p-value represents the probability that results like these would be observed even if there was no difference between the two choices, simply due to chance.


The entire point of non-significant is that you can not say that anything happened

Authors are making an extremely serious mistake here and you shouldn't be pushing back


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: