More

joshdickson · 2025-04-03T14:59:15 1743692355

I have tracked my macro intake seriously for years and use the database every day, as do many folks who used the initial app releases. It's actually more valuable to me to have the data in this format, even estimated, because what happens with other apps is you get gaps in macronutrient reporting on things like Omega 3's, and you wonder 'Am I not eating any Omega 3's or does the database containing the food I ate just not include them?'. In that case I'd much rather have an LLM that had access to as much relevant data as I could feed it reason through approximate nutrient distribution and give me the best estimate it could.

Appreciate the feedback!

joshdickson · 2025-04-03T14:46:02 1743691562

That is missing a milligram label, thank you for pointing that out. Fix uploading now.

johnisgood · 2025-04-03T14:57:58 1743692278

That is what I thought.

BTW when you hover over the ingredients, you just get back the name. Are you guys going to do something with it in the future? Right now there is a visual feedback (the cursor changes), but it is not useful yet. I am not entirely sure what I would have expected, perhaps a description of what it is, and upon clicking on it, it could have information gathered from various sources, like examine.com and what have you, but that would be a huge change on its own, the short description upon mouse hover-over should work for now and may not be a huge change.

joshdickson · 2025-04-03T15:11:37 1743693097

The goal, without question, is 100% full coverage on citations for every piece of data that's in the database, even if the citation is an LLM's general reasoning (which for o1-pro is both quite good and often includes study citations).

Right now you'll see that aggregated on some items like this where the reported data is an ensemble of all of the linked resources: https://www.opennutrition.app/search/eggs-eeG7JQCQipwf

Frankly, I just couldn't justify the additional time and monetary expense in doing that if I released this initial version and nobody cared or found it useful. This dataset was also compiled before tools like Claude Citations came out which could make it easier. That is the nature of AI-driven data; I think this is useful now, it is also the worst it will ever be.

johnisgood · 2025-04-03T15:13:41 1743693221

I am not complaining by the way, it was more of a feature request, for example when you hover over an ingredient (e.g. "Choline", "Tryptophan", etc.), it may display a somewhat concise description of that ingredient (e.g. "Tryptophan is ..."). It is fine as it is either way, all things considered.

joshdickson · 2025-04-03T22:11:09 1743718269

Ah that's a good idea, should add that. Also I appreciate your phrasing of "you guys" when, as a solo developer, if someone thinks your efforts are the product of a larger team, it's always appreciated :)

johnisgood · 2025-04-04T06:04:09 1743746649

Yeah I thought more people worked on it, it looks good. :P

Keep it as accurate as possible, and maintainable, and then it will be easy to add larger features. If no one else does, I might add a calorie tracker of some sort, it would be helpful to my mom. It is helpful as it is even now. How difficult would it be to add translations right now? She might look for "tojás" which is "egg" in Hungarian, and I would like her to be able to do that at some point.

joshdickson · 2025-04-03T14:40:46 1743691246

> Those rigorous validation steps were also created with LLMs, correct?

Not really. I do explain in the methodology post how good o1-pro is at the task, but there was a lot of manual effort involved in coming to that conclusion with my own effort to review the LLM's reasoning, and even still, o1-pro is not perfect.

yamihere · 2025-04-03T15:15:05 1743693305

Nice! Thanks for responding.

>> Outputs undergo rigorous validation steps, including cross-checking with advanced auditing models such as OpenAI’s o1-pro, which has proven especially proficient at performing high-quality random audits.

>> there was a lot of manual effort involved in coming to that conclusion with my own effort to review the LLM's reasoning

So, the randomly audited entries seemed reasonable to you – not even the data itself, just the reasoning about the generated data. Did the manual reviews stop once things started looking good enough? Are the audits ongoing, to fill out the rest of the dataset? Would those be manually double-checked as well?

>> I became interested in exploring how recent advances in generative AI could enable entirely new kinds of consumer products—ones whose core innovations leveraged AI but didn’t explicitly market themselves as “AI products.”

Once again: Why not market this as an AI product? This is LLMs all the way down.

People are already interested in using this dataset. I was. Now, LLM generated “usually close enough to not be actively harmful” data is being distributed as a source for any and all to use. I think your disclaimer is excellent. Does your license require an equivalent disclaimer be provided by those using this data?

joshdickson · 2025-04-03T15:44:10 1743695050

> not even the data itself, just the reasoning about the generated data

Poor phrasing on my end -- yes, absolutely the end data as well as the reasoning, as the reasoning tends to include the final answer.

Maybe I should! Appreciate the feedback.

yamihere · 2025-04-03T16:03:29 1743696209

Thanks again. Mine was an uncharitable interpretation, apologies for that. I appreciate your engagement with critical comments without coming off as defensive or snarky.

This looks like a lot of work and good will were poured into it, and I can see how it can be useful to a fitness focused audience.

You control the messaging on the site and in your apps, and you make it clear that this is not authoritative data. Everything built on top of this needs to have the same messaging, but it has probably been ingested into multiple LLMs already.

I think some sort of licensing requirement that the LLM source of this data be prominently disclosed will not keep this from becoming a source of truth for other datasets, products, and services; but, it is still worth the effort. All you can do is all you can do, right?

joshdickson · 2025-04-03T16:22:52 1743697372

The idea of including that requirement in the license is a good idea and I had not considered it, but I will -- frankly my motivations have been more on the citation side of things such that the need for quality disclaimers is not as great. Thank you for the suggestion.

joshdickson · 2025-04-03T14:21:16 1743690076

> or worse,use it as a primary source and become discouraged

I would hope these people download the free app so they can actually track their food, which has extensive tooling to track weight trends and expenditure changes over time :). But yes, you should be able to customize the assumptions, I just have about 100 more of these things to add and didn't want to wait longer to see feedback.

hluska · 2025-04-03T15:01:25 1743692485

I don’t know bud, but when I work with diet and nutrition, I feel like I owe users accuracy more than I deserve feedback. Maybe we have a different sense of ethics.

joshdickson · 2025-04-03T14:11:22 1743689482

> Could you possibly add an option to see the nutrient content per 100g serving?

In the top-right of the table in the web search, you can change the toggle from "Per Serving" to "Per 100g", though this is just for the table view.

joshdickson · 2025-04-03T14:09:31 1743689371

TL;DR: They are estimates from giving an LLM (generally o3 mini high due to cost, some o1 preview) a large corpus of grounding data to reason over and asking it to use its general world knowledge to return estimates it was confident in, which, when escalating to better LLMs like o1-pro and manual verification, proved to be good enough that I thought they warranted release.

You can read about the background on how I did them in more detail in the about/methodology section: https://www.opennutrition.app/about (see "Technical Approach")

Xiol32 · 2025-04-03T14:16:03 1743689763

You need to add a disclaimer for this data. People could rely on them being accurate, and you simply can't prove they are.

joshdickson · 2025-04-03T14:23:24 1743690204

There is a large disclaimer that states, among other things, "We strive to ensure accuracy and quality using authoritative sources and AI-based validation; however, we make no guarantees regarding completeness, accuracy, or timeliness. Always confirm nutritional data independently when accuracy is critical." on every page on the website where that kind of in-depth data is available.

adamas · 2025-04-03T14:34:46 1743690886

At that point, if you are not sure a data point is accurate, should you really display it ? You have no proof appart from "The LLM said it was ok" which is kind of poor.

sswatson · 2025-04-03T16:17:45 1743697065

I disagree with the idea that data must be accompanied by a guarantee of accuracy to be used or published. That standard would rule out almost all datasets for which the underlying data is not programmatically generated.

My guess is that this dataset is probably more accurate on the whole than many datasets used by the kinds of calorie-tracking apps that outsource their collection of nutrition information to users. But an analysis would be required.

Regardless, the only workable approach is to describe the provenance of your data and explain what steps have been taken to ensure accuracy. Then anyone who wants to use the data can account for that information.

joshdickson · 2025-04-03T14:04:14 1743689054

The OpenNutrition app does that :)

Logging foods by image is a great way to get started being accountable with eating, and I'll use it if I'm out and don't want to manually figure out all the different components of something, but it's impossible for even the most well-trained human eye to understand food composition visually. A lot of AI-focused diet apps have gone in this direction as their primary method of input because it removes the need for a database, but the marketing these apps run that this is in anyway accurate as a primary search mechanism is, to me, really borders on abject dishonesty and sets users up for long-term failure. Just because an ingredient is invisible when prepared doesn't mean it's not there.

joshdickson · 2025-04-03T13:55:05 1743688505

Ah that is an embarrassing bug. Mobile safari does not do that. Thank you for the report, looking to see why that is now.

Edit: Should be patched in Desktop Safari now.

jonesy827 · 2025-04-03T16:03:50 1743696230

It's still erroring in Firefox on macOS and Windows. I see a CORS error on the XHR request

joshdickson · 2025-04-03T16:24:13 1743697453

Should be back up now, I didn't scale up quickly enough for the traffic. My apologies and thank you for the report.

joshdickson · 2025-04-03T13:51:27 1743688287

Thank you so much for checking out the project and the bug report. The dataset includes alternate names for each of the non-branded grocery products and those are indexed into the Go-based prefix & full-word search engine that I wrote to answer queries. Sometimes they can be a bit over-prioritized in the search experience, but, I'd still rather have them :)

joshdickson · on Dec 30, 2023

It’s a lot harder to figure out if you have product market fit when you’re choosing a starting point that significantly reduces your addressable market.