Really? I work in AI and my biggest concern is that I don't see any real products coming out of this space. I work closer to the models, and people in this specific area are making progress, but when I look at what's being done down stream I see nothing, save demos that don't scale beyond a few examples.
> in the 80s there were less defined products, amd most everything was a prototype that needed just a bit more research to be commercially viable.
This is literally all I see right now. There's some really fun hobbyist stuff happening in the image gen area that I think is here to stay, but LLMs haven't broken out of the "autocomplete on steroids" use cases.
> today's stuff is useful now
Can you give me examples of 5, non-coding assistant, profitable use cases for LLMs that aren't still in the "needed just a bit more research to be commercially viable" stage?
I love working in AI, think the technology is amazing, and do think there are some under exploited (though less exciting) use cases, but all I see if big promises with under delivery. I would love to be proven wrong.
LLMs can be used to generate high-quality, human-like content such as articles, blog posts, social media posts, and even short stories. Businesses can leverage this capability to save time and resources on content creation, and improve the consistency and quality of their online presence.
2. Customer Service and Support:
LLMs can be integrated into chatbots and virtual assistants to provide fast, accurate, and personalized responses to customer inquiries. This can help businesses improve their customer experience, reduce the workload on human customer service representatives, and provide 24/7 support.
3. Summarization and Insights:
LLMs can be used to analyze large volumes of text data, such as reports, research papers, or customer feedback, and generate concise summaries and insights. This can be valuable for businesses in fields like market research, financial analysis, or strategic planning.
4. HR Candidate Screening:
Use case: Using LLMs to assess job applicant resumes, cover letters, and interview responses to identify the most qualified candidates.
Example: A large retailer integrating an LLM-based recruiting assistant to help sift through hundreds of applications for entry-level roles.
5. Legal Document Review:
Use case: Employing LLMs to rapidly scan through large volumes of legal contracts, case files, and regulatory documents to identify key terms, risks, and relevant information.
Example: A corporate law firm deploying an LLM tool to streamline the due diligence process for mergers and acquisitions.
I'm working on AI tools for teachers and I can confidently say that GPT is just unbelievably good at generating explanations, exercises, quizes etc. The onus to review the output is on the teacher obviously, but given they're the subject matter experts, a review is quick and takes a fraction of the time that it would take to otherwise create this content from scratch.
As a teacher - I have no shortage of exercises, quizes etc. Internet is full of this kind of stuff and I have no trouble finding more than I ever need. 95% of my time an mental capacity in this situation goes for deciding what makes sense in my particular pedagogical context? What wording works best for my particular students? Explanations are even harder. I find out almost daily that explanations which worked fine in last year, don't work any more and I have to find a new way, because previous knowledge, words they use and know etc of new students are different again.
>As a teacher - I have no shortage of exercises, quizes etc. Internet is full of this kind of stuff and I have no trouble finding more than I ever need
Which all takes valuable time us teachers are extremely short on.
I've been a classroom teacher for more than 20 years, I know how painful it is to piece together a hodge podge of resourecs to put together lessons. Yes the information is out there, but a one click option to gather this into a cohesive unit for me saves me valuable time.
>95% of my time an mental capacity in this situation goes for deciding what makes sense in my particular pedagogical context? What wording works best for my particular students?
Which is exactly what GPT is amazing at.Brainstorming, rewriting, suggesting new angles of approach is GPTs main stength!
>Explanations are even harder.
Prompting GPT to give useful answers is part of the art of using these new tools. Ask GPT to speak in a different voice, take on a persona or target a differnt age group and you'll be amazed at what it can output.
> I find out almost daily that explanations which worked fine in last year, don't work any more
Exactly! Reframing your own point of view is hard work, GPT can be an invaluable assistant in this area.
> Which is exactly what GPT is amazing at.Brainstorming, rewriting, suggesting new angles of approach is GPTs main stength!
No, it isn't. It just increases noise. I don't need any more info, I need just to make decisions "how?".
> Prompting GPT to give useful answers is part of the art of using these new tools. Ask GPT to speak in a different voice, take on a persona or target a differnt age group and you'll be amazed at what it can output.
I'm not amazed. At best it sounds like some 60+ year old (like me) trying to be in the "age group" 14 while after only hearing from someone how young people talk. Especially in small cultures like ours here (~1M people).
I have teachers in my family, their lives have been basically ruined by people using ChatGPT-4 to cheat on their assignments. They spend their weekend trying to workout if someone has "actually written" this or not.
So sorry, we're back to spam generator. Even if it's "good spam".
One potential fix, or at least a partial mitigation, could be to weight homework 50% and exams 50%, and if a student's exam grades differ from their homework grades by a significant amount (e.g. 2 standard deviations) then the lower grade gets 100% weight. It's a crude instrument, but it might do the job.
a bit dramatic. there has to be an adjustment of teaching/assessing, but nothing that would "ruin" anyone's life.
>So sorry, we're back to spam generator. Even if it's "good spam".
is it spam if it's useful and solves a problem? I don't agree it fits the definition any more.
Teachers are under immense pressure, GPT allows a teacher to generate extension questions for gifted students or differentiate for less capable students, all on the fly. It can create CBT material tailored to a class or even an individual student. It's an extremely useful tool for capable teachers.
is it spam if it's useful and solves a problem? I don't agree it fits the definition any more.
Who said generating an essay is useful sorry ? What problem does that solve?
Your comments come accross as overly optimistic and dismissive . Like you have something to gain personally and aren’t interested in listening to others feedback.
I'm developing tools to help teachers generate learning material, exercises and quizes tailored to student needs.
>Who said generating an essay is useful sorry ? What problem does that solve?
Useful learning materials aligned with curriculum outcomes, taking into account learner needs and current level of understanding is literally the bread and butter of teaching.
I think those kinds of resources are both useful and solve a very real problem.
>Your comments come accross as overly optimistic and dismissive . Like you have something to gain personally and aren’t interested in listening to others feedback.
Fair point. I do have something to gain here. I've given a number of example prompts that are extremely useful for a working teacher in my replies to this thread. I don't think I'm being overly optimistic here. I'm not talking vague hypotheticals here, the tools that I'm building are already showing great usefulness.
> a bit dramatic. there has to be an adjustment of teaching/assessing, but nothing that would "ruin" anyone's life.
If you don't have the power to just change your mind about what the entire curriculum and/or assessment context is, it can be a workload increase of dozens of hours per week or more. If you do have the power, and do want to change your entire curriculum, it's hundreds of hours one-time. "Lives basically ruined" is an exaggeration, but you're preposterously understating the negative impact.
> is it spam if it's useful and solves a problem?
Whether or not it's useful has nothing to do with whether or not it's spam. I'm not claiming that your product is spam -- I'll get back to that -- but your reply to the spam accusation is completely wrong.
As for your hypothesis, I've had interactions where it did a good job of generating alternative activities/exercises, and interactions where it strenuously and lengthily kept suggesting absolute garbage. There's already garbage on the internet, we don't need LLMs to generate more. But yes, I've had situations where I got a good suggestion or two or three, in a list of ten or twenty, and although that's kind of blech, it's still better than not having the good suggestions.
>Whether or not it's useful has nothing to do with whether or not it's spam.
I think it has a lot to do with it. I can't see how generating educational content for the purpose of enhancing student outcomes with content reviewed by expert teachers can fall under the category of spam.
>As for your hypothesis, I've had interactions where it did a good job of generating alternative activities/exercises, and interactions where it strenuously and lengthily kept suggesting absolute garbage.
I like to present concrete examples of what I would consider to be useful content for a k-12 teacher.
This would align with Year 9 Maths for the Australian Curriculum.
This is an extremely valuable tool for
- A graduate teacher struggling to keep up with creating resources for new classes
- An experienced teacher moving to a new subject area or year level
Bear in mind that the GPT output is not necessarily intended to be used verbatim. A qualified specialist teacher with often times 6 years of study (4 year undergrad + 2 yr Masters) is the expert in the room who presumably will review the output, adjust, elaborate etc.
As a launching pad for tailored content for a gifted student, or lower level, differentiated content for a struggling student the GPT response is absolutely phenomenal. Unbelievably good.
I've used Maths as an example, however it's also very good at giving topic overviews across the Australian Curriculum.
Here's one for: elements of poetry:structure and forms
Again, an amazing introduction to the topic (I can't remember the exact curriculum outcome it's aligned to) which gives the teacher a structured intro which can then be spun off into exercises, activities or deep dives into the sub topics.
> I've had situations where I got a good suggestion or two or three, in a list of ten or twenty
This is a result of poor prompting. I'm working with very structured, detailed curriculum documents and the output across subject areas is just unbelievably good.
There are countless existing, human-vetted, designed on special purpose, bodies of work full of material like the stuff your chatgpt just "created". Why not use those?
Also, each of your examples had at least one error, did you not see them?
>Also, each of your examples had at least one error, did you not see them?
I didn't could you point them out?
>There are countless existing, human-vetted, designed on special purpose, bodies of work full of material like the stuff your chatgpt just "created". Why not use those?
As a classroom teacher I can tell you that piecing together existing resources is hard work and sometimes impossible because resource A is in this text book (which might not be digital) and resource B is on that website and quiz C is on another site. Sometimes it's impossible or very difficult to put all these pieces together in a cohesive manner. GPT can do all that an more.
The point is not to replace all existing resources with GPT, this is all or nothing logic. It's another tool in the tool belt which can save time and provide new ways of doing things.
Also have teachers in my family. Most of the time is spent adjusting the syllabus schedule and guiding (orally) the stragglers. Exercises, quizes and explanations are routine enough that good teachers I know can generate them on the spot.
>Exercises, quizes and explanations are routine enough that good teachers I know can generate them on the spot.
Every year there are thousands of graduate teacher looking for tools to help them teach better.
>good teachers I know can generate them on the spot
Even the best teacher can't create an interactive multiple choice quiz with automatic marking, tailored to a specific class (or even a specific student) on the spot.
I've been teaching for 20+ years, I have a solid grasp of the pain points.
> Even the best teacher can't create an interactive multiple choice quiz with automatic marking, tailored to a specific class (or even a specific student) on the spot.
Neither can "AI" though, so what's the point here?
here's an example of a question and explanation which aligns to Australian Curriculum elaboration AC9M9A01_E4 explaining why frac{3^4}{3^4}=1, and 3^{4-4}=3^0
This is a relatively high level explanation. With proper prompting (which, sorry I don't have on hand right now) the explanation can be tailored to the target year level (Year 9 in this case) with exercises, additional examples and a quiz to test knowledge.
This is just the first example I have on hand and is just barely scratching the surface of what can be done.
The tools I'm building are aligned to the Austrlian Curriculum and as someone with a lot of classroom experience I can tell you that this kind of tailored content, explanations, exercises etc are a literal godsend for teachers regardless of experience level.
Bear in mind that the teacher with a 4 year undergrad in their specialist area and a Masters in teaching can use these initial explanations as a launching pad for generating tailored content for their class and even tailored content for individual students (either higher or lower level depending on student needs). The reason I mention this is because there is a lot of hand-wringing about hallucinations. To which my response is:
- After spending a lot of effort vetting the correctness of responses for a K-12 context hallucinations are not an issue. The training corpus is so saturated with correct data that this is not an issue in practice.
- In the unlikely scenario of hallucination, the response is vetted by a trained teacher who can quickly edit and adjust responses to suit their needs
Let’s call it for what it is- taking poorly organized existing information and making it organized and interactive.
“Here are some sharepoint locations, site Maps, and wikis. Now regurgitate this info to me as if you are a friendly call center agent.”
Pretty cool but not much more than pushing existing data around. True AI I think is being able to learn some baseline of skills and then through experience and feedback adapt and be able to formulate new thoughts that eventually become part of the learned information. That is what humans excel at and so far something LLMs can’t do. Given the inherent difficulty of the task I think we aren’t much closer to that than before as the problems seem algorithmic and not merely hardware constrained.
>taking poorly organized existing information and making it organized and interactive.
Which is extremely valuable!
>Pretty cool but not much more than pushing existing data around.
Don't underestimate how valuable it is for teachers to do exactly that. Taking existing information, making it digestable, presenting it in new and interseting ways is a teacher's bread and butter.
It’s valuable for use cases where the problem is “I don’t know the answer to this question and don’t know where to find it.” That’s not in and of itself a multibillion dollar business when the alternative doesn’t cost that much in the grand scheme of things (asking someone for help or looking for the answer).
Are you suggesting a chatbot is a suitable replacement for a teacher?
I’ve rarely if ever seen a model fully explain mathematical answers outside of simple geometry and algebra to what I would call an adequate level. It gets the answer right more often than explaining why that is the correct answer. For example, it finds a minimal case to optimization, but can’t explain why that is the minimal result among all possibilities.
They're currently already relying on overworked, underpaid interns who draft those documents. The lawyer is checking it anyway. Now the lawyer and his intern have time to check it.
I suggest we do not repeat the myth and urban legend that LLMs are good for legal document review.
I had a couple of real use cases used for real clients who were hyped about LLMs to be used for document review and trying to save salary, for Engish language documents.
We've found Kira, Luminance and similar due diligence project management stuff as useful being a timesaver if done right. But not LLMs.
Due to longer context windows, it is possible to ask LLMs the usual hazy questions that people ask in a due diligence review (many of which can be answered dozens of different ways by human lawyers). Is there a most favoured nation provision in the contract, is there a financial cap limiting the liability of the seller or the buyer, governing law etc.
Considering risks of uploading such documents into ChatGPT, you are stuck with Copilot M365 etc. or some outrageously expensive "legal specific" LLMs that I cannot test.
Just to be curious with Copilot I've asked five rather simple questions for three different agreements (where we had the golden answer), and the results were quite unequal, but mostly useless - in one contract, it incorrectly reported for all questions that these cannot be answered based on the contract (while the answers were clearly included in the document), in an another, two questions were answered correctly, two questions not answered precisely (just governing law being US instead of the correct answer being Michigan, even after reprompting to give the state level answer, not "USA") and hallucinated one answer incorrectly. In the third one, three answeres were hallucinated incorrectly, answered one correctly and one provision was not found.
Of course, it's better to have a LEGAL specific benchmark for this, but 75% hallucination in complex questions is not something that helps your workflow (https://hai.stanford.edu/news/hallucinating-law-legal-mistak...)
I don't recommend at least LLMs to anyone for legal document reviews, even for the English language.
I have no idea what type of law you're talking about here, but (given the context of the thread) I can guarantee you major firms working on M&As are most definitely not using underpaid interns to draft those documents. They are overpaid qualified solicitors.
I’ve been doing RLHF and adjacent work for 6 months. The model responses across a wide array of subject matter are surface level. Logical reasoning, mathematics, step by step, summarization, extraction, generation. It’s the kind of output the average C student is doing.
We specifically don’t do programming prompts/responses nor advanced college to PHD level stuff, but it’s really mediocre at this level and these subject areas. Programming might be another story, I can’t speak to that.
All I can go off is my experience but it’s not been great. I’m willing to be wrong.
> It’s the kind of output the average C student is doing.
Is the output of average C students not commercially valuable in the listed fields? If AI is competing reliably with students then we've already hit AGI.
Except for number 3, the rest are more often disastrous or insulting to users and those depending on the end products/services of these things. Your reasoning is so bad that i'm almost tempted to think you're spooning out PR-babble astro-turf for some part of the industry. Here's a quick breakdown:
1. content: Nope, except for barrel-bottom content sludge of the kind formerly done by third world spam spinning companies, most decent content creation stays well away from AI except for generating basic content layout templates. I work as a writer and even now, most companies stay well away from using GPT et al for anything they want to be respected as content. Please..
2. Customer service: You've just written a string of PR corporate-speak AI seller bullshit that barely corresponds to reality. People WANT to speak to humans, and except for very basic inquiries, they feel insulted if they're forced into interaction with some idiotic stochastic parrot of an AI for any serious customer support problems. Just imagine some guy trying to handle a major problem with his family's insurance claim or urgently access money that's been frozen in his bank account, and then forced to do these things via the half-baked bullshit funnel that is an AI. If you run a company that forces that upon me for anything serious in customer service, I would get you the fuck out of my life and recommend any friend willing to listen does the same.
3. This is the one area where I'd grant LLMs some major forward space, but even then with a very keen eye to reviewing anything they output for "hallucinations" and outright errors unless you flat out don't care about data or concept accuracy.
4. For reasons related to the above (especially #2) what a categorically terrible, rigid way to screen human beings with possible human qualities that aren't easily visible when examined by some piece of machine learning and its checkbox criteria.
5. Just, Fuck No... I'd run as fast and far as possible from anyone using LLMs to deal with complex legal issues that could involve my eventual imprisonment or lawsuit-induced bankruptcy.
2.I think you overestimate the caliber of query received in most call centres. Even when it comes to private banks (for those who've been successful in life), the query is most often something small like holding their hand and telling them to press the "login" button.
Also these all tend to have an option where you simply ask it and it will redirect you to a person.
Those agents deal with the same queries all day, despite what you think your problem likely isn't special, in most cases may as well start calling the agents "stochastic parrots" too while you're at it.
IMO the unreasonable uselessness of LLMs is because for most tasks involving language the accuracy needs to be unbelievably high to have any real value at all.
We just don't have that.
We have autocomplete on steroids and many people are fooling themselves that if you just take more steroids you will get better and better results. The metaphor is perfect because if you take more and more steroids you get less and less results.
It is why in reality we have had almost no progress since April 2023 and chatGPT 4.
Really? I work in AI and my biggest concern is that I don't see any real products coming out of this space. I work closer to the models, and people in this specific area are making progress, but when I look at what's being done down stream I see nothing, save demos that don't scale beyond a few examples.
> in the 80s there were less defined products, amd most everything was a prototype that needed just a bit more research to be commercially viable.
This is literally all I see right now. There's some really fun hobbyist stuff happening in the image gen area that I think is here to stay, but LLMs haven't broken out of the "autocomplete on steroids" use cases.
> today's stuff is useful now
Can you give me examples of 5, non-coding assistant, profitable use cases for LLMs that aren't still in the "needed just a bit more research to be commercially viable" stage?
I love working in AI, think the technology is amazing, and do think there are some under exploited (though less exciting) use cases, but all I see if big promises with under delivery. I would love to be proven wrong.