Hacker News new | past | comments | ask | show | jobs | submit login

Two decades ago, I was trying to use classical machine vision to tell the difference between cut and uncut grass, to guide a self-driving lawnmower.

I concluded that it couldn't be done with classical machine vision, and that this "neural network" nonsense wasn't going to catch on. Very slow, computationally inefficient, full of weirdos making grandiose claims about "artificial intelligence" without the results to back it up, and they couldn't even explain how their own stuff worked.

These days - you want to find the boundary between cut and uncut grass, even though lighting levels can change and cloud cover can change and shadows can change and reflections can change and there's loads of types of grass and grass looks different depending on the angle you look from? Just label some data and chuck a neural network at it, no problemo.






> These days - you want to find the boundary between cut and uncut grass, even though lighting levels can change and cloud cover can change and shadows can change and reflections can change and there's loads of types of grass and grass looks different depending on the angle you look from? Just label some data and chuck a neural network at it, no problemo.

If only.

Having been faced with the same problem in the real world:

1) There isn't a data bank of millions of images of cut / uncut grass

2) If there were, there's always the possibility of sample bias. E.g. all the cut photos happen to have been taken early in the day, of uncut late in the day, and we get a "time-of-day" detector. Sample bias is oddly common in vision data sets, and machine learning can look for very complex sample bias

3) With something like a lawnmower, you don't want it to kill people or run over flowerbeds. There can be actual damages. It's helpful to be able to understand and validate things.

Most machine vision algorithms I actually used in projects (small n) made zero use of neural networks, and 100% of classical algorithms I understand.

Right now, the best analogy to NLP is BERT. At that point, neural techniques were helpful for some tasks, and achieved stochastically interesting performance, but were well below the level of general uses, and 95% of what I wanted to do used classical NLP. IF I had a large data set AND could do transfer training from BERT AND didn't need things to work 100% of the time, BERT was great.

Systems like DALL-e and the reverse are moving us in the right direction. Once we're at GPT / Claude / etc.-level performance, life will be different, and there's a light at the end of the tunnel. For now, though, the ML machine is still a pretty limited way to go.

Think of it this way. What's cheaper:

1) A consulting project for a human expert in machine vision (tens or hundreds of thousands of dollars)

2) Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)


I don’t think people fully appreciate yet how much of LLMs’ value comes from their underlying dataset (I.e. the entire internet - probably .. quadrillions..? of tokens of text) rather than the model + compute itself.

If you’re trying to predict something within the manifold of data on the internet (which is incredibly vast, but not infinite), you will do very well with today’s LLMs. Building an internet-scale dataset for another problem domain is a monumental task, still with significant uncertainty about “how much is enough”.

People have been searching for the right analogy for “what type of company is Open AI most like?” I’ll suggest they’re like an oil company, but without the right to own oil fields. The internet is the field, the model is the refining process (which mostly yield the same output but with some variations - not dissimilar from petroleum products).. and the process / model is a significant asset. And today, Nvidia is the only manufacturer of refining equipment.


This is an interesting analogy. Of course oil extraction and refining are very complex, but most of the value in that industry is simply the oil.

If you take the analogy further, while oil was necessary to jumpstart the petrochemical industry, biofuels and synthetic oil could potentially replace the natural stuff while keeping the rest of the value chain in tact (maybe not economical, but you get the idea). Is there a post-web source of data for LLMs once the well has been poisoned by bots? Maybe interactive chats?


> If only.

I will admit that "no problemo" made it sound easier than it actually is. But in the past I considered it literally impossible whereas these days I'm confident it is possible, using well known techniques.

> There isn't a data bank of millions of images of cut / uncut grass

True - but in my case I literally already had a robot lawnmower equipped with a camera. I could have captured a hundred thousand images pretty quickly if I'd known it was worth the effort.

> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.

I agree - at the time I was actually exploring a hybrid approach which would have used landmarks for navigation when close enough to detect the landmarks precisely, and cut/uncut boundary detection for operating in the middle of large expanses of grass, where the landmarks are all distant. And a map for things like flowerbeds, and a LIDAR for obstacle tracking and safety.

So the scope of what I was aiming for was literally cut/uncut grass detection, not safety-of-life human detection :)


Out of curiosity: Why would you need cut/uncut grass detection? If you have all the other stuff in place, what's the incremental value-add? It seems like you should be able to cut on a regular schedule, or if you really want to be fancy, predict how much grass has grown since you last cut it from things like the weather.

I wanted to steer the mower along the cut/uncut grass boundary, just like a human operator does. Image segmentation into cut/uncut grass would be the input to a steering control feedback loop - much like lane-following cruise control.

I hoped by doing so I could produce respectable results without the need to spend $$$$$ on a dual-frequency RTK GPS & IMU system.


Out of curiosity, since you already have lidar on the machine, why not use it to detect grass height?

> What's cheaper

If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it. And you will only realise the many ways their hand finessed algorithms fail once you are trying to field the algorithm.

> With something like a lawnmower, you don't want it to kill people or run over flowerbeds.

Best to not mix concerns though. Not killing people with an automatic lawnmover is about the right mechanical design, appropriately selected slow speed, and bumper sensors. None of this is an AI problem. We don’t have to throw out good engineering practices just because the product uses AI somewhere. It is not an all or nothing thing.

The flowerbed avoidance question might or might not be an AI problem depending on design decisions.

> Hiring cheap contractors to build out a massive dataset of photos of grass (millions of dollars)

I think that you are over estimating the effort here. The database doesn’t have to be so huge. Transfer learning and similar techniques reduced the data requirements by a lot. If all you want is a grass height detector you can place stationary cameras in your garden, collect a bunch of data and automatically label them based on when you moved the grass. That will obviously only generalise to your garden, but if this is only a hobby project maybe that is all you want? If this is a product you intend to sell for the general public then of course you need access to a lot of different gardens to test it on. But that is just the nature of product testing anyway.


> If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it.

1. Test datasets can be a lot smaller than training datasets.

2. For tasks like image segmentation, having a human look at a candidate segmentation and give it a thumbs up or a thumbs down is much faster than having them draw out the segments themselves.

3. If labelling needs 20k images segmented at 1 minute per image but testing only needs 2k segmentation results checked at 5 seconds per image, you can just do the latter yourself in a few hours, no outsourcing required.


> If you don’t have the second how can you trust the first? Without the dataset to test on your human experts will deliver you slop and be confident about it.

One of the key things is that if you don't understand how things work, your test dataset needs to be the world. A classical system can be analyzed, and you can pick a test dataset which maximally stresses it. You can also engineer environments where you know it will work, and 9 times out of 10, part of the use of classical machine vision in safety-critical systems is to understand the environments it works in, and to only use it in such environments.

Examples:

- Placing the trackball sensor inside of the mouse (or the analogue for a larger machine) allows the lighting and everything else to be 100% controlled

- If it's not 100% controlled, in an industrial environment, you can still have well-understood boundaries.

You test beyond those bounds, and you understand that it works there, and by interpolation, it's robust within the bounds. You can also analyze things like error margin since you know if an edge detection is near the threshold or has a lot of leeway around it.

One of the differences with neural networks is that you don't understand the failure modes, so it's hard to know the axes to test on. Some innocuous change in the background might throw it completely. You don't have really meaningful, robust measures of confidence, so you don't know if some minor change somewhere won't throw things. That means your test set needs to be many orders of magnitude bigger.

For nitpickers: You can do sensitivity analysis, look at how strongly things activate, or a dozen other things, but the keywords there were "robust" and "meaningful."


Not that you're wrong, but when faced with a similar problem, I got a lot of mileage out of telling an intern to try a network trained detect the boundary between the ground-based-potentially-tall features and not the feature (e.g., background and sky), and measuring the height from a low camera. Voila, tall areas and not-tall areas.

Funnily now with with the advent of GPS+RTK lawnmower robots, fancy AI is not even needed anymore. They follow a very exact, pre-determined patterns and paths, and do a great job.

Yeah GPS+RTK was what I went with in the end.

Didn't work as well as I'd hoped back in those days though, as you could lose carrier lock if you got too close to trees (or indeed buildings), and our target market was golf courses which tend to have a lot of trees. And in those days a dual-frequency RTK+IMU setup was $20k or more, which is expensive for a lawnmower.


No tool is perfect for every job. That said, the positioning of the RTK unit is crucial. Possibly look for a mower which can work with multiple RTK units, or reposition your existing one for better coverage.

I find that even though signals get significantly weaker under trees, mine still works wonderfully in a complex large garden scenario. It will depend on your exact unit/model, as well as their firmware and how it chooses to deal with these scenarios.


That, and you can now train the perfect lawnmower in an entirely virtual environment before dropping it into a physical body. You do your standard GAN thing, have a network that is dedicated to creating the gnarliest lawn mowing problems possible, bang through a few thousand generations of your models, and then hone the best of the best. There are some really astonishing examples that have been published this last year or so - like learning to control a hand from absolute first principles, and perfecting it.

This is all pretty much automated by nvidia’s toolkits, and you can do it cheaply on rented hardware before dropping your pretrained model into cheap kit - what a time to be alive.


FYI: A comment like this one is more helpful with links. There's one below with a few. If you happen to read this, feel free to respond, or to hit "edit" and add them.

What is classical machine vision? For an image recognition problem, wouldn't you use a conventional neural network? (And I might be a bit outdated here, considering that CNNs were used for image recognition a decade ago).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: