YOLOv5: State-of-the-art object detection at 140 FPS

bArray · on June 10, 2020

I'm just going to call this out as bullshit. This isn't YOLOv5. I doubt they even did a proper comparison between their model and YOLOv4.

Someone asked it to not be called YOLOv5 and their response was just awful [1]. They also blew off a request to publish a blog/paper detailing the network [2].

I filed a ticket to get to the bottom of this with the creators of YOLOv4: https://github.com/AlexeyAB/darknet/issues/5920

[1] https://github.com/ultralytics/yolov5/issues/2

[2] https://github.com/ultralytics/yolov5/issues/4

rocauc · on June 10, 2020

Hey all - OP here. We're not affiliated with Ultralytics or the other researchers. We're a startup that enables developers to use computer vision without being machine learning experts, and we support a wide array of open source model architectures for teams to try on their data: https://models.roboflow.ai

Beyond that, we're just fans. We're amazed by how quickly the field is moving and we did some benchmarks that we thought other people might find as exciting as we did. I don't want to take a side in the naming controversy. Our core focus is helping developers get data into any model, regardless of its name!

sillysaurusx · on June 10, 2020

YOLOv5 seems to have one important advantage over v4, which your post helped highlight:

Fourth, YOLOv5 is small. Specifically, a weights file for YOLOv5 is 27 megabytes. Our weights file for YOLOv4 (with Darknet architecture) is 244 megabytes. YOLOv5 is nearly 90 percent smaller than YOLOv4. This means YOLOv5 can be deployed to embedded devices much more easily.

Naming controversy aside, it's nice to have some model that can get close to the same accuracy at 10% of the size.

Naming it v5 was certainly ... bold ... though. If it can't outperform v4 in any scenario, is it really worthy of the name? (On the other hand, if v5 can beat v4 in inference time or accuracy, that should be highlighted somewhere.)

FWIW I doubt anyone who looks into this will think roboflow had anything to do with the current controversies. You just showed off what someone else made, which is both legit and helpful. It's not like you were the ones that named it v5.

On the other hand... visiting https://models.roboflow.ai/ does show YOLOv5 as "current SOTA", with some impressive-sounding results:

SIZE: YOLOv5 is about 88% smaller than YOLOv4 (27 MB vs 244 MB)

SPEED: YOLOv5 is about 180% faster than YOLOv4 (140 FPS vs 50 FPS)

ACCURACY: YOLOv5 is roughly as accurate as YOLOv4 on the same task (0.895 mAP vs 0.892 mAP)

Then it links to https://blog.roboflow.ai/yolov5-is-here/ but there doesn't seem to be any clear chart showing "here's v5 performance vs v4 performance under these conditions: x, y, z"

Out of curiosity, where did the "180% faster" and 0.895 mAP vs 0.892 mAP numbers come from? Is there some way to reproduce those measurements?

The benchmarks at https://github.com/WongKinYiu/CrossStagePartialNetworks/issu... seem to show different results, with v4 coming out ahead in both accuracy and speed at 736x736 res. I'm not sure if they're using a standard benchmarking script though.

Thanks for gathering together what's currently known. The field does move fast.

rocauc · on June 10, 2020

Agreed!

Crucially, we're tracking "out of the box" performance, e.g., if a developer grabbed X model and used it on a sample task, how could they expect it to perform? Further research and evaluation is recommended!

For size, we measured the sizes of our saved weights files for Darknet YOLOv4 versus the PyTorch YOLOv5 implementation.

For inference speed, we checked "out of the box" speed using a Colab Notebook equipped with a Tesla P100. We used the same task[1] for both - e.g. see the YOLOv5 Colab notebook[2]. For Darknet YOLOv4 inference speed, we translated the Darknet weights using the Ultralytics YOLOv3 repo (as we've seen many do for deployments)[3]. (To achieve top YOLOv4 inference speed, one should reconfigure Darknet carefully with OpenCV, CUDA, cuDNN, and carefully monitor batch size.)

For accuracy, we evaluated the task above with mAP after quick training (100 epochs) with the smallest YOLOv5s model against the full YOLOv4 model (using recommended 2000*n, n is classes). Our example is a small custom dataset, and should be investigated on e.g. COCO. 90-classes.

[1] https://public.roboflow.ai/object-detection/bccd [2] https://colab.research.google.com/drive/1gDZ2xcTOgR39tGGs-EZ... [3] https://github.com/ultralytics/yolov3

bArray · on June 11, 2020

This is why I have so much doubt. To claim it's better in any meaningful way you need to show it on the same framework, varied datasets, varied input sizes and you should be able to use it in your detection problem and also see some benefits from the previous version.

> SIZE: YOLOv5 is about 88% smaller than YOLOv4 (27 MB vs 244 MB)

Is that a benefit of Darknet vs TF, YOLOv4 vs YOLOv5, or did you win the NN lottery [1]?

> SPEED: YOLOv5 is about 180% faster than YOLOv4 (140 FPS vs 50 FPS)

Again, where does this improvement come from?

> ACCURACY: YOLOv5 is roughly as accurate as YOLOv4 on the same task (0.895 mAP vs 0.892 mAP)

The difference in 0.1% accuracy can be huge, for example the difference between 99.9% and 100% could require an insanely larger neural network. Even much less that 99% accuracy, it seems clear to me that there can still be some limitations on accuracy from neural network size.

For example, if you really don't care so much for accuracy, you can really squeeze the network down [2].

[1] https://ai.facebook.com/blog/understanding-the-generalizatio...

[2] https://arxiv.org/abs/1910.03159

anthiras · on June 11, 2020

It's about time for Roboflow to pull this article. It seems highly unlikely that a 90 % smaller model would provide a similar accuracy, and the result seems to come from a small custom dataset only. Please make a real COCO comparison instead.

The YoloV5 repo itself shows performance comparable to YoloV3: https://github.com/ultralytics/yolov5#pretrained-checkpoints

Another comparison suggests YoloV5 is slightly WORSE than YoloV4: https://github.com/WongKinYiu/CrossStagePartialNetworks/issu...

bArray · on June 11, 2020

> It's about time for Roboflow to pull this article.

The article still adds value by suggesting how one would run the network and in general the site seems to be about collating different networks.

Perhaps a disclaimer could be good, reading something like: "the speed improvements mentioned in this article are currently being tested". As a publisher, when you print somebody else's words, unless quoted, they are said with your authority. The claims are very big and it doesn't feel like enough testing has been done yet to even verify that they hold true.

heyitsguay · on June 11, 2020

Very cool business model! How long have you been at it? I've been pushing for a while (unsuccessfully, so far) for the NIH to cultivate a team providing such a service to our many biomedical imaging labs. It seems pretty clear to me that this sort of AI hub model is going to win out in at least the medium term versus spending money on lots of small redundant AI teams each dedicated to a single project. What sort of application sectors have you found success with?

rocauc · on June 11, 2020

Appreciate it!

Nice, I really respect research coming out of NIH. (Happen to know Travis Hoppe?) Coincidentally, our notebook demo for YOLOv5 is on the blood cell count and detection dataset: https://public.roboflow.ai/object-detection/bccd

We've seen 1000+ different use cases. Some of the most popular are in agriculture (weeds vs crops), industrials / production (quality assurance), and OCR.

Send me an email? joseph at roboflow.ai

mycall · on June 11, 2020

Do you know of any battery-wired drones that can pick out invasive plants? I've been looking for this to use on trails but since the plant's sap is highly poisonous, drones seem to be the logical solution.

bArray · on June 11, 2020

> We're not affiliated with Ultralytics or the other

> researchers.

Unfortunately I am now unable to edit to reflect this better.

joshvm · on June 10, 2020

I somewhat agree on the naming issue. I don't think yolov5 is semantically very informative. But by the way, if you read the issues from a while back you'll see that AlexeyAB's fork basically scooped them, hence the version bump. Ultralytics probably would have called this Yolov4 otherwise. This repo has been in the works for a while.

For history, Ultralytics originally forked the core code from some other Pytorch implementation which was inference-only. Their claim to fame is that they were the first to get training to work in Pytorch. This took a while, probably because there is actually very little documentation for Yolov3 and there was confusion over what the loss function actually ought to be. The darknet repo is totally uncommented C with lots of single letter variable names. AlexeyAB is a Saint.

That said, should it be a totally new name? The changes are indeed relatively minor in terms of architecture, it's still yolo underneath (in fact I think the classification/regression head is pretty much unchanged). The v4 release was also quite contentious. Actually their previous models used to be called yolov3-spp-ultralytics.

Probably I would have gone with efficient-yolo or something similar. That's no worse than fast/faster rcnn.

I disagree on your second point though. Demanding a paper when the author says "we will later" is hardly a blow off. Publishing and writing takes time. The code is open source, the implementation is there. How many times does it happen the other way around? And before we knock Glenn for this, as far as I know, he's running a business, not a research group.

Disclosure: I've contributed (in minor ways) to both this repository and Alexey's darknet fork. I use both regularly for work and I would say I'm familiar enough with both codebases. I mostly ignore the benchmarks because performance on coco is meaningless for performance on custom data. I'm not affiliated with either group, in case it's not clear.

bArray · on June 11, 2020

> But by the way, if you read the issues from a while back

> you'll see that AlexeyAB's fork basically scooped them,

> hence the version bump.

Yeah that sucks, but it does mean they should have done some proper comparison with YOLOv4.

> This took a while, probably because there is actually very

> little documentation for Yolov3 and there was confusion

> over what the loss function actually ought to be. The

> darknet repo is totally uncommented C with lots of single

> letter variable names. AlexeyAB is a Saint.

Maybe I'm alone, but I found it quite readable. You can quite reasonably understand the source in a day.

> The v4 release was also quite contentious.

Kind of, I am personally still evaluating this network fully.

> I disagree on your second point though. Demanding a paper

> when the author says "we will later" is hardly a blow off.

Checkout the translation of "you can you up,no can no bb" (see other comments).

> And before we knock Glenn for this, as far as I know, he's

> running a business, not a research group.

I understand, but this seems very unethical to take the name of an open source framework and network that publishes it's improvements in some form, bump the version number and then claim it's faster without actually doing an apples to apples test. It would have seem appropriate to contact the person who carried the torch after pjreddie stepped down from the project.

joshvm · on June 11, 2020

On the whole I agree about darknet being readable, it seemed well written and I've found it useful to grok how training libraries are written. I think they've moved to other backends now for the main computation though.

But.. it was still very much undocumented (and there were details missing from the paper). I think this almost certainly led to some slowdown in porting to other frameworks. And the fact its written in C has probably limited how much people are willing to contribute to the project.

> Checkout the translation of "you can you up,no can no bb" (see other comments).

That's from an 11 day old github account with no history, not Ultralytics as far as I know.

> Kind of, I am personally still evaluating this network fully.

Contention referring to the community response rather than the performance of the model itself.

DougBTX · on June 11, 2020

> Checkout the translation of "you can you up,no can no bb" (see other comments).

Who actually is "WDNMD0-0"? Looks like the account was created to make just that one comment.

nerderloo · on June 11, 2020

Didn't AlexeyAB endorse YOLOv4 though? Did he also endorse YOLOv5?

joshvm · on June 11, 2020

AlexeyAB is the primary author on YOLOv4, and the darknet maintainer, so yes. This is pretty much the official word on the matter:

https://github.com/AlexeyAB/darknet/issues/5920#issuecomment...

Despite that, there was still a lot of controversy over the decision to call it v4.

See that thread for the discussion on v5 and you can make your own judgement.

nerderloo · on June 11, 2020

Ah, I misspoke. I meant prjeddie. prjeddie kind of endorsed YOLOv4. Did he endorse YOLOv5?

Although YOLOv4 isn't anything new achitecture-wise, it tried all the tricks in the book on the existing YOLO architecture to increase its speed performance, and its method and experiment results were published as a paper; it provided value to humanity.

YOLOv5 seemed to have taken the YOLO name to seemingly only to increase the startup name value without giving much(it did appear to provided YOLOv3 Pytorch implementation, but that's before taking YOLOv5 name) back. I wonder how prjeddie would think of YOLOv5.

quietbritishjim · on June 10, 2020

> Someone asked it to not be called YOLOv5 and their response was just awful [1]

I don't see any response by them at all. Do you mean the comment by WDNMD0-0? I can't see any reason to believe they're connected to the company, have I missed something?

sillysaurusx · on June 10, 2020

There are some benchmarks here: https://github.com/WongKinYiu/CrossStagePartialNetworks/issu...

It's hard to interpret benchmarks in a fair way, but it's sort of sounding like YOLOv4 might be superior to YOLOv5, at least for certain resolutions.

Does YOLOv5 outperform YOLOv4 at all? Faster inference time or higher accuracy?

rcpt · on June 10, 2020

I love that the response to them is "you can you up,no can no bb"

Learned a new phrase today.

whoevercares · on June 10, 2020

你行你上啊不行别bb（bb=trashtalking/non-favorable comments）

This is literally trash talking Slang in Chinese, because this field is full of young bloated researchers who forget their last name

joshvm · on June 11, 2020

> who forget their last name

I've not heard that one before either. Is it a reference to the Dark Tower? ("[he] has forgotten the face of his father") or did Stephen King borrow it from somewhere else?

whoevercares · on June 11, 2020

This is an old punchline in China for many years and I doubt it comes from English literature. I guess the meaning is similar (last name ~= name of the father)

Edit: obviously I should google dark power first lol.

joshvm · on June 11, 2020

Also a slight edit, I wrote name initially. Of course in the books it's "face of his father", but it still sounds similar [1]. To admit to forgetting the face of one's father is to be deeply shameful, to accuse someone of it is insinuating they should be ashamed of themselves.

Can you write it in Chinese?

[1] https://www.goodreads.com/quotes/12991-i-do-not-aim-with-my-...

whoevercares · on June 11, 2020

“不知道自己姓什么了”

catalogia · on June 10, 2020

Can you explain it? I can't figure out what that means.

arctangent · on June 10, 2020

Apparently it is Chinese internet slang meaning:

"If you can do it, then you go and do it. If you can’t do it, then don’t criticise others."

via: http://www.chinesetimeschool.com/zh-cn/articles/chinese-inte...

kbenson · on June 10, 2020

Just found these.[1][2] That is pretty awful, if it's from a dev.

Edit: Although as yeldarb explains in a comment here[3], it's probably a bit more complicated than that.

1: https://www.urbandictionary.com/define.php?term=you%20can%20...

2: https://www.quora.com/Whats-the-meaning-of-you-can-you-up-no...

3: https://news.ycombinator.com/item?id=23478983

bArray · on June 10, 2020

> Edit: Although as yeldarb explains in a comment here[3],

> it's probably a bit more complicated than that.

Legally speaking I'm not sure anything wrong was really done here.

Morally speaking, it seems quite unethical. AlexeyAB has really been carrying the torch of the Darknet framework and the YOLO neural network for quite some time (with pjreddie effectively handing it over to him).

AlexeyAB has been providing support on pjreddie's abandoned repository (e.g. [1]) and actively working on improvements in a fork [2]. If you look at the contributors graphs, he really has been keeping the project alive [3] (vs Darknet by pjreddie [4]).

Probably the worse part in my opinion is that they have also seemingly bypassed the open source nature of the project. This is quite damning.

[1] https://github.com/pjreddie/darknet/issues/1900

[2] https://github.com/AlexeyAB/darknet

[3] https://github.com/AlexeyAB/darknet/graphs/contributors

[4] https://github.com/pjreddie/darknet/graphs/contributors

kbenson · on June 10, 2020

So, the question I have is whether AlexeyAB got some sort of endorsement from pjreddie, or if they just took over the name by nature of being the most active fork? If it's the latter, ultralytics' actions don't seem quite as bad (although they still feel kind of off-putting, especially with how some of the responses to calls for a name change were formulated).

I guess given the info I have now, to me it boils down to whether there's precedent for the next version of the name to be taken by whoever is doing work on it? If the original author never endorsed AlexeyAB (I don't know one way or another), then perhaps AlexeyAB should have changed the name but references or payed homage to YOLO in some way?

Eh, this is all starting to feel a bit too close to youtube drama for my liking.

nerderloo · on June 11, 2020

AlexeyAB seems to have gotten endorsement from pjreddie: https://github.com/AlexeyAB/darknet/issues/5920#issuecomment...

kuzee · on June 10, 2020

Looks like ultralytics, not roboflow, is the one that named this model v5. Different people/companies.

bArray · on June 10, 2020

Yep, I updated my GitHub comment with respect to what @josephofiowa said. I made an assumptions when seeing the same PR images/language being used.

nharada · on June 10, 2020

I welcome forward progress in the field, but something about this doesn't sit right with me. The authors have an unpublished/unreviewed set of results and they're already co-opting the YOLO name (without the original author) for it and all of this to promote a company? I guess this was inevitable when there's so much money in ML but it definitely feels against the spirit of the academic research community that they're building upon.

salty_biscuits · on June 10, 2020

Well, very unlikely to get the original author. He doesn't do that kind of thing anymore

https://twitter.com/pjreddie/status/1230524770350817280?s=19

syntaxing · on June 10, 2020

Totally agreed, kinda seems dirty to call something "v5" when it this is a derivative work of the original.

oehtXRwMkIs · on June 10, 2020

I think derivative is a bit generous. This is just a reimplementation of v4 with a different framework.

sicariusnoctis · on June 11, 2020

> there's so much money in ML

What do you mean? I thought the DL hypetrain was dying as companies failed to make returns on their investments.

sillysaurusx · on June 10, 2020

We made a site that lets you collaboratively tag a bunch of images, called tagpls.com. For example, users decided to re-tag imagenet for fun: https://twitter.com/theshawwn/status/1262535747975868418

And the tags ended up being hilarious: https://pbs.twimg.com/media/EYXRzDAUwAMjXIG?format=jpg&name=...

(I'm particularly fond of https://i.imgur.com/ZMz2yUc.png)

The data is freely available via API: https://www.tagpls.com/tags/imagenet2012validation.json

It exports the data in yolo format (e.g. it has coordinates in yolo's [0..1] range), so it's straightforward to spit it out to disk and start a yolo training run on it.

Gwern recently used tagpls to train an anime hand detector model: https://www.reddit.com/r/AnimeResearch/comments/gmcdkw/help_...

People seem willing to tag things for free, mostly for the novelty of it.

The NSFW tags ended up being shockingly high quality, especially in certain niches: https://twitter.com/theshawwn/status/1270624312769130498

I don't think we could've paid human labelers to create tags that thorough or accurate.

All the tags for all experiments can be grabbed via https://www.tagpls.com/tags.json, so over time we hope the site will become more and more valuable to the ML community.

tagpls went from 50 users to 2,096 in the past three weeks. The database size also went from 200KB a few weeks ago to 1MB a week ago and 2MB today. I don't know why it's becoming popular, but it seems to be.

sillysaurusx · on June 10, 2020

Well, that didn't take long – our API endpoint keeled over. Luckily, you can fetch all the data directly from firebase:

  # fetch raw tag data
  $ curl -fsSL https://experiments-573d7.firebaseio.com/results/.json > tags.json
  $ du -hs tags.json
  14M tags.json

  # fetch tag metadata (colors, remapping label names, possibly other stuff in the future)
  $ curl -fsSL https://experiments-573d7.firebaseio.com/user_meta/.json > tags_meta.json
  $ du -hs tags_meta.json
  376K tags_meta.json
  $ jq tags_meta.json

Note that's the raw unprocessed data (no yolo). To get info about all experiments, you can use this:

  curl -fsSL https://experiments-573d7.firebaseio.com/meta/.json | jq

I'm a bit worried about the bill. It's up to $50 and rising: https://imgur.com/ZgmXsWU almost entirely egress bandwidth. Be gentle with those `curl` statements. :)

(I think that's due to a poor architectural decision on my part, which is solvable, and not due to egress bandwidth via the API endpoint. But it's always fun to see a J curve in your bill... It's about $1 a day right now. https://imgur.com/4gUTLO7)

foota · on June 10, 2020

Can you set it up so that it's only available via cloud? I'm sure that would bother people, but is a better alternative to losing access or you going broke :)

sillysaurusx · on June 10, 2020

We're motivated to keep this as open as possible. I really like the idea of an open dataset that continues to grow with time. If it keeps growing, then within a couple years it should have a vast quantity of tags on a variety of diverse datasets, which we hope might prove helpful.

If anyone wants to contribute, I started a patreon a few minutes ago: https://www.patreon.com/shawwn

sillysaurusx · on June 12, 2020

Heh, $1 per day? Try $1k per day. https://imgur.com/duugqHK

sillysaurusx · on June 12, 2020

We've confirmed that this was someone running `while true; curl ...`, resulting in a $3,700 bill. https://twitter.com/theshawwn/status/1271365062913961984

I guess "be gentle" means "please troll us."

zb1plus · on June 10, 2020

Perhaps you can mirror it to an s3 bucket or GH repo for people to CURL more easily?

rocauc · on June 10, 2020

I remember following this as it came out (and learning windshield wipers should be called "swipey bois")

Surprised and happy to hear you're seeing high labeling quality.

We'll re-host with credit on https://public.roboflow.ai What license is this?

sillysaurusx · on June 10, 2020

Thanks! We've decided to license the data as CC-0. We'll add that to the footer.

We don't host any images directly – we merely serve a list of URLs (e.g. https://battle.shawwn.com/tfdne.txt). But any data served via the API endpoints is CC-0.

knicholes · on June 10, 2020

I need a dataset and tags for hair, face, neck, arms, left breast, right breast, nipple, torso. Any tips? I'm training a GAN, but I need to specifically segment the parts, as I don't want nipples in the middle of a face. I don't want to have to manually annotate 1,000 images

sillysaurusx · on June 10, 2020

At the moment, the only experiments with enough data to be useful are e621-portraits (5,407 tags https://www.tagpls.com/exp?n=e621-portraits) and danbooru-e (344 tags https://www.tagpls.com/exp?n=danbooru2019-e) both of which are NSFW.

Those are also drawings/anime, not photos. We have an /r/pics experiment (SFW, 99 tags https://www.tagpls.com/exp?n=r-pics) and /r/gonewild (NSFW, 57 tags https://www.tagpls.com/exp?n=r-gonewild) but currently I haven't gathered enough urls to be very useful -- it only scrapes about 100 or so images every half hour. So there is a lack of tags right now on human photos. We also have a pps experiment (NSFW, exactly what you think it is, 306 tags https://www.tagpls.com/exp?n=pps) but I assume that's not quite what you were looking for.

If you have an idea for a dataset, you can create a list of image URLs like https://battle.shawwn.com/r/pics.txt and we can add them to the site. You can request an addition by joining our ML discord (https://discordapp.com/invite/x52Xz3y) and posting in the #tagging channel.

Also, if anyone's curious, here's how I'm measuring the tag count:

  $ curl -fsSL https://experiments-573d7.firebaseio.com/results/danbooru2019-e/.json | jq '.' | grep points | wc -l
       344
  $ curl -fsSL https://experiments-573d7.firebaseio.com/results/e621-portraits/.json | jq '.' | grep points | wc -l
      5407
  $ curl -fsSL https://experiments-573d7.firebaseio.com/results/r-gonewild/.json | jq '.' | grep points | wc -l
        57
  $ curl -fsSL https://experiments-573d7.firebaseio.com/results/r-pics/.json | jq '.' | grep points | wc -l
        99
  $ curl -fsSL https://experiments-573d7.firebaseio.com/results/pps/.json | jq '.' | grep points | wc -l
       306

Der_Einzige · on June 10, 2020

I love that it's porn (and specifically furry/hentai) which pushes the limits of image recognition and creativity within computer vision. Between this and the de-censoring tool "DeepCreamPy" I can't look most data scientists in the face anymore .

jcun4128 · on June 11, 2020

that's a great name, turning jagged edges back to smooth and applying reverse Gaussian blur /s

on a serious note, kind of interesting the authenticity/accuracy if it's just filled in... eg. turning black and white pictures back to color eg. was it actually green or blue

knicholes · on June 10, 2020

Yeah, I mean, the tagging is awesome, but I'm thinking I'll need more image segmentation than object recognition. With a segmentation map, I can make a great image->image translator.

jcun4128 · on June 11, 2020

This is really cool, thanks for sharing

jcun4128 · on June 11, 2020

> I don't want nipples in the middle of a face

There is a market somewhere

dang · on June 10, 2020

You should post this as a Show HN!

sillysaurusx · on June 12, 2020

Okay! Thank you. We appreciate the encouragement.

It looks like an HN user on an EC2 server decided to fetch data from our firebase as quickly as possible, running up a $3,700 bill. Once (or if) that's sorted out, and once we verify tagpls can handle HN's load without charging thousands of dollars, we'll add an "about" page to tagpls and submit it.

nerderloo · on June 11, 2020

How does this have anything to do with the linked article?

sillysaurusx · on June 12, 2020

The idea with the site is that you can tag your own datasets, and then get the data suitable for yolo training. We've done that ourselves to train an anime hand detector, and other users have reported similar successes. I could've been a bit clearer about that.

jcims · on June 10, 2020

Has anyone (beyond maybe self-driving software) tried using object tagging as a way to start introducing physics into a scene? E.g. human and bicycle have same motion vector, increases likelihood that human is riding bicycle. Bicycle and human have size and weight ranges that could be used to plot trajectory. Bicycles riding in a straight line and trees both provide some cues as to the gravity vector in the scene. Etc. etc.

Seems like the camera motion is probably already solved with optical flow/photogrammetry stuff, but you might be able to use that to help scale the scene and start filtering your tagging based on geometric likelihood.

The idea of hierarchical reference frames (outlined a bit by Jeff Hawkins here https://www.youtube.com/watch?v=-EVqrDlAqYo&t=3025 ) seems pretty compelling to me for contextualizing scenes to gain comprehension. Particularly if you build a graph from those reference frames and situate models tuned to the type of object at the root of each each frame (vertex). You could use that to help each model learn, too. So if a bike model projects a 'riding' edge towards the 'person' model, there wouldn't likely be much learning. e.g. [Person]-(rides)->[Bike] would have likely been encountered already.

However if the [Bike] projects the (rides) edge towards the [Capuchin] sitting in the seat, the [Capuchin] model might learn that capuchins can (ride) and furthermore they can (ride) a [Bike].

craftinator · on June 10, 2020

I've been wondering these same thoughts for years. I don't do much work in the neural network subfield, but have done a lot with computer vision, and always found myself wanting more robust physical estimation techniques that didn't require external data.

joshvm · on June 10, 2020

RGB-D based semantic segmentation is certainly a thing. I'm sure it's also been done with video sequences as well.

jcims · on June 10, 2020

Yeah I wish the flagship phone manufacturers would put the hardware back into the phone to take 3d photos...even better if you can get point cloud data to go with it. The applications right now are kind of cheesy but they will get better and if the majority of photos taken pivot to including depth information i think it could really drive better capabilities from our phones.

Eyes are very hard to make and coordinate, yet there are almost no cyclops in nature.

joshvm · on June 11, 2020

In theory you could also do this with visual-inertial odometry eg monocular SLAM. But this is definitely something we're looking at in my group (I do CV for ecology), especially for object detection where geometry (absolute size) is a good way to distinguish between two confusing classes. A good candidate here is aerial imagery. If you've calibrated the camera and you know your altitude, then you know your ground sample distance (m/px).

Most flagships can do this though, any multicamera phone can get some kind of stereo. Google do it with the PDAF pixels for smart bokeh (they have some nice blog posts about it). I don't know if there is a way to so that in an API though (or to obtain the depth map).

https://ai.googleblog.com/2018/11/learning-to-predict-depth-...

jcims · on June 11, 2020

High resolution light field cameras would really help here as well. That seems a ways off though.

Are you folks able to do any multi-spectral stuff? That seems interesting.

joshvm · on June 11, 2020

I work mostly with RGB/Thermal, if that counts. My PhD was in stereo/lidar fusion, so I've always been into mixing sensors :)

I've also done some work on satellite imaging which is 13-band (Sentinel 2). Lots of people in ecology use the Parrot Sequoia which is four-band multispectral. There really isn't much published work in ML beyond RGB, which I find interesting - yes there's RGB-D and LIDAR but it's mostly for driving applications. Part of the reason I'm so familiar with the yolo codebases is that I've had to modify them a lot to work with non-standard data. There's nothing that stops you from using n-channel images, but you will almost certainly have to hack every off the shelf solution to make it work. RGB and 8-bit is almost always hard coded, augmentation also often fails with non RGB data (albumentations is good though). A bigger issue is there's a massive lack of good labelled datasets for non rgb imagery.

On the plus side, in a landscape where everyone is fighting over COCO, there is still a lot of low hanging fruit to pick I think.

I've not done any hyperspectral, very hard to (a) get labelled data (there's AVIRIS and EO-1/Hyperion maybe) (b) it's very hard to label, the images are enormous and (c) the cameras are stupid expensive.

By the way, even satellite imaging ML applications tend to overwhelmingly use just the RGB channels and not the full extent of the data.

jcims · on June 11, 2020

Whoa that's awesome! Love hearing contemporary technology used to detect/diagnose/monitor the environment and our ecological impact. Boots on ground will always be important but the horizontal scaling you can get out of imaging I would imagine really helps prioritize where you turn your attention. Thanks for the info and best of luck!

ely-s · on June 10, 2020

There seems to be an unfair comparison between the various network architectures. The reported speed and accuracy improvements should be taken with a bit of scepticism for two reasons.

* This is the first yolo implemented in Pytorch. Pytorch is the fastest ml framework around, so some of YOLOv5's speed improvements may be attributed to the platform it was implemented on rather than actual scientific advances. Previous yolos were implemented using darknet, and EfficientDet is implemented in TensorFlow. It would be necessary to train them all on the same platform for a fair speed comparison.

* EfficientDet was trained on the 90-class COCO challenge (1), while YOLOv5 was trained on 80 classes (2).

[1] https://github.com/ultralytics/yolov5/blob/master/data/coco....

[2] https://github.com/google/automl/blob/master/efficientdet/in...

rocauc · on June 10, 2020

Great points, and hoping Glenn releases a paper to complement performance. We are also planning more rigorous benchmarking nonetheless.

re: PyTorch being a confounding factor for speed - we recompiled YOLOv4 to PyTorch to achieve 50 FPS. Darknet would likely top out around 10 FPS on the same hardware.

EDIT: Alexey, author of YOLOv4, provided benchmarks of YOLOv4 hitting much higher FPS here: https://github.com/AlexeyAB/darknet/issues/5920#issuecomment...

cgarciae · on June 11, 2020

Side note: I like Pytorch but eager pytorch is not faster the jax.jit or tf.function code

rocauc · on June 10, 2020

EfficientDet was open sourced March 18 [1], YOLOv4 came out April 23 [2], and now YOLOv5 is out only 48 days later.

In our initial look, YOLOv5 is 180% faster, 88% smaller, similarly accurate, and easier to use (native to PyTorch rather thank Darknet) than YOLOv4.

[1] https://venturebeat.com/2020/03/18/google-ai-open-sources-ef... [2] https://arxiv.org/abs/2004.10934

KMnO4 · on June 10, 2020

Those numbers are quite impressive.

YOLOv4 -> YOLOv5

Inference time: 20ms -> 7ms (on P100)

Frames per second: 50 -> 140

Size: 244mb -> 27 mb

WanderPanda · on June 10, 2020

f(x)=c, zero size, infinite fps. You should also take some accuracy metric into account ;)

yodon · on June 10, 2020

> "Similarly accurate"

normanluhrmann · on June 10, 2020

MS COCO looks to be improved even overall.

OT: the stats above should be part of the PyTorch marketing material, indeed impressive

eeZah7Ux · on June 10, 2020

> open sourced

This is not a verb.

dang · on June 10, 2020

It's behaving like one:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

sudosysgen · on June 10, 2020

Arguably it became one.

ma2rten · on June 10, 2020

In February 2020, PJ Reddie noted he would discontinue research in computer vision.

He actually stopped working on it because of ethical concerns. I'm inspired that he made this principled choice despite being quite successful in this field.

https://syncedreview.com/2020/02/24/yolo-creator-says-he-sto...

gok · on June 10, 2020

Er so this "Ultralytics" consulting firm just borrowed the name YOLO for this model and didn't actually publish their results yet?

yeldarb · on June 10, 2020

Yeah, they made the most popular PyTorch implementation of YOLOv3 as well so they're not entering out of the blue, though. https://github.com/ultralytics/yolov3

The author of YOLOv3 quit working on Computer Vision due to ethical concerns. YOLOv4, which built on his work in v3, was released by different authors last month. I'd expect more YOLOvX's from different authors in the future. https://twitter.com/pjreddie/status/1230524770350817280

sonofaragorn · on June 10, 2020

I'm a bit fascinated by this Ultralytics. It has super nice website but according to LinkedIn, I think it's just one guy who does consultancy.

The intriguing part is that he has also done research in particle physics (as Ulatrlytics) that has been published in Nature [1].

I had never seen anything like that.

[1] https://www.nature.com/articles/srep13945

david_draco · on June 10, 2020

> In February 2020, PJ Reddie noted he would discontinue research in computer vision.

It would be fair to state also why he chose to discontinue developing YOLO, as it is relevant.

rememberlenny · on June 10, 2020

Two interesting links from the article:

1. How to train YOLOv5: https://blog.roboflow.ai/how-to-train-yolov5-on-a-custom-dat...

2. Comparing various YOLO versions https://yolov5.com/

boscon · on June 10, 2020

Latency is measured for batch=32 and divided by 32? This means that 1 batch will be processed in 500 milliseconds. I have never seen a more fake comparison.

bcatanzaro · on June 10, 2020

Why benchmark using 32-bit FP on a V100? That means it’s not using tensor cores, which is a shame since they were built for this purpose. There’s no reason not to benchmark using FP16 here.

joshvm · on June 10, 2020

Not sure about the benchmark, but the code includes the option for mixed precision training via Apex/AMP.

bcatanzaro · on June 11, 2020

If you click around enough you’ll see they benchmarked in 32-bit FP. Glad they have a mixed precision training option but I really think it’s a mistake in 2020 to do work related to efficient inference using 32-but FP.

The problem is that your conclusions aren’t independent of this choice. A different network might be far better in terms of accuracy/speed tradeoffs when evaluated at a lower precision. But there is no reason to use 32-but precision for inference, so this is just a big mistake.

hnarayanan · on June 10, 2020

What does it take to now use this name?

newen · on June 10, 2020

Yeah, it's pretty unethical. Looks like they just stole the name without any care. There doesn't seem to be any relationship between these guys and the original YOLO group.

bonoboTP · on June 10, 2020

If it's not trademarked, perhaps not much? I think it's pretty misleading, but the fight for attention is on! Using an established brand in your title will get more clicks.

ulam2 · on June 10, 2020

Just guts, i guess. You would also need to show some real performance improvement.

darknet-rider · on June 16, 2020

I really like the work done by AlexAB on darknet YOLOv4 and the original author Joseph Radmon with YOLOv3. These guys need a lot more respect than any other version of YOLO.

heisenburgzero · on June 11, 2020

This is not the first time something is fishy. Back in the early stages of the repo. They were advertising on the front page that they are achieving similar MAP to the original C++ version. But only to be found out they haven't train it on COCO dataset and test it.

0xcoffee · on June 10, 2020

Is it possible to run these models in the browser, something similar to tensorflow.js?

m00dy · on June 10, 2020

I would try convert to ONNX model and then try to infer with tensorflowjs.

ebg13 · on June 10, 2020

It looks like this is YOLOv4 implemented on PyTorch, not actually a new YOLO?

bArray · on June 10, 2020

YOLO is a neural network, Darknet is the framework. Without both YOLOv4 and "YOLOv5" on the same framework, it makes it near impossible to make any kind of meaningful comparison.

franciscop · on June 10, 2020

I am very interested on loading YOLO into a Raspberry Pi + Coral.ai, anyone knows a good tutorial on how to get started? I tried before and with Darknet it was not easy at all, but now with pytorch there seem to be ways of loading that into Coral. I am familiar with Raspberry Pi dev, but not much with ML or TPUs, so I think it'd be mostly a tutorial on bridging the different technologies.

(might need to wait a couple of months since this was just released)

kuzee · on June 10, 2020

Just read this. Nice overview of the history of the "YOLO" family, and summary of what YOLOv5 is/does.

DEDLINE · on June 10, 2020

Does anyone know of an open-source equivalent to YOLOv5 in the sound recognition / classification domain? Paid?

fattire · on June 10, 2020

Like it would identify what you're hearing? "Trumpet!" "Wind whistling through oak leaves!" "Male child!" etc?

DEDLINE · on June 10, 2020

Ubicoustics [1] would be the closest example to what I am looking for in a FOSS / Commercial offering. Is anyone working on this?

[1] https://github.com/FIGLAB/ubicoustics

qchris · on June 10, 2020

If anyone's interested in the direct GitHub link to the repository: https://github.com/ultralytics/yolov5

travisporter · on June 10, 2020

Hm on this page it has something written in an eastern language under YOLO, https://github.com/ultralytics says Madrid, Spain, but then they say "Ultralytics is a U.S.-based particle physics and AI startup"

hikarudo · on June 10, 2020

"You only look once" in Chinese.

heavyset_go · on June 10, 2020

I like to think that the name is also a reference to the fact that this will inevitably be used in some autonomous driving systems.

tapatio · on June 10, 2020

Less weights, more accuracy. Magic :)

osipov · on June 10, 2020

Just recently IBM announced with a loud PR move that the company is getting out of the face recognition business. Guess what? Wall Street doesn't want to keep subsidizing IBM's subpar face recognition technology when open source and Google solutions are pushing the state of the art.

ahelwer · on June 10, 2020

Not something to brag about. Facial recognition has very few applications outside of total surveillance. We should not respect those who lend it their time and effort.

monocasa · on June 10, 2020

I thought the real focus on the bad actors at this point was on gait detection. Works in civil unrest situations where everyone covers their face.

Not the the difference matters that much.

anewdirection · on June 10, 2020

Its not exclusive. Bad actors are working on whatever they are paid to build, by other bad actors with less technical acumen and more money.

Edit: I should add, that most of the actual progress is being made by smart people who think its an interesting problem and are unaware or uncaring of the clear outcome of such tech.

monocasa · on June 10, 2020

In this case I meant bad actors as in who is funding the research with the idea of increasing surveillance for the purpose of squashing dissent.

ericd · on June 10, 2020

Being able to distinguish between people is pretty foundational to being able to personalize AI applications. If you wanted to make a smart home actually smart and not just full of inconvenient remote controlled appliances, this is pretty necessary.

There are obviously privacy concerns with this example, it’d ideally be fully on-prem.

jcims · on June 10, 2020

>Facial recognition has very few applications outside of total surveillance.

That's not really for you to decide, is it? You're absolutely free to have that opinion of course.

>We should not respect those who lend it their time and effort.

Also your choice of course. Facial recognition is essentially a light integration of powerful underlying technologies. Should 'we' ostracize those working on machine learning, computer vision, network and distributed computing, etc?

lm28469 · on June 10, 2020

The question is always the same: is every technical/scientific progress desirable ? But it seems that this question isn't asked anymore, "move fast and break things" am I right ?

I'm much more worried about people using your arguments to try and shut down the discussion than people trying to open the debate, because once the mass surveillance/face recognition mass adoption pandora's box is open there won't be any way to go back.

When I see predator drones and FBI stingray planes above every major us cities during protests I already know we're not going in the "let's talk about this before reaching the point no return" direction.

wiz21c · on June 10, 2020

>>> When I see predator drones and FBI stingray planes above every major us cities during protests

Can you provide evidence for this ? Not that I doubt it, but if I want to tell other people that story, I must have evidence to be believed :-)

edit: ah of course, 30 seconds of duckduckgo just provide the info I need : https://thehill.com/homenews/house/501445-democrats-press-dh...

lm28469 · on June 11, 2020

Yes and plenty of other sources:

https://www.aclu.org/blog/privacy-technology/surveillance-te...

https://www.wired.com/2016/01/california-police-used-stingra...

https://chicago.cbslocal.com/2014/12/06/activists-say-chicag...

Once the tech is out there it's simply a question of "when" will it be used for borderline illegal activities, especially in the US where you have these different entities (fbi, cia, nsa, dea, &c.) basically acting in their own bubble and doing whatever they want until it's leaked and/or gets outrageous enough to get the public attention.

I mean, there were unidentified armed forces marching in US streets last week, if people don't se this as the biggest red flag in recent US history I don't know what they need.

https://www.vice.com/en_us/article/akzvy8/unidentified-law-e...

https://www.nbcnews.com/politics/white-house/i-was-horrified...

TehCorwiz · on June 10, 2020

You didn't really address the author's point which was that there don't appear to be compelling uses of facial technology beyond mass automated surveillance.

I can't think of other uses and I'd be interested if you can come up with some.

homarp · on June 10, 2020

Some compelling use:

1) assistance to recognizing people (because low vision, because memory fails, because you have a lot of photos...)

2) ensure candidate X is actually candidate X and not a paid person to take the exam in name of candidate X

3) door access control (to replace/in addition to access card)

4) having your own X-Ray (like in Amazon Prime): identify an actor/actress/model

5) having your personal robot addressing you by name

heavyset_go · on June 10, 2020

> 2) ensure candidate X is actually candidate X and not a paid person to take the exam in name of candidate X

Can you imagine the bureaucratic nightmare that would be unleashed upon yourself if "the system" decides you aren't who you say you are because of the way you aged, an injury, surgery or a few new freckles?

This already happens sometimes with birth certificates and identity theft, and it's awful for those who have to experience it. I'd hate to have a black box AI inflicting that upon others for inexplicable reasons.

homarp · on June 10, 2020

also, having models that can run locally on "cheap enough" devices is also quite interesting that you more or less understand.

Compared to using API in the cloud or purchasing Hikvision cameras.

tropshop · on June 10, 2020

Just getting into this. Do you recommend any particular "dumb" camera devices with a quality stream?

homarp · on June 11, 2020

what's your price range? indoor or outdoor? where do you want to do the inference?

canada_dry · on June 10, 2020

Here's one: identifying child soldiers.

[i] https://www.pyimagesearch.com/2020/05/11/an-ethical-applicat...

nitrogen · on June 10, 2020

Biometric authentication is one that comes to mind. Facial recognition running locally on my own photo library would also be useful for organizing photos. A cloud-free local-only home automation system that can tell the difference between owners/housemates/guests and customize behavior accordingly would also be nice.

sk0g · on June 10, 2020

I'm looking into YOLO for this, but it's moreso to verify your selfie == image on document, and we want to avoid sending highly sensitive information to third party providers.

The current service we use, while accurate, costs 50 cents per verification...

Edit: reading through this thread, if the model isn't super massive, we could offer on-browser verification! 27MB is still a hefty download though.

sterlind · on June 10, 2020

Labeling photos with who's in them, either on social media or in private image albums (creating indexes of people in your photos.)

Arguably this is a front for mass surveillance, or can easily be misused for that, but the ostensible purpose is separate and benign.

descenter · on June 10, 2020

> Facial recognition is essentially a light integration of powerful underlying technologies. Should 'we' ostracize those working on machine learning, computer vision, network and distributed computing, etc?

Couldn't you argue the same way against just about any kind of IED or booby trap? Yet people tend to ostracize those who make them more than they do people who make ball bearings and nails.

jcims · on June 10, 2020

Who in Iraq was getting ostracized by their family and friends for making IEDs?

hopfscotch2 · on June 10, 2020

If they want to use it to enable surveillance, yes.