Show HN: TechYaks – Best of 50k tech talks ranked by confidence intervals

yaj54 · on Sept 14, 2018

Hey HN, here’s the result of gathering as many tech talks as I could find and then trying a bunch of ranking heuristics to find one that produced reasonable results. I’m currently using the lower bound of each talk’s Wilson score confidence interval based on likes and dislikes.

I find good tech talks to be a combo of entertainment and broadening my toolbox of programming concepts. When I hit good talk I generally do a double take, “wha, I have not thought that way before.” There are definitely some good talks in this list.

There are a few great “Awesome Talks” lists that I’ve enjoyed perusing, but I’ve found that talk title intrigue does not seem to correlate with talk quality. So in lists of 50+ talks I have a hard time finding the “next best talk”.

I’m keen to get feedback on the site as is but also if there is interest in a “top tech talks” in the last month (or X unit of time) style of digest.

Hopefully there is a talk in here that gives you a double take.

Enjoy, ~yaj

e12e · on Sept 14, 2018

Maybe these are mostly too new, or you have different (more practical, hands on?) definition of tech-talk - but from a quick look the only speakers I expected - and found - were Sandi Mets and Rob Pike.

If it's practical, I'm surprised not to see the js "wat" lightning talk (which I now can't seem to find...).

If it's more general "best of", I'd expect something like Guy Steele "growing a language" : https://youtu.be/_ahvzDzKdB0

Douglas Engelbart "the mother of all demos": https://youtu.be/yJDv-zdhzMY

Alan Kay "doing with images makes symbols": https://youtu.be/p2LZLYcu_JY Or, if that's too long, the much more condensed ted talk: "a powerful idea about teaching ideas": https://youtu.be/Eg_ToU7m1MI (Maybe that's not a "tech talk"?)

Rich Hickey "simple made easy" : https://youtu.be/34_L7t7fD_U

To name a few of the top of my head.

yaj54 · on Sept 14, 2018

You're right those are all great talks (and fit in my definition of a tech talk). I just checked, and none of them are in my dataset, which I'll admit I'm surprised about. But they (and related ones) will make it into the next round.

Tuna-Fish · on Sept 15, 2018

The issue seems to be that they are not typically watched on youtube. For example, the "simple made easy" linked above is a low-quality pirate youtube copy, the proper place to watch it is here:

https://www.infoq.com/presentations/Simple-Made-Easy

bcbrown · on Sept 14, 2018

> js "wat" lightning talk (which I now can't seem to find...)

https://www.destroyallsoftware.com/talks/wat

lgregg · on Sept 15, 2018

I really enjoyed that. Thanks for finding it.

O_H_E · on Sept 15, 2018

Just watched Sandi Mets talk because of your comment, and it sucks to be forced to go study biology/sociology/literature after this ;)

chrisweekly · on Sept 14, 2018

>"interest in a “top tech talks” in the last month (or X unit of time) style of digest."

Yes!

svl · on Sept 14, 2018

Is your dataset limited to only videos on youtube? The Fronteers conference has been publishing its videos on vimeo, and that includes some really "awesome" ones: https://vimeo.com/fronteers/videos/sort:plays

yaj54 · on Sept 14, 2018

It is currently, but I'd like it not to be. One issue is my current ranking alg uses both likes and dislikes, but Vimeo only does likes, so I can't currently cross-compare between youtube and Vimeo without switching up my ranking alg. I'm curious about trying a version of my current alg that just uses likes and views though, which would be more portable.

nerdponx · on Sept 14, 2018

You could also use "net promoter" scoring: fraction of likes minus fraction of dislikes. I don't think it has any theoretical basis but the NPS system [0] is fairly popular.

[0]: https://www.netpromoter.com/

ekianjo · on Sept 15, 2018

NPS is one of the worst metrics I have ever seen in analytics. Especially sucks when you try to made predictive models to understand what drive NPS.

LittlePeter · on Sept 14, 2018

Note that in NPS there is third category: the passives. In case of likes there are only two.

yaj54 · on Sept 14, 2018

I'm not very familiar with NPS but I also have view count, so passives could be considered to be (view_count - likes - dislikes) right?

LittlePeter · on Sept 14, 2018

Yeap! Views are also done by promoters and detractors. Assuming one view per person you could get number of passives as: passives = views - promoters - detractors. Once you have passives, you can compute the NPS.

comboy · on Sept 14, 2018

Do you know of some generalization that instead of just positive and negative ratings would work with real numbers? E.g. rating could be anything between 0 and 1.

Great job btw.

yaj54 · on Sept 14, 2018

My current ranking stuff knowledge is coming from Evan Miller's blog, here's a post that deals with star ratings: https://www.evanmiller.org/ranking-items-with-star-ratings.h...

jules · on Sept 14, 2018

Have you tried the simple Bayesian approach with a Beta prior? [1] I'd be interested to learn how it does.

  pretend_upvotes = 4
  pretend_downvotes = 4

  def score(item_upvotes, item_downvotes):
    upvotes = item_upvotes + pretend_upvotes
    downvotes = item_downvotes + pretend_downvotes
    return upvotes / float(upvotes + downvotes)

[1] http://julesjacobs.github.io/2015/08/17/bayesian-scoring-of-...

yaj54 · on Sept 14, 2018

Interesting, I like the simplicity of that. Do you have any info how to determine good initial values for the prior? In this example good values for pretend_up and pretend_down? Would it make sense to use average_upvotes and average_downvotes or values that have that ratio?

jules · on Sept 14, 2018

Values that have that ratio might be good, but I'm not sure about the magnitude because maybe the average number of votes is too high so that the prior overwhelms the data. The scores get pulled towards that ratio as you increase the magnitude. If the ratio is close to 0 it has the effect of downranking videos with few votes, and if the ratio is close to 1 it has the effect of upranking videos with few votes. The effect might be too strong if you use the average magnitude. It might also be good to set the ratio a bit lower than the average ratio if you want to rank conservatively.

Parametrising it like you suggest might make it easier to experiment:

  ratio = 0.5
  number = 100
  pretend_upvotes = ratio*number
  pretend_downvotes = (1-ratio)*number

You could even set ratio to 0, but I actually think it makes sense to rank 1 up / 2 down above 101 up / 200 down, because the latter is definitely bad whereas the former might be good.

achompas · on Sept 15, 2018

You can either estimate the prior as part of a hierarchical model, or use empirical Bayesian estimation. I spoke last year about an example of EBE applied to music trends:

https://mobile.twitter.com/achompas/status/88732699382138880...

thanatropism · on Sept 14, 2018

You probably want a Beta distribution, which has finite support in [0,1] and also doubles as the conjugate prior for the binomial.

comboy · on Sept 14, 2018

The good stuff is here[1]. But it sucks that the paper[2] is from 1927 and I still have to pay to get access to it..

1. https://www.evanmiller.org/how-not-to-sort-by-average-rating...

2. Probable Inference, the Law of Succession, and Statistical Inference Edwin B. Wilson

ssl232 · on Sept 14, 2018

While I wholeheartedly agree about academic paywalls sucking, the paper PDF was literally the first result on DuckDuckGo [1, 2], at least for me!

[1] https://duckduckgo.com/?q=Probable+Inference%2C+the+Law+of+S...

[2] http://www.barestatistics.nl/uploads/1/1/7/9/11797954/wilson...

Agnu · on Sept 14, 2018

The paper is available on Sci-Hub.

dmos62 · on Sept 14, 2018

Also Library Genesis (http://libgen.io/) should be on everyone's list of places where papers (and books) can be found for free.

yesenadam · on Sept 15, 2018

http://libgen.io/scimag/index.php?s=Probable%20Inference,%20...

aviv · on Sept 14, 2018

Too many people get hung up on the perfect scalable tech stack. Most people don't need to waste more time watching another tech talk. What they need is more business skills and how to make money.

th0ma5 · on Sept 14, 2018

I was about to complain that my PyOhio video about dot matrix printers was missing, but it is in the 2-month Python section.

So now that I'm ashamed of my ego... there were several talks that were much better from that conference and it is a shame they didn't get more views. Mine was picked up by Hack-a-Day so it got a boost, but many of the other talks were better in all kinds of measures like amount of content, social relevancy, etc... It sucks that we still don't have a good system other than view count!

Great project, however, well done regardless.

sweezyjeezy · on Sept 14, 2018

The disguise on your self-plug is pretty thin man...

bastijn · on Sept 15, 2018

He didn't link it. That's bonus to the stealth check. The DM approves.

robax · on Sept 14, 2018

This looks awesome! I haven’t heard of most of these talks so I’m excited to dive in this weekend.

Also, kudos on the UI. It’s minimal, easy to use, and works as advertised. Good software!

O_H_E · on Sept 15, 2018

> It’s minimal, easy to use, and works as advertised. Good software

It sucks that these are rare nowadays.

Maybe I could start a suckless list :)

mongol · on Sept 14, 2018

Nice idea. Would like to see extended to broader topics. For example on talks about history.

garysieling · on Sept 14, 2018

I built something with a broader dataset but a different ranking technique, basically a bunch of cruder custom ranking rules.

https://www.findlectures.com

mongol · on Sept 14, 2018

Seems quite popular right now!

garysieling · on Sept 15, 2018

Oops, back up now!

mongol · on Sept 18, 2018

The links in search result don't work on Android (neither Chrome nor Firefox).

yaj54 · on Sept 14, 2018

Do you mean history outside of tech? Or history of (software / CS) tech? A lot of times my favorite talks are historical perspective (tech) talks, so I would also like to get more of those in there. A "history" topic filter would be sweet even on the talks I'm already searching but categorizing by topic is not something I've tackled here yet.

mongol · on Sept 14, 2018

I mean "regular history". About ancient civilizations, bronze age collapse, Napoleon, the Russian revolution, things like that. It is a personal preference for me to listen to such talks, but I think there could be interest in many other topics as well. I have come to the conclusion that apart from literature, talks in academic settings are more likely to be interesting, compared with documentaries produced for television.

yaj54 · on Sept 14, 2018

Cool. The hard part is collecting the list relevant talks, the easy part is ranking them. If there is a way to get the right set of academic history talks it wouldn't be that hard to go from there.

I haven't watched much historical lectures, but I do enjoy documentaries and historical fiction. Another idea I've kicked around is a site that organizes a bunch of historical documentaries and historical fiction into a geo-timeline visualization. And include the ability for people to comment on and discuss the historically accuracy of the films. But that's a different project...

christudor · on Sept 15, 2018

What kind of stuff do you listen to at the moment? Where do you go to find these kind of talks?

angel_j · on Sept 14, 2018

One thing I wonder about ratings is, are you measuring a user's individual interests, or whether they think the content is good / bad / shareable?

And what do the users think they are ranking?

Personally, I'd rather be ranking for what interests me, so that I get more of that; but I feel most networks are trying to extract a different signal, and that this produces crappy recommendations and useless rankings.

icc97 · on Sept 14, 2018

Excellent set of links, plus I love the simple UI.

wolco · on Sept 14, 2018

This is great. The formula to determine the best is going to miss the most recent talks if likes/dislikes are what is used.

yaj54 · on Sept 14, 2018

True-ish. Since the formula uses confidence intervals[0] it does better with recent talks than a simple (likes - dislikes). But you're right, breaking into "all time" status is tough because there are a number of talks with many hundreds of likes and 1 or 2 dislikes.

0: https://www.evanmiller.org/how-not-to-sort-by-average-rating...

platz · on Sept 14, 2018

It's telling that 95% of the most upvoted talks from 'all time' are all after 2010

yaj54 · on Sept 14, 2018

I'll admit, "all time" is slightly inaccurate. Actually, now I'm curious how far back my data actually goes.

platz · on Sept 14, 2018

In fairness, YouTube hasn't been around forever either, so I'm not sure the difference between the upload date and the talk date is actually surfaceable

DerSaidin · on Sept 14, 2018

Would be awesome if they were tagged, so I could look at topics I'm interested in.

zerr · on Sept 14, 2018

Any 70s/80s/90s talks?

angersock · on Sept 14, 2018

Uncle Bob's stuff missing on purpose?

https://www.youtube.com/watch?v=WpkDN78P884 was kinda a classic for the Rails community.

yaj54 · on Sept 14, 2018

Not on purpose at all, I've learned a lot from Uncle Bob. 50k talks is certainly not all of the talks ever, and there are many good talks not included in the 6 lists I've generated (which is a total of only about 230 talks). I'd like to add more filters to be able to hone in on top talks for different contexts.

suyash · on Sept 14, 2018

What is "50k"? What's your criteria on selecting talks?

yaj54 · on Sept 14, 2018

50k is 50,000. Which is a lot of talks, but certainly not all of them. My abstract criteria for a talk being considered a "tech talk" is a live lecture given from the front of a room to a live multi-person audience with a subject matter related to computing. The 50k is just the number my hacky talk finder script found.

suyash · on Sept 14, 2018

so your pool is limited 50k talks ?

dstick · on Sept 14, 2018

Nice! Is it possible to implement some kind of search based filter?

Just a simple string match would be awesome :)

yaj54 · on Sept 14, 2018

Not at the moment on the site, but there is a lot of potential there for sure. I do have that capability on my local copy of the data and it's fun.

icc97 · on Sept 14, 2018

If it's open source they could use algolia.

rb808 · on Sept 14, 2018

awesome awesome. I've given up going to meetups and conferences, looking at youtube is so much better for finding good content.

I'd love other languages and software topics too.

drexlspivey · on Sept 15, 2018

That David Beasley talk at no 6 is sick. Dude is a wizard.

ILikeConemowk · on Sept 15, 2018

Nice!

Which sources did you focus on?