Show HN: Wiby is now free software

nullwarp · on July 9, 2022

Wiby is amazing and one of my favorite ways to discover the more unique side of the internet. I love to kill time by clicking "Surprise me" and half the RSS feeds I subscribe to are from this!

Thank you so much for making Wiby! I have truly enjoyed what its brought!!

wibyweb · on July 9, 2022

Thank you! Makes me happy to see people enjoying it.

gala8y · on July 9, 2022

Only the first search brought up something very, very interesting for me [0] (bonus for looks, of course). I love and miss the old internets. Happy to know there is Wiby! (I use Millionshort and Marginalia.)

[0] https://sadgrl.online/blog/entries/selfinterview.html

marginalia_nu · on July 8, 2022

Good luck, and interesting approach!

How large is your index, by the way, like in terms of documents and gigabytes?

wibyweb · on July 9, 2022

Thank you, my index is puny compared to yours and that is because I only index the page submitted by guests and don't go any further. The index is in the 10's of thousands. The hyperlink crawling feature was added and tested out specifically for the release, as I understand that some people will want to use that heavily and to build up a much larger index. My computers were super cheap and handle my puny index well, because its puny. I have no idea how well my approach compares to others in terms of performance.

alexb_ · on July 9, 2022

This is really amazing, and can allow for niche communities to build up "useful links" that can be easily searched, WITHOUT having to use outside search engines.

>There are several forms to control the search engine. There is no central form linking everything together, just a collection of different folders that you can rename if you want.

Maybe a good addition would be /admin/ that links all of the admin functions.

rav3ndust · on July 8, 2022

This is awesome - thanks for sharing. I just so happen to have a domain left lying around for the Searx instance I never spun up, so I might end up using it for a weekend project with this.

Love Wiby, been playing with it awhile now. I especially enjoy the "surprise me" button from time to time.

wibyweb · on July 9, 2022

Thank you, I do hope you get around to building a wiby engine!

flas9sd · on July 9, 2022

the guide linked in the Readme is comprehensive and a good read - also gives instructions on who to scale and distribute load.

If the author can answer: how big did the fulltext table became for x entries on wiby.me - and what is a common response time on N amount of searches per minute for this dataset?

Would you offer a /traffic or /stats page within about/ ? duckduckgo shows traffic, not index stats though.

I don't see it in the windex schema yet, but would be interesting what the actual hitrate is on a search corpus, how many clickthroughs there are for any used search term. Answering these kind of questions adds computation and record keeping though.

Thanks for open sourcing, it's an interesting mix of languages and tools!

wibyweb · on July 9, 2022

>how big did the fulltext table became for x entries on wiby.me

I want Wiby to be comprised mainly of human submitted pages, so for 99% of the index, only the pages submitted by users are indexed and no further crawling was done. However I recognized that not having the capability to crawl through links would not make it useful for others, so I added in the crawling capability to my liking and tested it accordingly. I imagine others might want to depend heavily on hyperlink crawling for their use case, but there is a tradeoff in the quality of the pages that get indexed and the resources they require.

>and what is a common response time on N amount of searches per minute for this dataset?

Hard to say exactly as I haven't run many benchmarks, but my goal is to keep multi-word queries to within about a second. Single-word queries are very fast. My 4 computers handle hundreds of thousands of queries per day because Wiby is being barraged by a nasty spam botnet with thousands of constantly changing IPs. If I don't keep them in check they will eventually eat all the CPU availability.

>Would you offer a /traffic or /stats page within about/ ? duckduckgo shows traffic, not index stats though.

Probably not on mine since I don't get enough traffic for it to be of that much interest to me. I privately use goaccess to get a general idea of daily traffic.

em-bee · on July 10, 2022

i like this approach as a possible use for a personal searchengine, that only has stuff that i have been looking at. for that it would be helpful to have some kind of browser extension that can autosubmit everything in my history. ideally that extension would also autoaccept every submission so that it can work fully in the background without my intervention.

also helpful would be a whilelist/blacklist feature, say, wikipedia and stackoverflow may always be autoaccepted while certain other sites may always be rejected, and the rest go through the regular review process.

then i can use that as my default search engine and branch out when i don't find what i am looking for. for that it would also be cool if there could be a way to search wiby and another search engine in paralell and display like 5 results from each.

wibyweb · on July 12, 2022

Perhaps you can develop such a browser extension. Sounds like a very good idea actually.

closedloop129 · on July 9, 2022

>Wiby is being barraged by a nasty spam botnet with thousands of constantly changing IPs.

Short of having a private beta like kagi, how else could those bot nets be excluded? How difficult is it to create a white list of uninfected IPs?

marginalia_nu · on July 9, 2022

For what it's worth, I ended up putting Marginalia Search behind Cloudflare to deal with what I assume is the same group. At worst I saw 30k queries per hour.

arboles · on July 9, 2022

Who the hell has the incentive to shut down by force small, independent search engines? Competitors?

marginalia_nu · on July 9, 2022

My unsubstantiated hunch based on looking at the types of queries, which at least for me were over-specified as all hell and within the sphere of pharmaceuticals, e-shopping and the like, is that they're gambling on the search engine being backed by Google or Bing, and they're effectively trying to poison their typeahead suggestion data.

I'd guess they're just aiming their gatling gun at whatever sites has an opensearch specification without much oversight.

It's also crossed my mind it might be some sketchy law firm looking for DMCA violations, since a fair bunch of the queries looked like they were after various forms of contraband. Seems weird they'd use a botnet though. Like most of the IPs seemed to be like enterprise routers with public facing admin pages and the like. Does not seem above board at all.

prox · on July 9, 2022

What is the botnet owner thinking of gaining from a small potatoes search engine? Seems rather futile?

wibyweb · on July 10, 2022

I wish I knew. They have nothing to gain. Its effectively a DDoS attack.

rambambram · on July 8, 2022

Indeed interesting! I might have missed it, but is this 'only' for building an index? Or also for searching through the results in some way?

For now, I'm definitely adding wiby.me to my list of recommended search engines.

wibyweb · on July 9, 2022

With this code you will start out with a blank index and will have to start making submissions to your search engine to build the index, but you can search the results as soon as the pages get crawled. The video demo provides a practical example.

gala8y · on July 9, 2022

Gotta love the ASCII image on the front page.

https://wiby.me/ https://imgur.com/a/vCg0Zuq

solarkraft · on July 9, 2022

Awesome, thank you for publishing it!

The internet, somewhat ironically, really needs a search engine that works in the current day. You can't find anything anymore. It's like Google has been un-invented.

Hopefully some day soon the internet will be searchable again.

nnt38 · on July 9, 2022

Google works fine for me and seemingly most of the population.

stavros · on July 9, 2022

I'll second the original comment, for me Google is trash that won't find anything except blogspam and mainstream sites.

Read a thing on a blog once? Good luck, you're never finding that again.

password4321 · on July 8, 2022

What is available to share crawls?

There would be a lot of trust required to use the data for anything but things like Common Crawl save a lot of time. Does Wiby support starting with that?

wibyweb · on July 9, 2022

A group of people would have to band together to share the same table (windex) with each other. Personally I am interested in seeing people try to cultivate their own niche indexes instead of working towards a common one.

jdjdjdjdjd · on July 9, 2022

This looks really great, nice work. Wish it could be installed on basic shared hosting though.

2Gkashmiri · on July 9, 2022

is this project like yacy? i mean run your own crawler to build your own search engine? nice work

wibyweb · on July 10, 2022

Not sure how yacy works exactly except that its some sort of grand decentralized search engine project that is somehow all connected together.

umen · on July 9, 2022

clean simple code . is it used in production any where ?

unicornporn · on July 9, 2022

https://wiby.me

Komodai · on July 8, 2022

Wow!

Thank you so much for this, I will for sure try and play around with this.

Is it possible to run the Crawler behind like socks5 proxy or such?

wibyweb · on July 9, 2022

You can certainly change the crawler's database connection from "localhost" to an IP address on a different machine, but I am unsure how that works with that type of proxy (I had to look up what a socks5 is). Sounds like it can work though.

Komodai · on July 9, 2022

Well what I meant was, say I have a server with the ip 13.223.12.212 and I want to run the crawler there, however would like to crawl the actual websites with the ip 23.215.23.15 (aka my proxy, socks5 being one of the several protocols to do it)

If you get what i mean :P

I assume it's possible if I just change some of the curl options in the crawler code

wibyweb · on July 10, 2022

Do let me know if you succeed