Hacker News new | past | comments | ask | show | jobs | submit login
Startpage.com: Privacy-oriented search engine (startpage.com)
134 points by activatedgeek on Jan 10, 2021 | hide | past | favorite | 84 comments



A caveat: I stopped using StartPage after it sold to an advertisement firm and switched to https://lite.duckduckgo.com/ instead. The sale doesn't necessarily mean StartPage is any less private (because you can sell to an ethical advertising firm, why not) but something to keep in mind.

Ref: https://www.reddit.com/r/privacy/comments/di5rn3/startpage_i...


Also worth noting is that since the acquisition, Startpage has added these support pages:

- Startpage CEO Robert Beens discusses the investment from Privacy One / System1 [1]

- What is Startpage's relationship with Privacy One/System1 and what does this mean for my privacy protections? [2]

- What is the Startpage privacy-guarding data flow? [3]

Some further context [4].

[1] https://support.startpage.com/index.php?/Knowledgebase/Artic...

[2] https://support.startpage.com/index.php?/Knowledgebase/Artic...

[3] https://support.startpage.com/index.php?/Knowledgebase/Artic...

[4] https://blog.privacytools.io/relisting-startpage/


I would highly recommend Searx instead. You don't have to host your own instance either, there are many available at https://searx.space.

It's essentially a "proxy" search engine for many different ones. It has some really cool features aas well as a dark mode.


I like searx quite a bit. I would ise it exclusiveley if there was a well functioning instance available. Unfortunately some features (search for files, search social media) didn't work on the instances I've tried, and there seems to be some issue with setting it as your default search engine on android. For me it works fine for a few searches, then at a certain point searchong for anything from the browser bar just redirects you to the sites homepage and you have to start over there. Local results are a bit lacking but this is essentially by design and adding a zip code or whatever usually helps.


IIRC if you disable POST requests, Chromium should show the searX instance in the Search Engines settings menu.


Thanks for the tip on DDG lite, never heard of it. Sounds like it reduces assets from 2mb to 33kb and makes fewer calls to populate the results. Will have to use it for a bit and see if result quality is comparable to standard ddg

https://lifehacker.com/use-duckduckgo-lite-for-absurdly-fast...


There is also something in between lite and full version. No JavaScript. https://html.duckduckgo.com/


Why not use both?

Putting aside what happens if one allows Javascript and uses "modern" browsers, Startpage generally does not seem to require any more data from users than DDG. A small shell script can be used to search Startpage or DDG (or almost any other search engine) from the commandline without sending any unecessary data, like unecessary headers, cookies or hidden form variables. The best part is by not using the "modern" browser to send the search, one can easily automate editing the results page before viewing it in a browser, discarding all the cruft. I like to just return the URLs. (I notice that Startpage also (a) supports HTTP/1.1 pipelining, e.g., multiple page requests over a single TCP connection; Google does not and (b) allows bans to be overcome by solving an easily read captcha and this seems to prevent further bans; Google imposes automatic temporary bans that cannot be overcome by solving a captcha.)

The biggest problem I see with the major search engines and these minor search engines that repackage results from the major ones is that they are too often limiting the number of results returned. For example, Google limits to something like 200-300. In the early days of the web, search engines used to brag about how many pages were searched, and they proved their claim by how many results they returned. Today search engines want to localise and limit the results. Not to mention promoting their own websites. I also notice repeated searches where one is collecting the total results not simply the first page yield different results.

Not every query is a question and not every user is interested in an instantaneous "answer" or the most popular website. That type of quick searching certainly has its place but it is not "research" and will not lead users to learn much about what actually exists on the web, or how to think critically about the web's content. Some users may want to search for pages and then evaluate the pages themselves. Exploration and discovery. Those users are treated as "bots" in order to justify what can only be anti-competitive practices. The sad consequence of this "limiting" behaviour is to keep curious users from ever learning what actually exists on the web (versus what a "search engine" decides to promote, or demote).

I have been playing around with Common Crawl data and it seems woefully circumspect in its scope. A web index should be public information but these search engines sure as heck do not treat it as such.


You definitely have some useful tips here.

But with or without a shell script, Startpage keeps your privacy. Our web app acts as a proxy between your endpoint and the rest of the web. We couldn't collect user data even if we wanted to. You deserve an explanation, and here's why:

Startpage is delighted that users are conscientious about our privacy practices, as they should be. Privacy-aware users are the kind of users we enjoy serving. Asking questions is always a good idea. As we’ve stated, System1 is interested in Startpage’s anonymous contextual advertising revenue, not in our data. Mainly because we don’t store any.

Even if they wanted to change our privacy policies, it wouldn’t be possible. Our co-owners and Surfboard Holding BV still have authority in our company. Our infrastructure is all in the European Union, where the strict GDPR legislation applies and the US Cloud law doesn’t.

Maintaining user privacy is our reason for being. We thank you for your curiosity and your vigilance. We hope you continue to ask questions and enjoy Startpage. We are gladly answering all your questions.


Startpage is delighted that users are conscientious about our privacy practices, as they should be. Privacy-aware users are the kind of users we enjoy serving. Asking questions is always a good idea. As we’ve stated, System1 is interested in Startpage’s anonymous contextual advertising revenue, not in our data. Mainly because we don’t store any.

Even if they wanted to change our privacy policies, it wouldn’t be possible. Our co-owners and Surfboard Holding BV still have authority in our company. Our infrastructure is all in the European Union, where the strict GDPR legislation applies and the US Cloud law doesn’t.

Maintaining user privacy is our reason for being. We thank you for your curiosity and your vigilance. We hope you continue to ask questions and enjoy Startpage. We are gladly answering all your questions.


Certainly food for thought. Unless startpage.com has revenue, and they leave it unchanged, I would have to be cynical and say that it's only a matter of time before the advertising shows up.


They can make money without advertising. Selling your search queries correlated to your browser fingerprint, for example.


Indeed, which is probably worse. Advertising could in theory could be done ethically, unpersonalized, untracked.


I believe all advertising to be unethical, so there's that.


I think I am probably not far off feeling the same way, so I am curious what your view is to make that the case?


They have ads, but they arent bad, aside from stuff like searx they are about the best ive used


Shameless plug,.. I run Okeano [1], a privacy friendly [2] search engine that aims to use 80% of profits to purchase river interceptors from the Ocean Cleanup Project and deploy them to the worlds most polluting rivers.

We support domain blocklist [3] natively and have !waves (similar to !bangs).

We're bootstrapped and not owned by an advertising company (startpage.com is owned by System1).

[1] https://okeano.com

[2] https://okeano.com/privacy

[3] https://okeano.com/blocklist


Coincidentally I also see Ecosia[1] on HN front page right now, a search engine that plants trees.

1. https://www.ecosia.org/


that uses Facebook analytics, apparently


Made an account just to say this, search results are good, maybe better than DDG, cant say for sure. Made couple of searches and results are very relevant to what I am searching. And it's faster than DDG, at least for me. I like it.UI is nice too, clean AF. I don't care much for the ocean cleaning part to be honest, yes shame on me. I guess I am cynical since some companies push those altruistic stuff to advertise themselves. Don't mean no ill, it's great what you're doing. Considering using at as my main search engine. Do you proxy exact results from Bing or are the results managed before showing to user?

Just searched 4chan, no direct link to 4chan in first page, only "about" it.


Hi. Thank you! I try to keep the UI clean and simple. It's nice to hear somebody appreciates that.

Regarding results, I wrote an answer to that question here: https://news.ycombinator.com/item?id=25717814

That 4chan results page is embarrassing indeed. Currently many things in the pipeline, but will fix.

If you find any quirks, my email is in my profile.

- David


Thank you to you too! I think folks at r/privacy r/privacytoolsio would be interested in this, could bring some more users.


They have a page with links to privacy policies https://www.privacytools.io/providers/search-engines/


I like this. But without adblock I see no ads. Also would it be possible to have a subscription based no-ad version so we don't see ads + don't feel guilty that we aren't helping out by not clicking any ads? I guess it would be hard to stay private because it would mix an paid account id with search queries, but maybe there's a way.


We aren't running ads.. yet. Need more users before we can make a contract where we aren't required by the ad company to send user data.

Paid plan has been on my mind for a while now.. and as you said, it's complicated. It's in the pipeline.


Would be a nice secondary business idea for some to create an ad company to cater to smaller online platforms like yours without requiring user data. Another complicated prospect but at least it would give you a starting point.

There's also things like https://coil.com/ who seem like they help support online content creators. I wonder if there's a way to treat search results like "content".


> "There's also things like https://coil.com/ who seem like they help support online content creators. I wonder if there's a way to treat search results like "content".

It is possible. I built the search engine [0] that was the first to integrate Coil as a monetization source. It is pretty small, but Coil payments do cover about 2% of the monthly cost to run the service.

Infinity Search also uses Coil. [1]

Here is an article with some thoughts around monetizing a privacy based search engine [2].

---------

[0] https://www.runnaroo.com/

[1] https://webmonetization.org/

[2] https://coil.com/p/runnaroo/Privacy-and-Search-Engine-Moneti...


thanks, this is some good insight!


My site sells and hosts our own ads and I have thought about starting something like this.

It would be interesting to sell space against specific queries for a time duration vs per click or per impression.

This approach doesn’t lend itself as much to optimizing for every individual user action, but instead as the quality of the content and users as a whole.

I would focus on a few niches to start which have lots of ad spend that you could get a piece of. Maybe you could then pour those dollars directly into improving the organic results for those niches.


also, dark mode pls


In the pipeline!


Do you index webpages yourself or piggyback off Bing/Google?


It uses Bing as a backup and for most general search. We have our own index that focuses on specific communities, including HN. Eventually you'll see more tailored search for that index, including a "privacy rank" and page size.


It's probably not the best idea to go against established conventions, but I think it'd be pretty cool if you used tilde instead of exclamation marks for waves. :)


Yes, I think this is a good idea. Might make this optional or as an alternative. Added to the pipeline.


https://okeano.com/reports gives me

> Can't find what you're looking for.


Yes, sorry. Have to fix that. We are not making money yet so no reports to show.


I appreciate the UI work on Okeano, but could you please support a dark mode option?


Thank you. Yes, working on it.


please test your site with nojs.

js is not an option with many devices and useragents.

thank you for doing what you do.


It's only 22 kb of JS and at least 2/3 of that is Axios which I'm about to replace with Fetch.. but I understand :)

In the pipeline!


Private.sh ( https://private.sh ) actually encrypts your search query and washes it thru a proxy prior to delivering it to the search engine entity which decrypts it, performs the search, and encrypts the results before sending it back through the same channel.


Also worth noting is that Private.sh is run by Private Internet Access / Kape Technologies [1] in partnership with Gigablast for its search index. [2]

[1] https://www.voxmarkets.co.uk/articles/kape-technologies-to-a...

[2] https://gigablast.com/blog.html#privatesearch


Private.sh is between Imperial Family Companies [1] and Gigablast!

[1] https://imperialfamily.com/


arent they an adware company?


How is that encryption scheme any better than https?


Regardless of https, ddg or startpage see your IP address and search query and you'll have to trust they don't log it even passively.

In this case, your query is encrypted on the client side, passed through a proxy, decrypted at the engine, search is performed, and then results are encrypted, passed through the proxy, and the client side decrypts and displays the results.

USER Encrypted Search --- Proxy --- Search Engine Decrypts Search, Searches, Encrypts Search --- Proxy --- USER decrypts results and displays.

The search engine does not know your IP, and Private.SH does not know what you searched for.


"Private.SH does not know what you searched for."

but

"your query is encrypted on the client side"

and then

"the client side decrypts and displays the results"

So all this encryption/decryption code, where does it come from?

If the answer is Private.SH, then Private.SH can in fact know what the user searched for and the results they got by feeding the user code that sends that information (or even just the encryption keys) back to Private.SH

Also, I'm not clear on how the search engines are supposed to be able to decrypt something encrypted by the client. What actually happens there?


Most of it's answered here https://private.sh/how-it-works.html

So you're using the search engine's public key to encrypt it, meaning the proxies can't decrypt it. But yes, you have to trust the client-side code, which is an insurmountable problem.

On the plus-side, the code is really short and easy to read. Perhaps a standalone app with reproducible builds could solve this, but that's much more of a pain than simply entering your query straight from the browser.

Edit: I was also going to mention that you can download the chrome/firefox extension by themselves, but the download link has an expired certificate which doesn't instill much confidence.


"you have to trust the client-side code, which is an insurmountable problem"

That depends on what you're trying to achieve, who you're willing to trust, and what you're willing to do.

If your goal is to do searches without having to trust client-side code from a search engine or Private.SH, then you could (assuming they have support for such a workflow) do your own encryption using a tool you do trust, such as gpg, then submit the encrypted query to Private.SH, which would hand it off to the search engine.

The search engine could then decrypt it, perform the query, and re-encrypt it to your public key (which would be contained in the encrypted query they got) and pass it back to Private.SH, which would then pass the encrypted query back to the user.

This way no code from Private.SH nor the search engine has to be trusted.

Of course, this does not help if Private.SH is secretly owned by, compromised by, or has a data-sharing agreement with some entity you don't want your data to be seen by (such as the search engine, hostile agency, data harvesting/reselling organization, etc).

This latter possibility is what I really don't see an easy way to mitigate.

For all we know any/all of these "privacy respecting" services might be owned by Google, Palantir, some other data harvesting corporation, government agency, intelligence service, etc.


Hm, I think your idea of letting the search engine decrypt with your public key would solve the issue of private.sh being the untrusted party. The search engine would be able to send back the encrypted results which private.sh wouldn't be able to see.

If both are untrusted parties in cahoots with each other, then there's no getting around private.sh and gigablast sharing both the IP (courtesy of private.sh) and query (courtesy of gigablast) to each other.


Oh, so the search provider is a separate entity. Interesting, looking forward to seeing their source code.


And because it's not open source, you have to trust them on all that, right?


Which "search engine entity" are they sending the queries to? It doesn't appear to be Google or Bing and the search results seem pretty bad..


are you planning to add nojs support or is that not an option for your tech?


I have had very good experience with Startpage. Unlike DDG, it's a Google proxy, so the search quality tradeoff is much less stark (not non-existent, as there's no personalization...)


I noticed this week / month that ddg must have gotten an upgrade, it’s results are a lot better, to the point of beating Google in my search patterns.


No personalisation is a positive for me.


Random anecdote on the intangible value of "Privacy" for real-world users: I run a news website with the upsell argument of zero ads, tracking, or third-party cookies and have gained no significant increase in conversions from it.


The anonymous view feature is cool. A comparison with DDG would be nice.


As cool as it is, I usually find myself using https://archive.is as a browser.

For a time, I used https://brow.sh but its hosted html browser is not up anymore.


That must be a great experience...

RMS has improved upon that if you are interested in privacy to that extent: https://lwn.net/Articles/262570/


It is, especially for Medium and Substack posts, websites that my DNS resolver or ISP block, and for webpages that refuse to load with uMatrix in its default setting.


uMatrix in default settings pretty much breaks every website.

You definitely need to whitelist cloudflare CDN and other popular CDNs like Amazon S3, things like jquery.com ,and maybe Google for the recaptcha (unless you whitelist google for individual websites)


How do they make money?


They have ads.


Are they keyword-based ads like DDG?


Yes. It's in their privacy policy.

https://www.startpage.com/en/privacy-policy/


When Google had to cave in to copyright and made their image search shit on mobile I switched to startpage who doesn't care and let you download images- anonymously to boot.


Startpage had a strong beginning, eg ixquick. I think I first learned of it through Katherine Albrecht. It's now a pitiful mutant of its origins, which I miss. Options are waning, but I've been using MetaGer[1] with fair results. I wish scroogle was still up.

https://en.m.wikipedia.org/wiki/MetaGer


Re: Scroogle &c. - there are some Searx instances which manage to return Google results, e.g. https://searx.be - and this is what I've generally settled on. (Bing-backed searches, including DDG, don't end up working very well for me.)


In what aspects is it more private than the duck?


The two noteworthy aspects are: 1) if you click on an ad on startpage, you're inside Google's network. If you click on an ad on the duck, you're inside Bing's network. 2) the duck is independent and startpage is owned by an advertising company.


DDG is located in the US and logs partial IP information. Startpage is headquartered (and under legal jurisdiction) of The Netherlands. Also, it's a bit naive to call System1 just an advertising company. It's a bit larger of a company than that. Sadly, an ex-Startpage consultant did a lot work to confuse and manipulate users into not understanding their partnership.


Is a decentralized search engine possible?


There are two that I know of:

YaCy: https://github.com/yacy/yacy_search_server (functional)

Seeks: https://github.com/beniz/seeks (defunct?)

---

There's also SearX, which isn't distributed but is a metasearch engine (pulls results from multiple search engines) that you can self-host [0] or use one of its many mirrors [1].

[0] https://github.com/searx/searx

[1] https://searx.space/




For privacy search engine I prefer DDG. For privacy social network Plumebio.


so they say. Sorry. I use them to search for certain things but don't expect much in protection


ddg has a bang code for startpage.

!sp

takes you to their home page.

!sp privacy

does that search on sp.

!sp duck duck go

does that search on sp.

EDIT: Ahem ...

!ddg

!ddg recursion


ddg also has !s for Starpage


Cool. When I want a bang code, my first (!sp) or second guess is usually there.


Is it FOSS?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: