A caveat: I stopped using StartPage after it sold to an advertisement firm and switched to https://lite.duckduckgo.com/ instead. The sale doesn't necessarily mean StartPage is any less private (because you can sell to an ethical advertising firm, why not) but something to keep in mind.
I like searx quite a bit. I would ise it exclusiveley if there was a well functioning instance available. Unfortunately some features (search for files, search social media) didn't work on the instances I've tried, and there seems to be some issue with setting it as your default search engine on android. For me it works fine for a few searches, then at a certain point searchong for anything from the browser bar just redirects you to the sites homepage and you have to start over there. Local results are a bit lacking but this is essentially by design and adding a zip code or whatever usually helps.
Thanks for the tip on DDG lite, never heard of it. Sounds like it reduces assets from 2mb to 33kb and makes fewer calls to populate the results. Will have to use it for a bit and see if result quality is comparable to standard ddg
Putting aside what happens if one allows Javascript and uses "modern" browsers, Startpage generally does not seem to require any more data from users than DDG. A small shell script can be used to search Startpage or DDG (or almost any other search engine) from the commandline without sending any unecessary data, like unecessary headers, cookies or hidden form variables. The best part is by not using the "modern" browser to send the search, one can easily automate editing the results page before viewing it in a browser, discarding all the cruft. I like to just return the URLs. (I notice that Startpage also (a) supports HTTP/1.1 pipelining, e.g., multiple page requests over a single TCP connection; Google does not and (b) allows bans to be overcome by solving an easily read captcha and this seems to prevent further bans; Google imposes automatic temporary bans that cannot be overcome by solving a captcha.)
The biggest problem I see with the major search engines and these minor search engines that repackage results from the major ones is that they are too often limiting the number of results returned. For example, Google limits to something like 200-300. In the early days of the web, search engines used to brag about how many pages were searched, and they proved their claim by how many results they returned. Today search engines want to localise and limit the results. Not to mention promoting their own websites. I also notice repeated searches where one is collecting the total results not simply the first page yield different results.
Not every query is a question and not every user is interested in an instantaneous "answer" or the most popular website. That type of quick searching certainly has its place but it is not "research" and will not lead users to learn much about what actually exists on the web, or how to think critically about the web's content. Some users may want to search for pages and then evaluate the pages themselves. Exploration and discovery. Those users are treated as "bots" in order to justify what can only be anti-competitive practices. The sad consequence of this "limiting" behaviour is to keep curious users from ever learning what actually exists on the web (versus what a "search engine" decides to promote, or demote).
I have been playing around with Common Crawl data and it seems woefully circumspect in its scope. A web index should be public information but these search engines sure as heck do not treat it as such.
But with or without a shell script, Startpage keeps your privacy. Our web app acts as a proxy between your endpoint and the rest of the web. We couldn't collect user data even if we wanted to. You deserve an explanation, and here's why:
Startpage is delighted that users are conscientious about our privacy practices, as they should be. Privacy-aware users are the kind of users we enjoy serving. Asking questions is always a good idea.
As we’ve stated, System1 is interested in Startpage’s anonymous contextual advertising revenue, not in our data. Mainly because we don’t store any.
Even if they wanted to change our privacy policies, it wouldn’t be possible. Our co-owners and Surfboard Holding BV still have authority in our company. Our infrastructure is all in the European Union, where the strict GDPR legislation applies and the US Cloud law doesn’t.
Maintaining user privacy is our reason for being. We thank you for your curiosity and your vigilance. We hope you continue to ask questions and enjoy Startpage. We are gladly answering all your questions.
Startpage is delighted that users are conscientious about our privacy practices, as they should be. Privacy-aware users are the kind of users we enjoy serving. Asking questions is always a good idea.
As we’ve stated, System1 is interested in Startpage’s anonymous contextual advertising revenue, not in our data. Mainly because we don’t store any.
Even if they wanted to change our privacy policies, it wouldn’t be possible. Our co-owners and Surfboard Holding BV still have authority in our company. Our infrastructure is all in the European Union, where the strict GDPR legislation applies and the US Cloud law doesn’t.
Maintaining user privacy is our reason for being. We thank you for your curiosity and your vigilance. We hope you continue to ask questions and enjoy Startpage. We are gladly answering all your questions.
Certainly food for thought. Unless startpage.com has revenue, and they leave it unchanged, I would have to be cynical and say that it's only a matter of time before the advertising shows up.
Shameless plug,.. I run Okeano [1], a privacy friendly [2] search engine that aims to use 80% of profits to purchase river interceptors from the Ocean Cleanup Project and deploy them to the worlds most polluting rivers.
We support domain blocklist [3] natively and have !waves (similar to !bangs).
We're bootstrapped and not owned by an advertising company (startpage.com is owned by System1).
Made an account just to say this, search results are good, maybe better than DDG, cant say for sure. Made couple of searches and results are very relevant to what I am searching. And it's faster than DDG, at least for me. I like it.UI is nice too, clean AF. I don't care much for the ocean cleaning part to be honest, yes shame on me. I guess I am cynical since some companies push those altruistic stuff to advertise themselves. Don't mean no ill, it's great what you're doing. Considering using at as my main search engine. Do you proxy exact results from Bing or are the results managed before showing to user?
Just searched 4chan, no direct link to 4chan in first page, only "about" it.
I like this. But without adblock I see no ads. Also would it be possible to have a subscription based no-ad version so we don't see ads + don't feel guilty that we aren't helping out by not clicking any ads? I guess it would be hard to stay private because it would mix an paid account id with search queries, but maybe there's a way.
Would be a nice secondary business idea for some to create an ad company to cater to smaller online platforms like yours without requiring user data. Another complicated prospect but at least it would give you a starting point.
There's also things like https://coil.com/ who seem like they help support online content creators. I wonder if there's a way to treat search results like "content".
> "There's also things like https://coil.com/ who seem like they help support online content creators. I wonder if there's a way to treat search results like "content".
It is possible. I built the search engine [0] that was the first to integrate Coil as a monetization source. It is pretty small, but Coil payments do cover about 2% of the monthly cost to run the service.
Infinity Search also uses Coil. [1]
Here is an article with some thoughts around monetizing a privacy based search engine [2].
My site sells and hosts our own ads and I have thought about starting something like this.
It would be interesting to sell space against specific queries for a time duration vs per click or per impression.
This approach doesn’t lend itself as much to optimizing for every individual user action, but instead as the quality of the content and users as a whole.
I would focus on a few niches to start which have lots of ad spend that you could get a piece of. Maybe you could then pour those dollars directly into improving the organic results for those niches.
It uses Bing as a backup and for most general search. We have our own index that focuses on specific communities, including HN. Eventually you'll see more tailored search for that index, including a "privacy rank" and page size.
It's probably not the best idea to go against established conventions, but I think it'd be pretty cool if you used tilde instead of exclamation marks for waves. :)
Private.sh ( https://private.sh ) actually encrypts your search query and washes it thru a proxy prior to delivering it to the search engine entity which decrypts it, performs the search, and encrypts the results before sending it back through the same channel.
Regardless of https, ddg or startpage see your IP address and search query and you'll have to trust they don't log it even passively.
In this case, your query is encrypted on the client side, passed through a proxy, decrypted at the engine, search is performed, and then results are encrypted, passed through the proxy, and the client side decrypts and displays the results.
USER Encrypted Search --- Proxy --- Search Engine Decrypts Search, Searches, Encrypts Search --- Proxy --- USER decrypts results and displays.
The search engine does not know your IP, and Private.SH does not know what you searched for.
"the client side decrypts and displays the results"
So all this encryption/decryption code, where does it come from?
If the answer is Private.SH, then Private.SH can in fact know what the user searched for and the results they got by feeding the user code that sends that information (or even just the encryption keys) back to Private.SH
Also, I'm not clear on how the search engines are supposed to be able to decrypt something encrypted by the client. What actually happens there?
So you're using the search engine's public key to encrypt it, meaning the proxies can't decrypt it. But yes, you have to trust the client-side code, which is an insurmountable problem.
On the plus-side, the code is really short and easy to read. Perhaps a standalone app with reproducible builds could solve this, but that's much more of a pain than simply entering your query straight from the browser.
Edit: I was also going to mention that you can download the chrome/firefox extension by themselves, but the download link has an expired certificate which doesn't instill much confidence.
"you have to trust the client-side code, which is an insurmountable problem"
That depends on what you're trying to achieve, who you're willing to trust, and what you're willing to do.
If your goal is to do searches without having to trust client-side code from a search engine or Private.SH, then you could (assuming they have support for such a workflow) do your own encryption using a tool you do trust, such as gpg, then submit the encrypted query to Private.SH, which would hand it off to the search engine.
The search engine could then decrypt it, perform the query, and re-encrypt it to your public key (which would be contained in the encrypted query they got) and pass it back to Private.SH, which would then pass the encrypted query back to the user.
This way no code from Private.SH nor the search engine has to be trusted.
Of course, this does not help if Private.SH is secretly owned by, compromised by, or has a data-sharing agreement with some entity you don't want your data to be seen by (such as the search engine, hostile agency, data harvesting/reselling organization, etc).
This latter possibility is what I really don't see an easy way to mitigate.
For all we know any/all of these "privacy respecting" services might be owned by Google, Palantir, some other data harvesting corporation, government agency, intelligence service, etc.
Hm, I think your idea of letting the search engine decrypt with your public key would solve the issue of private.sh being the untrusted party. The search engine would be able to send back the encrypted results which private.sh wouldn't be able to see.
If both are untrusted parties in cahoots with each other, then there's no getting around private.sh and gigablast sharing both the IP (courtesy of private.sh) and query (courtesy of gigablast) to each other.
I have had very good experience with Startpage. Unlike DDG, it's a Google proxy, so the search quality tradeoff is much less stark (not non-existent, as there's no personalization...)
Random anecdote on the intangible value of "Privacy" for real-world users: I run a news website with the upsell argument of zero ads, tracking, or third-party cookies and have gained no significant increase in conversions from it.
It is, especially for Medium and Substack posts, websites that my DNS resolver or ISP block, and for webpages that refuse to load with uMatrix in its default setting.
uMatrix in default settings pretty much breaks every website.
You definitely need to whitelist cloudflare CDN and other popular CDNs like Amazon S3, things like jquery.com ,and maybe Google for the recaptcha (unless you whitelist google for individual websites)
When Google had to cave in to copyright and made their image search shit on mobile I switched to startpage who doesn't care and let you download images- anonymously to boot.
Startpage had a strong beginning, eg ixquick. I think I first learned of it through Katherine Albrecht. It's now a pitiful mutant of its origins, which I miss. Options are waning, but I've been using MetaGer[1] with fair results. I wish scroogle was still up.
Re: Scroogle &c. - there are some Searx instances which manage to return Google results, e.g. https://searx.be - and this is what I've generally settled on. (Bing-backed searches, including DDG, don't end up working very well for me.)
The two noteworthy aspects are: 1) if you click on an ad on startpage, you're inside Google's network. If you click on an ad on the duck, you're inside Bing's network. 2) the duck is independent and startpage is owned by an advertising company.
DDG is located in the US and logs partial IP information. Startpage is headquartered (and under legal jurisdiction) of The Netherlands. Also, it's a bit naive to call System1 just an advertising company. It's a bit larger of a company than that. Sadly, an ex-Startpage consultant did a lot work to confuse and manipulate users into not understanding their partnership.
There's also SearX, which isn't distributed but is a metasearch engine (pulls results from multiple search engines) that you can self-host [0] or use one of its many mirrors [1].
Ref: https://www.reddit.com/r/privacy/comments/di5rn3/startpage_i...