More

wraptile · 2025-08-29T08:22:45 1756455765

time and time again people fall for proprietary lock-in to the point where it's hard to exert any sympathy.

wraptile · 2025-08-29T08:16:29 1756455389

Really enjoyed this and Python is basically what saved my software career.

Education system in Lithuania had turbo pascal in high-school and mostly java and c/c++ in university and while I really loved pascal in high-school the switch to Java was so jarring - "people really enjoy this? maybe I should do something else" is what I thought during my first year of college. Luckily Python started to become really big online and was such a joy to use and be a part of the community it really cleared up this notion that programming sucks. To this day this experience has stuck to me when approaching any new activity - is there a Python here somewhere that would unsuck this?

I'm quite a polyglot these days and will write Java if needed but Python is still my daily driver and it just feels right. If I'm doing something 10 hours a day, I'd like to feel good while doing it and that's exactly what Python delivers.

wraptile · 2025-08-28T09:49:44 1756374584

It's just 2 paragraphs for the lazy:

Participants were guided through pre-recorded audio instructions accompanied with evocative ambient music played through a speaker in the lab to breathe normally for 10 minutes (baseline) then engage in HVB, encouraged by the tempo of the music progressively increasing to the end of HVB. Some examples of the recorded instructions are presented below.

“Mouth wide open, pulling on the inhale, that’s it. No pauses at the top of the inhale, or the bottom of the exhale. Full body breaths. Breathing in to your whole body. Keep breathing. Getting comfortable, finding your rhythm. Keep going. As you’re breathing, it’s now time to let go of any intention you have, of any expectations you have, just focusing on the breath. Keep going. Active inhale, passive exhale. The music is going to keep on rising, so fall into the rhythm and let your breath guide you. Your job is just to keep breathing, pulling on that inhale. Surrendering to the exhale. Keep that breathing circular, that’s it. Keep going. Whatever sensations you’re feeling, let them come, let them rise, enjoy them. Stay focused. Give yourself fully to the breath. It’s your closest friend. It will be with you from the moment of your birth and stay by your side until you die. You can trust it.”

patrickmay · 2025-08-28T13:48:37 1756388917

It's sometimes called "circular breathing." There are a few versions of an active breathing meditation called Quantum Light Breath (which has nothing to do with either quantum mechanics or light). It's definitely worth trying.

vanderZwan · 2025-08-28T19:16:44 1756408604

> which has nothing to do with either quantum mechanics or light

As is tradition with these kinds of things.

kashunstva · 2025-08-29T13:23:59 1756473839

> Keep breathing. Getting comfortable

Is it just me or is this commonly discomfort-inducing? Hyperventilation is so associated with anxiety and panic that I don’t see how anything remotely pleasant can come of this. Assuming one’s acid-base balance and oxygenation are normal, I don’t quite see the point here.

wraptile · 2025-08-27T10:13:41 1756289621

You can actually do both. This way you have full control - invest where you control the market and sabotage where you don't.

wraptile · 2025-08-26T10:56:35 1756205795

Agree but fortunately Python's types are entirely optional!

I'm very familiar with pyright and still I start most of my new projects without types and start sprinkling them in once I have a good base working already. This works so well that every time I pick up a static language I just get turned off by the friction of mandatory types and go back to Python. The only exception is Typescript where I can just Any everything temporarily as well.

wraptile · 2025-08-25T09:36:08 1756114568

> thousounds of requests an hour from bots

That's not much for any modern server so I genuinely don't understand the frustration. I'm pretty certain gitea should be able to handle thousands of read requests per minute (not per hour) without even breaking a sweat.

q3k · 2025-08-25T09:50:26 1756115426

Serving file content/diff requests from gitea/forgejo is quite expensive computationally. And these bots tend to tarpit themselves when they come across eg. a Linux repo mirror.

https://social.hackerspace.pl/@q3k/114358881508370524

rollcat · 2025-08-25T11:05:22 1756119922

I think at this point every self-hosted forge should block diffs from anonymous users.

Also: Anubis and go-away, but also: some people are on old browsers or underpowered computers.

diggan · 2025-08-25T15:53:13 1756137193

> Serving file content/diff requests from gitea/forgejo is quite expensive computationally

One time, sure. But unauthenticated requests would surely be cached, authenticated ones skip the cache (just like HN works :) ), as most internet-facing websites end up using this pattern.

Sesse__ · 2025-08-25T20:05:25 1756152325

There are _lots_ of objects in a large git repository. E.g., I happen to have a fork of VLC lying around. VLC has 70k+ commits (on that version). Each commit has about 10k files. The typical AI crawler wants, for every commit, to download every file (so 700M objects), every tarball (70k+ .tar.gz files), and the blame layer of every file (700M objects, where blame has to look back on average 35k commits). Plus some more.

Saying “just cache this” is not sustainable. And this is only one repository; the only reasonable way to deal with this is some sort of traffic mitigation, you cannot just deal with the traffic as the happy path.

q3k · 2025-08-25T16:48:07 1756140487

You can't feasibly cache large reposotories' diffs/content-at-version without reimplementing a significant part of git - this stuff is extremely high cardinality and you'd just constantly thrash the cache the moment someone does a BFS/DFS through available links (as these bots tend to do).

hinkley · 2025-08-25T22:16:27 1756160187

We were seeing over a million hits per hour from bots and I agree with GP. It’s fucking out of control. And it’s 100x worse at least if you sell vanity URLs, because the good bots cannot tell that they’re sending you 100 simultaneous requests by throttling on one domain and hitting five others instead.

p3rls · 2025-08-25T17:44:16 1756143856

and this is how the entire web was turned into wordpress slop and cryptoscams

wraptile · 2025-08-25T09:29:15 1756114155

I'm unconvinced that secure communications is the bottle neck when it comes to criminal prosecution. We can expand police power without sacrificing our communications like that.

Anecdotally, take a look at China where privacy doesn't exist and yet Chinese syndicates are responsible for a major chunk of the issues you've listed. So clearly lack of privacy doesn't even correlate with decreased criminal behavior.

wildzzz · 2025-08-25T16:20:17 1756138817

Probably because any successful Chinese crime syndicate has the backing of both the government and big business.

wraptile · 2025-08-26T04:28:24 1756182504

Which happens due to totalitarian control of CCP which prohibits self correction mechanisms we have in democratic societies, so what's the Goldilocks area of authoritarianism here? My bet is that compromising all secure communications is all the way in the big bears bed, if we're sticking to the Goldilocks analogy. It's just a fundamental dead-end without fantasy scenarios like benevolent dictatorship which we all know doesn't exist in the real world.

wraptile · 2025-08-21T11:31:37 1755775897

I'm a scraper developer and Anubis would have worked 10 - 20 years ago, but now all broad scrapers run on a real headless browser with full cookie support and costs relatively nothing in compute. I'd be surprised if LLM bots would use anything else given the fact that they have all of this compute and engineers already available.

That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale. It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.

You can actually see this in real life if you google web scraping services and which targets they claim to bypass - all of them bypass generic anti-bots like Cloudflare, Akamai etc. but struggle with custom and rare stuff like Chinese websites or small forums because scraping market is a market like any other and high value problems are solved first. So becoming a low value problem is a very easy way to avoid confrontation.

jandrese · 2025-08-21T15:42:25 1755790945

> That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale.

Isn't this what Microsoft is trying to do with their sliding puzzle piece and choose the closest match type systems?

Also, if you come in on a mobile browser it could ask you to lay your phone flat and then shake it up and down for a second or something similar that would be a challenge for a datacenter bot pretending to be a phone.

DanielHB · 2025-08-21T14:27:24 1755786444

How do you bypass cloudflare? I do some light scrapping for some personal stuff, but I can't figure out how to bypass it. Like do you randomize IPs using several VPNs at the same time?

I usually just sit there on my phone pressing the "I am not a robot box" when it triggers.

wraptile · 2025-08-22T03:22:40 1755832960

It's still pretty hard to bypass it with open source solutions. To bypass CF you need:

- an automated browser that doesn't leak the fact it's being automated

- ability to fake the browser fingerprint (e.g. Linux is heavily penalized)

- residential or mobile proxies (for small scale your home IP is probably good enough)

- deployment environment that isn't leaked to the browser.

- realistic scrape pattern and header configuration (header order, referer, prewalk some pages with cookies etc.)

This is really hard to do at scale but for small personal scripts you can have reasonable results with flavor of the month playwright forks on github like nodriver or dedicated tools like Flaresolver but I'd just find a web scraping api with low entry price and just drop 15$ month and avoid this chase because it can be really time consuming.

If you're really on budget - most of them offer 1,000 credits for free which will get you avg 100 pages a month per service and you can get 10 of them as they all mostly function the same.

DanielHB · 2025-08-25T07:58:04 1756108684

I do it maybe once a month to fetch <1000 URLs. I do it from my home PC with my internet connection. I was just using puppeteer (headless chromium), I will try making it use my own normal browser instance instead of the built-in one.

Thanks for the tips!

hinach4n · 2025-08-21T14:47:17 1755787637

I believe usually you would bypass by using residential ips / proxies?

DanielHB · 2025-08-21T15:47:21 1755791241

I run it through my home network and I'm still triggering it. I add 2s delays between page load and it still triggers

jijijijij · 2025-08-21T20:18:42 1755807522

Well, if that's true... I am so sorry to tell you this, it looks like you are in fact a robot.

1gn15 · 2025-08-22T03:50:28 1755834628

I use Camoufox for the browser and "playwright-captcha" for the CAPTCHA solving action. It's not fully reliable but it works.

Gander5739 · 2025-08-21T20:11:12 1755807072

Flaresolverr can bypass it.

buckle8017 · 2025-08-21T16:16:40 1755793000

Ironically by runnung cloudflare warp.

miki123211 · 2025-08-21T14:54:20 1755788060

This only works if you're a low-value site (which admittedly most sites are).

hahn-kev · 2025-08-21T12:56:22 1755780982

Bot blocking through obscurity

lbhdc · 2025-08-21T13:24:04 1755782644

That's really the only option available here, right? The goal is to keep sites low friction for end users while stopping bots. Requiring an account with some moderation would stop the majority of bots, but it would add a lot of friction for your human users.

brookst · 2025-08-21T15:50:39 1755791439

The other option is proof of work. Make clients use JS to do expensive calculations that aren’t a big deal for single clients, but get expensive at scale. Not ideal, but another tool to potentially use.

tovej · 2025-08-21T14:07:56 1755785276

I like it, make the bot developers play whack-a-mole.

Of course, you're going to have to verify each custom puzzle aren't you.

sam0x17 · 2025-08-21T16:14:51 1755792891

> It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.

These are trivial for an AI agent to solve though, even with very dumb watered down models.

andai · 2025-08-21T13:26:22 1755782782

You can also generate custom solutions at scale with LLMs. So each user could get a different CAPTCHA.

josh-sematic · 2025-08-21T14:12:53 1755785573

At that point you’re probably spending more money blocking the scrapers than you would spend just letting them through.

lbhdc · 2025-08-21T14:24:11 1755786251

That seems like it would make bot blocking saas (like cloudflare or tollbit) more attractive because it could amortize that effort/cost across many clients.

wraptile · 2025-08-21T03:04:55 1755745495

> _starting_ at $799, $999 and $1199.

At 126GB storage, which is basically unusable in 2025 and it's a phone you want to last to 2030. This storage bait needs to be made illegal, it literally costs almost nothing to manufacture and exists purely to punish and trick the consumer.

bmicraft · 2025-08-21T12:40:24 1755780024

If you don't have multiple 10g+ games or hours of video on your phone then 128GB can easily be enough.

wraptile · 2025-08-21T13:36:15 1755783375

That's beside the point but I don't game on my phone (s22 with 128gb) and constantly have to shuffle storage just for photos, videos and music streaming cache. Storage is so cheap it makes no sense to pay 1,000$ for a device and then be a slave to manually managing it. It's insulting.

bmicraft · 2025-08-21T20:22:45 1755807765

Okay yeah I fully agree there. I was thinking of my 128GB 6a I bought for a third of that new.

wraptile · 2025-08-20T06:20:01 1755670801

I haven't written cursive for years and inspired by this article just tried it out and it still works! I never had a pretty hand writting and it's still just as ugly but very much functional.

Generally, I still do hand writing in terms of visualizing software with pen and paper but not in cursive but print letters as glace value is much more important here than information density and speed of cursive.

I find these fears really unfounded tbh. If we really need to hand write I think anyone can learn this skill in couple of days as we still have great hand dexterity, maybe even better than previous generations.