Hacker News new | past | comments | ask | show | jobs | submit | gregable's comments login

Very well put together. If you are curious about the weighted version, I tried to explain it some here: https://gregable.com/2007/10/reservoir-sampling.html

There's also a distributed version, easy with a map reduce.

Or the very simple algorithm: generate a random paired for each item in the stream and keep the top N ordered by that random.


Two notes on the weighted version. First, the straightforward implementation of selecting the top N when ranked by POW(RANDOM(), 1.0 / weight) has stability problems when the weights are very large or very small. Second, the resulting sample does not have the same distribution in expectation as the population from which it was drawn. This is especially so when the overall weight is concentrated in a small number of population elements. But such samples are workable approximations in many cases.

I discuss these issues more here: https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....


I worked on AMP and AMP email for a while at Google, but these are just my thoughts. HN always pulled out the pitchforks on this topic, I'm not surprised to see the same again. I disagree with a number of things this article claims:

> Build an AMP site, and you’d get preferential placement in search results ... The implicit stick, though, was that without an AMP page, your site wouldn’t rank as highly as it may have previously. And

There was an AMP news carousel that would appear at the top news results. The web result order however didn't prefer AMP. Depending on how you looked at it, this was preferential or it wasn't. The "wasn't" perspective is that this carousel was much like showing image or video results - it was a different format and there was a result spot reserved for some docs of that format if the query warranted it.

Interestingly, when Google first started rolling out carousels for images or videos in normal results, website owners protested as well as it was competition for visibility. I don't hear that argument as much any more.

Regardless, the AMP carousel has been gone for a while AFAIK.

> “We are here to make the web great again,” said Google’s vice president of news, Richard Gingras in 2015, only months after Donald Trump brought that phrase into the vernacular

Yeah, that aged poorly.

> [AMP] brought back the dynamics of the mobile versus the desktop web, for one. Instead of the same web for everyone, you now had one page on mobile, another page on desktop

That was a website owner choice. AMP pages could be responsive and work just fine on desktop. Many sites did exactly that, though you often never realized they were AMP pages. The goal of the project was always to optimize mobile performance, but it worked well for desktop too. Search provided a mechanism where you could choose to pair an amp and non-amp page, only showing AMP for mobile. I suspect sites did this because non-amp allowed all of the bespoke javascript they wanted on desktop, including things that were kinda terrible for user experience but improved ROI. Super heavy javascript, ads that were difficult to dismiss, all sorts of jank.

> And, more critically, it lessened your control over your site. ... ad tech and other scripts on your site might be incapable of running on your AMP site

AMP is a subset of HTML plus some javascript libraries. The subset thing means you had a limited API. That was the point though, the limited API was restricted to the set of things that could be forced to be performant. That is "control" in some sense, but it wasn't control in the common sense of limiting content or ad networks or whatnot. Virtually every ad network had a library for running on AMP.

> AMP required allowing any AMP CDN to cache your pages.

You can and always could create amp pages that are not served by AMP CDNs. The tradeoff is that search results couldn't preload the page for the user, as there is a hard privacy constraint that the user can't initiate network traffic to the publisher until they indicate intent with a click. So without the CDN, it wasn't quite as fast, but it was still typically pretty fast.

> As Ray Tomlinson, who implemented and sent the first email from ARPANET in 1971 said about adding formatting to email: “That’s too complicated: we just want to send messages to people.”

This is a valid perspective on what email is or should be. I don't feel strongly that it's the only perspective, but it's certainly valid. The argument however is really against HTML email, not AMP email in particular. I think most of the rest of the arguments apply pretty equally to both.

If you look at HTML email in webmail clients, clients all work on the principle of sanitization. Take arbitrary HTML, modify it to remove anything dangerous, and then render the rest. "anything dangerous" requires removing all javascript, most or all CSS, large swaths of the HTML tag space, rewrite all image URLs, etc.

This would result in pretty garbled results except senders have adapted to only send the subset of HTML that won't be garbled. However, it's not easy to do. Take a look at https://templates.mailchimp.com/resources/email-client-css-s... which shows what each email client accepts. It's much much worse than browser incompatibility, though you also have to handle browser differences too.

In a sense, this limited HTML API is similar conceptually to AMP. AMP just was able to add back some of the interactive functionality stripped away. And AMP had the possibility of becoming a open-source standard compatibility API for webmail clients. One that was open source, had maintained validators that could be tested against, etc.

I think it had the chance to really make HTML email better. Of course, if your perspective is that HTML email is fundamentally bad, then that's not really a win.

> You’d need to authenticate your domain with DKIM, DMARC, and SPF—good ideas, regardless. You’d also need to send a sample email to both Google and Yahoo!, and register your domain with each of them. Then, if you were lucky, within 5 days you’d be approved to start sending AMP emails.

I think the plan was always originally to expand this to a general availability format. However, AMP email launched in 2019 and Google largely shifted away from AMP shortly thereafter, so the project never got enough momentum to get to that state, sadly IMHO.


> AMP is a subset of HTML plus some javascript libraries. The subset thing means you had a limited API. That was the point though, the limited API was restricted to the set of things that could be forced to be performant. That is "control" in some sense, but it wasn't control in the common sense of limiting content or ad networks or whatnot. Virtually every ad network had a library for running on AMP.

Javascript libraries that MUST be loaded from one specific Google CDN.

If I load the exact same libraries from my own domain, suddenly it's not "valid" AMP anymore.

It's not a standard if it only works with one specific implementation.


> It's not a standard if it only works with one specific implementation.

IMO, that's sort of what a standard is, but the words is not strictly defined.

I think you are trying to argue that it's not open. The source is on github, and does accept contributions, but effectively Google controls who can commit to it. Depending on your definition of open, that's a valid argument.

You can load those libraries from other locations, but Google search results won't be able to cache it because of the privacy concerns I mentioned in my top level comment. It's not "valid", but the only consequence of the invalidity is no caching, and that consequence is unavoidable given the privacy constraint. It still shows up in search results.

The Google javascript library URL serves with no cookies, is publicly cacheable, and is an identical file to what you can build from source on github.


Except you can't. Every browser on iOS uses Safari's rendering engine. Chrome/Firefox on iOS are effectively reskinned Safari. This is an apple requirement. The rendering engine being the important part here when talking about standards and such.


> effectively reskinned Safari

It's worse than that, even - IIRC the renderer that other browsers have to use is slower and more limited than the one Safari uses.

So other browsers are effectively reskinned hobbled Safari.


A rendering engine is not a browser. Are all the Chrome engine variants really just Chrome in a skin? I don’t think so they all have unique properties that set them apart. As do Orion, Firefox, brave, etc on iOS


I wouldn't consider other chromium browsers reskinned because they're using the chromium engine as a dependency, by choice. They can customize it as much or as little as they want (and I'd imagine they do to various extents).

Browsers on iOS can't - they are required (legally, not technologically) to use (a worse version of) Safari's engine. Chrome for iOS is not the browser that the chrome team wants to distribute, it's a browser Apple made that has been customized to the extent that Apple allows it to be customized. What is that if not reskinned?


Every time this discussion happens a non-trivial number of people reveal they’ve fallen into this trap of believing other browsers are allowed on iOS. Feels like a consumer protection issue, at some level.


The only browser that seems to be able to get around this is Orion. No idea how they are doing it.


Orion is WebKit. Safari’s rendering engine is WebKit.


I tried Orion (m1 MBP) recently. From about 3wks ago til a few days ago. I liked the UI. But there were a lot of pages that didn’t work correctly. I persevered for a while. But gave up a few days ago and went back to Brave.


I know it’s WebKit. But they are somehow allowing extensions, which none of the other iOS browsers has managed afaik.


Likely just emulating/providing the javascript interfaces needed for FF and Chrome extensions to run.


When there is will, there is way!


A large and growing fraction of this lumber is used for wood pellet production to be burned in the EU as "green" energy.

Sure, these trees are technically renewable over decades to centuries but this doesn't matter all that much when we need to rapidly reach net zero in only about 25 years.


It was a massive climate change investment.


The only thing that's going to solve climate change is technology. It's the same technology we've had since the 50's. Not turning off our lights, setting the AC to 80, drinking through paper straws, or clearing thousands of acres for solar panels.


Assuming you're talking about nuclear, why do you think it would be cheaper/faster/easier/etc to decarbonize with nuclear than with solar + battery?


Solar is faster in the short term (because people think it's somehow "clean") but not sustainable in long term. It requires too much land and upkeep for not great output. It would be better to go all in on nuclear and surrounding technologies and solve energy once and for all.


I disagree. I'm willing to reconsider though. Here's what I think.

Solar is immensely cheap to the point where it's often cheaper than arbitrary other surfaces. Their maintenance is very low, much more so than nuclear. And land is not an issue for the US or China, the two places where decarbonizing energy is most important. Both have massive swaths of desert that is uninhabited, and where the addition of shade will likely net benefit the local ecosystem.

I agree that solar panel creation produces a fair amount of pollution, but then, so does nuclear power generation. In both cases this can and should be dealt with safely.


You say that like it’s a good thing.


Did the climate stop changing?


Remind Americans their climate change is better when they are paying at the grocery store.

Edit: Yea. That’s what I thought.


So in the limit, buying POW coins is using that money to consume energy. Selling the coins is taking that money away from energy consumption.


It slips both ways though. I suspect banning incandescent bulbs is probably a good change, even though there were people who obviously wanted them.


I sometimes am not clear on what the goal is.

The article seems to argue that the goal is very narrowly to reduce the amount of plastic bags created/consumed and then claims a study shows that the bans do indeed achieve that goal. It's hard to imagine this goal not being achieved, but it's too narrow.

I haven't seen any study showing that total plastic trash, incorrectly disposed, is reduced. It could be hard to study, I admit. I'd love to know the amount of the reduction as well. My guess would be there is a reduction, but it is fairly small.

For example, in the San Jose survey: https://web.archive.org/web/20230512013405/https://www.sanjo... pre-ban creek and litter surveys only showed 9% single-use plastic bags and this dropped to 2%.

I'd imagine 7% reduction is the upper bound on the impact, but it could be smaller than that if other litter increased. Maybe that's high enough to make the ban worth the inconvenience, I don't know what the right threshold should be.

Broader goals could include reducing total plastic production, reducing fossil fuel mining, etc. I'm more suspicious that these goals are not being meaningfully affected by bag bans.


I think the suggestion is that it's a battery with low long term storage costs, not that it's 100% efficient round trip.


How do the efforts compare? Does methanesat add anything beyond what these already do? Genuine question, im unfamiliar.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: