More

ezdiy · on April 10, 2018

TLB is only an indirect cause. This is because kernel scheduler preempts processes fairly infrequently (100 or 1000hz, or dynamic, but still capped to a small number).

Scheduling quantums are so large precisely to keep TLB flush overhead of a context switch low. If a network mandates more interaction (say, 100k req/s across all workers), each quantum tick must process a queued bundle of 1000 requests which piled up while asleep. This works as designed - you're supposed to use up all of your quantum, and not terminate it early by issuing blocking IO per request. One prerequisite for this is that your network/disk protocol must be pipelineable (most are because thats how we deal with network/seek latencies).

But at certain point the overhead of this pipelining itself becomes so great (message queues too deep) you have to switch to threading.

Hardcore threading advocates on the other hand, need to account for overhead of atomics (for locking, or for "lockless" algorithms). An atomic must wait for all pending writeback flush. Threading gets a lot of bad rep not because "kernels suck at it", but because person making such a statement wrote their program as an exercise in lock contention and/or too much write cache pollution per single atomic.

Threading vs process tradeoff = deep pipeline overhead vs frequent queue flush+locking overhead tradeoff.

Typically, you need to meet somewhere in the middle for best performance, which is when you end up with threads with job queues - those basically emulate process-induced queues within thread model.

dmitrygr · on April 10, 2018

   > Threading gets a lot of bad rep
   > not because "kernels suck at it",
   > but because person making such a
   > statement wrote their program as
   > an exercise in lock contention

Well put. I'm going to have this printed on a plaque and hung above my desk.

ezdiy · on April 10, 2018

HTTP2 is all about "tcp-inside-tcp" states.

https://github.com/golang/net/blob/master/http2/http2.go#L81

Wherever possible, they do the sensible thing - just goroutines for each flow, piped into from the muxed frame parser via channels, but the state for each flow must be still tracked as per-flow state in the topmost flow dispatcher - goroutine can't tear itself down on timeouts, inspect OOB signalilng of the topmost stream and such.

ezdiy · on Dec 8, 2017

> IPFS feels closest to it as of now.

I'm rather skeptic. IMO, IPFS is just overengineered bittorrent. At the lowest layer, the semantics are the same, except with really bad ideas thrown into the mix - wantlist instead of bitmaps, forcing DAG in places where there's no use for it, all driving overall performance of the network into the ground.

nerdponx · on Dec 9, 2017

AFAIK you can't get sued because you accidentally are hosting a fragment of copyrighted material over IPFS, whereas seeding coyprighted material over Bittorrent is a crime in the USA.

ezdiy · on Dec 10, 2017

What do you mean "accidentaly"? You either have the file pinned, or not. If its not pinned, you're not seeding anything. If it's pinned, you're liable same as with BT.

ezdiy · on Dec 8, 2017

GIT would come closest. Note that while it's not common now (aside from some first canaries, like scuttlebutt), GIT works perfectly fine as a P2P opennet. You are much less trapped on github, because you can just take your repo and the complete history in it anywhere you want - be it central server, or some sort of P2P overlay.

nerdponx · on Dec 9, 2017

Isn't Darcs based on some kind of "theory of patches" too? That was the whole selling point versus Git, that Git takes snapshots and calculates diffs, while Darcs takes diffs and calculates snapshots.

ezdiy · on Dec 8, 2017

>Then when we've all sunk into all these convenient cloud services and "easy to use" disposable devices, we'll have lost all of our privacy and power.

Convenience can be engineered into P2P, too.

With Bitcoin, people definitely get the convenience of unregulated speculative asset they wanted (presumably because real estate is even more obtuse than bitcoin).

With Bittorrent, people definitely get the convenience of having access to obscure content (though netflix is great counterexample to it).

> And yet we'll have people argue that these open source and federated/distributed systems are "too confusing" and "not practical" and that we shouldn't even try to avoid this future.

I think best route would be that of Linux vs Android. Which has already happened to a certain degree with Mastodon for instance - someone "privatizes" the underlying open fabric and puts a nice "convenience trap" on top of it to "attract adoption".

The issue here is mostly that investors in such an endeavor are seeking total control of the userbase, an engineered artificial "inconvenience of switching platform".

Internet behemoths should not be called out on "dangers of getting regulated and handing control to the government" (frankly, anything can happen, not just that), but on keeping their platforms un-interoperable on purpose in a bid to attain a monopoly through networking effects, burdening the entrapped users with "inconvenience of limited frontend choice". They should be called out the same way we called out Microsoft back then, or say, Comcast now.

ezdiy · on April 7, 2017

All pubs on the wiki are indeed overloaded. Interestingly if one sets up their own, the other pubs eventually sync with it, only the desktop client seems to be unhappy with laggy pubs. Is that by design?

FWIW, you can use pub.lua.cz:8008:@xYSW6eVu8gTS/nTSXZiH97dgKZ+wp7NkomR6WKK/PBI=.ed25519~iQ16RuvjKZqy/RhiXXmW9+6wuZNq+SBI8evG3PotxvI= if you have trouble connecting to the ones on github.

Feel free to add it to the wiki, I do plan to run it long term, but I am not a github user.

ezdiy · on April 7, 2017

If you want a "modern web" example, nntpchan - despite the name, its not related to usenet directly, only uses the same method of federated pub/sub replication.

ezdiy · on Jan 17, 2017

The answer is - it's sorta comparing apples and oranges - libraries, or general C++ inlining cruft can inflate the binary size a lot indeed.

A much fairer comparison is to compare source side by side. Chromium source is about 4x larger than operas, when not counting any 3rd party dependencies.

Or even better, compile times. Chromium build (or firefox) is half a day job on mid-range laptop (especially with 4/8G memory).

Opera builds in about 20 minutes. I was also pleasently surprised the codebase is not particularly bitrotten, and both VS2015 and modern gcc could cope with it.

gsnedders · on Jan 17, 2017

Opera used a relatively small subset of C++ for compatibility with some diabolical compilers for embedded devices, so in a sense it's less surprising that modern compilers had no problem: there's very little complex going on.

ezdiy · on Jan 17, 2017

I'm posting this from chropera as we speak, and I used presto for a decade before that.

While your points are spot on, there's one thing about Presto - performance, or more specifically resource usage. Modern layouting engines are simply not careful, because apparently all their devs have 32GB RAM and 8 cores, so users have to as well.

The problem is the argument "people generally don't have 50 tabs open, so why bother". Well, it bothers people who could comfortably do so in presto, it's not just nostalgia goggles.

As for vivaldi, I really tried to like it, putting up with its numerous subtle bugs. Then I deobfuscated its source code one day and realized if it ever becomes stable, it will be the day when a large scale nodejs project managed to do so. For some reason (cheap labor?) they decided Java or Typescript isn't the route.

gsnedders · on Jan 17, 2017

The reasons why Presto coped with less memory are complex and not really down to carefulness (there's plenty of resources put into memory consumption in both Blink and EdgeHTML), but reasons of architectural decisions that led to comparatively poor performance (JS-heavy stuff especially!) and site compatibility bugs (using different types for things that are web observable, especially), and some that led to comparative instability (heck, Chrome esp. has UI lagginess with large numbers of tabs due to IPC, but gains stability and the ability for multiple tabs to be doing stuff at once as a result).

Vivaldi is mostly formed of ex-Opera employees (esp. early, long-term ones!), and I expect they're getting comparable salaries to when they left Opera, at least (i.e., a bit below market average for the cities they're in). Why are they using JS? I can't really tell, beyond a desire to use web-technology as much as possible; I'd totally agree TypeScript or Flow would be better!

ezdiy · on Jan 18, 2017

> using different types for things that are web observable, especially

Can you be more specific?

> has UI lagginess with large numbers of tabs due to IPC

The IPC lock contention is part of the issue (as well as tab per process model), but those can be all worked around (sorta, well, chromium got rid of no-sandbox ifdefs a year ago...).

> Vivaldi is mostly formed of ex-Opera employees

Morten Stenshorn, Dave Rune, Rafal Chlodnicki, Sigbjorn Finne - more than half of names in there! - are the names signed under layout engine and ES engine in the leaked tree. Same people you can find on https://operasoftware.github.io/upstreamtools/

How many people I could track down who worked on opera 12 and now vivaldi? None.

I'm willing to counter-speculate: Whatever made original presto great was not because of their cofounder and CEO, but the people who actually wrote the code.

gsnedders · on Jan 18, 2017

> Can you be more specific?

Sure: often using shorts where other browsers used ints, using floats where others used doubles, or even using ints where others used doubles (back when much of that code was written, Presto was far more concerned about performance on devices without FPUs than other browsers). All of this could easily be observed through layout and scripting, and much of it led to many of the hardest to tackle site compatibility bugs that were ultimately never fully fixed (and most never had any fix shipped). `z-index` was a pain point, as was the use of ints for all percentages in CSS.

> How many people I could track down who worked on opera 12 and now vivaldi? None.

Almost all those who are listed at https://vivaldi.com/team/ worked on either Presto-based Opera Desktop or Presto itself (which were, post Presto being a thing, distinct teams). Heck, Yngve was literally the first person aside from their founders to work on Opera, and owned the much of the network code in Presto till near the end; Petter I think was the only other person to have worked for over 20 years at Opera, having worked on the Desktop browser for the majority (all?) of the time.

If my memory is correct, the names in the source-tree were module owners and module QA, which amounts to a fraction of the number of people involved (esp. historically: those who had moved on to other things or whatever you're unlikely to find by name in the source tree). At any given point there was maybe a quarter of the current team with names anywhere in the source tree… and that's only considering Presto. Desktop had an even smaller ratio, as far as I'm aware.

If you looked in the ES engine, I'm pretty sure my name should be there as the module QA (though I'm not sure this will be true at the point desktop 12.1x forked). So, uh, I might know the people involved and know their employment history… Very few people from the old Core department (that worked on Presto) remain at Opera, maybe 20% at absolute most, and AFAIK mostly left voluntarily.

As for those you listed, half of them joined relatively late on, and not that long before the move to WebKit (announced Feb 2013). And Dave Rune isn't a person, I presume you're potentially mixing Dave Vest and Rune Lillesveen?

ezdiy · on July 4, 2015

Love how the algorithm is simple compared to iDCT based ones, very good job!

> No text at all

Or indeed anything which poorly maps to gradients. For this I'm thinking about instead of storing pixel values per sample vertex, store only dct coefficient(s) per each tri - result of which gets texture-like-mapped to the tri surface. Think JPEG instead of 8x8 quads using variably sized tris.

JPEG artifacts would then be much more fine grained around edges.

EDIT: It would not need to be as complex as full DCT because a lot of information is carried through tri shape/positioning on edges. The idea is to have "library of various gradient shapes" to pick from, not full 8x8 DCT matrix capable of reconstituting 8x8 bitmap approximations.

Once again, thanks for inspiring implementation.