More

unlog · 2025-04-23T07:55:34 1745394934

SolidJS and dom-expressions are the best things that have happened in the front-end since React, it is influencing the whole ecosystem, from templating to Signals. It will be very, very hard to come up with better ideas, it may not be that popular, but it's leading the way.

unlog · 2024-11-14T20:08:46 1731614926

> The uploader has not made this video available in your country

We really need a civilization changing event to rethink some stuff.

teddyh · 2024-11-14T20:21:13 1731615673

<magnet:?xt=urn:btih:2650E9B1E36E713DD78BAFC638408A491854B839>

unlog · 2024-11-06T12:08:54 1730894934

Made this table listing the typings used by SolidJS, Voby, Vue, Preact, React, Pota, VSCode-LSP and Chrome, for easily comparing type definitions between frameworks and the browser.

Note: There are a few inaccuracies, like the `<audio>` tag not including typings, that's because extended interfaces aren't resolved, but _most_ of the stuff is there.

unlog · 2024-10-29T00:03:08 1730160188

In an all honest reply, is that the people that writes these specifications, live disconnected from the reality, they don't use the stuff they specify. That stuff works for very simple things, but then when your forms evolve you realise you will be better off just writing the whole thing yourself.

marcus_holmes · 2024-10-29T05:26:46 1730179606

This. Doing relatively common things like cross-referencing from other fields ("did the user specify a country? If so, we can validate the postcode/zip code they just entered, but if not we'll have to wait until they pick a country") almost immediately require JS to handle, and as soon as you're using JS then it's all just easier to do in code than trying to mess around with validation properties.

jhardy54 · 2024-10-29T01:28:19 1730165299

Yep. This is great until you need a cross-browser date picker, at which point you need to implement a bunch of stuff yourself. It’s frustrating how primitive HTML forms are, after so many years.

kmoser · 2024-10-29T04:27:11 1730176031

What do you mean? Most browsers support <input type="date"> natively just fine.

lelanthran · 2024-10-29T05:36:40 1730180200

> What do you mean? Most browsers support <input type="date"> natively just fine.

The `<input type=datetime-local>` on FF has a broken time-picker[1], has always been broken, and there are no plans to ever fix it, ever.

[1] In that it will let you pick a date, but not a time - the time must be manually typed into the input!

francislavoie · 2024-10-29T07:18:03 1730186283

What's hilarious is they do have UI for it in about:config "dom.forms.datetime.timepicker". It makes me so angry that it's not on by default. It works fine!

cuu508 · 2024-10-29T06:22:56 1730182976

On Android, the date picker widget is fiddly to use for selecting distant dates, like date of birth. Not impossible but requires many many taps.

sureIy · 2024-10-29T07:05:14 1730185514

That's an implementation issue, not a specification issue. The specification just suggests the user is shown a date picker.

talkin · 2024-10-29T07:31:38 1730187098

You’re technically right but that doesn’t matter.

That you’re correctly using html forms won’t quickly lead to browser improvements.. so the result is that users will hate your forms. Users/your customer might possibly even think that you’re to blame, and not $browserVendor.

BalinKing · 2024-10-29T13:47:43 1730209663

I’ll go one further and say that the customers are absolutely justified to blame the developer instead of the browser. If a developer knowingly chooses a built-in form control whose common implementations are bad for their users, how are they not at fault for the resulting experience?

“This site only uses functionality provided by the HTML spec” is not a useful goal in and of itself. Using the right tool for the job, which might be JavaScript, is always more important.

alexvitkov · 2024-10-29T08:27:24 1730190444

It doesn't really matter whose fault it is, the end result is that your users have bad experience on your website.

jeroenhd · 2024-10-29T09:14:51 1730193291

I've never had a problem with that myself. I guess people don't know you cab tap the year in the date picker to quickly go back a bunch of years?

nitwit005 · 2024-10-29T07:59:21 1730188761

Most of the forms features are from the early 90s. You're not working in the same millennium as some of the specification writers.

skrebbel · 2024-10-29T08:17:48 1730189868

The `required` attribute, which this article is about, is an HTML5 thing and first appeared in browsers in 2010-2011. So sure, not brand spanking new, but the web was already used to write modern apps. There's no good reason for the validation features to be so shabby.

Sander_Marechal · 2024-10-29T09:11:36 1730193096

Even 'required' doesn't work properly. Browsers do very odd and inconsistent things when your required field is hidden when submitting. Like in a basic tabbed or multi-step form.

unlog · 2024-06-11T14:09:14 1718114954

It tries to fetch a sitemap for in case there's some missing link. But it starts from the root and crawls internal links. There's a new mode added this morning for spa with the option `--spa` that will write the original HTML instead of the generated/rendered one. That way some apps _will_ work better.

unlog · 2024-06-11T11:54:43 1718106883

It saves the generated/rendered html, but I have just added a `spa` mode, that will save the original HTML without modifications. This makes most simple web app work.

I have also updated the local server for fetching from origin missing resources. For example, a webapp may load some JS modules only when you click buttons or links, when that happens and the requested file is not on the zip, it will fetch it from origin and update the zip. So mostly you can back up an SPA by crawling it first and then using it for a bit for fetching the missing resources/modules.

zb3 · 2024-06-11T14:32:58 1718116378

Note I've recently released a tool that can find all chunks of a SPA that works for popular configurations (like webpack or ES modules):

https://github.com/zb3/getfrontend

unlog · 2024-06-12T10:36:51 1718188611

Thanks for sharing!

unlog · 2024-06-10T22:19:04 1718057944

> Why would someone crawl their own website?

My main use case is that the docs site https://pota.quack.uy/ , Google cannot index it properly. On here https://www.google.com/search?q=site%3Apota.quack.uy you will see some tiles/descriptions won't match what the content of the page is about. As the full site is rendered client side, via JavaScript, I can just crawl myself and save the html output to actual files. Then, I can serve that content with nginx or any other web server without having to do the expensive thing of SSR via nodejs. Not to mention, that being able to do SSR with modern JavaScript frameworks is not trivial, and requires engineering time.

ibeckermayer · 2024-06-11T01:10:28 1718068228

I’m not quite understanding: you’re saying you deploy your site one way, then crawl it, then redeploy it via the zipfile you created? And why is SSR relevant to the discussion?

unlog · 2024-06-11T01:47:26 1718070446

Modern websites execute JavaScript that render DOM nodes that are displayed on the browser.

For example if you look at this site on the browser https://pota.quack.uy/ and do `curl https://pota.quack.uy/` do you see any of the text that is rendered in the browser as output of the curl command?

You don't, because curl doesn't execute JavaScript, and that text comes from JavaScript. One way to fix this problem, is by having a Node.js instance running that does SSR, so when your curl command connects to the server, a node instance executes JavaScript that is streamed/served to curl. (node is running a web server)

Another way, without having to execute JavaScript in the server is to crawl yourself, let's say in localhost, (you do not even need to deploy) then upload the result to a web server that could serve the files.

unlog · 2024-06-10T22:00:19 1718056819

Yes! You know, I was considering this the previous couple of days, was looking around on how to construct a `mhtml` file for serving all the files at the same time. Unrelated to this project, I had the use case of a client wanting to keep an offline version of one of my projects.

> Although UNIX philosophy posits that it's good to have many small files, I like your idea for its contribution to reduceing clutter (imagine running 'tree' in both scenarios) and also avoiding running out of inodes in some file systems (maybe less of a problem nowadays in general, not sure as I haven't generated millions of tiny files recently).

Pretty rare for any website to have many files, as they optimize to have as few files as possible(less network requests, which could be slower than just shipping a big file). I have crawled react docs as a test, and it's a zip file of 147mb with 3.803 files (including external resources).

https://docs.solidjs.com/ is 12mb (including external resources) with 646 files

mikeqq2024 · 2024-06-13T06:02:08 1718258528

trying to use this for mirroring a document site. disappointed at 1. it running quite slow, 2. it kept outputing error messages like "ProtocolError: Protocol error (Page.bringToFront): Not attached to an active page". not sure what reason

unlog · 2024-06-13T08:51:21 1718268681

If the URL is public you may post it here or in a GitHub issue, so I can take a look to what's wrong with it.

mikeqq2024 · 2024-06-14T03:18:15 1718335095

not reproduce it, but 'wget -m --page-requisites --convert-links <url>' did a good job for me. never mind

unlog · 2024-06-10T20:59:45 1718053185

Big fan of HTTrack! reminds me of the old days and makes me sad of the current state of the web.

I am not sure if HTTTrack progressed from fetching resources, long time since I used it for last time, but what my project does, is spin a real web-browser(chrome in headless mode which means it's hidden) and then it lets the JavaScript on that website execute, which means it will display/generate some fancy HTML that you can then save it as is into an index.html. It saves all kind of files, it doesn't care the extension or mime types of files, it tries to save them all.

tamimio · 2024-06-10T21:33:06 1718055186

> It saves all kind of files, it doesn't care the extension or mime types of files, it tries to save them all.

That’s awesome to know, I will give it a try. One website I remember I tried to download and has all sorts of animations with .riv extension and it didn’t work well with HTTrack, will try it with this soon, thanks for sharing it!

unlog · 2024-06-11T01:54:47 1718070887

let me know how that goes I am interested!

unlog · 2024-06-10T20:55:17 1718052917

Status codes, I am displaying the list because mostly on a JavaScript driven application you don't want other codes than 200 (besides media).

I thought about robots.txt but as this is a software that you are supposed to run against your own website I didn't consider it worthy. You have a point on speed requirements and prohibited resources (but is not like skipping over them will add any security).

I haven't put much time/effort into an update step. Currently, it resumes if the process exited via checkpoints(it saves current state every 250 URLs, if any is missing then it can continue, else it will be done)

Thanks, btw what's your project!? Share!

renegat0x0 · 2024-06-11T07:15:45 1718090145

I agree with your points.

You might be interested in reddit webscraping thread https://www.reddit.com/r/webscraping/

My passion project is https://github.com/rumca-js/Django-link-archive

Currently I use only one thread for scraping, I do not require more. It gets the job done. Also I know too little to play more with python "celery" threads.

My project can be used for various things. Depends on needs. Recently I am playing with using it as a 'search engine'. I am scraping the Internet to find cool stuff. Results are in https://github.com/rumca-js/Internet-Places-Database. No all domains are interesting though.

PenguinCoder · 2024-06-11T22:53:20 1718146400

> Status codes, I am displaying the list because mostly on a JavaScript driven application you don't want other codes than 200 (besides media).

What? Why? Regardless of the programming language used to generate content, the standard, well known HTTP status codes should be returned as expected . If your JS served site, gives me a 200 code when it should be a 404, you're wrong.

unlog · 2024-06-11T23:09:40 1718147380

I think you are misunderstanding, your application is expected to give mostly 200s codes, if you get a 404, then a link is broken or a page misbehaving which is exactly why that page url is displayed on the console with a warning.