SolidJS and dom-expressions are the best things that have happened in the front-end since React, it is influencing the whole ecosystem, from templating to Signals. It will be very, very hard to come up with better ideas, it may not be that popular, but it's leading the way.
Made this table listing the typings used by SolidJS, Voby, Vue, Preact, React, Pota, VSCode-LSP and Chrome, for easily comparing type definitions between frameworks and the browser.
Note: There are a few inaccuracies, like the `<audio>` tag not including typings, that's because extended interfaces aren't resolved, but _most_ of the stuff is there.
In an all honest reply, is that the people that writes these specifications, live disconnected from the reality, they don't use the stuff they specify. That stuff works for very simple things, but then when your forms evolve you realise you will be better off just writing the whole thing yourself.
This. Doing relatively common things like cross-referencing from other fields ("did the user specify a country? If so, we can validate the postcode/zip code they just entered, but if not we'll have to wait until they pick a country") almost immediately require JS to handle, and as soon as you're using JS then it's all just easier to do in code than trying to mess around with validation properties.
Yep. This is great until you need a cross-browser date picker, at which point you need to implement a bunch of stuff yourself. It’s frustrating how primitive HTML forms are, after so many years.
What's hilarious is they do have UI for it in about:config "dom.forms.datetime.timepicker". It makes me so angry that it's not on by default. It works fine!
That you’re correctly using html forms won’t quickly lead to browser improvements.. so the result is that users will hate your forms.
Users/your customer might possibly even think that you’re to blame, and not $browserVendor.
I’ll go one further and say that the customers are absolutely justified to blame the developer instead of the browser. If a developer knowingly chooses a built-in form control whose common implementations are bad for their users, how are they not at fault for the resulting experience?
“This site only uses functionality provided by the HTML spec” is not a useful goal in and of itself. Using the right tool for the job, which might be JavaScript, is always more important.
The `required` attribute, which this article is about, is an HTML5 thing and first appeared in browsers in 2010-2011. So sure, not brand spanking new, but the web was already used to write modern apps. There's no good reason for the validation features to be so shabby.
Even 'required' doesn't work properly. Browsers do very odd and inconsistent things when your required field is hidden when submitting. Like in a basic tabbed or multi-step form.
It tries to fetch a sitemap for in case there's some missing link. But it starts from the root and crawls internal links. There's a new mode added this morning for spa with the option `--spa` that will write the original HTML instead of the generated/rendered one. That way some apps _will_ work better.
It saves the generated/rendered html, but I have just added a `spa` mode, that will save the original HTML without modifications. This makes most simple web app work.
I have also updated the local server for fetching from origin missing resources. For example, a webapp may load some JS modules only when you click buttons or links, when that happens and the requested file is not on the zip, it will fetch it from origin and update the zip. So mostly you can back up an SPA by crawling it first and then using it for a bit for fetching the missing resources/modules.
My main use case is that the docs site https://pota.quack.uy/ , Google cannot index it properly. On here https://www.google.com/search?q=site%3Apota.quack.uy you will see some tiles/descriptions won't match what the content of the page is about. As the full site is rendered client side, via JavaScript, I can just crawl myself and save the html output to actual files. Then, I can serve that content with nginx or any other web server without having to do the expensive thing of SSR via nodejs. Not to mention, that being able to do SSR with modern JavaScript frameworks is not trivial, and requires engineering time.
I’m not quite understanding: you’re saying you deploy your site one way, then crawl it, then redeploy it via the zipfile you created? And why is SSR relevant to the discussion?
Modern websites execute JavaScript that render DOM nodes that are displayed on the browser.
For example if you look at this site on the browser https://pota.quack.uy/ and do `curl https://pota.quack.uy/` do you see any of the text that is rendered in the browser as output of the curl command?
You don't, because curl doesn't execute JavaScript, and that text comes from JavaScript. One way to fix this problem, is by having a Node.js instance running that does SSR, so when your curl command connects to the server, a node instance executes JavaScript that is streamed/served to curl. (node is running a web server)
Another way, without having to execute JavaScript in the server is to crawl yourself, let's say in localhost, (you do not even need to deploy) then upload the result to a web server that could serve the files.
Yes! You know, I was considering this the previous couple of days, was looking around on how to construct a `mhtml` file for serving all the files at the same time. Unrelated to this project, I had the use case of a client wanting to keep an offline version of one of my projects.
> Although UNIX philosophy posits that it's good to have many small files, I like your idea for its contribution to reduceing clutter (imagine running 'tree' in both scenarios) and also avoiding running out of inodes in some file systems (maybe less of a problem nowadays in general, not sure as I haven't generated millions of tiny files recently).
Pretty rare for any website to have many files, as they optimize to have as few files as possible(less network requests, which could be slower than just shipping a big file). I have crawled react docs as a test, and it's a zip file of 147mb with 3.803 files (including external resources).
trying to use this for mirroring a document site. disappointed at 1. it running quite slow, 2. it kept outputing error messages like "ProtocolError: Protocol error (Page.bringToFront): Not attached to an active page". not sure what reason
Big fan of HTTrack! reminds me of the old days and makes me sad of the current state of the web.
I am not sure if HTTTrack progressed from fetching resources, long time since I used it for last time, but what my project does, is spin a real web-browser(chrome in headless mode which means it's hidden) and then it lets the JavaScript on that website execute, which means it will display/generate some fancy HTML that you can then save it as is into an index.html. It saves all kind of files, it doesn't care the extension or mime types of files, it tries to save them all.
> It saves all kind of files, it doesn't care the extension or mime types of files, it tries to save them all.
That’s awesome to know, I will give it a try. One website I remember I tried to download and has all sorts of animations with .riv extension and it didn’t work well with HTTrack, will try it with this soon, thanks for sharing it!
Status codes, I am displaying the list because mostly on a JavaScript driven application you don't want other codes than 200 (besides media).
I thought about robots.txt but as this is a software that you are supposed to run against your own website I didn't consider it worthy. You have a point on speed requirements and prohibited resources (but is not like skipping over them will add any security).
I haven't put much time/effort into an update step. Currently, it resumes if the process exited via checkpoints(it saves current state every 250 URLs, if any is missing then it can continue, else it will be done)
Currently I use only one thread for scraping, I do not require more. It gets the job done. Also I know too little to play more with python "celery" threads.
My project can be used for various things. Depends on needs. Recently I am playing with using it as a 'search engine'. I am scraping the Internet to find cool stuff. Results are in https://github.com/rumca-js/Internet-Places-Database. No all domains are interesting though.
> Status codes, I am displaying the list because mostly on a JavaScript driven application you don't want other codes than 200 (besides media).
What? Why? Regardless of the programming language used to generate content, the standard, well known HTTP status codes should be returned as expected . If your JS served site, gives me a 200 code when it should be a 404, you're wrong.
I think you are misunderstanding, your application is expected to give mostly 200s codes, if you get a 404, then a link is broken or a page misbehaving which is exactly why that page url is displayed on the console with a warning.