Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't quite get the point of the DynamicScraper... Any real use cases for that?



For example, go to http://www.imdb.com

On the right, you'll notice that under the sidebar "Opening This Week" is a movie titled "Love Is Strange".

With that in mind, press Ctrl+U (view html source).

Try to search for the word "Strange" anywhere in the source. (It's not there.) If it's not there, how did it get shown on the screen?!

The answer is that it is "dynamically" loaded. A simple scraper that only works on a static download of html source won't be able to retrieve that string. You need web scrapers that can process dynamic pages (execute Javascript).

Btw, you'll notice that you can find the string "Strange" via F12 (Developer Tools). That's because the F12 inspector shows the html after the DOM has been dynamically modified by javascript whereas Ctrl+U does not.


The latter probably runs the script as though you are within the context of a web page (so full Ajax/JS support).

I assume the Simple version might be completely written in Node.js - so parses the HTML content, but no dynamic scripting.

The important thing to note is that in the Dynamic, you can't use closures in your internal functions as it wont get executed within your Node.js context, but will in PhantomJS.

As for use case, I do it for https://myshopdata.com to allow retailers to extract their product information with rich content and variation support (even if loaded by the user interacting with a dropdown on variations). It then allows you publish this in marketplaces, while information in sync by monitoring.


I _think_ the latter interprets javascript while the former only allows you to read the rendered html ?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: