More

geoscripting · on April 22, 2011

For a quick, one-time job, it works very well, and you don't have to bother configuring SSH/FTP for it :). Of course, one could build a lot on top of DRb, I just wanted to show how easy it would be :)

geoscripting · on Jan 15, 2010

That will come as soon as I learn more about Fantom

jamesbritt · on Jan 16, 2010

Looking forward to it.

geoscripting · on Dec 1, 2009

strict is a standard perl module. HTML::TreeBuilder seems to work just as well with malformed HTML.

geoscripting · on Dec 1, 2009

I used Selenium or WebDriver for that. Not bad.

geoscripting · on June 2, 2009

Most of the coders are used to work solely with an IDE, without ever knowing how things work in the background. I know this is something basic for people that know how an IDE works, but most of them don't.

geoscripting · on Feb 25, 2009

You could give HtmlUnit a try.

geoscripting · on Feb 24, 2009

It is recommended that you use the API that google provides for searching, but I fail to see why telnetting would be illegal. After all, both firefox and telnet use sockets to do their job.

radu_floricica · on Feb 24, 2009

Again, the comment above wasn't meant to be taken literally, but just because they both use sockets doesn't mean they're just as legal.

Imagine a web site which terms of service state you cannot use software to circumvent ads. Or where part of the security is done client-side (stupid, yes, but not impossible). Skipping the browser breaches at least the terms of service, and may be constructed as hacking. I think even Google discourages automated searching and prefers you use its api, which (at least some years ago) wasn't free for commercial use. I may be wrong in this particular case, but the important point is you may want to check the specific TOS before skipping the browser.

geoscripting · on Feb 24, 2009

The example wasn't written in Java for performance gain. It so happened that I had NetBeans open , and it was easier for me to write it in Java at the moment :).

sho · on Feb 24, 2009

Easier!? You wrote pages and pages describing the most inefficient way imaginable to do something I can do in 5 lines of Ruby, and you call it easier? And unless I'm very much mistaken, you'd have to compile the code anew whenever the cookie changed?

Well, good luck to you, and the more script kiddies you confuse the better I guess, but there are seriously much better ways to do this. Go look at Ruby Mechanize (I think it's also available for python); coming from Java you will be blown away by just how easy this kind of thing is. How do you think we all test? ; )

Update: Oh I see you know Mechanize from another article. So why not just use that ... you do know it can do all that logging in stuff for you, right?

geoscripting · on Feb 24, 2009

Yes, I have worked with mechanize before. I was using mechanize even when there only was the perl version. I added a comment to the article explaining my choice.

sho · on Feb 24, 2009

Fair enough. I guess the surprised reaction you're getting is because web testing frequently involves doing this kind of thing, so, being a community of web programmers, everyone here knows it backwards. I didn't really think of the angle you mentioned where someone wouldn't know all the relevant techniques and just want to get something working ASAP. For that, taking the cookie from FF might indeed be a time saver.

Anyway always good to see everyone chime in with their opinion so thanks for the conversation starter.

BTW, is anyone else nervous about the day the teenage h4xx0rs discover how easy this kind of thing is these days ..

geoscripting · on Feb 24, 2009

There's also mechanize for python. It can be found here : http://wwwsearch.sourceforge.net/mechanize/ . It handles cookies by default, and is a pretty good tool.

geoscripting · on Feb 24, 2009

More and more websites are adding Turing tests after relatively short intervals of time. I consider that to be a good thing, and there are some good ways to stop/identify spiders. You can check out this site: http://stackoverflow.com/questions/450835/how-do-you-stop-sc... . It shows some pretty good techniques that one can use to defend from spiders.