Provide a link or two as examples so people can quickly try it out. Make it clickable rather than something they have to copy and paste. Maybe have a grid of popular products for this. This allows people to try your service and also gives them ideas of the types of products they might track.
I could see something like this being a really quick way to get alerts when a competitor changes their price. Not sure if there would be legal issues in promoting this though.
In the text here ("Enter your email in the field above for future price notifications"), the field is below, not above. At least, it is in my browser.
This is a really great way to mine data. Seriously as an eCommerce store i'd pay for access to this data.
Let me send you a product feed, and let me know any time a customer requests something from it (on my site or someone elses) I'll send them a deal email.
Could be. If they don't want to compete solely based on price. Or don't believe that price is their strong point.
There are many reasons to compete on price but just as many reasons to not compete on price including service, support, delivery charges, home delivery options, shopping local. Or simply not wanting to support a race-to-the-bottom Walmart world.
Yes, forcing retailers to be part of your system is great for your business model but that doesn't equate to necessarily be good for theirs. And since they may never know that you even exist (if a customer doesn't visit because of your site how will they know?) they may be losing traffic without even knowing why.
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
Just curious, what kind of traffic levels are you seeing from HN. Just in case any of us want to do a load test to validate our own sites can handle it.
It depends on how complicated your website is, in some cases what technology you're using in combination with the size of your server you're renting, and if it's 1,500 in the first second or 25 a second. Either way, load tests for your site would be good for you.
There's no good excuse for ANY website or platform these days to not be able to handle this amount of traffic with absolute minimal effort. In fact, handling 10x more should be trivial (and cheap).
You can design it to be light and fast, but you wouldn't know unless you test it right? Good to know that a cheap droplet can at least handle rendering the landing page though.
I too am curious, price detection is a very well trodden area.
I've seen page structure analysis, manually Xpath/CSS selector writing, machine learning, pattern matching, micro-formats, large central databases, etc.
I understand that no company wants to unveil their secrets but this is Hacker News. If you're going to promote your product here at least give us something tech to munch on.
Well, the first page[1] I tested it on worked fine, whereas pricep.in seemed to fail. Also, I didn't need to create an account or install an extension.
It seems your product page no longer works, so of course it fails ;)
Our tools (apps and browser extensions) have a 'report' button, so whenever someone encounters an issue, we get notified and we will add it to our 'to fix' list :)
Or a half dozen or more other sites that have done this.
I think using heuristics as mentioned in one of their previous answers might be somewhat novel. But yeah... I worked on a product that did auto-detection of which element was the price on a page at least five years ago. I'd love to learn the differentiators.
I worked on a product like this for a long while, so I can appreciate how hard the problem is you are trying to solve. Some things I realized along the way:
* Price capture needs to happen in a headless browser (e.g. PhantomJS), rather than just capturing the HTML with a GET. Too many sites use JavaScript to make raw HTML analysis feasible.
* You can get > 50% of the pricing information with fairly simple matching on the class/id value in the HTML tag. But you need a headless browser to make sure the tag is visible. And since most product pages contain multiple prices, you need some heuristic to determine the relevant price. Oh, and watch of out for "reduced from" prices too (e.g. "Old Price: $50, New Price: $35".
* It doesn't hurt to be able to override the general heuristic on a domain-by-domain basis, saved me a lot of headaches.
* You need to be honest with yourself about how reliable the price capture algorithm is, and built up a regression database of known good pages, so when you change the algorithm, nothing else breaks. Also, you need to keep ahead of site redesigns!
* Product URLs tend to look messy, but tend not to change very often, if at all. I was worried about retailers e.g. changing product identifiers, but changing URLs hurts their SEO, so they don't do it. You will find "zombie" products, though - things which appear to be still on sale, but aren't linked anywhere on the site. Deciding when a product is sold out is tricky.
* The best user experience presents the items the user is watching as a "shopping basket". (I took design cues from Pinterest.) For a really slick experience, you should pick out the product name and image (Facebook meta-data helps here) and include them in you "pinned" products.
* Cutting-and-pasting URLs is a hassle. Consider writing a browser extension or a bookmarklet - users don't like to have the browsing flow interrupted by having to click across tabs. Having the price capture done inline on the page really impresses people.
Best of luck with this! I'm yet to see someone solve this problem well, and I eventually moved on other things after losing a lot of my hair. :-)
I'd suggest a more descriptive title here. From the title, I thought/hoped it would give me price info for local stores, not just online. One of my constant annoyances is the fact that it's difficult if not impossible to find the lowest price for grocery items without physically visiting the stores you want to survey.
The CFAA, which states "exceeding authorized access" to a computer system is both a crime and a tort, and the Copyright Act, which has been interpreted to mean that copies of HTML pages, even if they exist only for microseconds in RAM, are subject to copyright and thus, copyright infringement claims can be brought against anyone who downloaded the page.
It's also breach of contract (which I'm labeling separately from "illegal" to avoid nitpickers, even though it could be included) due to the individual ToS on each site, which almost always include boilerplate forbidding the access of the site by "automated means" in addition to forbidding "commercial" or other non-personal use.
Before you raise the common counterarguments, please know that others have done so before you, and the courts have generally sharply disagreed with them. There is no respect for non-Google data scraping in the judiciary.
I know Google have been treated as if it's a sui generis case, but surely Archive.org and many others are scraping and not being shutdown.
It's weird that the comparison between posting a robots.txt and a "no trespassers" sign hasn't been upheld? Or has it.
IMO the tort of copyright hasn't kept up well with tech changes, but transient cache copies are handled in EU laws IIRC.
Similarly with your mention of the CFAA (UK CMA has some similar terms), they're very loosely drafted. Not accounting for the need to communicate the limits of allowed access is silly though; there's a presumption of allowed access online IMO (that isn't mirrored offline) and going against that presumption should require explicit withdrawal of consent.
First IANAL. But there are people building businesses on this kind of thing even though it's almost universally not allowed. Google is the most prominent -- the judiciary has decided that a different set of rules applies to Google v. smaller companies, so Google gets away with it. That's because they are massive and were probably able to pay bribes to the judges to get them to agree with them.
Usually if you C&D upon receipt of C&Ds, you won't get sued unless you actually damaged the site. I assume companies like import.io and Scrapinghub adhere to those. I know particularly in the case of Scrapinghub, they won't go behind a login to get something in order to try to avoid heat.
Fundamentally, however, anything that makes scraping a key component of its business model is a high-risk business under current US law. People can and have sued scrapers out of existence, often leaving a trail of screwed customers, laid off employees, and dejected founders owing huge liability judgments.
I could see something like this being a really quick way to get alerts when a competitor changes their price. Not sure if there would be legal issues in promoting this though.
In the text here ("Enter your email in the field above for future price notifications"), the field is below, not above. At least, it is in my browser.