Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Price changes alerts for any store (thepriceminer.com)
79 points by adibalcan on Aug 31, 2016 | hide | past | favorite | 60 comments


Provide a link or two as examples so people can quickly try it out. Make it clickable rather than something they have to copy and paste. Maybe have a grid of popular products for this. This allows people to try your service and also gives them ideas of the types of products they might track.

I could see something like this being a really quick way to get alerts when a competitor changes their price. Not sure if there would be legal issues in promoting this though.

In the text here ("Enter your email in the field above for future price notifications"), the field is below, not above. At least, it is in my browser.


Good idea! We will change as soon as possible.


Done


Alternative: https://pricep.in/


Why the currency isn't specified? I'm looking for an item on Amazon France and the price should be dispayed in euros, not in USD.


We get the correct price but we strip currency (for numerical comparison)


Another one I wrote for my friends and I a while back:

https://salesniper.com


That's a great domain name. Did you register it or have to buy it from someone?


we have registered it


This is a really great way to mine data. Seriously as an eCommerce store i'd pay for access to this data.

Let me send you a product feed, and let me know any time a customer requests something from it (on my site or someone elses) I'll send them a deal email.


Please use this form to contact us: https://thewebminer.com/contact


Here's an alternative for heuristic extraction which appears to handle more complex data: https://www.diffbot.com/


Is this market specific? I tried this product [0] and it tells me the product has a price of 32 rather tahn 13,999/-.

[0] http://www.amazon.in/dp/B01DDP7GZK/ref=br_imp?pf_rd_m=A1VBAL...


It's heuristic


These can be shutdown very easily by the sites if they want to.


Why yes they can. But, is it in their best interest to devote much resources to the endeavor?


Could be. If they don't want to compete solely based on price. Or don't believe that price is their strong point.

There are many reasons to compete on price but just as many reasons to not compete on price including service, support, delivery charges, home delivery options, shopping local. Or simply not wanting to support a race-to-the-bottom Walmart world.

Yes, forcing retailers to be part of your system is great for your business model but that doesn't equate to necessarily be good for theirs. And since they may never know that you even exist (if a customer doesn't visit because of your site how will they know?) they may be losing traffic without even knowing why.


Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.


the server is overloaded. We will fix soon


now works :)


Couldn't this work as a browser extension instead?


Alternative for the UK http://pricehare.com


You might want to update your version of Nginx and add some caching I am getting a 504

nginx/1.4.6 (Ubuntu)


Thanks for feedback, we will fix this


Gives a server error when clicking go and no link is entered


Gives internal server error when clicked go without a link


Try now ;)


So, it's like camelcamelcamel, but for more sites?


We have to many visitors from HN


Just curious, what kind of traffic levels are you seeing from HN. Just in case any of us want to do a load test to validate our own sites can handle it.


Top 5 on HN for ~3 hrs peaked around 1,500 views per hour


That doesn't sound like a lot, that's a hit every couple of seconds.


It depends on how complicated your website is, in some cases what technology you're using in combination with the size of your server you're renting, and if it's 1,500 in the first second or 25 a second. Either way, load tests for your site would be good for you.


There's no good excuse for ANY website or platform these days to not be able to handle this amount of traffic with absolute minimal effort. In fact, handling 10x more should be trivial (and cheap).


Well if you're ultra cheap like me, you'll have to think deeply about scaling up your $5 instance lol

There are a lot of factors that go into site performance. Knowing your site can handle what other people have seen is a good start.


Or you can design it to be light and fast from day 1. I had no trouble with the same level of traffic during a ShowHN on a $5 droplet.


You can design it to be light and fast, but you wouldn't know unless you test it right? Good to know that a cheap droplet can at least handle rendering the landing page though.


This is based on an unique technology which detect price on product pages on various stores


What makes it unique? Since this is HN a few more details would be appreciated.


It's an heuristic algorithm which detects price element (as CSS selector) without any specific configuration


I too am curious, price detection is a very well trodden area.

I've seen page structure analysis, manually Xpath/CSS selector writing, machine learning, pattern matching, micro-formats, large central databases, etc.

I understand that no company wants to unveil their secrets but this is Hacker News. If you're going to promote your product here at least give us something tech to munch on.


How it is different than pricep.in ?


Well, the first page[1] I tested it on worked fine, whereas pricep.in seemed to fail. Also, I didn't need to create an account or install an extension.

[1] https://www.lowes.ca/routers/dewalt-dwp611pk-125-hp-variable...


It seems your product page no longer works, so of course it fails ;)

Our tools (apps and browser extensions) have a 'report' button, so whenever someone encounters an issue, we get notified and we will add it to our 'to fix' list :)


I added them both at the same time, and the page is still working now.


Or a half dozen or more other sites that have done this.

I think using heuristics as mentioned in one of their previous answers might be somewhat novel. But yeah... I worked on a product that did auto-detection of which element was the price on a page at least five years ago. I'd love to learn the differentiators.


Please give us feedback


I worked on a product like this for a long while, so I can appreciate how hard the problem is you are trying to solve. Some things I realized along the way:

* Price capture needs to happen in a headless browser (e.g. PhantomJS), rather than just capturing the HTML with a GET. Too many sites use JavaScript to make raw HTML analysis feasible.

* You can get > 50% of the pricing information with fairly simple matching on the class/id value in the HTML tag. But you need a headless browser to make sure the tag is visible. And since most product pages contain multiple prices, you need some heuristic to determine the relevant price. Oh, and watch of out for "reduced from" prices too (e.g. "Old Price: $50, New Price: $35".

* It doesn't hurt to be able to override the general heuristic on a domain-by-domain basis, saved me a lot of headaches.

* You need to be honest with yourself about how reliable the price capture algorithm is, and built up a regression database of known good pages, so when you change the algorithm, nothing else breaks. Also, you need to keep ahead of site redesigns!

* Product URLs tend to look messy, but tend not to change very often, if at all. I was worried about retailers e.g. changing product identifiers, but changing URLs hurts their SEO, so they don't do it. You will find "zombie" products, though - things which appear to be still on sale, but aren't linked anywhere on the site. Deciding when a product is sold out is tricky.

* The best user experience presents the items the user is watching as a "shopping basket". (I took design cues from Pinterest.) For a really slick experience, you should pick out the product name and image (Facebook meta-data helps here) and include them in you "pinned" products.

* Cutting-and-pasting URLs is a hassle. Consider writing a browser extension or a bookmarklet - users don't like to have the browsing flow interrupted by having to click across tabs. Having the price capture done inline on the page really impresses people.

Best of luck with this! I'm yet to see someone solve this problem well, and I eventually moved on other things after losing a lot of my hair. :-)


Really interesting feedback. Thanks!


I just tried this link and it gave me a price of 154:

https://www.amazon.ca/Samsung-Smartphone-32-GB-Unlocked-Inte...


I'd suggest a more descriptive title here. From the title, I thought/hoped it would give me price info for local stores, not just online. One of my constant annoyances is the fact that it's difficult if not impossible to find the lowest price for grocery items without physically visiting the stores you want to survey.


You are right. Thanks for feedback.


Nice and simple idea, pretty well executed with not many bells and whistles. I like it. :)


Thanks!


This is probably illegal in the United States. Expect to receive cease and desist demands.


Illegal on what basis?


The CFAA, which states "exceeding authorized access" to a computer system is both a crime and a tort, and the Copyright Act, which has been interpreted to mean that copies of HTML pages, even if they exist only for microseconds in RAM, are subject to copyright and thus, copyright infringement claims can be brought against anyone who downloaded the page.

It's also breach of contract (which I'm labeling separately from "illegal" to avoid nitpickers, even though it could be included) due to the individual ToS on each site, which almost always include boilerplate forbidding the access of the site by "automated means" in addition to forbidding "commercial" or other non-personal use.

Before you raise the common counterarguments, please know that others have done so before you, and the courts have generally sharply disagreed with them. There is no respect for non-Google data scraping in the judiciary.


I know Google have been treated as if it's a sui generis case, but surely Archive.org and many others are scraping and not being shutdown.

It's weird that the comparison between posting a robots.txt and a "no trespassers" sign hasn't been upheld? Or has it.

IMO the tort of copyright hasn't kept up well with tech changes, but transient cache copies are handled in EU laws IIRC.

Similarly with your mention of the CFAA (UK CMA has some similar terms), they're very loosely drafted. Not accounting for the need to communicate the limits of allowed access is silly though; there's a presumption of allowed access online IMO (that isn't mirrored offline) and going against that presumption should require explicit withdrawal of consent.

If I were drafting the law ...


What about import.io?


First IANAL. But there are people building businesses on this kind of thing even though it's almost universally not allowed. Google is the most prominent -- the judiciary has decided that a different set of rules applies to Google v. smaller companies, so Google gets away with it. That's because they are massive and were probably able to pay bribes to the judges to get them to agree with them.

Usually if you C&D upon receipt of C&Ds, you won't get sued unless you actually damaged the site. I assume companies like import.io and Scrapinghub adhere to those. I know particularly in the case of Scrapinghub, they won't go behind a login to get something in order to try to avoid heat.

Fundamentally, however, anything that makes scraping a key component of its business model is a high-risk business under current US law. People can and have sued scrapers out of existence, often leaving a trail of screwed customers, laid off employees, and dejected founders owing huge liability judgments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: