Hacker News new | past | comments | ask | show | jobs | submit login

no. that's the decision we made in order to be as privacy focused as possible. there's no way to know whether the same person comes back to a site the day after or later. they will always be counted as a new visitor after the first day.



Not sure if privacy is a valid argument here. As long as visitors are annonymised and it is self hosted I don't see how this would invade a person's privacy.

The problem with Google Analytics is not that they track a user on a single domain, the problem is that users are tracked on a *.google.com domain and Google knows who you are based on your other google sessions and knows everything you do on the internet because every website uses GA. With a self hosted product that wouldn't be the case, so the privacy is given by nature even if you'd track a user across a space of a month or longer.


I agree, self-hosting and decentralizing analytics data is the biggest improvement to user privacy that can be done today and has multiple other benefits for both the user and the webmaster.

Now, if only the EU would push more against data decentralization instead of writing cookie policies that result in horrendous user experience on the web...


That is not how Google Analytics works. Its cookie (the Client ID) is a first-party cookie set on the domain of the website hosting it.

Google Analytics optionally integrates with both Google Ads and DoubleClick, and both of those integrations do a cookie-match against .google.com or .doubleclick.net cookies. But those integrations are optional and off by default.


this will limit your solution from many deployments.

Knowing returning visitors from first time visitors is quite important and helps to asssess if viewership, audience and customer base is growing over time.

For startups the "how many unique visitors do you get in a month" may be an important KPI and you're saying your solution cannot answer this question, so another solution will be needed to be deployed.

Unique visitor data's also needed to assess effectiveness of campaigns and run e-commerce operations. There's often campaigns to bring back a user who previously didn't buy (email, ads etc). It's important to measure the effectiveness of these investments separately in web analytics given the campaigns will be different for new and recurring visitors.


Correlation will hold between unique hashes and unique visitors to access increase in unique visitors for your campaign. Even the percentage of these accesses that are returning is likely constant. All you would be giving up is the ability to measure variation in the returning percentage across several days (even so you could probably modify the code to change the salt every x days without losing much of the privacy benefits)

There will always be value to be extracted from the invasion of your users' privacy, but you also hit diminishing returns over this increasingly invasive probing. Plausible is aiming for "good enough" whilst respecting people's privacy, and that is a good compromise IMHO.

There is a trade-off. You will never get 100% of the information without all the tracking, but there is information that represents more bang for the privacy buck.

Would you not have a acceptable error increase in your decisions with a bit less information and a lot less privacy invasion?

EDIT> I think more control to the user is better, so instead of canvas fingerprinting, shady cross-site tracking and all, I would rather have a uuid that my browser informed, but that I controlled, so I could be anonymous when I want to and be tracked when I don't care, or when I genuinely agrees it adds value.


"this will limit your solution from many deployments."

I think tech companies (particulalry Google) have conditioned us to expect analytics [1] to be an essential component of all apps and web services. Developers have happily accepted this, rather than questioned it (unless they happen to be the ones being tracked). But actually, analytics may not need to be as detailed (or as intrusive) as many think it needs to be.

Here is a blog post from Whimsical (an online flow chart tool) who decided to remove Google Analytics:

> "We realized that all our tracking stuff had barely marginal value. We were just accumulating data because it might be useful someday. And because everybody else was doing it. Yet 100% of our product decisions over the past year were based on strategy, qualitative feedback and our own needs." (My emphasis)

From: https://whimsical.com/blog/choosing-privacy

[1] Words like 'Analytics', 'Telemetry', 'Web Beacon' etc are examples of the dishonesty of the tech industry in using words to hide their real purpose and soften their impact. All of these words are about tracking online behaviour, but no-one would dare use the clearer, more honest word – tracking – in their app or web copy.


Apples and oranges. You have taken the example of a SaaS tool who might not need that level of information. But, a newspaper or a magazine absolutely needs deep information in order to sell advertising and sponsorship on their site.


yeah, i understand. we're trying to have a balance between these:

1. privacy of site visitors

2. compliance with privacy regulations

3. useful and actionable data for site owners

it's difficult to track people from visit to visit or from one device to another without breaking the first two (cookies, browser fingerprinting...) so we had to make some decisions.

in general sites that try to get visitor consent to cookies and/or to tracking realise that majority of them don't give it, so even the data that may not be as accurate as full on tracking becomes very valuable.


The problem is that I most likely have to get consent (opt-in) from the user in order to collect that data - at that point I can either trick my visitors into giving it by using various dark pattterns or I end up with a dataset that is pretty much useless since nobody will opt-in.

If those are my options I prefer not having to annoy my visitors with consent banners over extended and persistent analytics data. That's why we actually migrated from Google Analytics to Plausible.

By the way: Import of analytics data from Google would be a great feature if anyone from Plausible is still reading here.


If you have a customer relationship, then they can sign in (and accept your ToS) - and then you can, if they allow, track them that way. Similarly for re-activation campaigns, one can use something like UTM. Don't need to track everyone on the internet with generic stats mechanisms (like Plausible provides) for that.


It's an intentional limitation. If you want that data, go with Google Analytics or some other option.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: