Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Plausible – Self-Hosted Google Analytics alternative (plausible.io)
351 points by markosaric on Oct 6, 2020 | hide | past | favorite | 139 comments



Hello HN!

We started developing Plausible early last year, launched our SaaS business and you can now self-host Plausible on your server too! The project is battle-tested running on more than 5,000 sites and we’ve counted 180 million page views in the last three months.

Plausible is a standard Elixir/Phoenix application backed by a PostgreSQL database for general data and a Clickhouse database for stats. On the frontend we use TailwindCSS for styling and React to make the dashboard interactive.

The script is lightweight at 0.7 KB. Cookies are not used and no personal data is collected. There’s no cross-site or cross-device tracking either.

We build everything in the open with a public roadmap so would love to hear your feedback and feature requests. Thank you!


Hey markosaric! Very interesting product and I'm really tempted to try it out. One suggestion to add to your documentation: you should mention drawbacks in your "comparison" pages.

For example, the page where you compare Plausible to Google Analytics lists only advantages and doesn't indicate that Plausible could in any way be worse. But based on the description of how Plausible works, I'm fairly certain that people behind VPNs will not be properly counted (everyone with the same user-agent will be counted as if they were the same person). Use of VPNs is fairly widespread, so lack of fingerprinting will actually prevent you from accurately assessing how many visitors a site has.


Thanks!

Depends really on the perspective you look at it from. That post is from the visitor privacy / privacy regulations perspective. If you use GA full on, you will get some advantages such as storing cookies on devices but if you either make GA "privacy friendly" (enable IP anonymization, disable user-ID, disable cookies etc) or ask visitors to consent to being tracked you will get less accurate numbers than our method.

We haven't noticed any issues with VPN usage as still you need people to use the same IP, same user agent and visit the same domain all on the same day to be counted as one with us.


Love the product! Since you asked, one request that I would find helpful is having the stat comparisons from the previous day based on the same time the previous day rather than the total. The comparisons almost always show some -X% because you are comparing a 24 hour period yesterday to a 24 - X hour period today.


thanks! makes sense and i'd like to see that change too! i've added the request to our github now so we'll see what can be done: https://github.com/plausible/analytics/issues/344


How do you track sessions without cookies? IP address?


We generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.

hash(daily_salt + website_domain + ip_address + user_agent)

This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next.

See full details here: https://plausible.io/data-policy


Doesn’t this mean that if you accidentally serve your website on two domains (e.g. example.com and example.net with no redirect) it will count one visitor twice if they visit both domains? What is the benefit to including the domain in the hash?

Not really keen on the use of the IP address. I’ve been behind load balancing proxies and weird mobile networks often enough to know that I can appear from a dozen different IPs in the space of an hour just by browsing the web normally with default settings.

Have you considered requesting a 24 hour private cacheable resource and counting the requests on the server? Or is the browser cache too unreliable?


More accurate would be that we use the site_id in the hash. You can serve your website from many different domains but as long as the site_id in your script tag is the same, the hash remains the same.

The site id is included in the hash to prevent cross-site tracking. Otherwise the hash would act almost like a third-party cookie and people could be tracked across different sites.

> Not really keen on the use of the IP address.

Yeah, it's not ideal. We do check the X-Forwarded-For header so as long as the proxies are being good citizens, the client IP is present in that header.


> Have you considered requesting a 24 hour private cacheable resource and counting the requests on the server? Or is the browser cache too unreliable?

Nice idea.


Is it possible to track users over a week or month? Often the same user return to our website several times before buying and it's important to learn about this behavior.


no. that's the decision we made in order to be as privacy focused as possible. there's no way to know whether the same person comes back to a site the day after or later. they will always be counted as a new visitor after the first day.


Not sure if privacy is a valid argument here. As long as visitors are annonymised and it is self hosted I don't see how this would invade a person's privacy.

The problem with Google Analytics is not that they track a user on a single domain, the problem is that users are tracked on a *.google.com domain and Google knows who you are based on your other google sessions and knows everything you do on the internet because every website uses GA. With a self hosted product that wouldn't be the case, so the privacy is given by nature even if you'd track a user across a space of a month or longer.


I agree, self-hosting and decentralizing analytics data is the biggest improvement to user privacy that can be done today and has multiple other benefits for both the user and the webmaster.

Now, if only the EU would push more against data decentralization instead of writing cookie policies that result in horrendous user experience on the web...


That is not how Google Analytics works. Its cookie (the Client ID) is a first-party cookie set on the domain of the website hosting it.

Google Analytics optionally integrates with both Google Ads and DoubleClick, and both of those integrations do a cookie-match against .google.com or .doubleclick.net cookies. But those integrations are optional and off by default.


this will limit your solution from many deployments.

Knowing returning visitors from first time visitors is quite important and helps to asssess if viewership, audience and customer base is growing over time.

For startups the "how many unique visitors do you get in a month" may be an important KPI and you're saying your solution cannot answer this question, so another solution will be needed to be deployed.

Unique visitor data's also needed to assess effectiveness of campaigns and run e-commerce operations. There's often campaigns to bring back a user who previously didn't buy (email, ads etc). It's important to measure the effectiveness of these investments separately in web analytics given the campaigns will be different for new and recurring visitors.


Correlation will hold between unique hashes and unique visitors to access increase in unique visitors for your campaign. Even the percentage of these accesses that are returning is likely constant. All you would be giving up is the ability to measure variation in the returning percentage across several days (even so you could probably modify the code to change the salt every x days without losing much of the privacy benefits)

There will always be value to be extracted from the invasion of your users' privacy, but you also hit diminishing returns over this increasingly invasive probing. Plausible is aiming for "good enough" whilst respecting people's privacy, and that is a good compromise IMHO.

There is a trade-off. You will never get 100% of the information without all the tracking, but there is information that represents more bang for the privacy buck.

Would you not have a acceptable error increase in your decisions with a bit less information and a lot less privacy invasion?

EDIT> I think more control to the user is better, so instead of canvas fingerprinting, shady cross-site tracking and all, I would rather have a uuid that my browser informed, but that I controlled, so I could be anonymous when I want to and be tracked when I don't care, or when I genuinely agrees it adds value.


"this will limit your solution from many deployments."

I think tech companies (particulalry Google) have conditioned us to expect analytics [1] to be an essential component of all apps and web services. Developers have happily accepted this, rather than questioned it (unless they happen to be the ones being tracked). But actually, analytics may not need to be as detailed (or as intrusive) as many think it needs to be.

Here is a blog post from Whimsical (an online flow chart tool) who decided to remove Google Analytics:

> "We realized that all our tracking stuff had barely marginal value. We were just accumulating data because it might be useful someday. And because everybody else was doing it. Yet 100% of our product decisions over the past year were based on strategy, qualitative feedback and our own needs." (My emphasis)

From: https://whimsical.com/blog/choosing-privacy

[1] Words like 'Analytics', 'Telemetry', 'Web Beacon' etc are examples of the dishonesty of the tech industry in using words to hide their real purpose and soften their impact. All of these words are about tracking online behaviour, but no-one would dare use the clearer, more honest word – tracking – in their app or web copy.


Apples and oranges. You have taken the example of a SaaS tool who might not need that level of information. But, a newspaper or a magazine absolutely needs deep information in order to sell advertising and sponsorship on their site.


yeah, i understand. we're trying to have a balance between these:

1. privacy of site visitors

2. compliance with privacy regulations

3. useful and actionable data for site owners

it's difficult to track people from visit to visit or from one device to another without breaking the first two (cookies, browser fingerprinting...) so we had to make some decisions.

in general sites that try to get visitor consent to cookies and/or to tracking realise that majority of them don't give it, so even the data that may not be as accurate as full on tracking becomes very valuable.


The problem is that I most likely have to get consent (opt-in) from the user in order to collect that data - at that point I can either trick my visitors into giving it by using various dark pattterns or I end up with a dataset that is pretty much useless since nobody will opt-in.

If those are my options I prefer not having to annoy my visitors with consent banners over extended and persistent analytics data. That's why we actually migrated from Google Analytics to Plausible.

By the way: Import of analytics data from Google would be a great feature if anyone from Plausible is still reading here.


If you have a customer relationship, then they can sign in (and accept your ToS) - and then you can, if they allow, track them that way. Similarly for re-activation campaigns, one can use something like UTM. Don't need to track everyone on the internet with generic stats mechanisms (like Plausible provides) for that.


It's an intentional limitation. If you want that data, go with Google Analytics or some other option.


> hash(daily_salt + website_domain + ip_address + user_agent)

That's a pretty interesting technique!

I built a privacy friendly analytics project https://github.com/sheshbabu/freshlytics and I've been wondering how to correctly count unique visitors to a website. I don't store cookies or any PII data at all so by definition, it's hard to distinguish between two different visits - are they from same person or different people?

An alternative approach is used by Simple Analytics - https://docs.simpleanalytics.com/uniques where they use referrer header to derive unique visits. They mention that they don't use IP addresses as they're considered fingerprinting.

But it looks like a hash function (whose salt gets rotated daily) strikes a good balance between fingerprinting while maintaining user privacy. Any downsides to this approach?


downsides are basically that in order to gain the benefits for the user privacy and compliance with regulations, you lose a bit of accuracy depending on the situation. we cannot see whether the same person returns to a site on a different day so count them as a new unique visitor.

i assume that using the referrer header to count uniques has even more downsides as i imagine the number of unique visitors with that method would be much higher than it actually is.


Doesn't that depend on visitors not sharing an IP address i.e. not behind something that does NAT?


yes, that's a trade-off in this privacy first approach. if several people are on the same ip address, visiting the same website and having the same user agent on the same day, they look the same to us.


That seems pretty robust, but I have question about the salt. How does the daily salt part work? How is it stored during the day? Are historic salt values stored?


The salt is stored in a Postgres database and we run a daily job that generates a new salt and deletes old ones.


It's so great the whole code is available under the MIT license.

After seeing many open source projects with paid services going for more protective licenses, like the AGPL or the newer eventually open licenses (i.e. Business Source License), I wonder what was your rationale to still pick the MIT.


thanks! we're big fans of open source so wanted as permissive licence as possible.

one of the things we spend a lot of time thinking about is how to make our project sustainable while still being as open as possible in everything we do.

a bit more details in this post "How to pay your rent with your open source project" https://plausible.io/blog/open-source-funding


It's all fun and games until someone spins up a competitive hosted service using your MIT licensed codebase.


And ..... they've switched to AGPL.


Q: Since users of Plausible are sending you their visitors IP addresses, doesn't that mean GDPR applies? GDPR mentions "disclosure by transmission" as an example of processing[0].

Since that's the case, doesn't that mean you need a legal basis for processing the data[1], you need a way to allow users to opt out of collection of their data[2], and your clients need to sign data protection agreements[3] with Plausible?

I tried looking for Plausible's point of view on this and how they address the above, but I couldn't find anything.

[0] https://gdpr-info.eu/art-4-gdpr/

[1] https://gdpr-info.eu/art-6-gdpr/

[2] https://gdpr-info.eu/art-21-gdpr/

[3] https://gdpr-info.eu/art-28-gdpr/


Plausible is a gulp of fresh air in the sea of products and services that try to sell one's identity in exchange for a free service. I have started using it for one of my sites, then recently migrated another, and planning to do the same with the rest of my projects.

Moreover, Plausible being an open-source product, it gives anyone a chance to contribute to it and make it even better. As soon as I realised that it was written in Elixir/Phoenix, I just couldn't wait but find ways to help. Although my contributions to the project have been small until now, the guys were really kind and addressed the changes I pointed out almost immediately.

Great work!


thank you!

our Self-Hosted version was possible thanks to several contributors on our GitHub https://github.com/plausible/analytics/issues

One contributor from HN helped us reduce our script down to 0.7 KB from 1.4 KB or so earlier this year too... all contributions are very welcome!


> https://docs.plausible.io/self-hosting

Is there any way to self-host without using Docker?


I've not tried it, but there is a build script for it in AUR: https://aur.archlinux.org/packages/plausible-git/


It's possible but not officially supported. You would have to install all the dependencies on your machine and build the mix release yourself.


I was curious what sort of Javasript wizardy was used to accomplish this. It appears to be lots of deleted code. Still, good for the contributor! Here is the PR https://github.com/plausible/analytics/pull/68/files


Hard to trace because we have a code change and file renames in the same merge. I don't like that. Renames and white-space changes should be their own commits.


I agree with what you’ve said. I’ll even join you in praising plausible, well done!

However I still wonder about the TCO of self hosting.

For big companies TCO is a number on a spreadsheet. For a technical person it includes total amount of minutes per year net above a hosted solution. The concerns I have about this are:

1). I think it’s extremely easy to underestimate this metric. It’s not just the download or set up or configuration (although if those parts go wrong it could turn from minutes into hours)

It’s also the fact that it’s going to have more downtime that you have to fix. You’re going to have to stay current on updates and relevant security issues. You’re going to have to run it somewhere which although light weight takes up some resource. That doesn’t mean the software is not as good. It just means a hosted service has people (supposedly) managing all that crap.

2). Say the TCO is not deceptively low and it really isn’t very much time extra per year. Now let’s decide how much of your time is worth giving up that could be spent on critical path or uniquely value add functionality to your own product or service that could make you more successful? Seriously, what’s the amount of time you’re willing to give up per year on that?

I don’t know the exact right answer to number two. All of this is a trade-off right? Not just the TCO but also the control someone else has over you, and the openness and utility of open source.

But it’s a concern. If for no other reason than it’s not obvious when a new product comes along.

Every minute we have is opportunity cost. The only thing we get to choose is which bucket to put it in.


I mean, they have a hosted service you could just pay for if you're worried about KTLO time


I dont quite understand your concern here, If you value your time why not just use their paid services?

What are your expected PV per month? Out of all the hosted analytics out there, plausible is one of the most affordable solution, with a very low entry level price.


The OP seem to be talking more about the benefits of self hosting, Hence that line of reasoning.

Secondly, I was specifically not critical of plausible in fact I was positive toward them.

The issue in general is that the trade off is subtle and not obvious, and I think a fair number of people think it’s an easy decision one way or another but it’s often not.


Isn't the whole premise "Self-Hosted"? There are lots of organizations that still don't use online services and prefer things to be in house and in their own data center.


There are people who jump off a bridge too not sure what your point is.

Also I was specifically not talking about larger organization level decisions as much as smaller start ups or individuals who have to make the decision.

The premise is even after this many years experience we all have with SaaS versus self hosted it’s still not an obvious decision. It seems like some people can be dismisses of it as an easy decision but the devil is in the details and the specifics of the situation.


It occurs to me that even with a self-hosted solution, project maintainers could benefit from having a broader corpus of sample data from various use cases.

One of the complaints about European armaments in the middle ages was that the guys making the plate also made the arrowheads and crossbow bolts so there was never the same arms race you had in Japan, as a common counterexample.

I think it would be cool to see a tool like this one, along with a data anonymizer, playing cat and mouse trying to scrub data versus inferring PID from it. I think that might entice more white hat security folks to investigate this problem space and it feels like we could use more of that sort of attention on this domain.


Plausible Analytics [1] is an MIT Licensed alternative to Google Analytics. It is hosted at plausible.io but can also be self-hosted. The app server is written in Phoenix/Elixir. The self-hosted version is distributed as a Docker image. It is configured [2] with a PostgreSQL server for user data, a Clickhouse server for analytics data, and an SMTP server for transactional email.

EDIT: according to markosaric, the data policy restricts the granularity of Active Users [3] to daily statistics for privacy reasons so the common Monthly Active Users (MAU) stat is not available.

[1] https://github.com/plausible/analytics

[2] https://docs.plausible.io/self-hosting-configuration

[3] https://en.wikipedia.org/wiki/Active_users


I have lost count of all the Google Analytics alternatives. It seems there is a new one popping up every week. This is not criticism, but I'm wondering why there are so many people developing their own alternative product.

For me an attractive GA alternative has to be:

- 100% self hosted via Docker containers in Kubernetes - Able to configure a connection string to a datastore outside my kubes cluster - One deployment for the data gathering service - Second deployment for an Admin tool/dashboard - Provide at least basic metrics around acquisition, user device and location data

Nice to have:

- Open source so people can scrutinise the code before deploying and feed back bug fixes/feature requests, etc.

Can anyone recommend a GA alternative which ticks these boxes?


Sounds to me like we need one more Google Analytics alternative. But this will surely be the last one, after that we are feature-complete!



:)


> This is not criticism, but I'm wondering why there are so many people developing their own alternative product.

Recent privacy laws introduced requirements which Google Analytics can't meet out of the box - at the very least there's is lots of uncertainty. Website owners like to avoid having to annoy their users with consent banners. Of course it's hard to get a definitive answer but it seems using Google Analytics without a consent banner doesn't comply with local regulation in Germany and/or Europe.



This comes very close to GA alternative, it's nice and fat free. The daily salt reset is a deal breaker. (basically means people visiting the site are considered unique everyday which breaks monthly/weekly unique stats and conversion tracking spanning more than 24 hours)

Our current pipeline for one app has a 14-20 days conversion and we rely on this data to optimize. Also have an marketplace app where conversions periods spans several days depending on the product and promotions. With the daily anonymization algorithm reset, every conversion will seem it took less than 24 hours.

The hunt for a GA alternative continues. BTW does anyone know/recommend a great self hosted alternative to Mixpanel?


thanks! from our experience and testing, most website visitors say no to give tracking consent / storing cookies (up to 90+%) so this was a good balance between making something that's compliant with regulations and that's providing useful data for many use cases.


My guess is giving up conversion tracking is the #1 hurdle for adopting alternate privacy-friendly (aka. no conversion tracking) analytics solutions.

It means that people using a privacy-friendly conversion tracking solution literally cannot track conversions accurately. If you can't track conversions, it also means you can't run advertising campaigns effectively because so much of advertising relies on conversion tracking to know whether ad spend is having an ROI or just burning money.

Then again, perhaps the target users of a privacy friendly analytics solution aren't in the same category of people who are running paid ad campaigns on Adwords, etc.


yeah all the privacy regulations make it difficult to do all the tracking that some businesses are used to. it's clear from testing that if they actually enforced things such as GDPR and asked for consent, majority of visitors will say no which would make their current strategy difficult. Plausible was built with all these regulations in mind so we're more a solution for businesses that want to comply with the regulations but still get some useful data.


Couldn't you generate monthly/weekly salts to track longer-term metrics? Repeat visitors probably would not see this as that much more of an intrusion than daily stats.


Have you heard of Fathom Analytics? Maybe it meets your needs.

https://usefathom.com


Worth checking out Snowplow Analytics as a self-hosted version of Mixpanel. Good guys out of London that have largely bootstrapped it, but the business is pretty big now -- some good customers using it at large scale.


There was an interview on Changelog earlier this year with these guys, it was pretty interesting to hear how they work and got started, etc...

https://changelog.com/podcast/396


thanks! Changelog guys were good to us and helped give us a push earlier this year. they invited us on the show after one of our posts was shared on HN!


Plausible is not a Google Analytics alternative. Yes, it shows you traffic on pages, but it does not support some essential features, like proper events/goals with metadata.

https://github.com/plausible/analytics/issues/134


Plausible was not built to be a clone or a full on feature by feature replacement. It is an alternative that works for many people who think GA is complicated, slow, privacy intrusive and so on.

We do try to add what we believe are 20% of GA features that 80% of GA users need without adding too much complexity.

We already do support event / goal tracking and we have metadata for goals as the next big feature on our public roadmap: https://github.com/plausible/analytics/projects/1


I'm waiting for it to switch. I liked the interface and your general way of doing business, but being unable to track outbound links or conversions was a deal breaker.


Kudos for linking that issue, which makes your criticism quite constructive. The maintainers have responded in the thread, which I think is relevant:

> We do need some more info on the use cases for this in order to come up with the best solution possible so any feedback is appreciated. Here's what I'm getting from previous posts...

Its great when you can contribute to open source by merely explaining your use cases.


This seems to have the same issue as Pirsch[0] in that (i) a mobile user will have multiple IPs during a day as they roam between wifi networks and mobile networks - which will incorrectly over-report unique visitors and (ii) NAT gateways with similar devices behind (a home with 4 iPhones, a workplace with 5000 identically imaged desktops, all similar devices at a starbucks, an entire lab of computers at a school) will incorrectly under-report unique users.

An IPv4 address is an unreliable way to count "a visitor". At best it represents "at least one network (~building) of 1-n visitors". You could argue 0-n visitors (some sort of automation/crawler). It could also represent more than one network (a temporary IP obtained for less than a day, a IP obtained for a day that changes during your definition of a day).

User-agent reduces the scope, but doesn't eliminate (between recommended evergreen updates and identically imaged/uncustomized use of devices).

[0]: https://news.ycombinator.com/item?id=23668212


Thank you for mentioning our lib! :)

I mostly agree with your comment. It's really hard to solve these issues without violating the visitors privacy and saving state on the client device. I think the approach is "good enough" if you want to know how well your website performs or to detect trends, but you will definitely need more invasive methods to gain more insight.


Plausible is the best thing that has happened to us at https://hellonext.co. Such a beautiful product, and has helped us get more customers from the EU as well, since they know that we are not really in the business of helping other companies track them and push ads down their browsers.


Genuine question here... do your users care that much about privacy?

The reason I ask is that I know only two people in my life that care enough about privacy to take steps to protect themselves either online or in the real world, for example I know many people with Alexa's and Google Nests, Android phones (I have one myself) etc.

I'm just curious if people are coming to you BECAUSE of this or if it's just a coincidence.

Edit: spelling and grammar


People may not care 100%, but all other things equal I think most people would prefer the privacy-friendly service rather than the creepy one.

Furthermore regardless of user's opinions, the GDPR forces you to ask for consent before stalking the user, so even if the privacy aspect itself isn't the problem, the user experience is (as in the user needs to click accept before your can track them and most people wouldn't unless you make it obnoxious, but then making it obnoxious also breaches the GDPR).


Many people don't outwardly care about privacy because they've never had it violated (or never knew it happened)

Ask them their credit card number. They care.


Actually, yes. Especially the enterprise customers. Their legal used to give us a tough time to understand what trackers we use, this increased especially after the GDPR came into existence.

I wouldn’t say we are closing ‘BECAUSE’ of this. But it has been a contributing factor to convince the customers to trust us and literally prove a point that we are not these data-hungry people.


that's good to hear! thank you!


It would be nice to see how it compares to other products. Lately I've been noticing quite a few GA alternatives, would be nice to know which one is really worth looking into.


we have a live demo here where you can see exactly what we offer with our own website stats: https://plausible.io/plausible.io

otherwise i've published a comparison with Google Analytics: https://plausible.io/vs-google-analytics

and a comparison with Matomo: https://plausible.io/vs-matomo


Somewhat related, I have a list of privacy friendly Google-Analytics alternatives at https://workflowy.com/s/analytics-software/0yDQ899MfOsE2WAO


Any solution you'd recommend for non-web analytics?


I’ve used Mixpanel and found it had nice APIs. I’ll update the list for the ones that support it.


thanks for including Plausible on that list!


As an outsider of web-backends, is Google Analytics used as a web-service api for doing analytics on your own website? and now with Plausible you can install an analytics engine on your own website/webserver and use it "natively" instead of relying on a big-tech service api?

Is that what's going on? if yes, are big-tech service-api's so popular that indie web-backend engineers use them all over the place? My impression was that most of the backend stuff you do is on the webserver, and part of the work is to survey github open-source projects and install the ones you like (and the ones that are popular) on your webserver.

Wasn't the community shift from php to node.js/npm (or ruby, or python, take your pick) touted as some kind of a mini-revolution, something that'll make your life easier as a backend-dev? Turns out you guys still prefer someone else (in this case, 4-eyes big-tech) does the serious work for you?

I'm really not sure I understand the landscape.


depends on what study you look at, Google Analytics is installed on everywhere from 50% to 85% of all websites on the web. that may not be a big issue if Google wasn't also the largest advertising company on the web.

a blog post that helped "launch" Plausible earlier this year was "Why you should stop using Google Analytics on your website" https://plausible.io/blog/remove-google-analytics


Yeah but I fail to understand when that happened in the past?

Google Analytics appeared in 2005, and as far as I know as a tech person, they never marketed it aggressively.

How did we go from sometime in 2003, when analytics packages (no matter how basic) were downloaded from sourceforge or gnu or whatever and installed on your website ...

to sometime in 2015 when you're working for a small/indie tech company, your website is mostly working and you're thinking of doing analytics on your website and someone suggests to you "you should just use Google Analytics, everyone does it, it's what you do, why would you use something else or write your own" ... ???

Also if Google Analytics is so popular among web-backends, there must be other service-apis that backend-devs use without even a second thought?


I think you're a bit confused by what Google Analytics is. It's not a service API, it's got nothing to do with the back-end. You put a snippet of JS on your page, and then the agency that does your SEO can log in to Google to see a dashboard of site traffic. There's no back-end involvement, and back-end developers never need to know that analytics is occurring at all.

In terms of what happened to get us here? Analytics became a marketing and BI function. Your choice of front-end analytics is going to be made by your CMO, not your CTO. I mean, your CTO can decide whatever he wants for logging & application analytics, but the CMO is also going to want an analytics suite. For a variety of reasons, this favors hosted front-end-centric JS-based solutions rather than logfiles.

The other major thing is that the industry consolidated from 2005-2010. There were like a dozen large vendors, but Omniture bought most of them and then Adobe bought Omniture. By 2015 there are basically two that anyone cares about: Google Analytics, and Adobe Analytics, and GA is free which makes it the default. In both cases, they are part of larger "suites" that package other tools, like campaign targeting and A/B testing.

The other things are used without a second thought are Firebase (and Firebase Analytics is now merging with Google Analytics), and Facebook tracking pixels.


Thanks. Very helpful information!


i guess the whole web moved in that direction over the years. from keep it simple, self-host everything and reduce all dependencies, to use the cloud and third-party services as much as possible. just check any of the mainstream websites and see how many connections to external services they make.

even in the case of Plausible, we have this self-hosted (free as in beer and free as in speech) product and a hosted cloud (free as in speech) product, and vast majority of people prefer to use our hosted cloud version.


Makes sense. Thank you for your comments.


Been using Plausible since I started working on https://btfy.io. I actually copied their algorithm to detect unique visitors for Btfy also. Really great work. Will be open-sourcing mine too soon.


thank you! happy to hear about that :)


At first glance the design of this feels very close to fathom, another privacy-focused analytics tool https://usefathom.com/


https://goaccess.io/ does the same analytics and does not require exposing a dedicated service


server logs are good but unfortunately not very accurate for most site owners thanks to all the bots. with javascript analytics, it's easy to block those logs. i did a study here comparing the numbers between the server and client side analytics: https://plausible.io/blog/server-log-analysis


Your analysis is quite incorrect. Bots and browsers produce fairly different logs in terms of file access pattern, timing, user agent and known source networks.

Filtering users from bots can be done on the server side.

Identifying users that block javascript trackers cannot be done using javascript trackers.


I think server logs are more accurate, it will tell you exactly what's going on, but like you said, bots will show up as well.

On the other hand, a bunch of folks (+600M) block JS analytics/trackers so it will deflate the actual number of visitors, thus less accurate.


I wanted to support and use Plausible, mainly for the drill-down feature, which Fathom doesn't have yet.

But, I had two main concerns about this product from my trial:

1) Slow response time (~300ms) of the script when using a custom domain.

2) I liked its simple design and features, but looks like it's already starting to bloat up with trivial things like UTM tracking.

Some small UI bugs (like time range buttons not working) also need ironing out.

Waiting for Cloudflare Web Analytics to compare.


Anyone else getting an error in Safari when trying to visit the site?

"Safari can't open the page 'https://plausible.io/self-hsted-web-analytics" because Safari can't establish a secure connection to the server 'plausible.io'.


They're listed in your pihole or whatever you use to block trackers most likely.

Same thing here, had to override the block


hmm that's weird! just checked and seems to work all fine. no HN hug of death today :)

do you perhaps run any adblocker or something like that?

the link is: https://plausible.io/self-hosted-web-analytics


It would be nice if projects like this had a page explaining why you would choose them over the 20 other[0] open source analytics options.

[0]: https://github.com/awesome-selfhosted/awesome-selfhosted#ana...


I wish there were an ethical, FOSS analytics with cross-site interest tracking. As a Games as a Service developer, I need to be able to know my audience characteristics (such as age) that help me understand where to further develop the product. It could give the user the choice of what information to share and use diffe


Why not... ask your users about their interests and demographics?

I think the problem is the analytics mindset. That you're entitled to have that information without a user actually providing it to you. Ask people. No, not all of them will answer, and that's okay.


Completely agree.

As a user, if you try to gather information about me without my consent I will do everything in my power to stop you, but if you ask me instead, I will help you out — probably with a smile on my face.


Does anyone know if they are using a stock theme?

I really like the design (and structure) of their website. Or was the design self-made?


Hey, I use TailwindUI (https://tailwindui.com/components) and a lot of it is custom as well.


I think this is tailwind-ui (https://tailwindui.com/)


Why use Plausible rather than Snowplow?


Using Snowplow for simple visitor stats is like hosting your static website in Kubernetes.

It's a different use case and level of complexity.


Notice to the authors: it looks like you've built this using taulwindui components (which are great!). I don't think though that their license allows them to be used in open source projects.


Tailwind UI License allows for components to be used in open source projects, as long as the project's primary purpose isn't to distribute those components :)

Edit: License link for anyone interested https://www.notion.so/Tailwind-UI-License-644418bb34ad4fa29a...


Hm - I remember asking the same question on their Discord channel and at that time (IIRC) the project maintainers told me otherwise. That's good to know!


> Can I use Tailwind UI in open source projects?

> Yep! As long as what you're building is some sort of actual website and not a derivative component library, theme builder, or other product where the primary purpose is clearly to repackage and redistribute our components, it's totally okay for that project to be open source.

https://tailwindui.com/pricing


> Yep! As long as what you're building is some sort of actual website and not a derivative component library

The license itself doesn't seem to have clear relicensing/sublicensing terms, so it's not clear at all how it works for either open source or proprietary products. If it actually does allow downstream open source licensing, then while it prevents the direct licensee from creating a component library, it doesn't prevent a downstream licensee of the open source product from doing so; if it's usage restrictions (which on their own would prevent a project from being open source) continue to apply to downstream users, then it cannot be used on open source projects.


while im not a lawyer, that does sound extremely far fetched.

you're also allowed to publish code with an MIT licence which can only be used with a 100% proprietary binary blob which can only be acquired by paying a third party. the specified licence only applies to the published code after all.

said code might be useless without a licence to use that binary blob, but that has no meaning wrt the published open source code.


Link to free, self-hosted version: https://plausible.io/self-hosted-web-analytics


that's the one that's linked to in this post :)


I would love to hear this compares to Posthog!

I just switched from Google Analytics (by way of firebase) to Posthog, which seems very comparable. I liked Posthog because it has a react native library.


from what i know, Posthog is focused more on app/product analytics while we're focused on web analytics.


Any way to run this on a shared hosting with PHP/MySQL? If that works out somehow I would give this a try over Matomo, which is just "too big" for my use case.


If you want something smaller than Matomo that works on any shared hosting you can check out the platform that I'm building[0], it does have some extra features compared to the lighter Plausible, but you can disable those if you want to keep it simple.

[0]: https://usertrack.net


Is there anyway to compare visits from today to yesterday, or today to last week?

I was thinking that it could be a filter on a segment, but it doesn't look possible when I tried the demo. Otherwise, I really like the look of usertrack.


Thanks!

There is no way to compare two date intervals in the UI yet, but the code is actually there, I only have to add some UI for it.

I was thinking to just add a checkbox to enable "compare with last interval", so whatever segments you have selected, when the checkbox is enabled you would also see graphs of the previous period. Eg. You have chosen "last 2 weeks", it would comapre "last 2 weeks" with "minus 4 weeks, minus 2 weeks".


> I was thinking to just add a checkbox to enable "compare with last interval", so whatever segments you have selected, when the checkbox is enabled you would also see graphs of the previous period. Eg. You have chosen "last 2 weeks", it would comapre "last 2 weeks" with "minus 4 weeks, minus 2 weeks".

Yes, I think that would work really well as it would be simple to understand. You could also have the option to only show the differences or at least highlight the differences as that is what you are probably trying to see.


Agree, one thing that is missing now is the total numbers for the selected date interval. So if you select last 2 weeks, you see the graphs, but it doesn't show the total visitors in those last 2 weeks (unless you go to the main domains list). If I implemented the interval comparison and the display of visitor totals, it could also show +24% (compared to last interval) for stats like visitors, conversions, etc.

I am planning to implement both of those features really soon (this month), but I am currently working on releasing a simpler analytics platform for WordPress, based on userTrack (https://wplytic.com), as the current pricing might be a bit too high for some WordPress users and they might also not need all the advances features of userTrack (heatmaps, session recordings, multi-user, multi-domains, A/B tests, etc).


Main contributor here. As long as you have Docker installed on the host, you're good to go. If you don't have it, you can install as follows: https://docs.docker.com/get-docker/


Shared hosting doesn’t give you docker or root. You have a webroot that usually only supports PHP or static files.


Excited to see support for hash-based routes got added recently! That was the major deterrent for me last time I saw this posted, time to give it another shot I think.


yep, we've added that. instructions are here: https://docs.plausible.io/hash-based-routing/


+1 from a two-months user. Haven't had the time to mess around with Umami so I jumped on it. UI needs some polish, otherwise a solid alternative.


thank you! what would you like improved in the UI?


Nice product but the fact you can't tell new from old customers is a non-starter for most businesses.


Is there info about what pages a user viewed and the browser version? I don't see it on the demo.


we have the "top pages" report which includes all the pages users have viewed. you can click on any page to see the full drill down too.

we only show browsers at this time but there's been a request to add browser versions too. we plan to do it but cannot promise anything in terms of timeline for when it will be done. See https://github.com/plausible/analytics/issues/151


I'm really wondering how can it identify returning visitor without using cookies. Local storage?


there was another question about this that i responded to in this thread. no local storage is used.

We generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.

hash(daily_salt + website_domain + ip_address + user_agent)

This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next.

See full details here: https://plausible.io/data-policy


cool thanks. sorry, still can't see the other question even after you made me aware of its existence.



Fingerprinting. I'm working on a competing product which is also open-source. This is how I do it: https://github.com/emvi/pirsch/blob/master/fingerprint.go

And I think that is what most privacy-focused solutions do at the moment.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: