Hacker News new | past | comments | ask | show | jobs | submit | more mgliwka's comments login

https://github.com/intel/hyperscan/ implements this along multiple regex matching and a pcre pattern compatible engine (chimera).


Then visit fortune.com, it’s one of those sites.


Opens OK. Adblock status counts 1-2 adblocked things every second I stay there.

If it’s different on your PC could be ‘coz I’m in Montenegro, it’s Europe but we are not yet in the EU.


I have to disagree with the statement, that those techniques are not being used in the wild. I‘ve observed a porn advertising network delivering some js once, which opened a third-party domain served pdf with cookies in the background and then closed the popup immediately again. I was wondering what that was about. Now it’s clear to me.


A friend of mine might have noticed something similar (on a news site, of course).


It would be very useful if you could point us to such examples! (I'm an author of the paper)


Pornhub. It could of course be a popup playing a different role (e.g. being part of a "you need to upgrade your vulnerable software naow!1"-scheme) that's only visible if no blockers at all are used.


What's the nature of your data? Is it static or dynamic?


^ important. If you're trying to provide free access, I always seek ways to mitigate cost. If the API can be slowly updating, like once a day or w/e, then I stick everything behind a static CDN or something such that it should cost you next to nothing. Just a small CPU somewhere to build the JSON endpoints per update cycle.


That's been exactly my experience. Most time is spent connecting or waiting for the server response (TTFB). Using an async I/O event loop approach in combination with EPOLL/KQUEUE you can handle thousands of concurrent connections. You then push the response to your worker nodes, which process the data in a multi-threaded fashion. Stream Processing Frameworks like Apache Spark or Storm work great for that.


Crawling may be cheap, but you also want to save that data and make it queryable without waiting minutes for the response to a query. That makes it way more expensive.


GitLab exposes a list of newly created projects over it's API: https://gitlab.com/api/v4/projects?order_by=created_at

And Github allows to search by commit hash: https://github.com/search?q=hash%3A04e699c8bc970423f243eca3e...

By combining those two you could get a list of projects which are on GitLab and on Github. Using the created_at on both APIs you could figure out which one was there first and which one has been imported/pushed onto the other platform.

(you would of course miss all projects which have been already deleted on Github, although forks should still exists which should help in most cases)


Maybe I'm missing the obvious, but would you mind to elaborate how this strategy works?


The herd mentality is very much in evidence in the stock market. There's often no logic to it. By standing apart from the herd and sometimes running in the opposite direction you can often make a killing. Example : 2007 when the market crashed. A lot of people panicked and sold because the market dropped (because a lot of people were also selling).


In my experience it's mostly the same.

Since the demand exceeds the supply by far, most consultancies take on less experienced candidates, then train them and/or pair them with a more experienced colleague.

I've worked for several smaller consultancies focusing on different domains and technology stacks and always learned it on the job.


If you dont mind me asking, how do you think this helped or hurt your career?


tl;dr: TLS inspection is just another tool in your toolbox to control your corporate network traffic. While it might help to avert infections and detect exfiltration traffic, it's by no means required by GDPR.

The reasoning for this is mostly:

Malware can use TLS to load malicious payloads and exfiltrate data + Data loss and data breaches are targeted by GDPR => Decrypting the traffic let's you detect the malicious activity and prevent the infection / notice the exfiltration, which can help you staying GDPR compliant.

IANAL, but as long as it's a black box and the traffic doesn't get stored nor is accessible, the logs don't contain any personal information and the users are in the know about this processing, it should be okay.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: