More

mgliwka · on Sept 4, 2018

https://github.com/intel/hyperscan/ implements this along multiple regex matching and a pcre pattern compatible engine (chimera).

mgliwka · on Aug 21, 2018

Then visit fortune.com, it’s one of those sites.

Const-me · on Aug 21, 2018

Opens OK. Adblock status counts 1-2 adblocked things every second I stay there.

If it’s different on your PC could be ‘coz I’m in Montenegro, it’s Europe but we are not yet in the EU.

mgliwka · on Aug 16, 2018

I have to disagree with the statement, that those techniques are not being used in the wild. I‘ve observed a porn advertising network delivering some js once, which opened a third-party domain served pdf with cookies in the background and then closed the popup immediately again. I was wondering what that was about. Now it’s clear to me.

blattimwind · on Aug 16, 2018

A friend of mine might have noticed something similar (on a news site, of course).

tomvangoethem · on Aug 16, 2018

It would be very useful if you could point us to such examples! (I'm an author of the paper)

blattimwind · on Aug 17, 2018

Pornhub. It could of course be a popup playing a different role (e.g. being part of a "you need to upgrade your vulnerable software naow!1"-scheme) that's only visible if no blockers at all are used.

mgliwka · on Aug 9, 2018

What's the nature of your data? Is it static or dynamic?

asdkhadsj · on Aug 9, 2018

^ important. If you're trying to provide free access, I always seek ways to mitigate cost. If the API can be slowly updating, like once a day or w/e, then I stick everything behind a static CDN or something such that it should cost you next to nothing. Just a small CPU somewhere to build the JSON endpoints per update cycle.

mgliwka · on July 10, 2018

That's been exactly my experience. Most time is spent connecting or waiting for the server response (TTFB). Using an async I/O event loop approach in combination with EPOLL/KQUEUE you can handle thousands of concurrent connections. You then push the response to your worker nodes, which process the data in a multi-threaded fashion. Stream Processing Frameworks like Apache Spark or Storm work great for that.

mgliwka · on July 5, 2018

Crawling may be cheap, but you also want to save that data and make it queryable without waiting minutes for the response to a query. That makes it way more expensive.

mgliwka · on June 11, 2018

GitLab exposes a list of newly created projects over it's API: https://gitlab.com/api/v4/projects?order_by=created_at

And Github allows to search by commit hash: https://github.com/search?q=hash%3A04e699c8bc970423f243eca3e...

By combining those two you could get a list of projects which are on GitLab and on Github. Using the created_at on both APIs you could figure out which one was there first and which one has been imported/pushed onto the other platform.

(you would of course miss all projects which have been already deleted on Github, although forks should still exists which should help in most cases)

mgliwka · on June 11, 2018

Maybe I'm missing the obvious, but would you mind to elaborate how this strategy works?

zapperdapper · on June 11, 2018

The herd mentality is very much in evidence in the stock market. There's often no logic to it. By standing apart from the herd and sometimes running in the opposite direction you can often make a killing. Example : 2007 when the market crashed. A lot of people panicked and sold because the market dropped (because a lot of people were also selling).

mgliwka · on June 7, 2018

In my experience it's mostly the same.

Since the demand exceeds the supply by far, most consultancies take on less experienced candidates, then train them and/or pair them with a more experienced colleague.

I've worked for several smaller consultancies focusing on different domains and technology stacks and always learned it on the job.

a_lifters_life · on June 7, 2018

If you dont mind me asking, how do you think this helped or hurt your career?

mgliwka · on May 28, 2018

tl;dr: TLS inspection is just another tool in your toolbox to control your corporate network traffic. While it might help to avert infections and detect exfiltration traffic, it's by no means required by GDPR.

The reasoning for this is mostly:

Malware can use TLS to load malicious payloads and exfiltrate data + Data loss and data breaches are targeted by GDPR => Decrypting the traffic let's you detect the malicious activity and prevent the infection / notice the exfiltration, which can help you staying GDPR compliant.

IANAL, but as long as it's a black box and the traffic doesn't get stored nor is accessible, the logs don't contain any personal information and the users are in the know about this processing, it should be okay.