Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Fwaf – Machine Learning Driven Web Application Firewall (fsecurify.com)
79 points by Faizann20 on May 14, 2017 | hide | past | favorite | 18 comments


It seems like what it actually "learned" is no better than banning some keywords, for example:

  >>> p=lambda x:lgs.predict(vectorizer.transform([x]))
  >>> p("/product.php?name=etc")
  array([1])
  >>> p("/login.php?name=rfoo&pass=hehe")
  array([1])
  >>> p("/download.php?file=/root/.bashrc")
  array([0])
  >>> p("/example/test/q=" + lorem + "<script>alert(1)</script>") # len(lorem) = 4488
  array([0])
(FYI 1 means malicious and 0 means clean)


If you have a look at the data, there is a wide variety of malicious commands that it can detect.


Doesn't look like he did any cross-validation, hence the high accuracy. Always keep a hold-out set to test against


In case someone is having difficulty with the link, here is an alternative:

https://web.archive.org/web/20170514081124/http://fsecurify....

Apologies for inconvenience.

Thanks


Like others have said, you might be overfitting your training data here: your model is just memorising the examples you give it and would fail if somebody slightly varies some payload. (inserting some whitespace or something)

Another thing to keep in mind is that an accuracy of 99% doesn't mean much in an unbalanced problem like yours (much more clean queries than malicious once).

What you should show instead is precision (of the ones labeled malicious, how many are actually malicious?) and recall (out of the malicious queries in the dataset, how many did your model label as malicious?)


Try to also do cross-validation (check sklearn's stratified K-Folds) to be more confident you're not just overfitting.


The similarities in name _and_ logo to F-Secure are a little bothersome.


They have a straight F :p



Fun, will look into this! Wonder if anyone can point to other datasets?


The website is working a bit slow but the page is loading. If you are facing any problem, please wait for a minute and the page will load.


All I got is blank page.



Why use a trigram as n ?


I tried different n grams and trigrams were performing best.


Trigrams are a known sweetspot. Its a practical heuristic. Got no proofs to link. Wrote awikipedia semana parser and witnessed similar results in practice.

Going above 3 adds very,very little precision while guzzling space. Using 2 shows a drop in precision.

Probably has todo with how human built systems are built (we make them in our image - and we seem to have a thing for 3s - map(subject, verb, object) -> (origin, data, destinstion) etc)

Addendum; if you know what PCA is, id wager that added n gram dimensions share a linear dependency with lower dimensions - so sharing a statistical resemblence (covar(A,B) -> 0) that adds very little to the data's variability once you start adding dims above 3.


Why the downvotes with out correction?


Woo, saving this.im planning a similar ptoject once im done with my studies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: