Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A tool for recovering passwords from pixelized screenshots (github.com/beurtschipper)
449 points by maydemir on Dec 6, 2020 | hide | past | favorite | 124 comments


Why is pixelation preferable to a big black obliterating box?

To me, it seems like a lot of wasted time and effort and potential arms race between encoders and decoders and the risk of being exposed when you could just put a big black box over whatever you wish to obscure.


>Why is pixelation preferable to a big black obliterating box?

From a security perspective it has zero advantages, but I guess some people like the aesthetics?

>To me, it seems like a lot of wasted time and effort and potential arms race between encoders and decoders and the risk of being exposed when you could just put a big black box over whatever you wish to obscure.

There doesn't need to be an arms race, one can just erase whatever needed to be hid, put fake random text over it and then pixelate that. Ie., a prettier "black box" but still the same core method as a black box. In principle doesn't even need to be any more work, it'd be easy enough to throw together a simple "pixelate erase" script that'd take a selection, apply average solid color and random text over it, then pixelate the result all in one automatated step.

The problem is if some people think any modification of the original information is ever good enough. Anything based on the original could leak information, so need to erase and then apply any desired aesthetics afterward.


Yep, just replace the original password for "password" and pixelate that, if it really must look like a real password. And make sure to never use "password" as a real password, but that's a different level of shooting oneself in the foot.


Or better yet, embed easter-eggs. Possibly insults to the people prying into your secrets, or possibly alternative announcements.


Or better yet, put "Itriedtodepixelizeyourpassword".


"Depix Error: Please report on GitHub"


"All Depix users are reported to the FBI."


"x is not in the depixers file. This incident will be reported."

https://xkcd.com/838/


Be sure to drink your ovaltine


> replace the original password for "password" and pixelate that, if it really must look like a real password

Or replace the original password for "password" and leave it at that.


Maybe replace it with a black strip and pixelate that


Why pixelate something that isn't real password?


To make it look like it was a password. Only people who depixelate it will notice what the actual string is.

I think this whole pixelation thing is just for making it look nice. I, personally, would just use something like asterisks without any pixelation.


> take a selection, apply average solid color and random text over it, then pixelate the result

Averaging the color leaks information. To avoid leaking info, destroy the pixels as the first step. Get colors from outside the selection.


What useful information could solid color and random text leak even if depixelized? I assume the average solid color is just so the new pixelized region doesn't stand out against the rest of the non-pixelized ones, being an average color.

Applying it to an HN comment the result would be a solid average of #F6F6EF (beige background) and #290027 (text), so a solid darker beige with some random black text over, all pixelized. How can any of this be used to recover the original text?


It could speed up bruteforce attacks. Imagine a targeted attack having both an "average color" of the rendered password, produced by known background and foreground colors and a known typeface and also having a strong cryptographic hash of the password, where the hash takes for example ~0.1s to calculate.

The attacker then before trying to hash a candidate password they can first calculate the average color of it to check if it even remotely matches, which can be much faster than the hash function.

Average color could be a rough predictor of password length too, depending on circumstances.


Ok, maybe I had a different interpretation of the scenario OP presented. I read the "solid block of average color" as simply an average of 2 colors, not a weighted average. So black text on white background would always result in rgb(128,128,128) (#808080) regardless of how many pixels of each color you had in the block. The only things that leaks are the color of the background and of the text but these are already known from the surrounding parts.

This makes sense for purely aesthetic reasons because the pixelized block will not stand out against the combination of background + text around it. A white page with black text would have a gray fuzzy area where it's pixelized.

If a weighted average is used and you can determine the fill factor of the text inside the box that would be some information leakage, as small as it may be.


In principle it leaks information, so is always worse practice than not leaking. For example, one could imagine an edge-case where given a set of average-colour password images collected over time, for a password field that has a dynamic in its presentation that is exposed in the averaging, useful information could be discerned and used to supplement other attack vectors.

Say for example, the password field had one of those password strength colour indicators that subtly changed the background or font colour to indicate password strength, and the images were captured and obfuscated at that point. Information useful to an attacker would now be embedded in the colour average.


Because pixelation doesn't attract attention as much as black box and is therefore a better placeholder. A black box with sharp edges has much more contrast than all the legit elements in the image and becomes visually dominant whereas typically you want to attract attention to something else on the page.

For a similar reason, designers use "lorem ipsum" placeholder text rather than all-white or all-black placeholders when mocking up a layout


The clear answer is to use pixelated lorem ipsum for obfuscation. :)


Boom, startup idea right there. Provide secure-pixelize as a service, and monetize by selling on the side whatever info was sensitive enough for people to want to hide


White box then or a bunch of


This reminds of the Markup tool[1] on iPhone. People use it to redact from screenshots, but it was (is?) actually slightly transparent by default.

[1] https://9to5mac.com/2018/03/13/ios-markup-reveal-redact-sens...


one of the "underhanded c" contest winners many years ago was an entry for a redaction tool, which drew black boxes on an image in such a way that the software developer could later effectively recover the redacted information easily, while still being able to claim plausible deniability. The trick was that it worked by producing a ppm file where pixels with single digits would become zeros, but double digits would become 00, and three digit ints would become 000 (all valid values for black in ppm).


Reminds me of an old Perl module which translated code into a trinary alphabet with the characters space, tab, and new line. The resulting coded file is just an import statement and whitespace, but runs just fine.



Wait, shit I have totally done this. God damn it. Fortunately nothing super sensitive but damn.


Interesting. I recently hacked together a tool shows the area on the screen I point to in maximum threshold, to make things like this visible. I have it mapped to a hotkey and I fire it up every time I see a redacted area.

https://github.com/guidovranken/threshold-zoom



Even big black boxes require some amount of skill.

https://twitter.com/Phthalaldehyde/status/133471474074092748...


Other fun things I've seen (e.g. when double-blind reviewing):

- Someone makes the background color black in Word and saves as PDF. Text is still selectable - Someone draws a black rectangle over the text in Word, text is still searchable / rectangle can be removed with any simple PDF editor


Can't facepalm enough. Just, amazing incompetence.


People say the same thing about scans of printouts. But I guess there is a reason for those.


Those at least destroy any non-visible data.


I feel like contrast enhancement or similar would be able to bring that back.


That’s also why they scan in b&w with a potato. Also has the side-effect of making many graphs/charts impossible to interpret.


> "with a potato"

This is either a typo or I want to know more about it.


It's an Internet running Gag, if the quality of something produced by a technical product is bad, you might say: "Was this filmed/calculated/etc with a potato?"

https://www.google.com/amp/s/amp.knowyourmeme.com/memes/reco...


But customers are always right, which is why Acrobat these days has special features to do this. Besides deleting letters under it could affect layouts.


> Besides deleting letters under it could affect layouts.

Which is in and of itself a security risk. Not only can the length of the redaction give an estimate on the length of the password, but for variable width fonts, could even rule out many passwords using pixel measurements between the text on either side.


Only reason I can think of: Pixelation has a more elegant touch than covering everything with a box (be it black, or some better suited color).

Took a while to convince people to not use a bit of gaussian blur, because it's insecure. Well, ready for round 2...


What stops one from making a more secure pixelation algorithm?

One that will shuffle around or maybe just randomly derive the pixelated blocks if they're within a certain color threshold of similarity?

I still generally like pixelation over drawing a box for aesthetic purposes.


Why bother? Just have the tool put random text to fit then pixelize that.


Well you can increase the size of the blocks, maybe to full font-height. But at some point in future, some ML-guy will even decipher that probably.


For elegance, it's a bit artificial, being tied to the technology of things that use pixels and popular software that zooms in on pixelated images as a grid of single-color squares. Maybe some unintelligible character-like symbols would look more elegant?


A pixelated area communicates more clearly that there’s information present that is hidden from you- but not from others. It also can’t be mistaken for a design element.


For stylistic reasons.

It looks less jarring when pixelated compared to being blackened out.


Why always black boxes? The box could be of any color. In this case, white works best.


It's not. There's never going to be a tool that does this with black bars.

The only benefit I can think of is legitimacy. Having something blurred there suggests that there was actually something there.


There is still a risk to leak the length of the password. If the font is known and the size of the rendered text can be inferred that would limit the search space considerably.


Personal Snippet, if I want to hide/cross something I wrote on paper, I usually change letters to something else, like u to g, i to d, nothing preplanned, whatever comes in mind, before I cross line the text in forward slashes, backword slashes, & in horizontal lines.


I sometimes write down on paper my banking passphrase (from which the bank makes me select letters at specific position). Afterwards I overwrite it a few times with the alphabet and a few numbers and other random letters. It's totally unreadable.


I thought I was the only one!

Such a great trick. Just write random letters or even random sentences over your handwriting perhaps 2 or 3 times, then heavily cross it out and it's impossible to recover.


> There is still a risk to leak the length of the password. If the font is known and the size of the rendered text can be inferred that would limit the search space considerably.

I do a lot of screencast videos and what I typically do is put a black rectangle well past the point where my secret text really ends.

If my secret is:

    mysupersecretpassword
You might end up seeing a black bar of:

    ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

What I'm most afraid of is accidentally having literally 1 frame of video where the bar isn't present. This is super easy to do if you're just scrolling through a timeline while editing. You need to really dive in and move frame by frame to ensure you get everything. I usually extend it a few frames in either direction just to be safe too.


Your fear is one of the reasons I have z/x bound to 1 frame backwards/forwards in my editing software, it lets me quickly look for such issues.

However, glitches in editing software can still rarely occur, so for very confidential footage I first censor it, render it out in an intermediary format, check the render to make sure it's 100% correctly censored and then use that rendered footage to continute editing.

Also, mixing/messing with frame rates is an easy way to shoot yourself in the foot.


Hi, I'm thinking of learning how to edit videos. What software do you use, and what is the best way to learn it, for someone who isn't a master at photo/video editing, but is a power user of computers generally?


YMMV but I used a free trial of Adobe Premiere Pro to learn the basic concepts for editing clips produced by my Nikon D850 DSLR. At work I had access to Premiere Elements and decided this was good enough for me, as a hobbyist, given that going with Pro would massively increase my existing monthly fee to Adobe. So I paid the one-off license fee for Elements.

I'm currently looking at BlackMagic's DaVinci Resolve, which has a quite well featured video and audio editing package for free. But 'all' the free version does is edit video clips. You have to pay for additional functionality. Premiere Elements, in contrast, has lots of useful things like creating/handling non-video assets such as text for titles and credits, loads of (sometimes cheesy) clip transitions, etc etc. Resolve may be okay if you hate Adobe.

You'll need something more sophisticated if you are targeting non-standard playback devices, or want to participate in collaborative workflows, or grade raw sensor clips (assuming you can get this data off your video device). As a real power user you might also want to look at ffmpeg, something I simply haven't had time to get my head round.

Again, YMMV.

[Edit] I consider myself an 'advanced' stills photographer, who is much happier using manual rather than auto settings. When I started with video, however, I quickly discovered how much I didn't know. For example, what is the relationship between the shutter speed of the camera and the frame rate of the video? This was helpfully ignored in the camera's manual, and took some googling to find out. Video autofocus is also pants, on the D850, for the type of wildlife and action photography I like (it always refocuses just as the bird lands). So I'm teaching myself to use manual focusing, just like they do in the movies. A video tripod head was also essential so I could pan, elevate and focus with only two hands rather than three.


> I'm currently looking at BlackMagic's DaVinci Resolve, which has a quite well featured video and audio editing package for free. But 'all' the free version does is edit video clips. You have to pay for additional functionality. Premiere Elements, in contrast, has lots of useful things like creating/handling non-video assets such as text for titles and credits, loads of (sometimes cheesy) clip transitions, etc etc.

The free version lets you create titles, animations, transitions and you get access to their after effects comparable tool for fancy animations as well as color grading and audio / video editing.

It's a very reasonable free offering. I use it all the time to edit podcasts. I want to use it for editing video too but I have issues with it adding artifacts to the rendered videos, I think it's because it doesn't like my 6 year old GTX 750ti.


Yeah I think blurring is more aesthetic, and until recently was seemingly just as secure. It reminds me of how people have used the Photoshop swirl tool to obscure faces, but you could just use the swirl tool in the other direction to undo the effect.



Black bars alone may still leak info.

First, if you add black bars on your own and don't use professional redaction features of software, you might miss the OCR text layer of the PDF, or the bar might be added as a separate object entirely which means it can also be removed later on.

Second, if you don't use monospace text, the width of the text you are redacting will reveal information about it. That's why monospace fonts are so commonly used in the intelligence community for example.

Third, if you just add a black bar to a screenshot, there might be residual values of the text left in adjacent seemingly white portions of the image, but they might not be entirely white due to compression effects. Better you run it through a filter before publishing.


> Second, if you don't use monospace text, the width of the text you are redacting will reveal information about it.

If you do use monospace text, width reveals the exact redacted character count. If you don't use monospace text, it constrains the possible contents in a more complicated way, but it leaks information either way.


It leaks much less information when monospaced, although that information is easier to retrieve. Anyone can just visually inspect and see exactly how many monospaced characters were removed, but someone with the right tools and skills can narrow down the potential characters quite a bit if it's a short piece of non monospaced text.


Yes, in pdf, add bars, & then print it again as pdf, but as image.


Even black bars all leak the face that there’s something hidden


Do you mean the typeface/font? If it is just a word we are hiding in a sentence/paragraph; it would be like fill in a blank task for reader; much more chance to guess from context. There, the leaking length will not matter.


Oh sorry, my bad, you mean fact, not face. Yes, I agree.


ah. yeah fact :-)


This is why the real professionals use the same mechanism they always have: cut the offending text out of the paper with an x-Acto knife and make a photocopy.


Also helps prevent any issues related to double sided printing and bleedthrough.


> First, if you add black bars on your own and don't use professional redaction features of software, you might miss the OCR text layer of the PDF, or the bar might be added as a separate object entirely which means it can also be removed later on.

Looking at you, Preview on macOS.


This kind of techniques have been discussed here many times. See for example: https://news.ycombinator.com/item?id=8078747

I think one of the reasons people use blurring/pixelation is that it looks nicer than black boxes.

Wondering if it would make sense to build a tool that renders a pixelated version of lorem ipsum or something like that. It would be secure while also looking good.


The added benefit of this would be wasting the time of a would-be hacker


Security by attrition.


Why is this advertised for passwords only? It should work for any pixelated text. Just because cracking passwords sounds like a cool use case? This project is shooting itself in the foot. Plus it's not like passwords are ever pixelated on-screen, they're generally stylized as asterisks. But sweet leet hacking skills bro


A similar method was used to uncover the name of a game mode in a teaser video from nintendo. Since it was a video the pixels would change as things moved and you could reverse the pixelation fairly well.


What about that bitcoin wallet that was blurred on TV and they managed to reverse engineer it?

Cannot find source.


https://www.bbc.com/news/technology-41737248

> Two French hackers used their computer skills to reconstruct a blurred-out code on TV and claim bitcoins worth $1,000 (£760).


It was Bitcoin cash, so the 3 are worth, today, ~$1320.

I think the hackers could have made more money if they got real jobs doing valuable work.


The money they _directly_ got out of this might not have been their only motivation.


Funny they say bitcoins when 1 bitcoin was at least $1000 back then. I'm pretty sure the usual grammar rules apply so you can't use the plural unless you have at least 2 bitcoins. Super minor but interesting how a vastly deflated currency that is super divisible can mess with our usual language.


Any source for this? That sounds interesting.


It looks like this may be more due to a mistake in the censor area than a general pattern you can use but here it is https://twitter.com/Lattie9001/status/1027204063811850240

These kinds of mistakes show up constantly. Another one that often works is when people draw black bars over words they are often no 100% opaque and you can chuck them in to gimp, drag the levels around until the original word is exposed.


Reminds me of the case of the child abuser who was caught by simply reversing a Photoshop filter.

Reference: https://boingboing.net/2007/10/08/untwirling-photo-of.html


I tried this to get a feeling how practical this is.

It seems to me that this would only work in very rare situations. It needs some kind of pattern that is created with the same font. This pattern gets really large if you add a lot of chars (i.e. a-z already makes it huge).

I tried this with a really simple example (numbers only, cutting exact frame of pixelated image, same font for pattern) and it didn't produce any useful result.

TBH that doesn't look all that impressive. It's not a "throw pixelated image in here and get the unpixelated result" tool.

(It's still of course valid to make the point that pixelation can be insecure. It's of course very much possible that much better such tools - likely by throwing in some ML - are possible and may even exist in the hands of people who won't share them on github.)


> It needs some kind of pattern that is created with the same font.

If the goal is to de-pixelize some text in a public website (login form, username, phone, etc.) this is as simple as right click > inspect > input the sequence > screenshot.


Relatedly (and this is probably not surprising to anyone here) if you draw a black box on a PDF file to cover sensitive information, chances are a simple screen reader will still able to extract the information just fine.


The safest way to redact PDFs is to convert them to images, draw black boxes on the sensitive content and create a new PDF from these images.

This way the redacted PDF won't have any metadata from the original (e.g. bookmarks) nor risk any weired PDF redaction bugs.

The downside is that your redacted PDF is (probably) larger than the original and isn't accessible. You can mitigate this by using OCR but it still isn't perfect and has a high chance of messing up tables or any special text placement.

Some fancy combination of encodings (e.g. using JBIG2 except for images) can probably help with the filesize.

---------

The best way to handle sensitive information is probably to use codenames (for people, places, etc) so you can share the original documents without worrying so much.


That's why you always use the Adobe Acrobat Redaction tool. It'll obliterate everything and the metadata too!


Found this out the other day on that story from the port explosion. They had a pdf with blacked out names, but you could just copy the text and paste it and it wasn't blacked out.


Redact ---> print ---> scan ---> distribute.

I think US courts and lawyers are finally starting to learn this.


Paperless edition: Redact ---> convert to image ---> convert to PDF ---> distribute.


Even if you do black box -> separate screenshot? Is that because the OCR is encoded in white pixels you didn't redact?


If you take a screenshot of a PDF file that's been blacked out, then you're fine. There is no more OCR.


Asking for a friend: what's current state of the art in reversing pixelated censorship in Japanese movies? Obviously a tougher problem because the original is not known, but you'd think a well-trained GNN should be able to make a pretty good guess.


Decensoring Hentai with Deep Neural Networks: https://github.com/deeppomf/DeepCreamPy


Interesting, but seems to only work on static cartoon images where regions to uncensor have been manually identified.


IIRC there was another project which does the identifying part automatically, and then it's just a matter of doing it all in a loop for each frame.


The assumable backstory of this repo:

This is really the collision between designers and programmers. Due to aesthetics, designers try and cover text up by using a blur. Programmers see this from a functional stand point, they see some security issues. Programmers act to prove their methodology is superior and that is the story of how this repo came into existence.


So as I understand it this technique rests on a number of assumptions:

- You know the exact parameters used to render the text

- You can render new text with the exact same parameters

- The pixelated image hasn’t been ruined by color quantization or other destructive compression


I could see someone assembling a corpus of sample text images covering commonly available fonts across major operating systems and having a version of this tool that brute forces all of them to find the best match.

It would also be interesting to see how well it worked if there were differences in font rendering or compression as you say - I wonder if you it might still be close enough to make a partial match in some cases.


That basically how I did OCR 20 years ago :) One can also notice that kernels inside a text trained NN would have somewhat similar blocks and as a result what the net would be doing is a similar matching.


Where are passwords even pixelized? I always see them as dots.


Let's say, a software manual, where you show some screenshots or make a screencast, but for some reason it's not possible to use dummy data and you need to hide the real data (because it's customer's names, or alike)


I found the author's blog post on LinkedIn more informative:

https://www.linkedin.com/pulse/recovering-passwords-from-pix...


That link is also in the first section of the repository's README.


Seems like pixelization algorithms should implement some degree of brightness and chromatic random noise in order to defeat this?

To preserve the same aesthetics (a blurred username or identifier that doesn't call attention to itself) but would make it impossible to match against?

E.g. in Photoshop, the "Filter > Pixelate > Mosaic" option should have a checkbox called "Secure" or "Security noise". Or there should be a separate filter called "Pixelate > Redact". Ideally it would use some intelligence to figure out the size of characters/symbols in the selected area and automatically figure out the right combination of pixel size and noise.


Or flood the whole rectangle with black. If that's the only layer of the picture, there's no need to worry if the "secure noise" is secure enough, or if it stays secure in two years.


Is "pixelized" the norm somewhere? I'm used to "pixelated."


The way I use them, "pixelated" refers to images rendered to a target resolution in pixels. "Pixelized" refers to an image that is transformed to resemble pixel art of lower quality (in terms of resolution and color depth).

A screen from Super Mario Bros is pixelated. That's what the art is supposed to look like.

A screen from Broken Age running in Retro Mode is pixelized. The art isn't actually intended to look like that, and it's just for nostalgic effect.


"pixelated" has a pre-existing meaning in English (discombobulated, airheaded, tipsy). An English speaker already familiar with the word might choose "pixelized" to avoid the collision.


You're thinking of "pixilated," not "pixelated."


could it be that the author's first language is not English? "pixelized" sounds like a literal translation of the French word for "pixelated".


Judging from the author's username, he/she is almost certainly Dutch, and in Dutch we tend to use English terminology in most technical matters. But we don't always use it 100% correctly, so that could be a reason.


I've always thought that people's confidence is pixelation filters was unjustified, and that reversal algorithms are eminently achievable. This goes for pixelation of faces for hiding identities, too. Deep learning or even just genetic algorithms testing lots of possibilities and re-pixelating should be able to infer the original image.


Reminds me of how pedos were using the blur tool, and by using the unblur tool there you go, haha.

Still cool edge case.


So that begs the question, can you do the same with faces? Sure the contrast is a lot worse but with added frames from video content maybe?


Lots of suggestions saying to use black boxes instead unless you save to a pdf and your editor just added your black box as a layer..


I think it also would work on a photo with very small unreadable text, or miss I something?


Old-school way:

    $ 7z a 12345.7z passwords.txt
    $ cat 12345.7z >> cats.jpg
Try: https://i.imgur.com/McqO2eX.jpg


I prefer hard blur for my screenshots (sharex).


would a neural network be able to do something like this?


From the article - "Since the linear box filter is a deterministic algorithm, pizelizing the same values will always result in the same pixelated block."

The author used this property to create the de-pixelizing algorithm. A neural network could do it, but it would be overkill.


“recovering”




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: