Regex Puzzle

bluesmoon · on July 6, 2017

Well, there is https://regexcrossword.com/

samjs · on July 6, 2017

The OP converted into this format: https://regexcrossword.com/playerpuzzles/595e5542d2433

angry_albatross · on July 6, 2017

There's an error in the second "Across" expression. [^PZVJG]{4}(.)[EFUG]{6}(.)[^\sPZVJI]{2} should be [^PZVJG]{4}(.)[EFUG]{6}\1[^\sPZVJI]{2}

ColinDabritz · on July 6, 2017

This threw me off for a bit.

The \1 is from the original puzzle, and refers to the value of the first (capture group). I ran into this issue while filling out the puzzle. The \1 is required to 'propagate' the value in that square to other places. The puzzle is ambiguous without this fix.

Thanks for the pointer, I've added the note in the comments. Hopefully the puzzle is editable.

Lovely puzzle, and it's a great quote. :)

rpearl · on July 6, 2017

Fixed the typo: https://regexcrossword.com/playerpuzzles/595e8c8e86584

kreetx · on July 6, 2017

Nice!

bshimmin · on July 6, 2017

Brilliant. My dad is 71, loves puzzles (like cryptic crosswords and Sudoku), is a huge technophobe, and has just retired. This should keep him busy until about 2022.

baron816 · on July 6, 2017

Technophobe or technophile?

bshimmin · on July 6, 2017

Technophobe. Just acquiring the necessary Google skills to find out what the proper regex rules are will probably take him till Christmas. But hey, the man loves a challenge.

sergiosgc · on July 6, 2017

Technophobe, probably. A technophile crossword aficionado should have this solved by dinner time.

canada_dry · on July 6, 2017

Regex is one of those tools that I use a couple times a year - usually for cleaning up lousy input data.

I always end up spending a fair amount of time using tools like:

http://regex.inginf.units.it/

https://regex101.com/

http://www.regexr.com/

And of course stackoverflow.

carom · on July 6, 2017

I use https://www.debuggex.com/ a lot when it's a complex expression. The visualization really helps.

squeaky-clean · on July 6, 2017

I will always keep a Windows install at the ready just for RegexBuddy. I use it mostly to take a regex and generate the code I need for it (e.g. find the first numbered group in match in javascript), without having to remember language specific details.

rgb122 · on July 6, 2017

Why don't you just learn learn Linux and pcre? What use are 2-bit windows tools? Don't fill your head with windows - a dead os walking

dahart · on July 6, 2017

I think parent said why: there's no RegexBuddy on Linux. I can see you're new, so I won't be harsh but HN isn't the place for this kind of commentary. Everyone here understands the difference between Windows and Linux. Judging and/or trolling over Windows is boring, take it over to Reddit or something.

squeaky-clean · on July 6, 2017

Yes, thank you. I of course use Linux, but I really don't care to remember the specifics of regex libraries across PHP, Python, JS or Java. So I just work out my regex, and from the drop downs choose "Use->Javascript->Chrome->Get Text From Numbered Group". And it spits out like 6 lines of JS that will handle cases of it either being found or not. You can choose the names of the ingoing and outgoing variables.

You don't always get to pick the flavor of regex engine you're using and I've sort of become (partly because of RegexBuddy) the "regex expert" at the office. Aside from `re` in Python I don't even remember the names of the regex libraries. Why should I?

dahart · on July 6, 2017

The funny part (to me) is that it was already obvious you use Linux or mac -- some flavor of (star)nix -- because you said you keep a Windows install at the ready. That implies to me that Windows isn't your primary OS. I keep a Windows install at the ready too, for a whole bunch of reasons that have nothing to do with how much I like or dislike Windows.

I would love to be able to remember regex specifics from lib to lib and app to app, but try as I might, I can't. I never know if I have lookaheads or backrefs or named captures available and what the syntax is, I can't remember if there are named character classes. I end up reading the docs, again, almost every time I dig into a regex problem. Same reason for me- I use too many flavors of regex libs. If I could stick to one language, I'd have some hope.

I haven't tried RegexBuddy, but now I'm going to because of your comment, thanks for sharing!

squeaky-clean · on July 6, 2017

I highly recommend it if you have to deal with regex a lot. I really wish it was open-source, or there was some OSS alternative as good as it, but oh well. The tools linked above are great for simpler usage.

The built in regex step-debugger is also great, though I've learned that if I have to rely on that, it's probably not a task well suited to regex.

fapjacks · on July 6, 2017

If, however, you're into judging and/or trolling over Linux, then welcome to HN! There are tons of upvotes to be had in comments disparaging Linux.

dahart · on July 6, 2017

I'm sorry you feel that way, that sounds like maybe you've had some bad experiences here and are turning pessimistic about HN. Just because it happens doesn't mean it's universally accepted. Personally I would say exactly the same thing in response to the above comment if it was talking about Linux instead of windows and had nothing substantive to support it. Yes, you can find people who agree with any insults you want to throw out, but how about we try to be the good guys instead?

Find some awesomeness here on HN -- there's a lot of that too -- and comment about it. Build something and show it off or support someone else who's built something cool. Write comments that contribute meaningfully and are positive and you will find more upvotes than you can possibly imagine, if upvotes are what you're going for. And you're right, you can find lots of upvotes (and downvotes too) by judging other people's legitimate choices without even bothering to understand them, but that doesn't make HN a better place, it doesn't help anyone learn, and it might not make you happy in the long run either, even if it's fun for you at the time.

We get to decide if this place is cool and supportive and fun to play in, or lame and thorny and dangerous to share any of your thoughts for fear someone will be an ass about it and insult your choice of tools. I choose the former, and I'm happy to spread the love and request that people refrain from trying to knock others down or engage in flame wars.

I generally upvote people who reply to me just as a way to say thanks for reading what I wrote and engaging with me. Here's one for you. I hope you'll find more peace on HN and get exposure to more of the positive side of it. There are some incredibly amazing people here, and it's true there are also some destructive forces too. I hope you can let the crap roll off and seek out more of the good stuff!

fapjacks · on July 8, 2017

This pipe dream is what drew me to HN ten years ago. And while there are specifically rules against drawing comparisons between HN and reddit, years ago, the parallel made sense. These days, being totally honest, I find reddit to be more approachable and friendly and informative in the types of subreddits I read than HN, the vast majority of times. The reason I keep coming back to HN is for the once-in-a-blue-moon comments which blow your mind. If those comments stopped happening, or if they started happening on reddit, I'd never come back. There is absolutely a creed which is the centerline of HN, and deviation from it -- even totally reasonable, level-headed deviation -- is often punished with downvotes into invisibility. I've specifically created a Chrome browser extension to remove all point/color information from post comments -- and recently uploaded it to Github [0] -- just to keep my own voting habits from being influenced by this brutal kraken of sameness that exists on HN. My post above is a little bit tongue-in-cheek, but in my experience, one of the subjects you can most find downvotes with is the classic win/*nix flamewar. Even something totally reasonable and said in a straightforward, matter-of-fact way which promotes something about Linux in a thread about Windows is annihilated. Unwanted facts and differing viewpoints are erased from existence. That's HN, even if there's some silver lining in there occasionally.

[0] https://github.com/fapjacks/antihnbs

eru · on July 6, 2017

Perl compatible regular expressions are not regular expressions at all. (https://en.wikipedia.org/wiki/Regular_expression#Formal_defi...)

Drdrdrq · on July 6, 2017

Really? I use it fairly often, usually for cleaning input data and similar. But I only use a subset of regex functionalities that works across different engines and that doesn't make problems with escaping strings (no backslashes and similar).

seven800 · on July 6, 2017

You might also be interested in a tool I made which generates random strings that match a given regex:

http://regexicon.com/

Very useful when code reviewing other people's regular expressions.

simlevesque · on July 6, 2017

I love the fact that you can have unit tests for your regexes on regex101.

vacri · on July 6, 2017

As I get more experienced in operations, I find regex to be more and more invaluable. I used to dread doing a regex, now I get enjoyment out of sorting out a tricky one.

rosstex · on July 6, 2017

That first link, oh my god! I can have some fun with this.

hokkos · on July 6, 2017

I've worked on the project where some XSD files defined fields with regex restrictions, also some rules over fields added other stricter regexps or negative regexps depending on some context in a format called Schematron. I had to generate XML files conforming to those XSD, so I used some tools around Z3 solver and Microsoft.Automata to generate those strings conforming to multiple regexps. It would convert the regexps to finite automaton and intersecting them, walking it from the starting state to a final one over a charset.

Links :

https://www.microsoft.com/en-us/research/publication/symboli...

https://www.microsoft.com/en-us/download/details.aspx?id=523...

It now seems to be Open Source (MIT):

https://github.com/AutomataDotNet/Automata

eru · on July 6, 2017

There's also redgrep (https://github.com/google/redgrep) that supports intersection and complements of regular expressions.

I am toying the idea of writing a little game where player A thinks of a regular expression, and player B tries to guess. If B guesses right, they win. If B guesses wrong, A has to provide a false positive and a false negative (if they exist), and B gets to guess again.

Can you think of ways to automate the roles of A and/or B?

long · on July 6, 2017

In computer science academia, this kind of game is called grammar induction (of which inferring regular expressions is a special case).

A classic algorithm for inferring regular expressions was given by Angluin: https://people.eecs.berkeley.edu/~dawnsong/teaching/s10/pape...

(This isn't quite the same setup as you're thinking of but there are a ton of variations on the basic idea)

eru · on July 7, 2017

Thanks. I had figured out that grammar induction was the right word to look for a while ago. (But took me a bit to find it.) I know the paper you linked to, but yes, it's not quite the right setup.

long · on July 7, 2017

There's a conference on grammar induction called ICGI; might wanna browse through the proceedings to see if there's anything closer.

eru · on July 7, 2017

Thanks!

I'm basically interested in the equivalent of the "guessing game" for regular expressions. (See eg https://stackoverflow.com/questions/5440688/the-guess-the-nu... for a rational number solution.)

With a fixed guesser, that would encode all regular expressions / finite automata as sequences of binary digits. (But in a interestingly different way from just serializing the table for a DFA, or writing down the regular expression in ASCII characters.)

long · on July 7, 2017

So I do AI research on something pretty related to the guessing game -- I'll shoot you an email.

hokkos · on July 10, 2017

I think AutomataDotNet can do all that :

Automation of regular expression generation, it seems easy : use RE fragments and aggregate them, or walk the type hierarchy of the RE AST and generate them randomly.

B needs to guess A's RE so we need to generate examples of strings belonging to it to gives hints : this is exactly the use case of AutomataDotNet.

Also if B guess a RE that is equivalent to A's RE it seems unfair to not attribute a win, so we need to tell if 2 RE belong to the same equivalence class. AutomataDotNet does have a AreEquivalent method.

You can automate the generation of false positive and a false negative with the method Minus to creates an automaton that accepts A-B or B-A and generate an example.

eru · on July 10, 2017

Thanks, I'll have a look.

You are right about the equivalence classes: for that you want to talk about the corresponding DFA (which have a unique normal form in the shape of the minimum DFA).

I am not sure about the rest of what you are saying: in general even just minimizing regular expressions is EXP-SPACE complete, if I remember right.

Yes, generation of false negative and false positive ain't so hard---theory agrees with you. But automating the guesser is, as far as I know.

hokkos · on July 11, 2017

I was talking about generating hints to give an opportunity to the RE guesser to find the secret RE. You seem to talk about automating the RE generation based on the pattern of the hints, yeah I didn't thought about that, the other answer seem to talk about it.

jfries · on July 6, 2017

How did you solve backreferences with that approach?

eru · on July 6, 2017

You don't. Back reference leave the space of mathematical regular expressions, that all this nice theory works for.

hokkos · on July 6, 2017

I had the luck that XSD regex doesn't support backreference.

jgrahamc · on July 6, 2017

Worth doing this by hand to exercise your knowledge of regular expressions. My solution (SPOILER): http://imgur.com/a/9iK9J

simias · on July 6, 2017

Well done. I don't know what you think but I found that most of the time the character classes would intersect perfectly (i.e. there'd only be one character possible once you intersect both sides of a single square). That made it pretty easy overall since for the vast majority of the board you don't have to worry about the "context".

But I guess if it's meant for an audience of folks not very familiar with regexes it's difficult enough as it is.

jgrahamc · on July 6, 2017

I thought it was pretty easy given that the character classes meant that it was pretty easy to take a row/column and eliminate possibilities.

stedaniels · on July 6, 2017

Well, thanks. I tried to cheat with https://github.com/blukat29/regex-crossword-solver and got hit with lex parsing errors! My limited python and 5 minute effort resulted in failure! At least I got to read the message though :-)

rootlocus · on July 6, 2017

I'm assuming the solution isn't unique because I found some positions that are under-constrained.

vhold · on July 6, 2017

I only found one column to have multiple solutions, and saved it for last, at which point only one option made sense.

angry_albatross · on July 6, 2017

Which positions? I didn't find any that were under-constrained.

rootlocus · on July 6, 2017

Assuming 0 indexing: row 0 column 4. Constrained by rows 0, 2, 3 where rows 0 and 2 have [^XZVCHFJLQM] and [^\sPQFB] and row 3 has [OYSRU]. All of: O, Y, S, R and U match. Am I missing something?

jgrahamc · on July 6, 2017

In row 2, column 4 the (.) is reused via \1 in column 11 where it has to be R.

70jS8h5L · on July 6, 2017

You did the same thing I did I imagine. The regexcrossword.com link has a small mistake:

[^PZVJG]{4}(.)[EFUG]{6}(.)[^\sPZVJI]{2}

should be

[^PZVJG]{4}(.)[EFUG]{6}\1[^\sPZVJI]{2}

(note the \1 )

vacri · on July 6, 2017

Ah, but you have underscores where there should be spaces! ;)

KineticLensman · on July 6, 2017

This BBC report refers to a puzzle released by the UK's National Cyber Security Centre [1], as part of an online recruitment effort.

[1] https://www.ncsc.gov.uk/news/take-our-regex-crossword-challe...

cag_ii · on July 6, 2017

Interstingly, Bletchly Park is known to have used crossword puzzles published in The Daily Telegraph as a recruitment tool for "codebreakers" during WWII.

https://en.wikipedia.org/wiki/Bletchley_Park#Personnel

arien · on July 6, 2017

So I suppose is it a one time thing only? A shame, it was quite fun to solve!

simlevesque · on July 6, 2017

There you go: https://regexcrossword.com/

Cephlin · on July 6, 2017

Wow, finally a crossword I have a chance at!

dbrgn · on July 6, 2017

If you want a challenge, try this one: http://twiki.org/p/pub/Codev/TWikiPresentation2013x03x07/reg...

dmit · on July 6, 2017

Originally from the MIT Mystery Hunt:

https://devjoe.appspot.com/huntindex/puzzle/mit2013601

PDF: http://web.mit.edu/puzzle/www/2013/coinheist.com/rubik/a_reg...

rspeer · on July 6, 2017

Note that as a Mystery Hunt puzzle, the goal of the puzzle isn't just to fill in the grid, it's to find the answer, a secret word or phrase that would be filled into another puzzle (the metapuzzle).

The puzzles generally don't tell you how to extract the answer, but the idea is you know it when you see it.

gregable · on July 6, 2017

I also created an HTML-based version of this one some time ago that allows rotation and color codes the rows as matching or not: https://gregable.com/p/regexp-puzzle.html

proactivesvcs · on July 6, 2017

Thanks for the gregex in HTML format! (gets coat)

andyjohnson0 · on July 6, 2017

I know that there are problems to do with regex matching that are NP-hard. So I'm wondering if it is possible to attack this puzzle using an algorithm that simplifies the individual regexes using knowledge of the regexes that that they interact with?

eutectic · on July 6, 2017

This problem is NP-hard by reduction from SAT. Treat each column as a truth variable and use the rows to encode CNF clauses. For example, `(A | ^C)` becomes `(1..)|(..0)`. Then set all the column regexes to `( 0* )|( 1* )` to enforce a consistent truth value for each variable.

jonahx · on July 6, 2017

Could you elaborate on the encoding? What are valid mappings?

eutectic · on July 6, 2017

A variable becomes '1' in the corresponding position and '.' everywhere else, and similarly with negations of variables and '0'. The regex is then just an alternation of these sub-regexes. This incurs just a linear blow-up in the number of variables, for the '.'s.

For an n x m grid, you can encode any CNF formula with n clauses on m variables. See here if you are unfamiliar with CNF: https://en.wikipedia.org/wiki/Conjunctive_normal_form

eutectic · on July 6, 2017

You could expand each regex to a regex on the whole table, and then take the intersection of the corresponding NFA/DFAs. Unfortunately, I suspect this takes exponential (or worse?) time in the worst case.

chpatrick · on July 7, 2017

You can solve it using a SAT solver like Z3. There are particularly elegant solutions in Haskell that basically interpret the regex but on "symbolic" characters rather than real ones. You can then ask what values these characters can take such that all regexes match.

Some implementations: https://github.com/ekmett/ersatz/tree/master/examples/regexp... https://gist.github.com/LeventErkok/4942496

IanCal · on July 6, 2017

It certainly should be. There limited things in the puzzle help as well.

For example, you can split each one of these regexes up into smaller ones based on positioning. Now some of them are simply "match any of the following characters" which can be combined together with set intersections (and something similar with "none of these characters).

mcbobbington · on July 6, 2017

I love regexes. In addition to doing cool things and saving time, I feel like I'm a "real programmer" whenever I write a good one.

gargarplex · on July 6, 2017

This comic artistically renders that feeling. I, too, know it well.

https://xkcd.com/208/

Already__Taken · on July 6, 2017

Anyone know a decent android app for these? the MIT one has the most insane and broken scrolling functionality it's shocking.

Xophmeister · on July 7, 2017

That wasn't as hard as I thought it would be. I was worried that, without stard/end of string anchors, things could get quite hairy, but the biggest stretch of logic was just, "There are five spaces for me to fit a character, an optional characters and two two-character sequences. Therefore that optional character must not appear."

Emyr42 · on July 12, 2017

Column H pattern starts [MVFU]{2}, and 3 of those options don't match the Row 0 pattern, leaving "U"

The published solution says H0 should be "S".

Emyr42 · on July 12, 2017

The version at https://www.ncsc.gov.uk/content/files/regex_cross_hard_v3.pn... has S[MVU]... for column H.

Guess nobody tested it.

shabble · on July 6, 2017

Does any common regex format/dialect require '\-' for a literal hyphen? AFAIK it's only special inside character classes, and escaping it doesn't necessarily work there if it would form a valid range identifier.

movablesed · on July 6, 2017

I don't know of any that require it. But it's common to see punctuation characters escaped like that because Perl (and PCRE and its various cousins/descendants) allows you to escape any non-meta-character and have it treated as a literal.

I suppose the two main benefits are

(a) neither the writer nor the reader has to remember which punctuation characters are meta-characters (you just have to remember that it's always a literal if it's escaped), and

(b) in implementations like PHP's which try to replicate the Perl-style 'delimited' syntax (e.g., /foo/), it prevents characters in the pattern from conflicting with the delimiters.

Maybe there's some other advantage but i can't think of what.

jwilk · on July 6, 2017

Direct link to the crossword:

https://ichef.bbci.co.uk/images/ic/976xn/p057t19t.jpg

ape4 · on July 6, 2017

Since the clues are machine parsable it should be machine solvable.

hermanschaaf · on July 6, 2017

It is indeed machine-solvable; I wrote a solver for regexcrossword.com puzzles a while back (https://github.com/hermanschaaf/regex-crossword-solver). It was great fun, maybe even more than solving the puzzles by hand!

mtharrison · on July 11, 2017

Will your tool work on this puzzle though? I don't think so because it has backreferences.

gumby · on July 6, 2017

Nice! At Keplers in Mountain View you can buy version of Scrabble that uses regexes. The designer used to sell it in front of the shop -- he is obviously a programmer.

_lflx · on July 6, 2017

Could I stop by this afternoon and expect it to be in stock or was this a temporary offering?

gumby · on July 6, 2017

It wasn't a short-term item, but poor Kepler's has shrunk so much who knows if it's in stock or not. I would call them. It wasn't described as using regexes of course, so you'll have to say something like that special version of scrabble.

The designer is local so if they no longer stock it you could look online...but it's better to get it from the shop if you can.

eutectic · on July 6, 2017

Can you please expand a bit on how it worked?

gumby · on July 6, 2017

Never played it, but from looking at the box of talking to the inventor it was just that The set of letters included regex operators and those The set of letters included regex operators and those you could put on the board as well. Meaning other people could use him as well To make words.

timdierks · on July 6, 2017

I believe column E is under-constrained; a solution with column E = "YYYY " or "OOOO " passes the tests, but is clearly not what's intended.

timdierks · on July 6, 2017

Never mind, this was an error in the https://regexcrossword.com/playerpuzzles/595e5542d2433 version, which has a (.) where it should have a \1 in row 2 (thanks to @angry_albatross).

angry_albatross · on July 6, 2017

In the expression in the second row, the 5th character must match the 12th character, which must be R.

IanCal · on July 6, 2017

Fun! I made a few mistakes by writing letters sideways which was then confusing (C vs U, for example), but this was a nice puzzle.

gozur88 · on July 6, 2017

That's a very odd thing to see in a mainstream publication.