Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Cute" comes across as very dismissive. I'm not sure if you intended that. lens-regex-pcre is just a wrapper around PCRE, so anything that works in PCRE will work, for example, from your Mozilla reference:

    ghci> "California rolls $6.99\nCrunchy rolls $8.49\nShrimp tempura $10.99" ^.. [regex|\p{Sc}\s*[\d.,]+|] . match
    ["$6.99","$8.49","$10.99"]
"Spacing combining mark" seems to be "Mc" so this works:

https://unicode.org/reports/tr18/#General_Category_Property

    ghci> "foo bar \x093b baz" ^.. [regex|\p{Mc}|] . match
["\2363"]

(U+093b is a spacing combining mark, according to https://graphemica.com/categories/spacing-combining-mark)

I think in general that Haskellers would probably move to parser combinators in preference to regex when things get this complicated. I mean, who wants to read "\p{Sc}\s*[\d.,]+" in any case?



U+093b is still in the BMP. By the way, what text encodings for source files are supported by GHC? Escaping everything isn't fun.

And I am not sold on lens-regex-pcre documentation; "anything that works in PCRE will work" comes across as very dismissive. What string-like types are supported? What version of PCRE or PCRE2 does it use?


> U+093b is still in the BMP

I'm sorry, I don't know what that means. If you have a specific character you'd like me to try then please tell me what it is. My Unicode expertise is quite limited.

> I am not sold on lens-regex-pcre documentation

Nor me. It seems to leave a lot to be desired. In fact, I don't see the point of this lens approach to regex.

> "anything that works in PCRE will work" comes across as very dismissive

Noted, thanks, and apologies. That was not my intention. I was trying to make a statement of fact in response to your question.

> By the way, what text encodings for source files are supported by GHC?

UTF-8 I think. For example, pasting that character into GHC yields:

    ghci> mapM_ T.putStr ("foo bar ः baz" ^.. [regex|\p{Mc}|] . match)
    ः
> What string-like types are supported?

ByteString (raw byte arrays) and Text (Unicode, internal representation UTF-8), as you can see from:

https://hackage.haskell.org/package/lens-regex-pcre

> What version of PCRE or PCRE2 does it use?

Whatever your system version is. For me on Debian it's:

    Package: libpcre3-dev
    Source: pcre3
    Version: 2:8.39-15


> version of PCRE

It uses https://hackage.haskell.org/package/pcre-light , which seems to link with the system version. So it depends on what you install. With Nix, it will be part of your system expression, of course.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: