> UTF-8 regular expression matching shouldn't be different from ASCII at all, as far as I can tell.
Not really. First the '.' operator needs to work differently. This can be still done fast, but if you actually want to claim unicode support you should also consider:
- Unicode collation orders
- Unicode string equivalence algorithms
- Unicode normalization algorithms
This will also take care of all the unicode weirdnesses that no one actually uses or cares about but you still must implement to claim compatibility like: presentational forms, combining diacritics, ligatures, double sized characters and other odd stuff.
Not really. First the '.' operator needs to work differently. This can be still done fast, but if you actually want to claim unicode support you should also consider:
- Unicode collation orders
- Unicode string equivalence algorithms
- Unicode normalization algorithms
This will also take care of all the unicode weirdnesses that no one actually uses or cares about but you still must implement to claim compatibility like: presentational forms, combining diacritics, ligatures, double sized characters and other odd stuff.