Hacker News new | past | comments | ask | show | jobs | submit login

Thank you for publishing your work. Do you know of any similar projects with examples of custom tokenizers, e.g. for synonyms, snowball, but written in C?



SQLite itself is in C so you can use the API directly https://www.sqlite.org/fts5.html#custom_tokenizers

The text is in UTF8 bytes so any C code would have to deal with that and mapping to Unicode codepoints, plus lots of other text processing so some kind of library would also be needed. I don't know of any.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: