Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure :-) even then, you have even more fun... imagine if the next item was Sep 1-Dec 1 and the list was known to be sorted! The parser would need to be stateful to be able to disambiguate!!

Even worse (well, maybe similar, but more surprising): imagine if on Feb 28, you parse "Feb 29-Mar 1"... both of those dates could land in an entirely different year than "Feb 28-Mar 1" would, depending on whether the current year is a leap year...

I dare say I have not yet seen a single parser in my life that handles such issues. In fact I don't think I've seen a parser that can parse a date interval, or that can even do "parse this string assuming it is after that date", or anything like that.

And all of these problems are before we even consider time zones, leap seconds, daylight savings, syntactic ambiguities, etc... not just how they affect individual dates, but also ordered dates (/intervals) like above...




I would say that you are mixing up "parsing" and "calendaring (or something of the sort)". As far as I understand parsing is syntactic analysis, i.e. going from a linear structure to a more complex structure (usually a tree); it should not add to the tree anything that was not in the linear structure. It shouldn't consider a semantic context (such as the current date) to produce an ast.


By parsing I mean it in the usual sense for a date... strptime, Date.Parse, etc... i.e. turning a string to a date (or multiple dates). You can call it something else if you'd like.


> I dare say I have not yet seen a single parser in my life that handles such issues. In fact I don't think I've seen a parser that can parse a date interval, or that can even do "parse this string assuming it is after that date", or anything like that.

You can google "nlp date extraction" maybe? I've used libraries in the past that do this.

Note: they're far from being perfect. I ended up not using any, as each had their weird corner cases.

Here's an old one: http://natty.joestelmach.com/try.jsp#

I tried with: "first week of december to end of january"

And it gave me: Tue Dec 01 15:31:30 UTC 2020 Fri Jan 31 15:31:30 UTC 2020

Edit: this one seems more polished: http://nlp.stanford.edu:8080/sutime/process


Cool, thanks for the tip!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: