You have to have some sort of heuristic that determines what a "good" regex is, since there are undoubtedly multiple regexes that describe a corpus.
A simple heuristic is the smallest regex.
So in your example, given the training examples:
aba
abaa
aaaaba
and the counter examples:
abba
ba
ab
It's clear to a human I probably want to match "a+ba+". That's clearly much smaller than ("aba" | "abaa" | "aaaaba") & !("abba" | "ba" | "ab"), so it would be a "better" regex.
A simple heuristic is the smallest regex.
So in your example, given the training examples:
and the counter examples: It's clear to a human I probably want to match "a+ba+". That's clearly much smaller than ("aba" | "abaa" | "aaaaba") & !("abba" | "ba" | "ab"), so it would be a "better" regex.