Hacker Newsnew | past | comments | ask | show | jobs | submit | d66's commentslogin

I'm not sure that's true. I have an eZ80-based emulator for the AgonLight2 that is already running well enough to run some real (console-based) ROMs: https://git.phial.org/d6/uxn-ez80/

(I'm new to eZ80 assembly so the project is going slower than it otherwise might.)

The AgonLight2 has 512K of RAM and a 20 MHz CPU which is more than enough for Varvara.

I agree that 8-bit computers of the era (e.g. Pet, Apple 2, C64, TI/99a, etc.) don't have enough RAM to give Varvara its own 64k of memory (though it wouldn't be hard to design a Varvara variant with a smaller memory space) but otherwise there really aren't major barriers. As far as permacomputing goes though, there are plenty of hosts out there with enough memory to comfortably run Varvara (anything 32-bit will be fine, most 16-bit computers would also be fine).

Based on my eZ80-based implementation I think any of the eZ80-based TI calculators with 128k or more of user-accessible memory could implement Varvara without major problems.


That'll be fantastic! I'd be delighted to see that! That would make those TI calculators the new smallest machines with a full Uxn/Varvara implementation, and much lower power than the GBA that is the current champion there; I'd really like to get some experience with what that's like, because I've found the experience of using Uxn to be inspiring so far.

I also hadn't seen the AgonLight2 before.

One quibble, though: the vast majority of 32-bit computers are microcontrollers, and most of them have less than 64KiB of RAM. (My comment at https://news.ycombinator.com/item?id=44611710 goes into more detail about the popular STM32 family.) They would be perfectly adequate for a self-sufficient personal computing experience (most of them have more total memory than the computers I used to run Turbo Pascal on, just less RAM, plus their CPUs are 1000 times faster, and they can read and write SD cards) but probably not with Varvara.


That's a fair point.

A few of us have been discussing a modified Varvara spec that limits the system to a smaller amount of memory (e.g. 32k, 16k, or 8k). I think with a spec like that it would unlock many implementations that are not currently practical.


I think a more important consideration for currently produced microcontroller hardware is the ability to run from ROM, because then you can load your program into relatively abundant Flash instead of the much scarcer RAM. Some way of using virtual memory (take a look at Forth's approach, which doesn't require hardware support but doesn't work for code, only data) would also make a big difference. All of this doesn't necessarily depend on native-code compilation; microcontroller memory is a much more tightly pinching shoe than microcontroller speed.


It wouldn't be idiomatic Uxntal but there's a relatively simple modification where load/store is from RAM space but jump/pc is from ROM space. For a subset of existing programs it would work fine, although the assembler would need some modifications to make the memory layout(s) a bit clearer.


Another minor correction: the well-known TI home computer was called the TI-99/4A, and it was a 16-bit machine, sharing the same architecture as TI's ill-fated line of minicomputers.


There's no requirement that stacks grow upwards in Varvara.

If your stacks grew downwards then you could use LE instructions to operate on 16-bit values on the stack. You'd still need to support loading from BE memory into the LE stack but that might not be too bad (two 8-bit operations instead of one 16-bit operation).

The device ports are also specified as BE so in theory you'd need to split those reads/writes up too. However, in almost all cases those are done directly from the stack values so I bet most ROMs would work fine with LE devices and LE stacks. LE devices would only cause issues when someone used 8-bit reads/writes from part of a 16-bit port.


  negation (~α): strings not matched by α
  difference (α - β): strings matched by α but not β
  intersection (α & β): strings matched by α and β
  exclusive-or (α ^ β): strings matched by α or β but not both
  inclusion (α > β): does α matches all strings β matches?
  equality (α = β): do α and β match exactly the same strings?


expressions like (...){1,256} are very heavyweight and the scala JS code ends up timing out or crashing the browser.

if you replace that with (...)+ then it seems to work (at least for me). smaller expressions like (...){1,6} should be fine.


Just wondering, what is it about testing repetition [a-z]{1,256} with an upper bound that's so heavy? Intuitively it feels like greedy testing [a-z]+ should actually be worse since it has to work back from the end of the input.


This depends heavily on how repetition is implemented.

With a backtracking-search implementation of regexes, bounded iteration is pretty easy.

But the linked webpage appears to compile regexes to finite state machines (it shows you their finite-state-machine, for instance), and eg [a-z]{1,256} will have 256 states: 256 times the 1 state needed for [a-z]. If [a-z] were a complex regex, you could get a combinatorial explosion.

This alone probably isn't the issue? 256 is not a very large number. But I suspect there are follow-on algorithmic issues. This is just speculation, but I wouldn't be surprised if that 256-state machine were computed by applying DFA minimization, an algorithm with worst-case exponential running time, to a more naively generated machine.


you're right. inclusion/intersection/etc. aren't actually computed via DFA but instead are computed directly on the regular expression representation itself. and large disjunctions (with 256 branches) are what is very heavy.

(it's possible to instead do these operations on DFAs but at the time i found it hard to get from an automata back to a reasonable-looking regular expression.)


the library uses a fairly simple data representation where x{m,n} is compiled using conjunction and disjunction. so x{1,4} ends up being represented as x|xx|xxx|xxxx.

this simplifies the code for testing equality and inclusion, since logically x{n} is just xx... (n times) and x{m,n} is just x{m}|x{m+1}|...|x{n}.

but when you have x{m,n} and n-m is large you can imagine what kind of problems that causes.


Seems like it should at least be x(|x(|x(|x))) instead of x|xx|xxx|xxxx to avoid quadratic blow-up.


yes, that is the actual construction: the disjunction data type only supports a lhs and rhs, so that is the only possible way to represent it.

i wrote it the way i did for clarity in the comments.


normally you would use an ordinal number [1] to label individual elements of an infinite set while using a cardinal number [2] to measure the size of the set.

i believe the cardinality of a set of words from a finite alphabet (with more than one member) is equivalent to the cardinality of the real numbers. this means that the cardinality of .* is c.

unfortunately, i don't think that cardinality gets us very far when trying to differentiate the "complexity" of expressions like [ab]* from ([ab]*c[de]*)*[x-z]*. probably some other metric should be used (maybe something like kolmogorov complexity).

[1] https://en.wikipedia.org/wiki/Ordinal_number

[2] https://en.wikipedia.org/wiki/Cardinal_number


I wouldn't say that's their 'normal' usage, I mean sure you can use them like that but fundamentally ordinal numbers are equivalence classes of ordered sets in the same way that cardinal numbers are equivalence classes of sets.

As you've rightly noted the latter equivalence class gets us nothing so throwing away the ordering is a bit of a waste. Of all mathematical concepts 'size' is easily the most subjective so picking one that is interesting is better than trying to be 'correct'.

In particular a*b* is exactly equivalent to ω^2, since a^n b^m < a^x b^y iff n < x or n=x and m<y. This gives an order preserving isomorphism between words of the form a^n b^m and tuples (n,m) with lexicographic ordering.


interesting!

what would [ab]* be? for computing an ordinal number the only real difficulty is how to handle kleene star: given ord(X) how do we calculate ord(X*)?

but as you probably noticed i'm a bit out of my depth when dealing with ordinals.


Unless I'm very mistaken adding onto the end of a string doesn't affect lexicographic order, so that's effectively [ab]*. The ordinality of [ab] is simple, it's 2, but the kleene star is a bit of an odd one, it's not quite exponentiation.

To reason about the kleene star it's a bit simpler to consider something like R^*n, where you repeat up to n times. Obviously R^*0 = 1 and R^*S(n) can be built from R^*n by picking an element of R^*n and appending either nothing or an element of R, here the element of R^*n determines most of the order and 'nothing' orders in front. For technical reasons the corresponding ordinal is (1+R) R^*n, which is backwards from how you'd expect it and how you'd normally define exponentiation.

The kleene star can be recovered by taking the limit, identifying R^*n with it's image in R^*S(n). Which also doesn't quite work as nicely as you'd hope (normally the image is just a downward closed subset, it's not in this case).

I think [ab]* is equivalent to something like the rational part of the Cantor set. Not sure if there's a simpler way to describe it, it's nowhere near as simple as 2^ω, which is just ω.


Hmm, turns out this fails to be an ordinal because regular languages aren't well-ordered (except if you reverse the lexicographic order, maybe?). They are what is called an order type, and it looks like it should be possible to identify them with subsets of the rationals, if you wanted to.

Perhaps reversing the lexicographic order makes more sense, in that case longer tuples simply order last so R^* = 1 + R + R^2 + ..., the limit here is much easier since R^*n = 1 + R + ... + R^n is downwards closed as a subset of R^*S(n).

Then again in that scenario [ab]* is simply ω because it is effectively the same as just writing an integer in binary, so it is less interesting in a way.


I would say [ab]* has order type omega. That's because the set of strings it represents is: {epsilon, ab, abab, ababab, abababab, ...} which looks a lot like omega. (epsilon = empty string)

What this exercise really is is finding a canonical way to order a regular language (the set of strings a regexp matches). For example, a*b* could be {epsilon, a, aa, aaa, aaa, aaaa, ..., b, bb, bbb, bbbb, bbbbb, bbbbbb, ..., ab, aab, aaab, aaaab, ..., abb, aabb, ..., ...} which looks a lot like omega ^ 2 (not 2 * omega like I said before). However, you could also re-arrange the set to look like omega: {epsilon, a, b, aa, bb, ab, aaa, bbb, aab, abb, bbb, ...} (strings of length 1, length 2, length 3, etc)

I propose the following: for any two strings in the regular language, the one that comes first is the one whose kleene-star repetition counts come first lexicographically. More concretely, for the language a*b*, aaaab represents a repetition count of (4, 1) which and bbb represents (0, 3). (0, 3) comes before (4, 1) lexicographically, so bbb comes before aaaab. This admits the following ordering: {epsilon, b, bb, bbb, bbbb, ..., a, abb, abbb, abbbb, ..., aa, aab, aabb, aabbb, ..., ...} which is omega ^ 2 which "feels" right to me. Another rule is for the regular language (X|Y) and two strings x from X and y from Y, x should always come before y in our ordered set representation.

Hold on, what about nested kleene-stars? (a*)* is just a*, but (a*b*)* is distinct from a*b*. However, the "kleene-star counts" analysis from above breaks down because there are now multiple ways to parse strings like aab. I don't really know how to classify these regular languages as ordinals yet.

I don't really see any useful applications of this, but it's still fun to think about. The game I'm playing is thinking about a random ordinal and trying to come up with a regular language that, under my ordering rule above, looks like that ordinal. Let's try 2 * omega (which looks like this: {0, 1, 2, 3, 4, 5, ..., omega, omega + 1, omega + 2, omega + 3, ...} e.g. 2 copies of omega "concatenated"):

a*|b* = {epsilon, a, aa, aaa, aaaa, aaaaa, ..., b, bb, bbb, bbbb, ...} => 2 * omega.

Some more examples:

omega ^ 3: a*b*c*

omega ^ 2 + omega: a*b*|c*

Maybe we can write down some composition rules:

let X and Y be regular languages and ord(X) and ord(Y) be their ordinal representations. Then,

X|Y => ord(X) + ord(Y)

XY => ord(X) * ord(Y)

X* => ord(X) * omega

I haven't checked if these actually work, this is just a long rambly comment of dubious mathematical value.


[ab] is a character class, not a parenthesized catenation. It's (a|b), not (ab). So [ab]* is {epsilon, a, b, aa, ab, ba, bb, ...}


the page actually does give these. for α := [a-z]{2,4} the page gives |α| = 475228.

however, as others have pointed out any non-trivial use of the kleene star means the result will be ∞. in this case the page will list numbers that roughly correspond to "number of strings with N applications of kleene star" in addition to infinity.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: