(I'm a co-founder at Overleaf.com, which does collaborative 'LaTeX in the browser' in a different sense.)
I like the idea of a 'sane' subset of LaTeX that is easy to publish to the web. There are tools like LaTeXML and TeX4ht that try to convert general LaTeX documents to (X)HTML, but it's a very hard problem.
Some difficulties arise from the fact that TeX is just very hard to parse in general. Even the first stage of parsing TeX is Turing complete [1]. This makes it hard to write tooling e.g. for linting (though tools exist, e.g. chktex) or creating a WYSIWYG editor backed by LaTeX [2]. (edit: or creating a good LaTeX auto-complete [4])
Others arise from TeX's extensibility --- there are many thousands of packages that define their own commands and environments for different types of documents and different disciplines. This extensibility is on the one hand one of the main reasons that TeX and LaTeX are still actively used some 40 years after TeX's initial release, but on the other hand a major challenge for conversion to HTML. The LaTeXML project has many custom bindings [3] for these packages, but it's far from complete.
I guess the main question is whether we can find the right subset, and this project looks like a great start.
I really like the idea of adding math support to Unicode via combining characters. It's more complicated than anything Unicode currently deals with, but not that much more complicated, and the idea of being able to put math into anything that currently accepts strings is just so enticing. We should treat math as it's own language, and rendering it as we would any other human language with an unusual way of laying out characters.
It's an interesting idea. At what point, though, do we draw the line between what a character set (like Unicode) should handle, and what should be handled by a higher-level layer? I'm thinking that things like boldness, italicisation, and super script aren't really the job for a character set.
Unicode only defines the codepoints for characters, it doesn't require anyone to actually make them look good. Since those characters were specifically only included to represent mathematical texts where formatting needs to be preserved, it's unlikely anyone is spending much effort on making them look good as text.
The bold and italic characters actually belong to the Mathematical Alphanumeric Symbols [0] block, so they're strictly meant for math notation rather than general formatting. The superscripts are part of the Spacing Modifier Letters [1] block, which is used for IPA. You'll also sometimes find other formatting quirks that are deprecated in Unicode and meant for compatibility purposes.
I'd say if the formatting changes the meaning of the language, Unicode should support it. So if you are searching through text, any change to your query string that you would like to constrain the text that matches should be supported by Unicode. Unicode should at least support anything that affects the semantic equality of strings.
Hmmm, certainly Unicode ought to be able to represent mathematics as a script like any other. However, the complexity involved is non-trivial. To make things easier, whatever Unicode might do for math should have a mapping to and from TeX or MathAJAX. In any case, Unicode is rather complex as it is; I'm not sure I look forward to this extra level of complexity :(
> I like the idea of a 'sane' subset of LaTeX that is easy to publish to the web.
That's probably the only approach that really makes sense:
During the past decade I was surprised to learn that the writing of programs for TeX and Metafont proved to be much more difficult than all the other things I had done (like proving theorems or writing books). The creation of good software demand a significiantly higher standard of accuracy than those other things do, and it requires a longer attention span than other intellectual tasks.
Just a bit of historical correction. The article/post says:
"Ten years later, in 1978, his work bore fruit"
This gets things pretty wrong. He got the idea in 1977, and his estimate of "this will take 6 months" was pretty close, in that the initial version was finished sometime in 1978. It then took about another ten years to be "actually done". (Rewrite, add features, fix bugs, create Metafont, create WEB, etc...)
Classic TeX would be damn near useless in the age of Unicode, so you're looking at something like XeLaTeX or LUATex. The problem is that it's really easy to implement a really basic form of TeX, but unless you already planned for the really hard cases, maintaining your implementation is going to become intractible. TeX's real text typesetting is almost always woefully ignored even though _everything_ has to type beautifully, not just makes, and in modern version of TeX, that has to happen without insane syntax just to get a Unicode character we can already "just write" rather than needing all kinds of dedicated macros just for diacritics, it something as simple as mixing two writing scripts that necessitate two different fonts entirely.
As soon you want to mention names of people, English text often requires Unicode characters. Looking up some examples, the first random paper I took from arxiv mentioned three surnames that needed Unicode, the second needed four, including the name of one of authors herself.
FWIW my own name would need Unicode too (Κώστας Μιχαλόπουλος) but i always use its romanized form (Kostas Michalopoulos) in English. I think that is common when writing English text and names from languages that do not use the latin (or derived) alphabet.
An answer here would be way too long, but the short answer is "no". The technologies that were available at the time of TeX meant that TeX had to do all kinds of things that in today's world are bizarre.
TeX has seen a lot of improvements over the last 30 years, and modern TeX engines such as XeTeX and LuaTex have removed a lot of the insane painpoints that came with traditional TeX, which worked well only because there was literally nothing better at the time.
A modern TeX engine will let you just write what you want to write, using all of Unicode as your playground, using modern OpenType fonts, and with real vector graphics. None of those things can be done with original TeX, not just "it's hard to", it's literally impossible without rewriting it from the ground up. Which is why we HAVE modern TeX engines: just because it worked, doesn't mean it was good. It was merely the best available at the time.
There is no "you". If the idea is to make a thing for the web, the audience is everyone there, not that one guy who insist they will only ever use English.
I wonder whether this is the right approach. TeX itself is one of the most heavily documented programs in existence. Not only are its workings documented in detail in The TeXbook (and a host of other books by other authors, such as Eijkhout's TeX by Topic) but even the program itself has been written in a “literate programming” style, with pretty formatted source code (with profuse comments) available in print (Vol B of Computers and Typesetting) and as a PDF (http://texdoc.net/texmf-dist/doc/generic/knuth/tex/tex.pdf), there's a detailed history/retrospective and log of every change that went into the program (see Chapters 10 and 11 of the book Literate Programming, though the log without explanation is also available online http://texdoc.net/texmf-dist/doc/generic/knuth/errata/errorl...), and there are even 12 hours of video of Knuth talking about the internals of the program (https://www.youtube.com/watch?v=bbqY1mTwrj8&index=12&list=PL...).
So when the article says:
> To reproduce all of LaTeX in the browser is too much to ask
I wonder why? The file tex.web is less than 25000 lines long, much of it comments, so I'd estimate that TeX itself is only about 20000 sloc (in fact tangle on tex.web generates a Pascal file tex.p which is only 6115 lines long). This is not a lot IMO, and it would be a lot better to actually re-implement this, with additional support for things like getting the parse tree etc.
I was wondering recently if/how it would be possible to piggybag latex’ georgous typesetting (place the letters) to bring justified-text to websites.
I want to do a PoC for absolut positioning all letters of a basic document placed by tex for my screensize.
Are there any other solutions to document typesetting with latex-like features? TeX is very obtuse for someone who hasn't been using it for a long time.
A common solution is to use LaTeX, but to use it indirectly: write in Markdown and convert to PDF using Pandoc [1], which uses LaTeX in the background. This is (part of) the process used in RMarkdown [2], for example. That way, you get all the benefits of TeX and LaTeX but without most of the pain.
I've seen some people do org-mode -> TeX -> research paper. It's very impressive. I just wish there was something like that with a more GUI/polished feel.
I've been using org-mode and exporting it to HTML. Then making an @media(print) style sheet and exporting the HTML/printCSS to PDF through princeXML.
It's been amazing. Latex equations are exported as pngs (for PDF export because I don't think prince does Mathjax, but org mode can export to mathjax). I have my bibliography with bibtex2html. And templating my pdfs becomes so much easier than with latex. It's just HTML CSS !! My figures are numbered and captioned and referenced throughout the text, same for tables. And my table of content is generated. And code is highlighted. And I have access to ditaa for ascii flow charts and a bunch of other stuff (for making uml in ascii with png export for the PDF for example). It also handles excel like tables with formulae (possible to have lisp formulae !! So cool !!) in text mode !!. And of course, you can plot your table through gnuplot inside your org file. You tell it which columns and rows, the type of graph etc :)
It's also easy to include other org files, or to go down to raw HTML for the export (rather than org mode->HTML) if need be (for a picture than spans over 2 pages for instance).
Give it a try, you might like it ;) In the end it's just an org mode export to HTML to PDF with the print CSS media query. But it works remarkably well and you have all the org mode features.
Any particular reason why you don't use org-mode's latex export (org-latex-export-to-pdf / C-c C-e l p) directly? It will render math nicely, not as embedded images, etc.
It's really because of theming. I was trying to theme my latex document, but it don't know tex well. I do know CSS well though. So theming my header, my margins, my bloquotes, my images etc is very easy in CSS. I have no idea how to achieve this easily with tex.
For lightweight stuff, there's vanilla Markdown, but you have no control over formatting. For more serious work using markdown, you can try out Ulysses[0] or Scribus[1].
And, if you feel like spending an obscene amount of money, on the order of $10k, there's Arbortext APP[2]. (I don't know why this even exists?)
There was Lout[1], but it seems to be abandonded. I really liked it, especially the simpified syntax (compared to latex). It was also unicode-safe by design.
UNIX has been doing that for the past 40 years until AT&T ripped troff out of standard UNIX installations.
Look into groff and possibly heirloom doctools. It's fairly difficult to learn and the default macro packages on most installations may be somewhat difficult to come to terms with/adjust for your own needs. You're definitely expected to learn basic troff macros to hack up a macro package if needed. See also: http://www.schaffter.ca/mom/ and https://utroff.org/
You might want to check out LyX. It is a GUI editor that generates beautiful TeX documents but it is designed to be an user-friendly document processor instead of just a TeX GUI.
It's of course never going to be as good looking as MathJax or something like that -- but it may be more appropriate to be able to treat it as plain Unicode text in some cases.
For instance, it works in title fields across the web and search engines will understand it better than anything else.
There is not really a need to modify LaTeX at all to make it run in the browser. It already exists. Without modifying a single line of code, we have implemented a full browser-based port of LaTeX as part of our Browsix project, which makes it possible to run full, unmodified Unix applications inside the browser. See http://browsertex.org and http://browsix.org (and http://bpowers.net and https://jvilk.com/ and http://plasma.cs.umass.edu).
I actually did 'LaTeX in the browser' as a master thesis in 2014, but never went to continue developing it afterwards, be it as open-source project or with a commercial intent in mind. Although I though, at that time, I was at least up to the few solutions that were out there and solved the task of instant updates and real-time collaborative work on a document pretty gracefully.
Some neat improvements would have been version and so on, but you know, never made it that far after picking up a job. Kind of a shame...
I read the post but I still don't understand: is it possible to define new commands using \def or \newcommand? At first I thought these are what the other meant by "macro", but later he says
> We are exploring ways for users to define non-default environment behaviors in the browser. The same goes for macros used outside the dollar and double-dollar fences.
But I can't use \def or \newcommend to define things that appear inside dollar signs either.
Oh I see, thanks. For what it's worth, I would definitely include this example in the demo; it's basically the first thing I wanted to use. Given your pipeline, it makes sense that the \newcommand definitions themselves has to appear inside dollar signs (not just when they are used), but for people with a TeX background it's pretty unintuitive.
Also, you should definitely use \lange and \rangle in place of < and > for bra-ket notation :)
I love the output of LaTeX but the language itself (and it’s dependencies and packages) are an absolute horror show.
I’ve never understood how people can learn be it so, writing it is painful, it’s tooling is abysmal, and it rarely seems to work except on the person who wrote its machine.
> it rarely seems to work except on the person who wrote its machine.
I and many others edit the same documents at the university where I work, without significant issues. Distributions like TeXLive (https://www.tug.org/texlive/) provide a consistent all-inclusive cross-platform solution.
TeXStudio would be a perfect example of its abysmal tooling. It’s better than the CLI tools but it’s an awful editor and highlights how incompatible with a good writing experience LaTeX is.
Yes, many people produce good work in it - it’s output is fantastic after all - but an editor that would have been a substandard user experience in the 90s is the best LaTeX has in tooling.
Can you try to phrase that more precisely/constructively by including a reason why it is "awful" or give an example of "good tooling"?
As far as editing goes, latexmk, syntax highlighting and good shortcuts are all I ever use and am perfectly happy with (emacs+auctex). It is a different paradigm than WYSIWYG, but different does not say anything about good or bad.
Now writing new latex classes, I agree. That is very unintuitive and would greatly benefit from simplification, templates and tools.
I could go into a long, detailed, breakdown of how bad TeXStudio is but, frankly, if they want UI/UX work they should pay for it. Which they clearly don’t.
It’s... decent enough in the pack of “open source UI” but that isn’t a high bar.
Here’s the thing about that (oft repeated) line about WYSIWYG vs WYSIWYW: it’s bullshit.
There’s no justification for it other than the deficiencies of the tooling and tool chain. It’s an excuse.
Also, it should be considered that it's impossible to make breaking changes in the LaTeX language otherwise you lost the ability to compile a paper from 30 years ago.
But if you're trying to do something simple, I would say go for pandoc and use whatever format you're comfortable, then convert it to TeX: https://pandoc.org/
Because it looks much better. If you want proper kerning, ligatures, spacing, layouts etc. just reusing all the work that has been put into TeX is much better than trying to catch up.
I like the idea of a 'sane' subset of LaTeX that is easy to publish to the web. There are tools like LaTeXML and TeX4ht that try to convert general LaTeX documents to (X)HTML, but it's a very hard problem.
Some difficulties arise from the fact that TeX is just very hard to parse in general. Even the first stage of parsing TeX is Turing complete [1]. This makes it hard to write tooling e.g. for linting (though tools exist, e.g. chktex) or creating a WYSIWYG editor backed by LaTeX [2]. (edit: or creating a good LaTeX auto-complete [4])
Others arise from TeX's extensibility --- there are many thousands of packages that define their own commands and environments for different types of documents and different disciplines. This extensibility is on the one hand one of the main reasons that TeX and LaTeX are still actively used some 40 years after TeX's initial release, but on the other hand a major challenge for conversion to HTML. The LaTeXML project has many custom bindings [3] for these packages, but it's far from complete.
I guess the main question is whether we can find the right subset, and this project looks like a great start.
[1] https://tex.stackexchange.com/questions/4201/is-there-a-bnf-...
[2] https://www.overleaf.com/blog/81 --- my first attempt at rich text on Overleaf, many years ago
[3] http://dlmf.nist.gov/LaTeXML/manual/customization/customizat...
[4] https://www.overleaf.com/blog/523-a-data-driven-approach-to-...