Rosio Pavoris a blog

Towards a better BBCode

Everyone knows BBCode is a pain to work with, and while WordPress supports limited HTML in user comments, it should be obvious HTML is no better. The unnecessary repetition of SGML-based languages and the insistence on the proper nesting of tags makes them all hideous and unnecessarily error-prone. We can do better.
The discussions of learned societies on the subject have been less than satisfactory, so I decided to just implement my own mark-up language, based on the venerable S-expression:

{b This} is {i {u expert} {o mark-up}}.

This will turn into:

This is expert mark-up.

The immediate effect is that nesting problems and text redundancy immediately disappear. The syntax also lends itself to easy function composition:

{b.i.o.u EXPERT}

EXPERT

Finally, for this first version,0 we also support function iteration:

{sup*3 To the moon}{sub*3 and back.}

To the moonand back.

It goes without saying this can be combined with function composition in arbitrarily complex expressions, with the iteration operator having a higher precedence than the function composition operator.

I’ve elected to use curly braces rather than the more typical parentheses, because curly braces barely see any use in natural language, which is where this mark-up would generally be used. If you do need literal curly braces, you can escape them with a backslash (and if you need a literal \{, you can escape your backslash with a backslash).

As a proof of concept, and because I eat my own dog food, I’ve written (and enabled) a WordPress plugin that enables this SexpCode in blog comments. For sanity, iteration doesn’t go beyond *3. Supported tags are b, i, u, s, o, sub, sup, code, spoiler, quote, blockquote, and m. If you want to use it yourself, adding more tags or changing their definitions should be straightforward.
Trying to use an unsupported or empty tag, or having unbalanced braces (except for closing braces at the end), will assume you’re actually trying to post C-like code, and disable SexpCode for your comment.

Ladies and gentlemen, BBCode was our COBOL. This is our Lisp.

Edit: People who want to implement this themselves should be following this document rather than this post.

Edit again: Play with it!

Edit again again: More implementations:

Know of another implementation (SexpCode+ or SexpCode)? Let me know!


0 Future versions of the language are expected to add support for function arguments (for things like url, img, and colour) and the ability to define aliases (for example, {define exp b.i.o.u}, which would let you use a new exp function as if it were b.i.o.u).

49 Comments

  1. kitchen said,

    To the moonand back?

  2. Anonymous said,

    I hate you for making yet another idiotic non-intuitive markup syntax. What the hell exactly is wrong with Markdown? People *write like that anyway* when no markup is available.

  3. Cairnarvon said,

    Just look at any website that actually uses Markdown to see exactly what’s wrong with it. For all of BBCode’s faults, at least it doesn’t bleed styles all over innocent text because someone thought it would be a good idea to make very common symbols syntactically significant.
    People may write *like* _this_ when no mark-up is available, but they also write_like_this when they don’t want mark-up at all but are just referring to, for example, filenames or URLs.

    Application of style should be deliberate and unambiguous. Markdown fails on both counts.

  4. agz said,

    I like it!!

    I just learning latex and craved for something exactly like this!

    Hope it will evolve!

  5. Mike Samuel said,

    I like the syntax and this seems a great alternative to HTML for user styled content.

    One of the problems with user-styled content is when it can bleed into the page e.g. via links with javascript: URLs.

    SexpCode makes it easy to tell whether a block is balanced by counting brackets, but it would be nice to have a mechanism to mark out things that are third-party content and so shouldn’t have any ability to manipulate the same-origin.

    If one could put {sandbox …} around balanced SexpCode and be sure that the result did not have any authority over the embedding origin, then it would be a very attractive alternative to HTML to people considering doing error prone sanitization.

    Any link appearing inside a {sandbox …} would have to be an absolute non-http/non-https link that is not to the same origin — so could not possibly load content containing flash which could be tricked into communicating with the embedding page.

  6. UMH Memesmith!gNlkr4vCuc said,

    I LOVE YOU! I LOVE YOUR POST! I READ IT 5 TIMES! KEEP POSTING!

  7. Kadin said,

    Needs strikethrough.

  8. bobmandan said,

    needs blink.

  9. Eli Barzilay said,

    Have a look at the Scribble syntax in PLT Scheme: http://docs.plt-scheme.org/scribble/ — it’s nearly the same as what you have, and it *is* just an alternative syntax for S-expressions, which means that it interacts very nicely with the language. And yes, even as a pure markup — without the language — it is still extremely useful and deals with exotic needs like unquoting, or different quotation delimiters so it is easy to write documentation for the system inside the system. (Write a manual for your language in your language, and this point becomes painfully obvious when you’re lost in a sea of backslashes..)

  10. Haxus the Lesser said,

    I was against this when you first posted it, but I’ve come around to it. If I am still awake in a few hours time, I’ll try implementing it for myself.

  11. Saving the Band » Comments Now Support SexpCode said,

    [...] Crolla has specified a replacement for BBCode (that atrocious square-bracketed markup language for bulletin boards) [...]

  12. Ivan Lazar Miljenovic said,

    With regards to failures of Markdown, this is in part up to the implementation: pandoc (http://johnmacfarlane.net/pandoc/) is smart enough to realise that underscores within text shouldn’t be converted to italics, etc.

  13. Haxus the Lesser said,

    Hey Xarn, what do you think the chances are that we can convince Mr Vacbob to use this?

  14. kerkeslager said,

    A few suggestions while this new language is still young and can be changed:

    1. Brackets ([ and ]) don’t require a shift like braces, and are pretty much as unused in natural language. Could I persuade you to use brackets and give everyone’s pinkies a break?

    2. In CSS, the period is used to indicate classes. Something similar to your function composition is done with commas. So it makes more sense to use commas where you are using periods, and save periods for use with classes.

  15. Amos Robinson said,

    where’s the defmacro?

  16. Cairnarvon said,

    Eli Barzilay said,
    Write a manual for your language in your language, and this point becomes painfully obvious when you’re lost in a sea of backslashes..

    We were considering a syntax for something similar to (some) BBCode’s [#][/#] earlier. Bun suggested {. text .}, where the period can be any non-alphanumeric character or string, but I don’t think we reached a consensus yet.
    I’ll take a look at Scribble later. I know a lot of people have tried S-expression-based mark-up languages before, but in good tradition, I haven’t looked at any of them.

    kerkeslager said,
    1. Brackets ([ and ]) don’t require a shift like braces, and are pretty much as unused in natural language. Could I persuade you to use brackets and give everyone’s pinkies a break?

    On my keyboard neither requires shift, but both require Alt Gr. Maybe it would be alright to allow both. Some other guy on Czechia had a variant which allowed {<[(, but that’s probably overkill.

    2. In CSS, the period is used to indicate classes. Something similar to your function composition is done with commas. So it makes more sense to use commas where you are using periods, and save periods for use with classes.

    If my keyboard had it I’d use ∘ instead. A period is as far as I’m willing to compromise.

    Amos Robinson said,
    where’s the defmacro?

    Coming up!

    As a general warning to anyone holding their breath, I should probably point out that this started as a way to troll #bbcode and associates, and so far isn’t intended as a more general language than is needed for our art. Since we’re nothing if not unemployed driven by a need to better the world, though, it will probably get seriously out of hand before it dies.

  17. richardus!Ep8pui8Vw2 said,

    I have a job interview in the morning Xarn.

  18. Peter Woo said,

    “People may write *like* _this_ when no mark-up is available, but they also write_like_this when they don’t want mark-up at all but are just referring to, for example, filenames or URLs.”

    It’s also very easy to confuse *italics* and **bold**. And hyperlinks… the anchor goes in brackets, and the url in parentheses like [google](http://google.com). First of all it’s difficult to remember this (because it’s arbitrary), and second of all 50% of the time people link to wikipedia that link is broken because wikipedia uses parentheses in many of their urls. The syntax for writing a list also gets me sometimes. And the italics syntax is overloaded with both asterices and UNDERSCORES, of all things?!

    Personally, I think the following would be perfectly satisfactory: **bold**, __underlined__, //italicized//, “teletype“, [link]@[url], 2^[superscript], a_[subscript]. Fairly unambiguous, except for perhaps the teletype, and by using two characters for each simple wrap we avoid most (if not all) would-be collisions with the underlying text.

  19. kerkeslager said,

    > On my keyboard neither requires shift, but both require Alt Gr. Maybe it would be alright to allow both. Some other guy on Czechia had a variant which allowed but that’s probably overkill.

    Ah, I'll take this as an incentive to increase my knowledge of other keyboard formats. It might be interesting to see an analysis of what percentage of users use what keyboard layout.

  20. Witek Baryluk said,

    Please look into ehtml used by YAWS web server, to express HTML in the Erlang directly. It is quite similar. Here is example http://yaws.hyber.org/code.yaws?file=/dynamic.yaws

  21. Alex said,

    Reminds me a little bit of RTF, which is in turn based on LaTeX. I like it.

  22. uxbot said,

    I wrote a similar s-exp thing in 4k of plt scheme … it provides all the usual inline elements plus a few block elements (ol ul dl), and with lazy evaluation can guarantee valid xhtml by ensuring that block elements are not placed within inline elements. error: (em (ol lulz ??? profit))

  23. Anonymous said,

    To the moonand back.

  24. Cairnarvon said,

    No plugin can withstand WordPress’ overzealous and brain-damaged HTML mangling, unfortunately. Apparently <sup><sup><sup>this</sup></sup></sup> is considered invalid XHTML.

    Edit: Who says source code is a poor substitute for documentation? Issue fixed.

  25. judah said,

    hmm what would be better is just having a rich text editor. consider Text Layout Framework. this example is overkill but you could make a lightweight version of this for your users.

  26. judah said,

    link, http://labs.adobe.com/technologies/textlayout/demos/
    again… overkill…

  27. Anonymous said,

    {sup*12/}

  28. bunnyhero said,

    oooh, shiny! very geeky but kind of neat.

    i hate markup that converts *paired asterisks* to bold text (this includes markdown, as well as gmail’s google talk client), because a lot of people i know like to use asterisks to indicate action, like *poke* or *hug*. bolding those doesn’t make sense (to me, anyway).

  29. Anonymous said,

    Suggestion: {code {f^n x}}, rather than {code {f*n x}}, to better match standard mathematical notation.

  30. Anonymous said,

    Suggestion: x}, rather than x}, to better match standard mathematical notation.

  31. Anonymous said,

    Suggestion: {code { f^n x }}, rather than {code { f*n x }}, to better match standard mathematical notation.

  32. Anonymous said,

    Oh well; that’ll do.

  33. Cairnarvon said,

    Good thing I get the unprocessed comments e-mailed to me.
    code takes one argument and (unlike ABBC’s [code]) does not imply verbatim. You probably want to use {{code SexpCode}.verbatim text} or {m.verbatim text}.

    Suggestion: {f^n x}, rather than {f*n x}, to better match standard mathematical notation.

  34. Cairnarvon said,

    And to actually answer your question: that probably makes sense as an alternative syntax, and is easy to implement. I’ll add it.

  35. Anonymous said,

    Some other guy on Czechia had a variant which allowed {<[(

    Wait, was my post just acknowledged by Xarn?

    And as for any {defmacro …}s, I don't think that it's really needed. There are only so many additional tags you might want to use (expert, map, foldr/l), and those could be included in the implementation from the start.

  36. Anonymous said,

    Wait, so code tags are inline, while

    m
    tags force a newline? I think that’s counterintuitive.

  37. Cairnarvon said,

    <pre> is set to be a block-level tag here; nothing to do with SexpCode. I did that a while ago, I don’t remember why. I should change that.

    SexpCode+ has define now, so the defmacro discussion is probably moot for the time being.

    Edit: Actually I don’t know anything about HTML or CSS. Anyway, fixed.

  38. Anonymous said,

    Hi, I proposed a revision to your SexpCode in my URL. What do you think about it and why?

  39. Cairnarvon said,

    It’s a stupid idea; for the reasons given in the thread, and because it doesn’t actually improve anything.

    SexpCode uses the period because that’s been used for function composition in mathematics for years; it’s not just an arbitrary delimiter, and trying to make it one just shows you haven’t read your SICP.

  40. SHiNKiROU Blog » BBCode Problems (and Solutions) said,

    [...] SExpCode: S-expression-based markup language. [...]

  41. SHiNKiROU!wSaCDPDEl2 said,

    (I’m the one who posted about BBCode problems)

    To make function arguments work, TeX-based syntax should be used.

    \b{text text text}
    \b.i{bold and italic}
    \url(http://www.google.com){Google}
    \url(http://google.com/). (. is self-closing)

    and add heredoc syntax and Perl qw// syntax so you can choose your own delimiters.

  42. Cairnarvon said,

    Function arguments already work quite nicely (if {url http://example.com/ Text} feels too ambiguous, you can always use {{url http://example.com/} Text}), and it already has a heredoc syntax, with Bun’s alternative verbatim syntax. The standard is much more featureful than the original post suggests.

    As for alternative delimiters, meh. There’s no clean way to do it, and taking cues from Perl has never done anyone any favours in the past.

  43. Sven said,

    This markup may be great from a programmers point of view, but from a users point of view it’s terrible if no proper explanation is given. For a programmer it’s logical to indicate the start and end of a block with curly braces, however a user barely uses those and won’t understand how it works. They won’t see the logic of an opening and a closing brace.

  44. Cairnarvon said,

    That’s true for every single mark-up language out there. Yes, they all confuse at least some users, but absolutely nothing is gained by only catering to the lowest common denominator.

  45. Anonymous said,

    Your BBCode substitute concept looks remarkably like Curl code. You might want to check that out:
    http://en.wikipedia.org/wiki/Curl_%28programming_language%29#Curl_as_lightweight_markup
    http://www.curl.com/

  46. Anonymous said,

    This is nice and all but isn’t this just another way of saying “I haven’t read my SICP”?

  47. richardus!Ep8pui8Vw2 said,

    http://sprunge.us/DjII

  48. Serg. said,

    Nice idea to have an alternative mark-up language ;)

    I think it can be useful, but I don’t think a lot of people will actually use it

  49. Andrew McGlinchey Wesleyan said,

    I simply couldn’t leave your web site prior to suggesting that I actually loved the standard information an individual provide for your visitors?
    Is gonna be back often in order to check up on new posts

Post a Comment

RSS feed for comments on this post · TrackBack URL