Towards a better BBCode
Everyone knows BBCode is a pain to work with, and while WordPress supports limited HTML in user comments, it should be obvious HTML is no better. The unnecessary repetition of SGML-based languages and the insistence on the proper nesting of tags makes them all hideous and unnecessarily error-prone. We can do better.
The discussions of learned societies on the subject have been less than satisfactory, so I decided to just implement my own mark-up language, based on the venerable S-expression:
{b This} is {i {u expert} {o mark-up}}.
This will turn into:
This is expert mark-up.
The immediate effect is that nesting problems and text redundancy immediately disappear. The syntax also lends itself to easy function composition:
{b.i.o.u EXPERT}
EXPERT
Finally, for this first version,0 we also support function iteration:
{sup*3 To the moon}{sub*3 and back.}
To the moonand back.
It goes without saying this can be combined with function composition in arbitrarily complex expressions, with the iteration operator having a higher precedence than the function composition operator.
I’ve elected to use curly braces rather than the more typical parentheses, because curly braces barely see any use in natural language, which is where this mark-up would generally be used. If you do need literal curly braces, you can escape them with a backslash (and if you need a literal \{, you can escape your backslash with a backslash).
As a proof of concept, and because I eat my own dog food, I’ve written (and enabled) a WordPress plugin that enables this SexpCode in blog comments. For sanity, iteration doesn’t go beyond *3. Supported tags are b, i, u, s, o, sub, sup, code, spoiler, quote, blockquote, and m. If you want to use it yourself, adding more tags or changing their definitions should be straightforward.
Trying to use an unsupported or empty tag, or having unbalanced braces (except for closing braces at the end), will assume you’re actually trying to post C-like code, and disable SexpCode for your comment.
Ladies and gentlemen, BBCode was our COBOL. This is our Lisp.
Edit: People who want to implement this themselves should be following this document rather than this post.
Edit again: Play with it!
Edit again again: More implementations:
- sexpcode, a CLI utility and a C library
- sexpcode.py, Python bindings for that C library
Know of another implementation (SexpCode+ or SexpCode−)? Let me know!
0 Future versions of the language are expected to add support for function arguments (for things like url, img, and colour) and the ability to define aliases (for example, {define exp b.i.o.u}, which would let you use a new exp function as if it were b.i.o.u).
kitchen said,
May 25th, 2010 at 6:47 pm
To the moonand back?
Anonymous said,
May 25th, 2010 at 8:51 pm
I hate you for making yet another idiotic non-intuitive markup syntax. What the hell exactly is wrong with Markdown? People *write like that anyway* when no markup is available.
Cairnarvon said,
May 25th, 2010 at 8:59 pm
Just look at any website that actually uses Markdown to see exactly what’s wrong with it. For all of BBCode’s faults, at least it doesn’t bleed styles all over innocent text because someone thought it would be a good idea to make very common symbols syntactically significant.
People may write *like* _this_ when no mark-up is available, but they also write_like_this when they don’t want mark-up at all but are just referring to, for example, filenames or URLs.
Application of style should be deliberate and unambiguous. Markdown fails on both counts.
agz said,
May 27th, 2010 at 1:10 am
I like it!!
I just learning latex and craved for something exactly like this!
Hope it will evolve!
Mike Samuel said,
May 27th, 2010 at 1:13 am
I like the syntax and this seems a great alternative to HTML for user styled content.
One of the problems with user-styled content is when it can bleed into the page e.g. via links with javascript: URLs.
SexpCode makes it easy to tell whether a block is balanced by counting brackets, but it would be nice to have a mechanism to mark out things that are third-party content and so shouldn’t have any ability to manipulate the same-origin.
If one could put {sandbox …} around balanced SexpCode and be sure that the result did not have any authority over the embedding origin, then it would be a very attractive alternative to HTML to people considering doing error prone sanitization.
Any link appearing inside a {sandbox …} would have to be an absolute non-http/non-https link that is not to the same origin — so could not possibly load content containing flash which could be tricked into communicating with the embedding page.
UMH Memesmith!gNlkr4vCuc said,
May 27th, 2010 at 1:59 am
I LOVE YOU! I LOVE YOUR POST! I READ IT 5 TIMES! KEEP POSTING!
Kadin said,
May 27th, 2010 at 2:16 am
Needs strikethrough.
bobmandan said,
May 27th, 2010 at 2:56 am
needs blink.
Eli Barzilay said,
May 27th, 2010 at 2:58 am
Have a look at the Scribble syntax in PLT Scheme: http://docs.plt-scheme.org/scribble/ — it’s nearly the same as what you have, and it *is* just an alternative syntax for S-expressions, which means that it interacts very nicely with the language. And yes, even as a pure markup — without the language — it is still extremely useful and deals with exotic needs like unquoting, or different quotation delimiters so it is easy to write documentation for the system inside the system. (Write a manual for your language in your language, and this point becomes painfully obvious when you’re lost in a sea of backslashes..)
Haxus the Lesser said,
May 27th, 2010 at 3:04 am
I was against this when you first posted it, but I’ve come around to it. If I am still awake in a few hours time, I’ll try implementing it for myself.
Saving the Band » Comments Now Support SexpCode said,
May 27th, 2010 at 3:18 am
[...] Crolla has specified a replacement for BBCode (that atrocious square-bracketed markup language for bulletin boards) [...]
Ivan Lazar Miljenovic said,
May 27th, 2010 at 3:20 am
With regards to failures of Markdown, this is in part up to the implementation: pandoc (http://johnmacfarlane.net/pandoc/) is smart enough to realise that underscores within text shouldn’t be converted to italics, etc.
Haxus the Lesser said,
May 27th, 2010 at 3:35 am
Hey Xarn, what do you think the chances are that we can convince Mr Vacbob to use this?
kerkeslager said,
May 27th, 2010 at 4:10 am
A few suggestions while this new language is still young and can be changed:
1. Brackets ([ and ]) don’t require a shift like braces, and are pretty much as unused in natural language. Could I persuade you to use brackets and give everyone’s pinkies a break?
2. In CSS, the period is used to indicate classes. Something similar to your function composition is done with commas. So it makes more sense to use commas where you are using periods, and save periods for use with classes.
Amos Robinson said,
May 27th, 2010 at 5:47 am
where’s the defmacro?
Cairnarvon said,
May 27th, 2010 at 6:46 am
We were considering a syntax for something similar to (some) BBCode’s [#][/#] earlier. Bun suggested {. text .}, where the period can be any non-alphanumeric character or string, but I don’t think we reached a consensus yet.
I’ll take a look at Scribble later. I know a lot of people have tried S-expression-based mark-up languages before, but in good tradition, I haven’t looked at any of them.
On my keyboard neither requires shift, but both require Alt Gr. Maybe it would be alright to allow both. Some other guy on Czechia had a variant which allowed {<[(, but that’s probably overkill.
If my keyboard had it I’d use ∘ instead. A period is as far as I’m willing to compromise.
Coming up!
As a general warning to anyone holding their breath, I should probably point out that this started as a way to troll #bbcode and associates, and so far isn’t intended as a more general language than is needed for our art. Since we’re nothing if not
unemployeddriven by a need to better the world, though, it will probably get seriously out of hand before it dies.richardus!Ep8pui8Vw2 said,
May 27th, 2010 at 7:14 am
I have a job interview in the morning Xarn.
Peter Woo said,
May 27th, 2010 at 9:30 am
“People may write *like* _this_ when no mark-up is available, but they also write_like_this when they don’t want mark-up at all but are just referring to, for example, filenames or URLs.”
It’s also very easy to confuse *italics* and **bold**. And hyperlinks… the anchor goes in brackets, and the url in parentheses like [google](http://google.com). First of all it’s difficult to remember this (because it’s arbitrary), and second of all 50% of the time people link to wikipedia that link is broken because wikipedia uses parentheses in many of their urls. The syntax for writing a list also gets me sometimes. And the italics syntax is overloaded with both asterices and UNDERSCORES, of all things?!
Personally, I think the following would be perfectly satisfactory: **bold**, __underlined__, //italicized//, “teletype“, [link]@[url], 2^[superscript], a_[subscript]. Fairly unambiguous, except for perhaps the teletype, and by using two characters for each simple wrap we avoid most (if not all) would-be collisions with the underlying text.
kerkeslager said,
May 27th, 2010 at 2:06 pm
> On my keyboard neither requires shift, but both require Alt Gr. Maybe it would be alright to allow both. Some other guy on Czechia had a variant which allowed but that’s probably overkill.
Ah, I'll take this as an incentive to increase my knowledge of other keyboard formats. It might be interesting to see an analysis of what percentage of users use what keyboard layout.
Witek Baryluk said,
May 27th, 2010 at 3:45 pm
Please look into ehtml used by YAWS web server, to express HTML in the Erlang directly. It is quite similar. Here is example http://yaws.hyber.org/code.yaws?file=/dynamic.yaws
Alex said,
May 30th, 2010 at 9:35 pm
Reminds me a little bit of RTF, which is in turn based on LaTeX. I like it.
uxbot said,
May 30th, 2010 at 9:58 pm
I wrote a similar s-exp thing in 4k of plt scheme … it provides all the usual inline elements plus a few block elements (ol ul dl), and with lazy evaluation can guarantee valid xhtml by ensuring that block elements are not placed within inline elements. error: (em (ol lulz ??? profit))
Anonymous said,
May 31st, 2010 at 2:10 am
To the moonand back.
Cairnarvon said,
May 31st, 2010 at 2:22 am
No plugin can withstand WordPress’ overzealous and brain-damaged HTML mangling, unfortunately. Apparently <sup><sup><sup>this</sup></sup></sup> is considered invalid XHTML.
Edit: Who says source code is a poor substitute for documentation? Issue fixed.
judah said,
May 31st, 2010 at 10:28 pm
hmm what would be better is just having a rich text editor. consider Text Layout Framework. this example is overkill but you could make a lightweight version of this for your users.
judah said,
May 31st, 2010 at 10:35 pm
link, http://labs.adobe.com/technologies/textlayout/demos/
again… overkill…
Anonymous said,
June 5th, 2010 at 4:30 pm
{sup*12/}
bunnyhero said,
June 5th, 2010 at 6:46 pm
oooh, shiny! very geeky but kind of neat.
i hate markup that converts *paired asterisks* to bold text (this includes markdown, as well as gmail’s google talk client), because a lot of people i know like to use asterisks to indicate action, like *poke* or *hug*. bolding those doesn’t make sense (to me, anyway).
Anonymous said,
June 20th, 2010 at 6:48 pm
Suggestion: {code {f^n x}}, rather than {code {f*n x}}, to better match standard mathematical notation.
Anonymous said,
June 20th, 2010 at 6:49 pm
Suggestion:
x}, rather thanx}, to better match standard mathematical notation.Anonymous said,
June 20th, 2010 at 6:49 pm
Suggestion: {code { f^n x }}, rather than {code { f*n x }}, to better match standard mathematical notation.
Anonymous said,
June 20th, 2010 at 6:49 pm
Oh well; that’ll do.
Cairnarvon said,
June 20th, 2010 at 7:00 pm
Good thing I get the unprocessed comments e-mailed to me.
codetakes one argument and (unlike ABBC’s[code]) does not implyverbatim. You probably want to use{{code SexpCode}.verbatim text}or{m.verbatim text}.Cairnarvon said,
June 20th, 2010 at 7:06 pm
And to actually answer your question: that probably makes sense as an alternative syntax, and is easy to implement. I’ll add it.
Anonymous said,
June 28th, 2010 at 12:59 am
Wait, was my post just acknowledged by Xarn?
And as for any
{defmacro …}s, I don't think that it's really needed. There are only so many additional tags you might want to use (expert, map, foldr/l), and those could be included in the implementation from the start.Anonymous said,
June 28th, 2010 at 1:01 am
Wait, so
codetags are inline, while m tags force a newline? I think that’s counterintuitive.Cairnarvon said,
June 28th, 2010 at 1:03 am
<pre> is set to be a block-level tag here; nothing to do with SexpCode. I did that a while ago, I don’t remember why. I should change that.
SexpCode+ has
definenow, so thedefmacrodiscussion is probably moot for the time being.Edit: Actually I don’t know anything about HTML or CSS. Anyway, fixed.
Anonymous said,
June 28th, 2010 at 7:12 am
Hi, I proposed a revision to your SexpCode in my URL. What do you think about it and why?
Cairnarvon said,
June 28th, 2010 at 9:56 am
It’s a stupid idea; for the reasons given in the thread, and because it doesn’t actually improve anything.
SexpCode uses the period because that’s been used for function composition in mathematics for years; it’s not just an arbitrary delimiter, and trying to make it one just shows you haven’t read your SICP.
SHiNKiROU Blog » BBCode Problems (and Solutions) said,
July 29th, 2010 at 6:06 am
[...] SExpCode: S-expression-based markup language. [...]
SHiNKiROU!wSaCDPDEl2 said,
August 2nd, 2010 at 2:27 am
(I’m the one who posted about BBCode problems)
To make function arguments work, TeX-based syntax should be used.
\b{text text text}
\b.i{bold and italic}
\url(http://www.google.com){Google}
\url(http://google.com/). (. is self-closing)
and add heredoc syntax and Perl qw// syntax so you can choose your own delimiters.
Cairnarvon said,
August 2nd, 2010 at 2:36 am
Function arguments already work quite nicely (if
{url http://example.com/ Text}feels too ambiguous, you can always use{{url http://example.com/} Text}), and it already has a heredoc syntax, with Bun’s alternative verbatim syntax. The standard is much more featureful than the original post suggests.As for alternative delimiters, meh. There’s no clean way to do it, and taking cues from Perl has never done anyone any favours in the past.
Sven said,
November 14th, 2010 at 1:54 pm
This markup may be great from a programmers point of view, but from a users point of view it’s terrible if no proper explanation is given. For a programmer it’s logical to indicate the start and end of a block with curly braces, however a user barely uses those and won’t understand how it works. They won’t see the logic of an opening and a closing brace.
Cairnarvon said,
November 14th, 2010 at 7:59 pm
That’s true for every single mark-up language out there. Yes, they all confuse at least some users, but absolutely nothing is gained by only catering to the lowest common denominator.
Anonymous said,
November 19th, 2010 at 1:31 pm
Your BBCode substitute concept looks remarkably like Curl code. You might want to check that out:
http://en.wikipedia.org/wiki/Curl_%28programming_language%29#Curl_as_lightweight_markup
http://www.curl.com/
Anonymous said,
January 9th, 2011 at 9:58 am
This is nice and all but isn’t this just another way of saying “I haven’t read my SICP”?
richardus!Ep8pui8Vw2 said,
February 25th, 2011 at 4:55 am
http://sprunge.us/DjII
Serg. said,
May 23rd, 2011 at 8:34 pm
Nice idea to have an alternative mark-up language ;)
I think it can be useful, but I don’t think a lot of people will actually use it