August 16th, 2007
Why Use a Language-Powered Domain Specific Language?
Following my previous post on Domain Specific Languages (DSL), I had the pleasure of reading some responses.
Aristotle does not like using eval (source):
I mean, evaluating another source file every time you instantiate an object in that class? Awesome! If I had to maintain his code I’d refactor that part out of existence with a quickness!
Ovid agrees but is a bit more thoughtful and explains DSLs better than I did, except we differ in opinion regarding use of eval. Ovid’s most important point is that DSLs can also be implemented by coding your own lexer and parser instead of using eval. I did fail to mention that, but I think it lends yet more credence to my previous assertion: that Ruby has better support for DSLs than Perl. Otherwise, the Perl crowd would not be so much against using eval (on strings) instead of writing the parser.
What a pity! There are certainly instances where eval is used in a sloppy and careless manner, but proper use to facilitate DSLs transcend that. My use of eval in DSLs may not always be the best, but I am convinced eval is a useful tool that should not be so summarily dismissed because of its dirty infancy and that its bathwater got a bit grimy.
Where there is disagreement is that Rubyists often use language-powered DSLs in configuration files. The closest I have ever seen in Perl was a configuration file made up of a large number of variable assignments. The programmer simply did not want to deal with a parser to handle the hundreds of variables being assigned and all of the string variations that might be encountered, so he had them directly evaled from configuration. Is that bad programming? Maybe so in Perl terms, but one cannot ignore the benefits he realized without any loss of maintainability of his code:
- He avoided needing to invent a configuration file format. Perl defined all the syntax for him.
- He avoided needing to write a parser that would need to understand his assignments to various scalar, array, and hash values.
- He did not need to figure out how to parse strings with special characters, such as quotes, so his strings could automatically accomodate arbitrary data as needed by administrators of the application.
- He got clean handling of line breaks and multi-line statements for free.
- He could make use of already assigned values to be included into other values without having to form his own string substitution symtax.
- Comments and their syntax are included in the benefits.
You automatically win all this when your configuration file is nothing more than assignments. Ruby programmers take it farther and gain even more leverage by providing methods for use in configuration.
I have a hard time believing that an application having to provide its own code to deal with the above instead of just letting the language handle it is going to be easier to maintain, as is opined by the Perl users I quoted. I am particularly convinced of it because I have myself written programs of both categories: those that use the power of the language in a DSL and those that carry along its own parser, defining its own syntax and language. I suspect the reason many Perl enthusiasts do not agree is that they have never written a language-powered DSL in an appropriate situation. If the most vocal Perl users (and their leadership, too?) is lambasting the idea, they certainly have no small social obstacle to get around if they honestly think the language-powered DSL is the right way to go.
With language-powered DSLs I have written a simple vulnerability scanner in 75 lines of Ruby. The configuration file to check one specific vulnerability I needed checked was 30 lines. 105 lines of code to check one vulnerability is no big deal, but the program is vulnerability agnostic. The configuration file only takes a port, information on what to spew to that port, and what to expect back from the system if it is vulnerable or not vulnerable. It is already sophisticated enough to handle a multi-part conversation. I believe the Perl community’s recommended way to do this with about the same amount of code would be to have modules instead of configuration files for each vulnerability. With a DSL I do not have to mess with package overhead or making sure my @INC array will contain the directory with my vulnerability information.
I have also written a fetchmail/procmail hybrid in 512 lines of Ruby. I needed something that could log into a POP server and download my mail like fetchmail does but also review the contents of the message and make procmail-like decisions about delivery before deciding to delete or keep the message on the server (the knowing to keep or delete being the key to why I could not use fetchmail and procmail). My personal configuration file for this is about 60 lines, but there is no reason you would need more than 5 if you wanted very basic functionality. I plan to share this program in the future, and I would be happy to speed that along if enough people show interest.
What did I gain, in addition to the benefits I noted above, in my DSL for the fetchmail/procmail hybrid?
- No need to decide on how to implement if, while, and other loop constructs. Since this program has to make decisions it can make use of these constructs if the user desires to use them.
- More elegant constructs specific to the task at hand. These are methods defined that (usually) expect blocks of code that are run conditionally (and possibly multiple times) based on the needs of the application.
- Users could assign different values to a configuration directive based on what host they are running from. I found this feature useful in my personal configuration; I can have mail delivered locally if I am on my Linux host but delivered onto a network drive if I am using my Windows host. This kind of feature is technically available even when you are limited to variable assignment, but I encourage its use when appropriate.
Do not use language-powered DSLs when writing a parser is clearly superior, such as when the input cannot be trusted. You cannot give the full power of the language to untrusted input unless you want a security nightmare. As I stated above, I have written my own special use language from Ruby in order to parse code from untrusted sources. I had to handle all the issues I mentioned above: dealing with the conditional and looping constructs, parsing strings that accommodate arbitrary data (through escape sequences), handling the parsing of arithmetic operators so they can be used inline, etc. It was a full language executed inside my application. So I do know what kind of code that entails, and it is most certainly not more maintainable than using the power of the language in your DSL.
If you have been told in the past by experts that “eval is bad” then I urge you to open your mind a little. Listen to what they have to say about it—they do make some valid points about when not to use eval, but do not think it means that one should never use it. I used to be in your camp, too, but when faced (in my case directly faced) with obvious benefit after obvious benefit, I came around. I urge you to take the plunge, too.
August 16th, 2007 at 9:19 am
This is perhaps one of the most significant issues I have with a “language-powered DSL” (a.k.a., fluent interfaces). If there’s even the slightest danger of untrusted input, then I strongly object to them. However, people will disagree as to what constitutes “untrusted input”. To my mind, this means any code coming from outside of the program, even it’s just a simple file that someone down in accounting can edit. I may be overly paranoid on this point, however.
More importantly, the who concept of fluid interface is just good programming. It’s a difficult skill to develop. That some people refer to these as DSLs where others do not makes a lot of the point moot. In Perl’s case, we do have a string eval which does this, but if implemented incorrectly, can lose some context (I’d love to easily be able to run an eval against a higher stack frame).
For myself, I distinguish between fluent interfaces and DSLs because the former, as mentioned, is just good programming and is not restricted to a problem domain while the latter is harder to implement, tends to be more restricted, but offers a greater expressiveness if suitably done. Plus, by separating the lexing and parsing, you can gain a lot of flexibility if someone needs to submit their data in an XML format but your DSL is natural language. Just write a separate lexer and use the same grammar! The decoupling of these steps is trivial and since Perl is so powerful when it comes to text manipulation, it’s well-suited for this.
That being said, it really sounds like you and I don’t have a significant difference of opinion here once we recognize that we’re using different definitions for similar terms, hence the disconnect.
(And I really wish you had a preview button here
August 16th, 2007 at 1:11 pm
I agree. Thanks!
While I like to promote the use of DSLs via string eval, I would not use it for any of the above examples that you present where a lexer and parser would be superior (further affirming your point that we agree on a lot). Your point about taking data in different formats is not one I considered, but I would not trust externally submitted data enough to pass to eval anyway.
I have never considered a situation where a parser might be taking input from any one of several lexers. Mine have always been married in a way because I was focused on one type of input. Nifty idea for me to churn on! 😀
August 17th, 2007 at 7:18 am
Cosine,
Quite why are you evaluating random code for configuration – doesn’t ruby have decent support for configuration files?
I replied before about this – the example you give is so unhelpful that I don’t see any case at all for what you’re going on about.
In my experience the occasions where you want some kind of ‘quick and dirty on the fly evaluated user input’ that you call a DSL, are exactly the times when you have a domain specific problem that a domain specific API should handle well with a proper parser.
For example in my work I handle human and computer created data feeds from Air Traffic services, The Met Office, US DoD, ICAO, etc, all of this data is parsed properly through a lexer with validation, edge case handling, and then fed into the system through the same clear and documented OO API that it will be retrieved through.
I’m trying but failing to see a good scenario for what you’re proposing – a good DSL is very hard to create, not technically but through the difficulty of the design decisions – the implementation in any language is the most trivial part – perhaps the reason we don’t use Quick/Dirty Pseudo-DSL like Ruby programmers is because many of us know that a better solution to the problem is usually available and done properly on CPAN or requires up front design time in doing it right (which rules out a DSL in 99.9% of cases).