Language Transformations and You: Syntactic Preprocessing

Alright, time for another exciting post about language transformations and believe me, today’s post is incredibly exciting! You see, as I explained in Language Transformations and You: Lexical Preprocessing, Lexical Preprocessing is great for including source code…but doesn’t really cut it in many other situations. Syntactic Preprocessing, however, is always exciting! You see, Syntactic Preprocessing goes further than lexical preprocessing. Instead of parsing for compiler specific preprocessor directives, it actually parses the code in search of text that matches patterns that you yourself define within macros.

What this means is that depending on the language, you can add new syntax and operators, change existing syntax and operators and even add new keywords! This kind of customization is huge with languages like Nim, Lisp or Scheme, but unfortunately hasn’t quite caught on with many of today’s main languages. Sure, you could use the C preprocessor and a Lexical Preprocessor to substitute bits of text as stated here, but I would have to agree with that answer…a simple code substitution isn’t needed, since you can use functions to do the same thing. Syntactic Preprocessing isn’t just a simple swap of a keyword for some code. Instead, it allows you to completely change how you write the language through fully descriptive patterns that can compile into whatever bit of code that you need.

Why is Syntactic Preprocessing really important? Mainly, it allows you to create your own domain specific language, that can result in cleaner code that is easier to maintain. Remember our object hierarchy fun? Instead of having a mess of multiple for loops that you constantly have to code that follow the exact same format, you can instead write a macro that takes cleaner code like parent#child#child#>callback. The compiler transforms that pattern into the messier actual code that your application needs to use in order to function (pardon the pun).

Here’s another example. Javascript will soon have the spread operator with the arrival of ES6. Unfortunately, it could take months or even years before browsers implement all of ES6 (looking at you, Safari and Opera…you’re behind IE’s tech preview!). In the mean time, you can create macros to ease the pain and give you access to some of the features of the spread operator:

macro (..el) {
  // Flatten arrays.
  rule {
    // We're going to use recursion here to ensure that
    // elements that are arrays are also spread.
    [$x:invoke(..el) (,) ...]
  } => {
    $x (,) ...
  }
  // Simply return anything that isn't an array.
  rule {
    $x
  } => {
    $x
  }
}

macro (..>) {
  // For an array, iterate over each element in the
  // array and use the ..el macro on it.
  rule {
    [$x:invoke(..el) (,) ...]
  } => {
    [$x (,) ...]
  }
}

macro (~) {
  // For a function call after the ~ macro, allow a
  // single array to be spread as the function's parameters.
  rule {
    $x(..>$y)
  } => {
    $x.apply(null, $y)
  }
}

// var num = ~some(..>app)
// Outputs: var num = some.apply(null, app);

// var arr = ..>[1,2,3,[1,[2, [3, [foo]]]]]
// Outputs: var arr = [1,2,3,1,2,3,foo];

Here’s a gist of the above macro!

Right? Macros are awesome! They give you so much control over the language. Some developers may not like having a domain language or even using macros, but the way I see it, you need to use the best tool for the job. Creating a domain language that ensures compliance to your application’s needs, can ensure conformance to proper design patterns and still compiles down to the code that you want run in production is a highly effective tool.

Macros sound pretty great so far…but what if you don’t want to use Scheme or Lisp? Luckily there are options for many of the modern languages out there:

  • C,C++, C#, Java, and Objective-C – Unfortunately, they only have the #define preprocessor, which does a simple code substitution and cannot provide as much customization. However, you can change their compilers.
  • Go – No macros here either, but they have a way to generate code, which is just as awesome!
  • Javascript – sweetjs – This is what I used in the above spread operator example, and can be incorporated with Grunt or Gulp.
  • Perl – macro
  • PHP – No macros or even a #define directive. You’re on your own here…
  • Python – macropy
  • Rust – included in language – In fact, the macros in Rust are the same as sweet.js, since both come from Mozilla.

I know what you’re thinking…you could go even further than a domain language and create an entirely new language that transpiles into another language…or even several languages, allowing your team to develop in one language and get native code for desktops, mobile OS’s and even web applications. Of course, you would be right…but I’m not going to discuss that further until my next post Language Transformations and You: Transpiling!

Language Transformations and You!

Language Transformations have long been a huge and important part of the every day life of many developers. What are Language Transformations? I’m glad you asked. You see, Language Transformations are what I’m calling the collection of various terms, technologies and concepts that transform one programming language into another, for lack of a better phrase. L.T. covers everything from using preprocessor directives to transpiling to language macros. Most importantly, it can change the way that you develop websites or applications for the better.

Let’s start with an example. You’re working on a large enterprise application that processes invoices. These invoices have billable items and each item has sub-items. This object hierarchy is a very common occurrence in various Object Oriented and Object Based languages. What happens if you have a collection of invoices, and on each you need to total up the quantities of the sub-items on each item?

// For the purposes of this example, let's ignore some
// of the more functional awesomeness that we can use
// in javascript and any other way we could improve the 
// performance of this bit of code.
totalQuantity = function (invoices) {
    var quantity = 0;

    for (var invoice in invoices) {
        for (var item in invoice.items) {
            for (var subItem in item.subItems) {
                quantity += subItem.quantity;
            }
        }
    }

    return quantity;
};

Alright, quite a bit of an eye sore…could use some improvements, but it gets the job done. Now, what if you need to get the cost of each item? It would require almost exactly the same code:


totalCost = function (invoices) {
    var cost = 0;

    for (var invoice in invoices) {
        for (var item in invoice.items) {
            for (var subItem in item.subItems) {
                cost += subItem.cost;
            }
        }
    }

    return quantity;
};

The only key difference here is we’ve replaced the word quantity with the word cost. While there are many ways to refactor this code, the problem still persists. What if the same thing happens for printing out a list of invoices of a sub account on a main account? Will your original solution cover it? Or will you have a second solution for the Account -> subAccount -> Orders conundrum? Plus what if you need to optimize one of your inheritence traversal solutions…and realize that all of them need to also be optimized?

Enter Language Transformations. By using Language Transformations, you can improve the readability and reuse of your code, while also simplifying how your application is written. These transformations come at a cost of course: either you have to spend time with another compilation and build step or you take a performance hit during run time. These cons need not scare you away from writing your own domain specific language through the use of Language Transformations. The pros of readability, convention, reuse, a single point of maintenance, code specialization and code simplicity far outweigh waiting a few more moments while compiling. In the case of any run time transformations, you will need to weigh the performance cost against the benefits for your own application.

Now that I’ve shown you why you need to transform your languages, I’m going to explain all of the pieces in a nice little teaser for you. Don’t worry, I will explain each of these with their own article over the next few weeks.

Lexical Preprocessing

Lexical Preprocessing is the idea of adding semantic fluff to your source code that the compiler then interprets to take various actions upon the source code. The fluff or Preprocessing Directives, is never included in the final compiled output of the source code. As such it can be a fairly useful tool for developers to change the source code for different environments, applications, sites and uses. Its main disadvantage, however, is that it can also clutter your code if it isn’t handled appropriately. Let’s look at an example in C#:

using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
#if DEBUG
	loggingLevel = LoggingLevels.Debug;
#else
	loggingLevel = LoggingLevels.Production;
#endif

	//Do some SQL Stuff
}

// Output for Debug:
using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
	loggingLevel = LoggingLevels.Debug;

	//Do some SQL Stuff
}

//Output for Prod:
using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
	loggingLevel = LoggingLevels.Production;

	//Do some SQL Stuff
}

The compiler reads through the code, checks the status of DEBUG and then outputs the proper line for the directive.

As you can see, this allows you to change the behavior of the application before runtime, by telling the compiler to transform the language based on specific rules. This can be useful when properly applied; however, as you can see if you use it fairly often, it can easily clutter up your code.

Syntactic Preprocessing (Macros)

Some compilers include an awesome feature known as Syntactic Preprocessing or “Macros” for short. These aren’t the same scripts that you use to automate MS Office or your computer; these are predefined rules that tell a compiler how to transform your code. While similar to Lexical Preprocessing, the key difference is that instead of including or excluding code or substituting blocks of code based on the environment or compiler state, Macros can change the syntactical behavior of a language by performing the same substitution operations based on the code itself. This is what is done by various preprocessor languages like Coffeescript, less, Sass or preprocessor libraries like Sweet.js.

Remember our fun object hierarchy dilemma? By using macros you can create your own domain specific syntax that allows you to handle the solution fairly elegantly:


aggCost = function(total, data){
    return total + data.cost;
}

totalCost = invoices#items#subItems#>aggCost;

// Output:

totalCost = function (invoices) {
    var cost = 0;

    for (var invoice in invoices) {
        for (var item in invoice.items) {
            for (var subItem in item.subItems) {
                cost = aggCost(cost, subItem);
            }
        }
    }

    return cost;
};

The compiler looks at the # operator and uses it to build out the for loops. Once it sees the #> operator, it knows that it should place a callback within the previously made for loop. You can define the rules to create your cost variable based on the name. In fact, you can have it substitute whatever you need based on rules; just remember that since this is a compiler step that it cannot interpret run time properties like the length of an array or the return of a function.

Note: There are some languages that allow syntax manipulation at run time. In that case, your macros can check the state of a variable or the return of a function.

Code highlighting issues aside, your source code is now a lot easier to read and is much cleaner. Just like any domain specific language, you will have to share its meaning with anyone else developing in the code base. To me, that’s not a terrible cost when you see how much more maintainable this methodology is in certain scenarios. Need to change how all of your object hierarchies are traversed? Simply update your macro, compile and test.

Transpiling

Transpiling is a relatively new term that means compiling from one language into another language, at or near the same level of abstraction. Basically, this is more of an escalated approach to language transformation, where you code in one language like C#, and the output is Objective C. This approach generally isn’t used for customization, but rather for familiarity. Perhaps your company has an application that is written in C# or Javascript and you want to put that application on Google’s Play Store or Apple’s App Store. You could rewrite the entire application…or you could transpile it from C# into Java and Objective C.

It’s not just limited to that specific instance either. Google has their own transpiler called traceur that allows you to transpile Ecmascript 6 (future javascript) to Ecmascript 5 (current javascript for modern browsers). Transpiling is also how TypeScript and AtScript are transformed into Javascript.

With transpiling, you can get a whole different set of language features that are compliant with your production environment’s language needs.

That’s all I have on Language Transformations this week. As I said before, this will be a 4 part series, where I go in depth into various styles of Language Transformation that will allow you to revolutionize how you handle development in your domain’s specific needs. I hope you are as excited as I am to take a closer look at Lexical Preprocessing, Syntax Preprocessing and Transpiling.