Language Transformations and You: Syntactic Preprocessing

Alright, time for another exciting post about language transformations and believe me, today’s post is incredibly exciting! You see, as I explained in Language Transformations and You: Lexical Preprocessing, Lexical Preprocessing is great for including source code…but doesn’t really cut it in many other situations. Syntactic Preprocessing, however, is always exciting! You see, Syntactic Preprocessing goes further than lexical preprocessing. Instead of parsing for compiler specific preprocessor directives, it actually parses the code in search of text that matches patterns that you yourself define within macros.

What this means is that depending on the language, you can add new syntax and operators, change existing syntax and operators and even add new keywords! This kind of customization is huge with languages like Nim, Lisp or Scheme, but unfortunately hasn’t quite caught on with many of today’s main languages. Sure, you could use the C preprocessor and a Lexical Preprocessor to substitute bits of text as stated here, but I would have to agree with that answer…a simple code substitution isn’t needed, since you can use functions to do the same thing. Syntactic Preprocessing isn’t just a simple swap of a keyword for some code. Instead, it allows you to completely change how you write the language through fully descriptive patterns that can compile into whatever bit of code that you need.

Why is Syntactic Preprocessing really important? Mainly, it allows you to create your own domain specific language, that can result in cleaner code that is easier to maintain. Remember our object hierarchy fun? Instead of having a mess of multiple for loops that you constantly have to code that follow the exact same format, you can instead write a macro that takes cleaner code like parent#child#child#>callback. The compiler transforms that pattern into the messier actual code that your application needs to use in order to function (pardon the pun).

Here’s another example. Javascript will soon have the spread operator with the arrival of ES6. Unfortunately, it could take months or even years before browsers implement all of ES6 (looking at you, Safari and Opera…you’re behind IE’s tech preview!). In the mean time, you can create macros to ease the pain and give you access to some of the features of the spread operator:

macro (..el) {
  // Flatten arrays.
  rule {
    // We're going to use recursion here to ensure that
    // elements that are arrays are also spread.
    [$x:invoke(..el) (,) ...]
  } => {
    $x (,) ...
  }
  // Simply return anything that isn't an array.
  rule {
    $x
  } => {
    $x
  }
}

macro (..>) {
  // For an array, iterate over each element in the
  // array and use the ..el macro on it.
  rule {
    [$x:invoke(..el) (,) ...]
  } => {
    [$x (,) ...]
  }
}

macro (~) {
  // For a function call after the ~ macro, allow a
  // single array to be spread as the function's parameters.
  rule {
    $x(..>$y)
  } => {
    $x.apply(null, $y)
  }
}

// var num = ~some(..>app)
// Outputs: var num = some.apply(null, app);

// var arr = ..>[1,2,3,[1,[2, [3, [foo]]]]]
// Outputs: var arr = [1,2,3,1,2,3,foo];

Here’s a gist of the above macro!

Right? Macros are awesome! They give you so much control over the language. Some developers may not like having a domain language or even using macros, but the way I see it, you need to use the best tool for the job. Creating a domain language that ensures compliance to your application’s needs, can ensure conformance to proper design patterns and still compiles down to the code that you want run in production is a highly effective tool.

Macros sound pretty great so far…but what if you don’t want to use Scheme or Lisp? Luckily there are options for many of the modern languages out there:

  • C,C++, C#, Java, and Objective-C – Unfortunately, they only have the #define preprocessor, which does a simple code substitution and cannot provide as much customization. However, you can change their compilers.
  • Go – No macros here either, but they have a way to generate code, which is just as awesome!
  • Javascript – sweetjs – This is what I used in the above spread operator example, and can be incorporated with Grunt or Gulp.
  • Perl – macro
  • PHP – No macros or even a #define directive. You’re on your own here…
  • Python – macropy
  • Rust – included in language – In fact, the macros in Rust are the same as sweet.js, since both come from Mozilla.

I know what you’re thinking…you could go even further than a domain language and create an entirely new language that transpiles into another language…or even several languages, allowing your team to develop in one language and get native code for desktops, mobile OS’s and even web applications. Of course, you would be right…but I’m not going to discuss that further until my next post Language Transformations and You: Transpiling!

Language Transformations and You: Lexical Preprocessing

As I stated in Language Transformations and You!, I will be spending the next few weeks discussing how you can improve your skill set with various Language Transformation techniques. This week’s big topic will be Lexical Preprocessing, which is a method of changing your source code based on specific compiler recognized tokens. Is it worth your time? Let’s take a deeper look at it and I’ll let you decide.

Source Code Importing

The most pervasive example of Lexical Preprocessing is importing source code from one file into another. This is one of the most important tools in a developer’s repertoire because it allows you to organize your source code. As the old saying goes, “A place for everything, and everything in its place.” Now, I could probably rant about the importance of organizing your files into small reusable modules until I’m blue in the face, but I will give you the benefit of the doubt and assume you yourself rant about the very same thing. 🙂

Anyways, when source code is being interpreted and compiled, the compiler needs to be able to find the references of all of the code that you used within your files. The way it handles that process is during Lexical Preprocessing by loading source from any file that you have imported with of course an import, includes or using statement (or whatever other keyword token your language of choice uses).

Unfortunately…there is a downside (there always is with technology): mocking out imported files for unit testing is a huge pain with normal Source Code Importing. In fact, with many languages, it can’t be done at all without some real hacky and evil code. The solution? Dependency Injection. Since this is a topic about Lexical Preprocessing, I won’t get into the gory details…suffice to say, you definitely need to learn about Dependency Injection. I would start here, here and also in a TLDR post here.

That one con aside, you’ve probably used this technique so often that you barely notice it anymore. It’s become a part of your routine. Add a file, include a file, use the imported contents within your new file. You’ve already begun transforming your source code into being a more readable and well organized project through the use of Lexical Preprocessing. Take a look in the mirror and high five yourself. Yeah, you’re pretty awesome. Now clean off your hand print before anyone realizes you just high fived your reflection. Don’t worry, your secret is safe with me.

“So”, you ask as you grab that bottle of window cleaner, “what else can Lexical Preprocessing do?”

I’m glad you asked.

Source Code Substitution

You see, Lexical Preprocessing isn’t just about importing source code. It’s also about changing your source code based on static, compiler recognized tokens. Let’s say you have a utility or application base code file that you use to define various settings, objects and functions that you use throughout your application. You might have database connection objects, file paths and boolean settings that change the behavior of your application. What if you need those to change based on whether or not the application is running in production? One common approach is to use Source Code Substitution in order to change how compiler output based on various compiler and environmental states.

Remember our database connection logging example in the previous article?

using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
#if DEBUG
	loggingLevel = LoggingLevels.Debug;
#else
	loggingLevel = LoggingLevels.Production;
#endif

	//Do some SQL Stuff
}

// Output for Debug:
using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
	loggingLevel = LoggingLevels.Debug;

	//Do some SQL Stuff
}

//Output for Prod:
using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
	loggingLevel = LoggingLevels.Production;

	//Do some SQL Stuff
}

With the use of the #if, #else and #endif preprocesser directives, we can change the output of our source code to fit our application’s environment. This tells the compiler that if DEBUG is true, that it should only include the line where we set the loggingLevel to Debug. For anything else, it should be set to Production. That’s a pretty nifty trick to use in order to ensure that our application runs exactly as we need it to in a developer environment or a production environment.

The only problem…is that it can clutter up your code if you use it too heavily. Remember your utility file? What if most of the settings in there change depending on the application’s environment? It would be a sloppy mess of preprocessor directives mixed in with your code, and determining the finished output of a source file would be a nightmare. Of course, if kept in one file, it can allow your entire application to behave differently as needed and keep the mess contained.

A better solution? Put your settings into separate configuration files based on your different application environments. Then you can load whichever one you need based upon the environment. It’s cleaner, more organized and easier to maintain. In fact, I would argue that in any situation where you feel you need to use preprocessor directives, you probably don’t. By writing well organized, modular, functional code that utilizes configuration files, proper source code importing and even Dependency Injection, you can make your application behave differently in almost any environment that you care to configure.

There are even some compilers that allow you to configure or code them for source code importing (like Grunt or Gulp with Javascript), so you can tell the compiler to import entirely different files based on your environment. Having a compiler that powerful will allow you to not only separate behavior into different files, but will also allow you to use that behavior for your environment without adding a bunch of compiler jargon to your code files. How awesome is that?

So, to sum up, you most likely already use Lexical Preprocessing often through source code importing. It has a fairly minor con, seeing as you can still write incredible, well organized code. Some languages even allow you to write meaningful unit tests alongside source code importing, without the use of Dependency Injection.

The other main use, source code substitution, should be used sparingly, as there are better ways to make your application behave differently in various environments than by substituting code.

Stay tuned next week for my favorite part of Language Transformations: Syntactic Preprocessing!

Language Transformations and You!

Language Transformations have long been a huge and important part of the every day life of many developers. What are Language Transformations? I’m glad you asked. You see, Language Transformations are what I’m calling the collection of various terms, technologies and concepts that transform one programming language into another, for lack of a better phrase. L.T. covers everything from using preprocessor directives to transpiling to language macros. Most importantly, it can change the way that you develop websites or applications for the better.

Let’s start with an example. You’re working on a large enterprise application that processes invoices. These invoices have billable items and each item has sub-items. This object hierarchy is a very common occurrence in various Object Oriented and Object Based languages. What happens if you have a collection of invoices, and on each you need to total up the quantities of the sub-items on each item?

// For the purposes of this example, let's ignore some
// of the more functional awesomeness that we can use
// in javascript and any other way we could improve the 
// performance of this bit of code.
totalQuantity = function (invoices) {
    var quantity = 0;

    for (var invoice in invoices) {
        for (var item in invoice.items) {
            for (var subItem in item.subItems) {
                quantity += subItem.quantity;
            }
        }
    }

    return quantity;
};

Alright, quite a bit of an eye sore…could use some improvements, but it gets the job done. Now, what if you need to get the cost of each item? It would require almost exactly the same code:


totalCost = function (invoices) {
    var cost = 0;

    for (var invoice in invoices) {
        for (var item in invoice.items) {
            for (var subItem in item.subItems) {
                cost += subItem.cost;
            }
        }
    }

    return quantity;
};

The only key difference here is we’ve replaced the word quantity with the word cost. While there are many ways to refactor this code, the problem still persists. What if the same thing happens for printing out a list of invoices of a sub account on a main account? Will your original solution cover it? Or will you have a second solution for the Account -> subAccount -> Orders conundrum? Plus what if you need to optimize one of your inheritence traversal solutions…and realize that all of them need to also be optimized?

Enter Language Transformations. By using Language Transformations, you can improve the readability and reuse of your code, while also simplifying how your application is written. These transformations come at a cost of course: either you have to spend time with another compilation and build step or you take a performance hit during run time. These cons need not scare you away from writing your own domain specific language through the use of Language Transformations. The pros of readability, convention, reuse, a single point of maintenance, code specialization and code simplicity far outweigh waiting a few more moments while compiling. In the case of any run time transformations, you will need to weigh the performance cost against the benefits for your own application.

Now that I’ve shown you why you need to transform your languages, I’m going to explain all of the pieces in a nice little teaser for you. Don’t worry, I will explain each of these with their own article over the next few weeks.

Lexical Preprocessing

Lexical Preprocessing is the idea of adding semantic fluff to your source code that the compiler then interprets to take various actions upon the source code. The fluff or Preprocessing Directives, is never included in the final compiled output of the source code. As such it can be a fairly useful tool for developers to change the source code for different environments, applications, sites and uses. Its main disadvantage, however, is that it can also clutter your code if it isn’t handled appropriately. Let’s look at an example in C#:

using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
#if DEBUG
	loggingLevel = LoggingLevels.Debug;
#else
	loggingLevel = LoggingLevels.Production;
#endif

	//Do some SQL Stuff
}

// Output for Debug:
using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
	loggingLevel = LoggingLevels.Debug;

	//Do some SQL Stuff
}

//Output for Prod:
using(SqlConnection con = new SqlConnection(connectionString)){
	// Set logging
	loggingLevel = LoggingLevels.Production;

	//Do some SQL Stuff
}

The compiler reads through the code, checks the status of DEBUG and then outputs the proper line for the directive.

As you can see, this allows you to change the behavior of the application before runtime, by telling the compiler to transform the language based on specific rules. This can be useful when properly applied; however, as you can see if you use it fairly often, it can easily clutter up your code.

Syntactic Preprocessing (Macros)

Some compilers include an awesome feature known as Syntactic Preprocessing or “Macros” for short. These aren’t the same scripts that you use to automate MS Office or your computer; these are predefined rules that tell a compiler how to transform your code. While similar to Lexical Preprocessing, the key difference is that instead of including or excluding code or substituting blocks of code based on the environment or compiler state, Macros can change the syntactical behavior of a language by performing the same substitution operations based on the code itself. This is what is done by various preprocessor languages like Coffeescript, less, Sass or preprocessor libraries like Sweet.js.

Remember our fun object hierarchy dilemma? By using macros you can create your own domain specific syntax that allows you to handle the solution fairly elegantly:


aggCost = function(total, data){
    return total + data.cost;
}

totalCost = invoices#items#subItems#>aggCost;

// Output:

totalCost = function (invoices) {
    var cost = 0;

    for (var invoice in invoices) {
        for (var item in invoice.items) {
            for (var subItem in item.subItems) {
                cost = aggCost(cost, subItem);
            }
        }
    }

    return cost;
};

The compiler looks at the # operator and uses it to build out the for loops. Once it sees the #> operator, it knows that it should place a callback within the previously made for loop. You can define the rules to create your cost variable based on the name. In fact, you can have it substitute whatever you need based on rules; just remember that since this is a compiler step that it cannot interpret run time properties like the length of an array or the return of a function.

Note: There are some languages that allow syntax manipulation at run time. In that case, your macros can check the state of a variable or the return of a function.

Code highlighting issues aside, your source code is now a lot easier to read and is much cleaner. Just like any domain specific language, you will have to share its meaning with anyone else developing in the code base. To me, that’s not a terrible cost when you see how much more maintainable this methodology is in certain scenarios. Need to change how all of your object hierarchies are traversed? Simply update your macro, compile and test.

Transpiling

Transpiling is a relatively new term that means compiling from one language into another language, at or near the same level of abstraction. Basically, this is more of an escalated approach to language transformation, where you code in one language like C#, and the output is Objective C. This approach generally isn’t used for customization, but rather for familiarity. Perhaps your company has an application that is written in C# or Javascript and you want to put that application on Google’s Play Store or Apple’s App Store. You could rewrite the entire application…or you could transpile it from C# into Java and Objective C.

It’s not just limited to that specific instance either. Google has their own transpiler called traceur that allows you to transpile Ecmascript 6 (future javascript) to Ecmascript 5 (current javascript for modern browsers). Transpiling is also how TypeScript and AtScript are transformed into Javascript.

With transpiling, you can get a whole different set of language features that are compliant with your production environment’s language needs.

That’s all I have on Language Transformations this week. As I said before, this will be a 4 part series, where I go in depth into various styles of Language Transformation that will allow you to revolutionize how you handle development in your domain’s specific needs. I hope you are as excited as I am to take a closer look at Lexical Preprocessing, Syntax Preprocessing and Transpiling.