What are Expression Templates?
Expression Templates are a programming technique for implementing an efficient operator overloading within C++. Since traditional variants of overloading operators suffer from severe performance lacks, Veldhuizen and Vandevoorde invented concurrently the Expression Templates programming technique.
This method delays the evaluation of the operators and stores the expression in a suitable templated object. The expression is not computed until it is assigned to a vector. Then the nested expression objects become evaluated by one loop iteration. Optimizing the code via the inlining, the resulted program becomes nearly as efficient as the corresponding C code.
In the following, we present a simple version of the ET technique, introducing the operating classes and overloaded operators for accomplishing some vector calculations. After understanding this examples, it should be easy to extend the ET technique to more complex matters.
A simple implementation sample
easyet.h – This file contains the implementations concerning the ET constructs for adding, subtracting two vectors, the unary minus and the multiplication of a vector by a double constant.
The wrapper class Expr encapsulates all possible templated expressions into a single interface. By inheriting the operating classes from the wrapper, we can pass every operation object like a wrapped object. The other way around, the introduced cast operator within the wrapper class allows us to pass a Expr object, wherever an unwrapped operation object is to be passed.
The operating classes store the expressions as references, since the life time of the expression object lasts until the evaluation is completed. The accessing operator[ ] serves for computing the expression for the i-th component, referring the evaluation to the stored branches of the expression tree. We emphasize the inheritance of the operating classes from the wrapper class with the operating class type itself as template parameter. This allows an efficient inheritance, accomplished at compile time. Thus, the full inlining optimizations are applicable, since we have no virtual functions and the compiler can analyze all types.
Finally, the overloaded operators take one or two Expr objects. Hence they initialize and return the corresponding operating object. Different from former ET implementations, we solely need to program one instance of an operator, except of the operator*, where we are forced to distinguish between the two combinations ( double * Expr , Expr * double ).
Remark: This example contains just a very simple implementation of ET. However, this program can be extended in several manners.
- Adding a couple of further operations
- Supporting several accesses, e.g matrices, tensors, polynomials
- Type-free generalized implementation using traits
Do not be surprised, if ET programs compile longer than any C counterpart does. Due to the nested template constructs and the inlining optimizations, compiling of ET is pretty hard work for any compiler!
vector.h – this underlying data structure displays all necessary changes for adapting a common user-defined data structure to an ET machine.
The Vector class is pretty similar to any common array-like class. However, we would like to emphasize some differences:
- Our class Vector is inherited from the previously presented wrapper class Expr, as well. Hence, we can handle any vector like a wrapped object.
- The overloaded assignment operator serves for the component-wise evaluation of a wrapped object and assigning the result into the vector. Since our wrapper class Expr has no overloaded operator[ ], we need to cast the wrapped object into a wrapper-free one.
Remark: The class Vector still contains a dynamic array, which size has not to be fixed at compile time!
main.cc – a short usage of the previously presented expression templates and vector classes
This tiny example shows the application of the ET implementation. The very math-like notation supports and eases the coding of mathematical problems. C++ allows the operator overloading, the performance is guaranteed by the ET library.
Fast Expression Templates
Despite of the tremendous performance improvements of the operator overloading in C++, there are some cases, where the efficiency of ET implementations falls short in comparison to their C or FORTRAN counterparts. In principle, this is a general optimization lack, an aliasing problem, that solely appears while vectors repeatedly occur in a right-hand side expression. E.g., consider the north, south, west, east denotation of stencil computations arising with the discretization of partial differential equations:
For those cases, we introduce a faster version of the underlying vector class and thus, we can reach the expected performance. As downside, we have to equip the vector class with an unique integer.
fast_vector.h – The coding of the fast vector version.
Remark: Principally, it is applicable to combine the two implementations of class Vector and class Fast_Vector. Changing the fast version to solely holding pointers without holding any data, we can use the fast vectors just for the performance critical parts. Thus, we have to assign the pointer of the class Vector to the class Fast_Vector before starting the computations.
This modified version of the vector class is equipped with a template enumeration. Additionally, the data pointer is declared as static. Hence, we need the extra initialization of the static array, added after the vector class. Except of these changes, the class Fast_Vector is equal to the previous class Vector.
Remark: Despite of the static data pointer, the array and its size remains dynamic. Thus, we have the same flexibility as we had in the implementation above.
main_fast.cc – Application of the fast vectors in the main program.
This example demonstrates the differences to the previous one, and does not contain any special problem concerning performance lacks.
Herein we can easily see, that the initialization of the vectors is more work for the user. We have to ensure, that every enumeration id is unique over the whole application. However, these fast vectors have only to be used for the performance critical parts and by doing so, we can reuse the class Fast_Vector for independent calculations.