Difference between revisions of "M2TBackend"

Revision as of 13:01, 22 January 2008

Common platform for M2T languages

This page describes both the ideas behind the common M2T backend and its implementation.

The backend originated as part of the evolution of the Xpand and Xtend languages, and the packages are currently named accordingly. It is however intended as a runtime environment for all M2T languages, supporting common performance optimizations, interoperability and potential reuse of other code. If it proves to be useful this way, it should probably be moved to a non-xtend namespace to reflect its common nature.

Comments are very welcome and indeed necessary to incorporate the requirements of other languages than Xpand and Xtend and make the backend useful to them.

The backend code is currently located in the modeling CVS at org.eclipse.m2t/org.eclipse.xpand/plugins/org.eclipse.xtend.backend* with related code at org.eclipse.m2t/org.eclipse.xpand/plugins/org.eclipse.xtend.middleend*. This distinction will be explained in the remainder of this document.

Overview

Design goals

The design of the backend was driven by the following forces:

Performance. In large projects, generator speed is an issue, and the backend is designed with performance in mind. This requirement is what actually sparked its development in the first place.
Compiler. For performance and/or obfuscation reasons, the backend will serve as a basis for compilation into Java classes.
Language independence. Concrete languages evolve, concrete convenience syntax is added, and anyway different languages have different concrete syntaxes. In order to leverage the performance tuning effort, the backend is designed to be largely independent of the concrete syntax of languages built on top of it. The developer skills and mindset required for frontend and backend development are quite different, and the separation gives performance and universality efforts of the backend a more stable basis. This is probably a point where some implicit assumptions will prove less general than desirable, and feedback from other m2t language development teams is necessary.
Independent of parse tree. There is a strict and complete separation between the data structures used by the backend and those used by the frontends. The previous item explained how this separation is useful for the backend, but the development of frontend tooling also benefits. Since the parse tree of the frontend need not directly serve as a basis for execution, it becomes simpler to implement features like fault tolerant parsing.
Language interoperability. The common backend is intended to facilitate interoperability of languages, i.e. making it as simple as possible to have code in one language call code written in another.
Reuse of Tooling. The backend will incorporate support for tooling that requires runtime support - debugging, profiling etc. - in such a way as to minimize implementation effort for the different languages that wish to support them.

Layers

The backend serves as the runtime environment, and its data structures are independent of the concrete syntax of a given language. The frontend tooling on the other hand is intended to use its own AST that should be free of runtime concerns.

Therefore a translation layer is introduced, called middle end for want of a different term. It is specific for every concrete language, and its purpose is to transform the AST of the frontend into the data structures required by the backend. This involves mainly the following transformations which will be explained in more detail in subsequent sections:

functions. The data structures representing code are structured around the key abstraction of function in the backend. A function is a piece of code that can be called using parameters and that returns an object.
primitive operations. The code inside a function is represented by a tree of expressions. Since stability is one of the key design requirements of the backend, the middle ends must map the specific functionality of that language onto the given set of expression nodes available in the backend.
types. Since m2t languages - just like many other languages - operate on data, data types have a representation in the backend. So every middle end must transform data types from the language specific representation into the common representation of the backend.

Execution sequence

In line with the performance goal of the backend, the backend is as static in its execution as possible. Everything that can be evaluated by the middle end has no support in the backend.

One prominent aspect that is affected by this decision is parsing of source code, which is left entirely to the middle end. The backend is designed so as to never parse any resources. This decision deeply affects the function resolution strategy and other implementation aspects of the backend, and therefore it should be reviewed especially carefully for its implications at an early stage if possible!

From an execution perspective, several steps must be performed in order to execute a program:

Call the middle end to transform the program into the backend representation. This requires the frontend AST as an input, so the middle end will call the corresponding front end parser. The output of this step is an initialized backend data structure.
There is a conscious design decision at this point. At first glance, it would also be possible to pass the front end AST to the middle end instead of having the middle end call the front end parser. This approach would work well for single source files, but it would be difficult to maintain if one source file referenced another, potentiall even written in a different language. Therefore the decision was made to have the middle end call the front end parser.
Initialize the runtime data structure for the backend. This step can be implicitly performed by a facade, but it allows detailed control over reuse of runtime data across several invocations (see below for details).
Actually call the backend to "execute" the data structures returned by the middle end.

Functions

Key abstractions

Functions are the backend abstraction for any code that can be called. They are represented by the Function interface (see below for details), they can be passed parameters and they always return an object (see the section on usage patterns for a description of how e.g. the implementation of a template language is intended). The Function interface has a method to invoke it (ignore the ExecutionContext for now. It is a data structure containing all runtime relevant data and will be explained later in this document):

public interface Function {
    public Object invoke (ExecutionContext ctx, Object[] params);
    
    ... // other methods that will be explained later
}

There is no distinction between "stand-alone functions" and "methods defined on a type" - they are all represented by functions. If a concrete language supports "method" style calls, these must be mapped to functions with an additional first parameter of the "class" type by the respective middle end.

So, for example

MyType.doSomething (int a, String b)

would be represented by

doSomething (MyType this, int a, String b)

Functions have no name per se, they are like function pointers in C in this regard. This supports a functional style of programming, Closures etc. In contexts where functions require a name, they are represented by the NamedFunction class that is basically a container for a function and a string representing its name.

Polymorphism

It is possible - even desirable - for several functions to have the same name, as long as they differ in their parameter types. Every function invocation made by the backend is done polymorphically, i.e. if there are several functions with the right name and number of parameters, the best fit is picked and invoked.

This choice is done based on the actual object types of the parameters and not on the "reference type" (a concept that the backend has no notion of) as in Java. So the resolution is done at runtime rather than compile time, and if there is no matching function (or no single "best match"), a runtime error occurs.

Let us for example assume we have the following functions ("List" being a subtype of "Collection"):

  f(Collection c, Object o);
  f(List       c, Object o);
  f(Object o, Collection c);

first parameter type	second parameter type	function being invoked
a	b	c

If we call the function "f" with the parameters "a" and 1 (i.e. a string and a number), the first of the functions is invoked. Both

If we call "f" with 1 and "a", the second function is invoked.

If we call "f" with two numbers, neither of the functions matches, and a runtime exception is thrown.

If we call "f" with two strings, both functions match, and neither matc

FDC

Linking

Libraries

polymorphism

static linking

syslib; mapping of function names

guards

caching

Linking and Libraries

syslib

Types

Kinds of Expression

Usage patterns

templates via return type

Internal data structures

package overview ExecutionContext

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "M2TBackend"

Revision as of 13:01, 22 January 2008

Contents

Common platform for M2T languages

Overview

Design goals

Layers

Execution sequence

Functions

Key abstractions

Polymorphism

Linking

Libraries

Linking and Libraries

Types

Kinds of Expression

Usage patterns

Internal data structures

To Do

@@ Line 50: / Line 50: @@
 == Functions ==
+=== Key abstractions ===
+Functions are the backend abstraction for any code that can be called. They are represented by the Function interface (see below for details), they can be passed parameters and they always return an object (see the section on usage patterns for a description of how e.g. the implementation of a template language is intended). The Function interface has a method to invoke it (ignore the ExecutionContext for now. It is a data structure containing all runtime relevant data and will be explained later in this document):
+<code><pre>
+public interface Function {
+    public Object invoke (ExecutionContext ctx, Object[] params);
+    ... // other methods that will be explained later
+}
+</pre></code>
+There is no distinction between "stand-alone functions" and "methods defined on a type" - they are all represented by functions. If a concrete language supports "method" style calls, these must be mapped to functions with an additional first parameter of the "class" type by the respective middle end.
+So, for example
+<code><pre>
+MyType.doSomething (int a, String b)
+</pre></code>
+would be represented by
+<code><pre>
+doSomething (MyType this, int a, String b)
+</pre></code>
+Functions have no name per se, they are like function pointers in C in this regard. This supports a functional style of programming, Closures etc.  In contexts where functions require a name, they are represented by the NamedFunction class that is basically a container for a function and a string representing its name.
+=== Polymorphism ===
+It is possible - even desirable - for several functions to have the same name, as long as they differ in their parameter types. Every function invocation made by the backend is done polymorphically, i.e. if there are several functions with the right name and number of parameters, the best fit is picked and invoked.
+This choice is done based on the actual object types of the parameters and not on the "reference type" (a concept that the backend has no notion of) as in Java. So the resolution is done at runtime rather than compile time, and if there is no matching function (or no single "best match"), a runtime error occurs.
+Let us for example assume we have the following functions ("List" being a subtype of "Collection"):
+<code><pre>
+  f(Collection c, Object o);
+  f(List       c, Object o);
+  f(Object o, Collection c);
+</pre></code>
+{| class="wikitable"
+|-
+! first parameter type !! second parameter type !! function being invoked
+|-
+| a || b || c
+|}
+If we call the function "f" with the parameters "a" and 1 (i.e. a string and a number), the first of the functions is invoked. Both
+If we call "f" with 1 and "a", the second function is invoked.
+If we call "f" with two numbers, neither of the functions matches, and a runtime exception is thrown.
+If we call "f" with two strings, both functions match, and neither matc
+FDC
+=== Linking ===
+=== Libraries ===
 polymorphism
-static linking
+static linking
+syslib; mapping of function names
+guards
+caching
 == Linking and Libraries ==

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "M2TBackend"

Revision as of 13:01, 22 January 2008

Contents

Common platform for M2T languages

Overview

Design goals

Layers

Execution sequence

Functions

Key abstractions

Polymorphism

Linking

Libraries

Linking and Libraries

Types

Kinds of Expression

Usage patterns

Internal data structures

To Do