Persistency and APIs

Different goals, perhaps common tools

There are at least two completely different and often incompatible goals in selling a library:

Sell the library as such to some people with with considerable software experience who regard it as a component and who need it only as one toolkit in an application they have already designed and are developing using their own favoured toolkits.: In this case the maximum chance of selling derives from the intrinsic quality of the library and whether and how well it fits or can be made to fit with whatever other toolkits the customer has already chosen.
Sell the library to some people who need to use it as the core of an application that they are writing from scratch without great software skills.: Better chances of selling come from offering a ready made selection of integrated toolkits that come with the library and the client can easily customize in some high level way.

The two goals are pretty often incompatible because faciliting integration into an arbitrary collection of toolkits is not needed and is quite different from facilitating easy customization of a well chosen, static collection of toolkits.

However in general at least some of the tools used for either situation are common.

Very high level concepts: functionals and reflection

TODO: mention debuggers, interpreters, domains

Dealing with persistency and API integration requires two very high level and rarely used concepts which come with many different names:

metadata: Metadata here is data that describes the properties rather than the structure of other data. For example, the list of fields in a record, or the list of parameters and the body of a function.
functionals: These are second-order functions, that is functions on metadata, like a function, rather than on data.
reflection: The ability of a program to operate on itself.

These two concepts are essential for both persistency and integration because:

Persistency depends on the ability to take an arbitrary piece of memory and a type, and to reflect on it to convert itsm content to some other format according to its type.
Integration depends on the ability to redefine the function invocation functional (which in C/C++ is normally implicit, but is there) for functions called from or by other languages, so arguments lists etc. get converted.

Both persistence and integration depend on having metadata that describes the types to persist or the functions to integrate, and functionals that do the store/load of the data to persist or the convert the call frame from one language ABI to another.

The important choices are on the details of the how and when, not what.

The possible choices

TODO: Mention GCC extensions, SL/5

How to generate the metadata, and in which format, and when to use it.
What kind of save/restore functional to write, and where.
What kind of call conversion functional to write, and where.

In some languages it's easier than in others; for example in Lisp since programs and data structures have exactly the same representation, a program is in effect its own metadata.

In Java and Objective-C the compiler embeds in compiled code a significant amount of data; in other languages there are builtin primitives to reflect on function calls.

The main problem is that neither C nor C++ have any easy ways to generate metadata or write general functionals. Some extended versions do, but the extensions are as a rule not portable.

It is therefore in general very difficult to write general purpose save/restore or call conversion functionals for C or C++. This means that special purpose ones, and some degree of flexibility has to be lost.

The loss of flexibility can involve several different alternatives.

Less flexible but more feasible alternatives

Metadata hacks

The big problem with metadata extraction is that to extract truly accurate metadata one needs full parsing of the source, with exactly the same processing done by the compiler.

Ideally therefore this would be done by the compiler, but if the compiler does not do it, and can't be modified, that's just not an option.

Using any other tool will to some extent produce inaccurate metadata; the issue is how often and how inaccurate.

The metadata is generated by a separate tool

The metadata describing a program's data structures or functions can be generated by another tool than the compiler. This can be a preprocessor or a postprocessor, for example:

A tool that scans the debugging information generated by the compiler, as a source oriented debugger is a fully reflective programs that needs extensive metadata.
After all the compiler usually generates fairly complete and accurate metadata in the form of debugger information, and this may be backprocessed into source form.
A version of GCC that converts the program into a tree represented in XML (special thanks to Marek for pointing it out).
The problem with this apporach is that it will generate metadata that is accurate only with regards to a binary compiled by GCC, and on some platforms that just is not a viable option.
A header file scanner that extracts function declarations (e.g. proto and unproto).

The metadata is generated manually

This requires writing by hand description of the data structures and functions in an API. This is often done before the fact, for example for RPC oriented programs.
There are several API description languages, for example related to ILU, SWIG, or DCOM.

The metadata is generated in part manually in part by a preprocessor.

This usually involves adding some manual tags to the definitions or declarations of types and functions. These tags are either used by a special purpose preprocessor or by a