Keldysh Institute for Applied Mathematics

M.M.Gorbunov-Possadov

Open Systems Journal, 1998, N. 3, pp. 45–49 (in Russian)

M.M.Gorbunov-Possadov

EXTENSIBLE HOMOGENEOUS SET
AS A BASIS FOR REUSE

Translation from Russian by Victor Sadikov

One more way to install a reusable component in a program is proposed. The special construct provides a means for extracting from the program a number of extensible homogeneous sets. For example, let's consider a compiler for an extensible language. Here, the table of keywords, blocks of the parser for analysis of language elements, blocks of the code generator will become such homogeneous sets. Under installation, a component breaks up into a number of items, and each of those is added to one of the extracted homogeneous sets. For example, let there be a need to add to the above-mentioned compiler the realization of a new language element. Here, the connection of this new component is reduced to updating the table by new keywords and to adding the blocks of the parser and the code generator to the appropriate homogeneous sets. At the same time any installed reusable component is kept up as an entity. Thanks to this the readability of the program text is improved and the removing the given component is facilitated.

1. Introduction
2. Is class always suitable?
3. Examples
4. Layers and simple connectivity
5. The implementation of the distribution
6. The homogeneous set
7. Conclusion
Bibliography

1. Introduction

When designing a reusable component, it is necessary to decide two questions: where the boundary of the reusable functional part of algorithm lies and in what form that part should be realised? Even attaching due importance to the first question of the functional boundary, we can't escape the fact that the range of possible answers to the second question of the form of program realisation has the essential influence on the ultimate program design. We examine some popular constructs, which provide the reuse of a program component.

The term 'reuse' causes very miscellaneous associations to a programmer coming across it for the first time. Sometimes a common subroutine for calculating the sine is meant, or there may appear the image of a colleague dropping in a hurry 'A year or two ago I happened to write something like what you need. Look it up on my disk. You may find something useful and adapt it.' The latter case of an occasional reuse is rather improbable and extremely ineffective, so it hardly deserves any attention. Contrarily, the matter with the sine is all right, for the technique of reusing subroutines has been well known for some decades, proved itself to be good, and it is not going to resign.

The role allotted to the subroutines is usually that of elementary bricks, or rather, terminal leaves for growing tree of a program. Such a leaf can be either written anew or borrowed from the library. Neither choice is likely to present any difficulties. Though, if any other part of the program tree is to be reused, then a bit more complicated constructs must be applied.

Quite often it is the trunk of a program tree or, in other words, a program framework that is reused. The newly created program pieces, which determine new aspects of the application, are put into the sockets of its framework. For instance, in MS Windows almost every application has to perform some ritual operations on system objects before getting down to immediate message handling. The direct realisation of these operations would take dozens of source code lines. Many modern programming environments put these lines in the framework and allow the programmer to get down to business immediately and fill in the sockets intended for the specific components of the application.

A more interesting example of reusing a framework is the technique of computational experiment. Thousands of times during a computational experiment as new calculated data are to be received, the current mathematical model ought to be modified, but the new modification of the model requires a new program to realise it. Ultimately, the more stable program part, which is invariable from run to run, becomes a framework. Each of numerous aspects varying during the experiment is represented in the framework by its own socket. Every time a new executable version of the program is to be built, the realisation of the currently required value of the variable aspect should be placed into the corresponding socket. So, the change of a variable aspect is strictly localised; it takes only replacing the contents of the socket corresponding to that aspect.

However, it is not always that subroutines and a framework guarantee a success. As the culmination of reuse there appears 'programming-via-installing'. The existing programming knowledge should be presented as a collection of modules accumulated in a common depository, from which developing programs could extract building units. Indeed, almost every algorithmic fragment written today has been already realised hundreds and thousands of times by somebody else. So it is extremely vexing to repeat this way time after time. It would be very profitable, if the main effort could be directed to selecting and customizing the reusable components, which might be borrowed even from different sources, and to arranging their team-work rather than to directly coding new modules. Roughly speaking, 'programming-via-installing' turns programmer into installer. But reusable subroutine and framework are insufficient in this case. In order to fill the intermediate domain of a program, there must be invoked more sophisticated reusable constructs, which are to be more closely interrelated.

If 'programming-via-installing' sticks to the object-oriented style, than it is a class that meets to some degree the above stated requirements of a reusable construct for an intermediate domain. Class is a kind of synthesis of the important features of a subroutine and a framework, providing for the reuse of them. On the one hand, the methods of a class are called to accomplish auxiliary tasks and in this way the class looks much like a subroutine, a leaf of a program tree. On the other hand, the refinements of virtual methods as well as the re-definition of common methods are permitted. In this way the class plays the role of the trunk of a program tree that is the role of a framework, in sockets of which newly defined methods are placed. The framework usually oversteps the limits of a single class: in modern software, as a rule, it takes a form of class library [1].

It might seem that the tool kit of these three constructs — subroutine, framework, class — could fulfil any requirement for making a reusable component. Still further, subroutine and framework might seem to be ordinary degenerated cases of class, and class itself might seem a universal construct, satisfying all needs of reuse. But, as a matter of fact, the reuse of classes aggravates some problems, which can be usually coped with easily in the case of such simple constructs as subroutines and frameworks.

2. Is class always suitable?

If a class implanted into a program is substantial and complicated enough, it makes the rest of the program comply with its nontrivial interface. On account of nontrivial interfaces, the opportunity for a linkage between large reusable classes, which, being developed independently, know nothing of the interfaces of each other, is rather illusory. It is unlikely that such classes can get along together. Strictly speaking, the disagreement of the interfaces can be overcome by brokers, special adapting subroutines, but in this case the structure of the program will look like a wild jungle. So, brokers can't be seriously regarded as a regular means of reuse support.

Thus the role of reusable components can be played, not by classes, but only by those conforming to the receiving environment. The more the degree of conformance is reached, the easier the linkage is performed and the more the efficiency of the final program is obtained. The closest coalescence of a class and a program can be achieved when their interpenetration takes place, when certain components are taken from the text of the class description and are directly "implanted" into the text of the receiving program. Unfortunately, convenient constructs for expressing such a profound conformance are usually absent from widespread programming environments.

The reusable classes work very successfully at lower levels of program hierarchy, and due to this usage they have been included in widespread libraries for a long time. But as a rule, an effective program can't be created without any additions to the library classes. The plain hierarchical structure of the program composed of only library classes turns out to be rather weak, because class can have external connections only of two sorts (as have been already mentioned). Methods of the class can be called from outside as ordinary subroutines, and the class itself can play the role of a hierarch, allowing overwriting of its virtual methods. All in all, the pure vertical hierarchical structure resembles the heap of separate fragile broom twigs. Horizontal connections are essential to the program. They unite its components into one complete and efficient block. The lack of facility to organise horizontal connections detracts much from existing classes and hampers reuse of classes in the middle domain of a program.

There is one more pretty obvious doubt about the suitableness of the class technique for the realisation of reusable components. While the principal goal of a library subroutine and a framework is the reuse, a class is in general the result of decomposing, or designing a rational object-oriented structure to the program under development. So, it would be a bit unrealistic to expect that it does happen to be the most appropriate form to represent a reusable component.

Some positive results towards the adaptation of class and object to the needs of reuse are obtained by Microsoft in their products known under the abbreviations OLE, COM, and ActiveX [2–3]. It is regrettable that they deal only with a class and an object, and thus the area to search for prospective reusable program constructs is a priory restricted. The Microsoft's researchers are also obsessed by other ideas that are quite reasonable in itself, but narrow unreasonably the general statement of the problem of reuse. They would like to link in one program reusable components written in different programming languages; to make the platform-neutral components; and to provide for their distributed processing. In order to achieve the desirable multilingual programming and distributed multi-platform processing, the potentiality of the interaction between a component and the program has to be limited to such an extent that in this case any interpenetration between them seems to be just a pipedream. But the following examples show that there is nothing preternatural about this construct. The interpenetration between components permits of the compact realisation that will make available the considerable reserve of efficient extraction and implementation of the reusable parts.

There are also not enough horizontal connections in JavaBeans [4] and other analogous projects based on the class conception. On this account, in spite of the great number of the examples showing the successful application of reusable classes, the class is unable to be the universal reusable component. The neutral term `a large reusable component' seems to be more accurate for denoting constructs intermediate between a framework and a subroutine.

Let's now consider the links appearing usually between a large component and the program receiving it, and the mechanics required to support these links.

3. Examples

We shall start with examples illustrating the particularity of reusing large program components.

The first example is far from dealing with programming. We will demonstrate how easily the links between framework and subroutine can be established. Assume that we intend to bore a hole, we have an appropriate electric drill (a framework) but there isn't a twist bit of the right diameter in our workshop. Fortunately, we know that our friend has the right twist bit (a subroutine). We are asking him for it, fixing it in the chuck (a socket) of the drill (provided that the interfaces of the chuck and the bit are compatible), and the problem of the reuse of the drill and the bit will be successfully solved.

Now we observe what will be brought about by the complication of the environment in question. Assume that we are buying some tool (a large reusable component) in order to extend the facilities of our workshop (a program). Buying this commodity has a few dimensions or a few layers. The tool itself must be placed on a shelf in the storeroom; all the documentation, manuals, guarantees etc. are to be collected in a drawer of the desk; one more layer is a record in the ledger of the family; another layer is the notice of regular service, which goes with the reminders about monthly rent and friends' birthdays in the notepad, and so on. We are not particularly interested in the initial implementation of the layers (their obvious program analogues are the components of the class or object); but the most interesting thing is their distribution among several simply connected depositories (storeroom, document file, ledger, notepad), which happens at the moment of the installation of a new tool.

An example from IBM PC hardware. The connections between the system block and the keyboard are, as a matter of fact, a kind of a framework-subroutine relationship. The keyboard cable only needs to be plugged into the corresponding socket on the backside of the system block to complete the mounting. Thanks to the distinct functional specialisation of a keyboard and the standardisation of the connector (or, to be precise, of the keyboard interface), any system block and any keyboard can be successfully connected. Nevertheless, under a little closer examination, the keyboard looks like an object of the keyboard class, where the methods correspond with numerous soft/hardware conventions providing the actual, rather complicated, interaction with the system block.

The relationship between the system block and the keyboard is a sample of a perfect engineering solution, which have made easier the lives of both the producers of hardware and the users. However, being a dull routine, the attachment of the keyboard has little to do with the high goals of reuse and 'programming-via-installing' in question. Actually, the system block just can't dispense with the keyboard, neither can the keyboard with the system block. Linking them up, nobody expects any new significant function, apart from those that were predetermined by the designers of IBM PC architecture when Queen Anne was alive.

The attachment of, for example, a CD drive is a quite different matter. When the architecture was designed, the invention of this peculiar device could be hardly thought of. Nevertheless, everything necessary for it has been prepared. The device itself is fixed in one of the homogeneous sockets, which are traditionally called sockets for 5" floppy drives, although they have been always suitable for placing a wide diversity of devices. The power supplement is provided by means of one of the homogeneous multipurpose power connectors, and the information exchange is provided with the help of one of the homogeneous interface connectors. The interpenetration of the system block and the device is enhanced by the fact that one or more homogeneous interrupts can be allocated for the new device as well.

In other words, being installed the CD drive delegates its representatives into a number of layers of the system block: the layer of placing (the 5" drive layer), the layer of the interface cable, the layer of interrupts etc. The relationship within any one layer is a bit simpler than the whole relationship between the pair 'keyboard-system block' mentioned above, but the aggregate group of the layers offers an opportunity for the most intensive interaction.

Any PC user has an opportunity to assemble, or rather to construct, a particular hardware configuration, which can be unique in terms of functional features, and which can suite him best. When it is said 'to construct', the attachment of this or that keyboard is scarcely implied. The constructing is based on the technique placing the devices wanted into the homogeneous 5" sockets, plugging them into homogeneous slots of the PCI bus, and other similar attachments based on the technique of delegating some representatives of the device under attachment into different layers of the system block. It is this technique that allows one to improve the hardware of a computer freely enough. And it is likely that this technique is lacking in support for the reuse of large program components in consideration.

An example from operating systems. The procedure considered above of attaching a CD ROM device is very similar to the way of installing a new large application into an operating system. A new item is added to the homogeneous set of the application menu items. In order to achieve an effective interaction with the accepting environment, the new application also delegates to the operating system not only its executable code but also some drivers, various system resource requirements, etc. Thus the installation of the new applications supplements the homogeneous layers of similar elements, which were formed by the foregoing applications. The variety of the layers involved in the installation determines the depth of integration of the application and operating system.

The examples just cited (arranging a workshop, upgrading a personal computer, installing an application) are based on interpenetration between a new component and an accepting environment. Admittedly, neglecting the technical details and turning to the intuition, the interpenetration seems more attractive and closer to the cherished goal of 'programming-via-installing' then, say, the reuse of library subroutines or frameworks.

Now an example from the area of programming. Let's assume that there is a compiler for an extensible language, and a reusable component implementing a new language construct is to be added to it. On installing, the new component is distributed among a number of layers: new reserved words will be installed in the table of the scanner, a new block will be added to the parser, and a block will appear in the code generator (Fig. 1).

Fig.1. Installation of a new component in compiler

Our observations are summarized in the table.

Receiving environment	Installed component	Installation construct
Drill	Twist bit	Variant socket: chuck of the drill
Workshop	Tool	Extensible homogeneous layers (sets): shelves in the storeroom, documentation and manuals, records in the ledger, notices of regular service
IBM PC	Keyboard	Variant socket: socket on the backside of the system block
IBM PC	CD drive	Extensible homogeneous layers (sets): sockets for 5" floppy drives, power connectors, interface connectors, interrupts
Operating system	Application	Extensible homogeneous layers (sets): application menu, executable code files, drivers, resource requirements
Compiler for an extensible language	Implementation of a new language construct	Extensible homogeneous layers (sets): table of the reserved words, parser's blocks, code generator's blocks

Such examples can be offered by scores. But even without them, each experienced programmer will agree that a large program usually decomposes into a number of functional layers, and that if a component to be added is nontrivial it is unlikely to be assimilated by only one layer, but in order to be naturally integrated into its new environment, the component will probably have some effect on a few of them.

4. Layers and simple connectivity

Program layers is not a speculative conception, but a thoroughly constructive one. As a rule, a layer is a simply connected fragment of a source code, which greatly improves the readability of the program, and facilitates dealing with it. But in terms of the technology of applying reusable components, the simple connectivity of layer turn into a serious obstacle, for when being installed, a multilayered component has to be split up into some isolated pieces, which will be localised on different layers of the program.

It stands to reason that the distribution of a new component among the layers by no means implies that the component ceases existing as a single whole. Having installed a reusable component, the developer is in his right at any subsequent moment to inspect at full what exactly he has implanted into his program. Furthermore, the component itself makes one more functional program layer, orthogonal to the layers among which it is distributed. Such a layer is of no less interest, and so the possibility of observing it systematically is undoubtedly profitable in terms of program readability. Besides, in the case of removing, the preserved unity of the component will allow us to apply the technologically irreproachable operation of deleting it as a whole instead of the complicated and unreliable undo of many separate changes.

At the same time, the arguments in favour of the splitting the component are none the less compelling. In the example given above, the developer of the compiler must be granted a facility to observe the current state of the reserved word table of the implementing language in compact form, that is without the necessity of skipping over the blocks of parser and code generator, which don't interest him at the moment. If the words composing the mentioned table are textually connected then the implementation of the table will be more efficient. Both these considerations prove that the reserved words should be extracted from the added reusable component and, roughly speaking, inserted into the source code of the table. (But even neglecting the interests of program layers, it is not always that the splitting of a reusable component can be avoided, for example, the component can include some graphic or sound fragments, which usually don't belong to the text of the program, but are preferably implemented as separate files stored somewhere apart.)

It is easy to see that the constructs available in popular algorithmic languages and classes are among them, can be of little help for the proper support of the reusable component spread over several layers. In these languages the facility of distributing the parts of a component among more than one layer is either totally absent or extremely limited. It is very difficult to account for this situation rationally. The necessity of distributing appears often enough, and its implementation is quite an easy matter. The scheme of the implementation of the distribution will be illustrated below by the example of the multilayered compiler.

5. The implementation of the distribution

Let's return to the compiler for an extensible language, or rather to the installation of a new language construct implementation in it. Let's discuss only one narrow subtask, how to organise the distribution of newly defined reserved words. Assume that the compiler is written in C [5], and let's try to supply C with the tools supporting the distribution of a component. Let's make use of the technique of preprocessing, which already exists in C; and improve the set of preprocessor directives by adding three more: #Install_in, #Horizon, and #End_of_horizon.

Specifically, let the new construct of the extensible language contain only one new reserved word — newword. Then into the text of its realisation the following line must be placed:

#Install_in keyword newword

where #Install_in is a new directive of the C preprocessor, which provides for the insertion of its second argument (newword) into the set of reserved words, whose name is specified by the first argument (keyword).

The directive #Install_in is involved in forming the set keyword, the elements of which will be employed at least at two locations in the program. One location is in the head file, which is shared by the scanner and the parser, where every reserved word is associated with a number. The other location is in the reserved word table of the scanner.

Assume that the name of the enumeration constant, which defines the number associated with a reserved word, is formed by concatenating the prefix "w_" and the corresponding reserved word. Then the enumeration specification defining the constants in the head file will look like:

enum { #Horizon keyword w_ ## keyword #End_of_horizon , };

The set socket enclosed between the lines #Horizon and #End_of_horizon stands for a compile-time loop. The loop is to be repeated, as many times as there are elements in the given set. The name of the set is pointed out by the #Horizon directive. Apart from this, the name of the set serves also as the loop variable, which runs sequentially through all elements of the set. On every repetition of the cycle the lines enclosed between #Horizon and #End_of_horizon are added to the output of preprocessor, with the loop variable being replaced by the next element from the set. The delimiter, which is placed between the repetitions of the cycle, is written in the #End_of_horizon directive. Recall that the ## operator signifies the compile-time concatenation of two lexemes. So, after the preprocessing our enumeration specification is to turn into

enum { ... , w_newword, ... };

Now let's look how the reserved word table is formed with the help of the set socket.

char * words [] = { #Horizon keyword #keyword, #End_of_horizon };

The # operator on line 3 serves for putting the text replacing the loop variable in quotation marks, just in the same manner as it is conventional for arguments in C macro definitions. After preprocessing the table will assume the following form:

char * words [] = { ... , "newword", ... , };

where the reserved words appear in exactly the same sequence as the constants in the considered enumeration do. Notice that the operand is missing in the #End_of_horizon directive: the comma has moved into the body of the cycle, for here it is used not as a delimiter, but as a terminator, which is also allowed by the C syntax.

Eventually, the problem of distributing a reserved word is successfully resolved. On the one hand the word is kept in the body of the reusable component, and so the component has invariably a readable, complete text view, and what's more, the component can be efficiently removed from the program at any moment. On the other hand the word automatically appears in all the layers, where it is wanted, and due to this fact all layers also obtain an adequate readable text representation.

The realisation of the directives #Install_in, #Horizon, and #End_of_horizon is a bit different from the common scheme of preprocessing. Before beginning with the macro expansion of the set socket #Horizon ... #End_of_horizon, the elements of the specified set must be found out, and this requires the analysis of all texts, belonging to the program under compilation, and the execution of all #Install_in directives occurring in these texts. The same analysis must be carried out on the interactive request when the developer wants to inspect the complete list of the elements of a particular set. Moreover, since the expansion of the set socket may be rather intricate, it seems useful to equip the preprocessor with the facility of browsing its output interactively.

It has been mentioned that with the given scheme of the distribution among the layers, a reusable component implanted into the program doesn't cease to exist in the form of a simply connected text, and so it is not at all a difficult task to remove it from the program. All consequents of the installation of the component will be completely eliminated, if the component is just removed from the totality of the source texts participating in the linking of the program. Because all sets are rebuilt on every linkage of a program, after the removal of any component they will have automatically the right contents.

Concluding the section dealing with the example, we should remark that the distribution of reusable component parts related to the parser and to the code generator among the respective layers can be realised in an analogous way. The tools described above need only a slight improvement.

6. The homogeneous set

An essential feature of the proposed set is the homogeneity of its elements. Homogeneous sets can be composed of elements of any type, for example: a set of identifiers, a set of algorithmic fragments, a set of other language constructs, etc. But no liberties are permitted within a particular set, all the elements must conform to strict limitations. It is owing to these limitations that the elements of the set can be lined up in the set socket.

Without a doubt, the homogeneity of the set merits a constructive realisation. Only in C, due to its careless attitude to static type checking, the primitive constructs like those considered in the example concerning reserved words are permissible. More strict languages will firstly demand the unambiguous specifications for the type of the elements of each set. This demand is not difficult to satisfy: the above scheme of handling homogeneous sets can be relatively easily supplemented with tools for specifying the type of the elements, while the syntax will be only a bit more complex.

The main advantage of a homogeneous set is its extensibility, i.e. its readiness to accept new elements. Had the supporting tools outlined above been implemented, the set would be able to evolve with extreme easiness and softness. The softness of evolution means that the installation of new element in the set wouldn't require any editing. Generally speaking, it follows from the softness that all the written and already debugged source texts preserve their capacity for productive functioning. So, if all the effects of a new reusable component on the receiving program can be reduced to the dispatch of its parts to one or more homogeneous sets, then the installation of the component as a whole will be carried out in a softness way, without any editing of the source code of the receiving program.

The softness of installation is one point, but the implementation of a homogeneous set also establishes a constructive hot spot for potential program growth and points out one of the possible directions of future program evolution as well. This fact is of no less importance. Besides, all the additions to extend the program along this direction acquire concrete and distinctive frames. In particular, it becomes clear, how a reusable component (or at least its constituent associated with particular direction of the evolution) should look.

Even in the case when no installations of new reusable components are expected, the implementation of a homogeneous set may be of use for the compact representation of program layers [6–8]. But for it, some orthogonal layers would have fallen into distant text fragments. Likewise reusable components, some disposable extensions can be attached to a program, and the contribution of extensible homogeneous set to the organisation of such attachments is quite beneficial as well.

7. Conclusion

In short, what are the advantages of a homogeneous set? On the one hand, it makes all new reusable components welcome. On the other hand, it provides for the clear and constructive decomposition of the program into modules concerning various directions of its evolution; and due to this clearness and constructiveness, the elements of the set stand a real chance to become "independent units of programming knowledge" [9], that is to turn into reusable components.

In the same way as the availability of the cycle statement provokes the structural organisation of the algorithm under development, the realisation of homogeneous sets, possessing a certain mobilising influence, is able to form the atmosphere of continuous search and adequate realisation for potential sources of all kinds of reuse.

We hope that in a short time the programmer, now splitting a program under development only into frameworks, subroutines, and classes, will have at his disposal a very productive configuration landmark — the homogeneous set. The act of converting this construct into an organic part of tools provided by widespread programming environments could give an impetus to that inert movement towards reuse and 'programming-via-installing', which we can now observe.

Bibliography

Fayad M.E., Schmidt D.C. Object-oriented application frameworks // Comm. ACM. — 1997. — V.40, N.10. — P.32–38.
Chappell D. Understanding ActiveX and OLE. — Microsoft Corporation, 1996.
Rogerson D. Inside COM. — Microsoft Corporation, 1997.
Harold E.R. JavaBeans. — IDG Books Worldwide, 1998.
Kernighan B.W., Ritchie D.M. The C programming language. — 2nd ed. — Prentice Hall, 1988.
Gorbunov-Possadov M.M. Extensible programs. — Moscow, Polyptych, 1999 (in Russian).
Gorbunov-Possadov M.M. The softness of program evolution // Open systems journal. — 1996. — N. 4. — P. 65–70 (in Russian). — http://keldysh.ru/softness/softness.htm (in English).
Gorbunov-Possadov M.M. System is open, while something is still in the way // Open systems journal. — 1996. — N. 6. — P. 36–39 (in Russian).
Tseytin G.S. On the way to component technology // Programming and computer software. — 1990. — N. 1. — P. 78–92 (in Russian).

The work was supported by the Russian Foundation for Basic Research, grant 96-01-01138

Author's personal web page: http://keldysh.ru/persons/english/gorbunov.html

M.M.Gorbunov-Possadov

EXTENSIBLE HOMOGENEOUS SETAS A BASIS FOR REUSE

EXTENSIBLE HOMOGENEOUS SET
AS A BASIS FOR REUSE