DOC

Typecheking object queries

By Ellen Pierce,2014-04-01 19:09
10 views 0
Typecheking object queries

    Semi-Strong Static Type Checking of Object-Oriented

    1Query Languages

    23,11Michał Lentner, Krzysztof Stencel, Kazimierz Subieta

    1Polish-Japanese Institute of Information Technology, Warsaw, Poland 2Institute of Informatics, Warsaw University, Warsaw, Poland 3Institute of Computer Science PAS, Warsaw, Poland

    Abstract. We propose a new semi-strong approach to types and static type

    checking in object-oriented database query and programming languages. Many

    features of current object-oriented query/programming languages, such as

    naming, ellipses, automatic coercions and irregularities in data structures,

    cause that current formal strong type systems are irrelevant to practical situa-

    tions. There is a need for semi-strong, but still efficient type checking. We

    treat types as syntactic qualifiers (signatures) attached to objects, procedures,

    modules and other data/program entities. In our approach a type inference sys-

    tem is based on decision tables involving signatures and producing type check-

    ing decisions. A type checker is based on data structures which statically mod-

    el run-time structures and processes: a metabase, a static environment stack, a

    static result stack and a type inference decision tables. To discover several

    type errors in one run we use the mechanism for restoring the state of the type

    checker after a type error.

    1 Introduction

    Apparently, strong type checking of object-oriented or XML-oriented query lan-guages integrated with programming languages is exhausted by the current state-of-the-art. There are thousands of papers on types. Many of them deal with bulk types typical for databases and database programming languages, including query lan-guages. There are also many practical proposals of type systems implemented in object-oriented programming languages (e.g. C++ or Java) as well as in research prototypes, e.g. PS-Algol, DBPL, Galileo, Napier-89, Fibonacci, etc. (see overviews [Atki87, Atki95]). A strong typing system (however, criticized [Alag97]) has also been proposed within the ODMG standard [ODMG00] for object databases. Al-though typing was not the main concern of the SQL-99 standard, it contains many pages devoted to types. Recently the XQuery standard proposal is also claimed to be strongly typed and corresponding typecheckers have been developed (see e.g. [FL05]).

     1 This paper presents a part of the research preceding the 6th Framework European Project eGov Bus, IST-026727-STP.

    However, there are features of query/programming languages and environments that make implementation of types difficult. Taking into account such issues as mu-tability, collection cardinality constraints, collection kinds, type equivalence based on type names, inheritance and multiple inheritance, dynamic object roles and dynamic inheritance, modules, export and import lists, etc. causes that the type notion is tan-gled with so many details and peculiarities that typical type systems known e.g. from functional languages become idealistic and impractical. Moreover, irregularities in data structures (null values, repeating data, variants/unions, unconstrained data names), ellipses and automatic coercions occurring in queries cause that strong typ-ing should be relaxed to be efficient for the programmers.

    Indeed, strong type checking professionals are trying to convince the community that all what should be done is already done. On the other hand, the typing solutions of the ODMG standard and challenging implementation of types in XQuery cause another impression. Developers of many web and database programming tools ignore types totally because the theories are obscure and implementation is too difficult. If it is so good in theory, why it is so bad in practice? We have checked this fundamental question by implementation of a strong type checker within our recent project ODRA, an object-oriented database server devoted to Web and grid applications. ODRA is based on the query/language SBQL, incomparably more powerful than ODMG OQL (because of its algorithmic power) and Xquery (because of updating facilities and views). SBQL is not only a very powerful object-oriented query language, it is also extended by imperative constructs of programming languages (a la Oracle PL/SQL or SQL-99), programming abstractions such as procedures, functions and methods, and database abstractions such as stored procedures and updateable views. The report [HLSS05] and this paper present the first inferences concerning this experience. First, we have concluded that many theories are good guidelines, but eventually must be corrected by practical considerations. Second, strong typing sys-tems presented by the theories are too strong and must be relaxed to meet reality. Therefore we propose a new semi-strong approach to the problem. Third, the major issues in the typing systems are not theoretical, but practical development of solu-tions that meet the typical programmer’s psychology. Psychology is hardly formal-

    ized, thus a lot of solutions in our typing system is dependent on (sometimes random) decisions which anticipate further behaviour of the programmers.

    To make our contribution more clear, we present the following example from the ODMG standard. The problem concerns auxiliary names or “synonyms” or “iteration

    variables” in languages such as SQL and OQL. Consider the following OQL query,

    p. 104 [ODMG00]:

    select * from Students as x, x.takes as y, y.taught_by as z

    where z.rank = "full professor"

    The query defines three names x, y, z, considered “iteration variables”. According

    to the scope rules (p. 112), their scope is limited to this query (they have no meaning outside it). But it appears (p. 105) that the type returned by this query is:

    bag<struct (x:Students, y:Section, z:Professor)>

    The semantic qualification of these names is changed: instead of being “iteration variables” they become structure field labels. Because the output of the query can be used in another (outer) query, these names can be used outside the query where they have been defined (which is inconsistent with the assumed scope rules). This example shows very basic questions:

    (1) Which formal model makes it possible to change iteration variables into

    structure field labels?

    (2) What is the real scope for x, y, z?

    (3) How the semantics can be correctly formalized?

    (4) How this situation can be correctly statically typed?

    (5) Which of the currently known typing systems can handle such situations? In our formalization of SBQL [Subi04] we have shown that intuitions of the ODMG are correct. We have precisely defined the semantics of the as operator, but

    this semantics has no precedents in programming languages. The resulting type is as shown above, however, the scope of variables x, y, z is restricted in a way that is

    different from any kind of textual query/program parentheses (a query, a program block, a procedure, a module, etc.). Obviously, the type inference engine should approximate run-time calculations during parse/compilation time. However, because such a semantic situation is unknown in programming languages, the answer for (5) is negative: no currently known typing systems is proper. The answer for (4) is (in particular) the subject of our research: we show how this case can be correctly typed. This is one of many examples showing that current trends in object-oriented and XML-oriented databases and query languages go far beyond the proposals of strong polymorphic typing systems that are known from previous years. Even apparently innocent concept of collection (bulk type) causes difficulties. In particular, it is nor-mal e.g. in SQL that a collection with one value v is considered equivalent to the

    value v (see nested select clauses in SQL). This kind of coercion requires shifting some type checks to run time. Careful analysis has shown that collections (especially heterogeneous ones) are in contradictions with the fundamental for object-orientedness substitutability and open-close principles. In XML technologies (DTD and XML Schema) the collection concept is replaced by the cardinality constraint concept, which makes quite a big difference for static typing systems (bulk types are type constructors, while cardinalities are attributes of internal type signatures). Again, we didn’t find any static typing system dealing with cardinalities rather than

    collections. Such cases have convinced us that static strong typing requires nowadays a quite new approach. We call it semi-strong typechecking (after semi-structured

    data concept). A thorough description of the approach and implementation can be found in [HLSS05].

    Data schemata partly release programming languages from keeping type informa-tion. Run-time libraries or programs read this information from schemata stored by databases. In this way types become neutral to programming languages. This as-sumption is the basis of the type systems of CORBA [OMG02] and ODMG [ODMG00]. Some hybrid solutions are proposed which consists in textual mapping of types (defined e.g. in CORBA IDL) to types specific to a programming language. In our approach we assume that the semi-strong type checking mechanism is directly

    based on the database schema. We do not consider how our type system can be mapped to a typical object-oriented programming language such as C++ or Java. Because our typing system is much more sophisticated, this probably would not be an easy task.

    Our idea follows the Stack-Based Approach [SKL95, Subi04], which explains the mechanism of query evaluation for object-oriented databases through concepts well-known in programming languages. It involves three basic architectural elements: an object store, an environment stack and a result stack. The idea of a static type check-er consists in simulating actions of the above three run-time structures during the compile time. This concept (executing programs on abstract data to obtain informa-tion about them) is the underlying intention of abstract interpretation, which was first proposed in [CC77] and some other work has been done later [Hah92]. A static type checker consists of the following architectural elements: a metabase (a counterpart of an object store), a static environment stack, a static result stack and type inference decision tables for all operators introduced in a given query/programming language.

    A particular row of a decision table contains signatures of the given operator ar-guments and the decision, which is one of the three items: (1) it determines the re-sulting signature for the given operator and signatures of its arguments; (2) it quali-fies situation as a type error; (3) it qualifies the situation as impossible to check stati-cally and adds run-time coercions and/or pushes the type check to run-time. Type inference decision tables make type soundness a straightforward issue, pro-vided that the description of query operators indeed matches their run-time behavior. Our approach facilitates accommodating non-standard ad-hoc features of practical languages better than conventional type systems, since the operational semantics can be modeled more directly. Such ad-hoc features are very useful in semi-structured processing.

    Our semi-strong approach to typing is very different from everything what till now has happened in the domain. The approach is supported by experimental im-plementation in the ODRA system, where we have shown that the idea is imple-mentable, efficient and has many advantages over current proposals for database typing systems. The comparison with similar existing proposals is difficult due to incompleteness and inconsistency (hence, non-implementability) of some proposals, much higher functionality and power of SBQL in comparison to e.g. ODMG OQL and XQuery, totally different formal background, unknown implementation features of majority of other proposals, etc. The black-box comparison convinced us that our approach offers for the designers of database languages much more flexible typing mechanisms that cover object-oriented, semi-structured and Web-oriented technolo-gies.

    The rest of the paper is organized as follows. In section 2 we present database schemata and metabases compiled from the schemata . Section 3 introduces the idea of type inference decision tables. Section 4 describes the type checking procedure. Section 5 concludes.

2 Database Schema and Metabase

    A type system for a database is necessary to express the information on the concep-tual and logical structure of the data store. The information is commonly referred to as the database schema. Fig.1 presents an example schema in an UML-like notation that can be easily transformed into e.g. ODMG ODL-like specification. Although basically the query engine is independent of the database schema, we need it to rea-son statically about type correctness of queries and programs. We also need it for other reasons, in particular, to enforce type constraints inside the object store, to resolve some binding ambiguities, to reason about ellipses, dereferences and coer-cions occurring in queries, and to optimize queries.

    A schema has two forms: external (as presented for programmers, e.g. Fig.1) and

    internal (as used by the type inference engine). There is a lot of syntactic freedom concerning the external schema form and this issue we consider important but sec-ondary. In this paper we are interested in an internal schema, which will be called metabase. A metabase represents schema information in a form of well-structured data (a graph), allowing for querying and for modification. A metabase statically models and reflects the data store and itself it looks like a data store. This idea is similar to DataGuides of Lore [Gold97]. Fig.2 presents a metabase compiled from the schema depicted in Fig.1.

    A node of a metabase graph represents some data entity (entities) of the object store: object(s), attribute(s), link(s), procedure, class, method, view, and perhaps others. Each node itself is a complex object having its non-printable identifier (we call it static identifier) which uniquely identifies the node. In this paper we assume a meta-language convention that the node describing entities named n has the identifi-

    er denoted i. The convention has no meaning for semantics and implementation of n

    the static typing mechanism.

    For simplification in this paper we “stick” representation of a complex object with representantation of its direct class. This does not lead to ambiguity, because on the run-time environment stack the section with binders to a complex object has always a twin, neighbouring section with binders to properties of its direct class. Such stick-ing of metabase nodes implies similar sticking of these two sections on the static environment stack, thus simplifies the metabase and implementation.

    Person[0..*]Person[0..*]

    name: stringname: string

    changeName(string)changeName(string)

    employs[0..*]works_inEmp[0..*]Emp[0..*]Dept[0..*]Dept[0..*]

    empno: intempno: intname: stringname: string

    sal[0..1]: intsal[0..1]: intloc[0..1]: stringloc[0..1]: stringmanages[0..1]boss

    job: stringjob: string

    changeSal(int)changeSal(int)

    Fig. 1 An example database schema

    As a complex object, a node of a metabase contains several attributes. In this pa-per we distinguish the following ones: name, kind (atomic, complex, link, procedure,

    method, class, etc.), type (an atomic type of objects described by the node; this

    attribute is not present in nodes describing complex and link objects, because their types are composed from types of other entities), card (cardinality). Cardinalities

    will be written as i..j, where i is the minimal number of described objects and j is the

    maximal number of them. If the number of described objects is unlimited, j has the

    value *. Cardinalities 0..0 and 1..1 will be written 0 and 1, and 0..* will be written as *.

    There could be other attributes of the node, in particular, mutability (perhaps, subdivided into update-ability, insert-ability, delete-ability), a collection kind (bag, sequence, array, etc.), type name (if one would like to assume type equivalence based on type names, as e.g. in Pascal), and perhaps others. For simplicity of presentation in this paper we omit them, but they can be easily involved into type inference deci-sion tables and then introduced in implementation.

    Edges in the metabase are of three kinds: (1) an ownership which connects subob-jects with their owner, (2) the range of a link object and (3) the inheritance.

    name: name name: Person kind: object kind: class type: string card: * card: 1

     name: works_in name: changeName kind: link kind: method card: 1

    type: string ; void name: employs

    kind: link name: Emp name: Dept card: 1..* kind: class kind: class name: manages card: * card: * kind: link card: 0..1

     name: boss name: changeSal kind: link kind: method card: 1 type: int ; void

    name: empno name: sal name: job name: name name: loc

    kind: object kind: object kind: object kind: object kind: object

    type: int type: int type: string type: string type: string

    card: 0..1 card: 1 card: 1 card: 1 card: 0..1 Fig. 2 An example metabase graph

    In Fig.2 we distinguish (1) and (2) edge kinds by the attribute kind. Note that as-

    suming some predefined (the same) name for all nodes, the metabase can be easily represented as an object store built according to the store model. In this way the metabase can be queried through SBQL.

    3 Type Inference Decision Tables

    The query engine processes values which can also be object references. The purpose of introducing static stacks is precise simulation of the actual computation. The static

    stacks contain signatures which are descriptions of values used during the actual computation. The set of signatures S is recursively defined together with the set of signature components SC. These two sets are the smallest sets which satisfy the fol-lowing conditions:

    1. Names of atomic types (e.g. int, float, string, date, etc.) belong to SC.

2. All static identifiers of the metabase graph nodes (e.g. i) belong to SC. and i EmpDept

    Identifiers of graph nodes represent types defined in the database schema. Static identifiers are signatures of references to store objects.

     and n is an external name, then the pair n(x) belongs to SC. Such 3. If x belongs to S

    a pair will be called static binder.

    4. If n 1, x, xx belong to SC, c is a cardinality, and o is an ordering tag, then 12, …, n

    (x, xx)[c]o belong to S. 12, …, n

    The last point defines collections of structures, but it also encompasses individual items which are just singleton structures. Although the set of cardinalities may be arbitrary, in our research we have limited it to {0..0, 0..1, 1..1, 0..*, 1..*}. Its ele-ments have an obvious meaning. Therefore, collections are treated the same way as individual and optional items. It is just the matter of cardinality. The ordering tag is one of {o, u} which stand for ordered and unordered respectively.

    All signatures are thus uniform. Each of them determines a content, a cardinality and an ordering. As it can be seen all three aspects are orthogonal. Such signatures can be further orthogonally augmented with mutability tags, store locations (in dis-tributed environments), etc.

    During the static type check we use rules to infer the type of complex expressions from types of their subexpressions. These rules have the form of type inference deci-sion tables. Such a table is established for each operator of the query/programming language. The decision tables for all the non-algebraic operators are generalized, i.e. each row of the table represents some collection (usually infinite) of real cases. Such generalized decision tables will be called meta-rules.

    For example, let us consider the navigation (the dot operator) and query q.q. LR

    The type of this query depends on the types of the queries q and q. LR

    The type of q The type of q The type of q.q LRLR

    (x, xx)[c]o (y, yy)[d]p (y, yy)[cd](op) 12, …, n12, …, m12, …, m

    In this table cd is the product of c and d, which is obtained by the multiplication of

    lower and upper bounds of the cardinalities c and d: [e..f][g..h] = [e*g..f*h]. The

    meaning of the conjunction of o and p is obvious. The resulting collection (even

    empty or singleton) is ordered if and only if both argument collections are ordered. In general, development of type inference decision tables presents a kind of art with no tips from theories. The art is based on precise understanding of the language semantics and the sense of programmers’ psychology. The devil is within a lot of

    detailed decisions concerning particular rows of the tables.

    4 Static Type Checking

    The general architecture of the type checker is presented in Fig.3. Shaded shapes are program modules, while dashed lines surround data structures which are used and created by the modules. The query parser takes a query as a text supplied by a client

    and compiles it to produce an abstract syntax tree of the query. This syntax tree is

analysed, decorated and sometimes modified by the type checker. If the type check-

    ing is successful (the query is correct), the query is executed by the query engine.

    The query engine operates on two stacks and on the data store.

    a query client Query parser

    Abstract syntax tree of the query

    Type checker Query engine

    Static stack ENVS S_ENVS

    Static stack QRES S_QRES QRE

    S

    Metabase Object store

     Fig. 3 The general architecture of the type checker

    Analogously, the type checker which is to simulate the execution of the query op-erates on corresponding static structures (the static environment stack S_ENVS, the static result stack S_QRES and the metabase). The type checker uses the information known during the parsing and does not retrieve any information from the data store. The static stacks contain, in particular, signatures of objects from the data store. The type checker processes the signatures exactly in the same way as the query engine could later process the concrete object from the data store, if they were not optimized. The procedure static_type_check is the heart of the type checker and operates on

    the syntactic tree of a query, both static stacks and the metabase. This procedure is an abstract implementation of our type checker. It performs the computation on sig-natures just as if they were actual data. During the signature computation, the proce-dure accomplishes the following actions:

     Checks the type correctness of the syntactic tree of a query by simulating the

    execution of this query on the static stacks S_ENVS and S_QRES.

     Generates messages on type errors.

     Augments the syntactic tree in order to resolve ellipses.

     Augments the syntactic tree with automatic dereferences and coercions.

     Augments the syntactic tree with dynamic type checks, if the type correct-

    ness cannot by asserted statically. Such augmenting means that type checks

    are postponed until run-time.

     Possibly modifies the static stacks in order to restore the process after a type

    error has been encountered. These modifications are driven by rules which

    define most probable result types in certain cases. They allow detecting

    more than one error during one type check pass.

    Before the function static_type_check is run, the stack S_ENVS must contain the

    static base section which consists of the static binders to all static root identifiers (i.e. static identifiers of objects which are starting points for querying). For each static root identifier i defined in the schema, the base section of S_ENVS will contain the n

    signature (n(i))[0..*]u. For the schema presented on Fig. 1 the base section of n

    S_ENVS will consists of (Person(i) )[0..*]u, (Emp(i))[0..*]u and PersonEmp

    (Dept(i))[0..*]u. Dept

    The structure of the procedure static_type_check is driven by the abstract syntax

    of queries. For each kind of a query (literal, name, unary operator, algebraic operator, non-algebraic operator) it contains a section which describes how to check the type of a query built from this kind of operator.

    6 Conclusions

    We have proposed a new semi-strong approach to static type checking assuming the practitioners’ viewpoint. Many interrelated aspect of a strong type checking mechan-

    ism, irregular, arbitrary choices that must be taken during the development, depen-dencies on programmers’ psychology, and other factors have caused our loss of be-

    lieve that any academic type theory could bring an essential support in the develop-ment of strong static type systems for object-oriented database query/programming languages.

    Our type system consists of a metabase, a static environment stack, a static result stack and type inference rules. The rules are represented as decision tables and are defined for all the operators occurring in the language. We have described an appro-priate static type checking procedure and explained how this procedure can be used to correct certain type errors in queries and to recover a type checking process from wrong state that may occur after a type error. Such restorations allow detecting more than one error during one type check pass. The procedure makes it also possible to resolve some ellipses, to accomplish some type coercions and to insert dynamic type checking actions into a run-time query/program code.

    We have validated our semi-strong approach to typing on our experimental object-oriented database system ODRA devoted to Web and grid applications, where we have shown that the approach is implementable and efficient.

    References

    [Alag97] S.Alagic. The ODMG Object Model: Does it Make Sense? Proc. OOPSLA

Report this document

For any questions or suggestions please email
cust-service@docsford.com