Using Schematically Heterogeneous Structures
Renke J. Miller *
Department of Computer and Inforrnation Science
Ohio State University
Columbus, OH 43210
Tel: +l 614 292 7027
Fax: +l 614 292 2911
Abstract
Schematic heterogeneity arises when information that is rep-
resented as data under one schema, is represented within
the schema (as metadata) in another. Schematic hetero-
geneity is an important class of heterogeneity that arises
frequently in integrating legacy data in federated or data
warehousing applications. Traditional query languages and
view mechanisms are insufficient for reconciling and trans-
lating data between schematically heterogeneous schemes.
Higher order query languages, that permit quantification
over schema labels, have been proposed to permit querying
and restructuring of data between schematically disparate
schemes. We extend this work by considering how these
languages can be used in practice. Specifically, we consider
a restricted class of higher order views and show the power
of these views in integrating legacy structures. Our results
provide insights into the properties of restructuring trans-
formations required to resolve schematic discrepancies. In
addition, we show how the use of these views permits schema
browsing and new forms of data independence that are im-
portant for global information systems. Furthermore, these
views provide a framework for integrating semi-structured
and unstructured queries, such as keyword searches, into a
structured querying environment. We show how these views
can be used with minimal extensions to existing query en-
gines. We give conditions under which a higher order view is
usable for answering a query and provide query translation
algorithms.
1 Motivation
Two schemes are schematically
heterogeneous
if data un-
der one schema corresponds to database or schema labels in
the other. Schematic heterogeneity arises frequently since
names for schema constructs (labels within schemas) often
capture some intuitive semantic information. Some authors
argue that even within the relational model it is more the
*This research is being supported by a Presidential Early Career
Award for Scientists and Engineers (PECASE) under NSF Award
Number 9702974.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use i6 granted without fee provided that
copies are not made or distributed for profit or commercial advan-
tage and that copies bear this notice and the full citation on the first page.
To copy otherwise. to republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or I fe*.
SIGMOD ‘98 Seattle, WA, USA
Q 1998 ACM 0-89791.996-6/98/008...$6.00
rule than the exception to find data represented in schema
constructs [22]. Within semantic or object-based data mod-
els it is even more common [20]. For example, a stock
class
may have a set of subclasses, one for each company, where
the names of companies serve as labels for the subclasses.
Traditional query languages, including SQL and common
object languages are typically not sufficient for reconciling
schematic heterogeneity [22]. As a result, traditional query
languages have limited use in creating integrated views of
schematically disparate structures.
1.1
The Problem
We begin by considering a number of scenarios in which
schematic heterogeneity can occur. The first is a classical
problem in which a number of legacy sources, which differ
schematically, must be used in concert. We then consider
situations in which the source data is schematically consis-
tent, but the ability to introduce schematically disparate
views of the data can provide useful functionality. Our ex-
amples draw from the worlds of database publishing, data
warehousing, and techniques for providing physical data in-
dependence.
1.1.1
Legacy System Integration
Consider a company wishing to integrate a number of legacy
systems that manage or use information about stock prices.
Each of the legacy systems was developed independently to
meet the needs of different applications and may contain
information about overlapping subsets of stock data. The
company may now have a new application that requires
access to all of this data. Due to the use of different de-
sign methodologies or the individual perspectives of various
database designers, different decisions may have been made
as to what data is invariant and therefore suitable for inclu-
sion as part of a schema. An example of this, adapted from
[22], is shown in Figure 1 depicting three relational schemas
for stock information. All three schemas intuitively model
similar information about the prices of company stocks on
different dates. Schema Sl contains a single relational ta-
ble. A table entry (tuple) contains a company name, a date,
and the price of the company’s stock on that date. Schema
S2 contains a separate relational table for each company.
Company names serve as labels for the tables. Schema S3
contains a single table where for each date the price of a com-
pany’s stock is recorded in a separate attribute. In Schema
S3, the company names serve as labels for a set of attributes
each containing price information. Information that is cap-
189
tured by data in the first schema (as values in a specific
schema instance) is expressed within the schema itself (as
either names of tables or names of attributes) in the second
and third schemas.
! stock
date COA COB COC . . ;
I
I
Figure 1: Three stock schemas. Table names are in bold,
attribute names in italics.
Migrating all data to a common form would require that
existing applications be rewritten which, for even moder-
ately sized systems, is simply not feasible [6]. Neither is it
feasible to require all new applications to use the current
view(s) of stock data. Using Schemas S2 and S3, tradi-
tional query languages would not permit the expression of
queries that quantified over company names (such as “find
all companies whose stock price has ever gone over $lOO”.)
Hence, the schema design itself can limit the set of express-
able queries. These limits may be unexceptable for new
applications.
Instead, we would like to construct a view (according to
the modeling requirements of our new application) that can
nevertheless be used to access the unchanged legacy stock
data. The view must be data independent. That is, it should
not depend on the specific names or numbers of companies
and should not need to change as the data evolves.
Using SQL, a single data independent view cannot be
constructed. Since SQL does not permit quantification over
relation or attribute names, any SQL view would have to in-
clude company names as constants. The view vl of Figure 2,
which translates data from Schema 92 to sl, illustrates this.
If the set of company names changes, the view definition
would need to be altered. To permit the specification of data
independent views, a language that permits quantification
over schema labels is needed. Many higher order languages
that include some form of quantification over schema con-
structs have been proposed. These include multi-database
languages [27], logics [ll, 231, algebras [33, 171, and object-
create view VI (co, date, price) as
( select ‘coA’, date, price
create view v2 (co, date, price) as
from coA
select R, T.date, T.price
union
from s2->R, R T
select ‘COB’, date, price
from COB
union
select ‘coC’, date, price
from coC
. . .
)
create view v3 (co, date, price) as
select A, T.date, T.A
from s3::stock->A, S3::stock T
where A # ‘date’
Figure 2: Example SQL and SchemaSQL views.
hotel
hid name chain city
country class
. . . . .
resort
hid vacationpkg
confctr
hid
numMtgRms confCapacity
hotelpricing
hid single double suite ...
Figure 3: Hotel Database.
oriented languages [22, 4, 201 to name just a few. Views v2
and v3 of Figure 2 are examples of higher order views con-
structed using SchemaSQL [24], a higher order extension of
SQL that permits quantification over database, relation and
attribute names. In the view v2, the notation 52 +
R
de-
clares
R
to be a relation variable ranging over all relations in
92. The variable
T
is a tuple variable declared to range over
the relations of
R.
This view translates data from Schema
32 to Schema sl and is independent of the specific relations
present in 92. In the view v3, the declaration 93 ::
stock + A
declares the variable
A
to be an attribute variable ranging
over all attribute names in the relation stock of Schema 93.
The variable
T
is again a tuple variable, here declared to
range over the tuples of the stock relation. View v3 trans-
lates data from s3 to sl. This example motivates the need to
be able to reconcile heterogeneous representations dynam-
ically, a task that can be accomplished using higher order
views.
1.1.2 Database Publishing
Higher order reasoning has numerous applications beyond
reconciliation of differing design choices. Consider the hotel
information of Figure 3 which is adapted from the DataWeb
system [31]. DataWeb permits the publishing of structured
databases providing sophisticated browsing and navigation
techniques to enable naive users to locate useful information
in a large, complex data collection. In this example, the at-
tribute hid in the relations is a foreign key to
hoteLhid.
The
tables
resort
and
confctr
represent subclasses of the hotel ta-
ble containing attributes specific to each type of hotel. The
information common to all hotels is vertically partitioned
between the
hotel
table and the
hotelpricing
table. The lat-
ter contains a number of attributes related to the pricing of
different room types.
Consider the construction of a database publishing appli-
cation, such as DataWeb, that makes this information avail-
able on the Web. The schema is rather complex so a useful
interface would necessarily permit some form of schema in-
dependent querying. For example, a keyword search inter-
face would permit users to find hotels in the “Sofitel” fran-
chise without knowing which table or attribute contained
this information (name, chain, branch, owner, etc.) A forms
interface might permit a user to request hotels with rooms
for under $70 without requiring the user to indicate (or un-
derstand) specific price related attributes over which this
predicate should be applied. Consider how either of these
queries would be specified. The keyword search is essentially
the query
select the
hid of tuples in any relation
where some
190
attribute
value contains ‘Sofitel”. The inexpensive query is
select
hid
from hotelpricing
where
any attribute value is less
than 70. Both of these queries are higher order. Although
the information in the hotel database does not contain any
schematic discrepancies, the ability to answer queries over
schema components could be used to support the schema
independent querying required in this environment.
Suppose the information of Figure 3 has been collected
in a data warehouse and consider a decision analysis appli-
cation on this information. Specifically, assume the applica-
tion will need to aggregate hotel information over different
dimensions presenting a tabular (or data cube style [14])
summary to a user. An example query would be to find the
number of hotels in each country of each class (including
subtotals for all classes and all countries). Using this infor-
mation, a user can “drill down” to specific data of interest
by refining the dimensions over which aggregation is done
(for example, viewing information aggregated by country,
then by city). The concept of data dimensions is central to
decision analysis applications. They provide the axes over
which data can be “sliced and diced”. In the first genera-
tion of decision analysis systems, the dimensions were fixed
[12]. Many recent systems permit the dynamic creation (of
a potentially large numbers) of new dimensions [31]. In our
hotel example, this means the addition of new hierarchies
over which hotels can be aggregated. Dimensions may be
represented by attributes or sets of attributes. An exten-
sible analysis system requires the ability to reason over (a
time varying) set of new dimensions. Specifically, it must be
possible to permit the user to specify predicates that can be
applied to any subset of dimensions.
1.1.3 Physical Data Independence
Indexing architectures often use views to describe index struc-
tures in order to permit the integration of new indexes (or
access methods) into a query optimizer [8, 371. Tsatalos et
al [37] have developed a robust indexing architecture that
makes use of restricted views containing only selection, pro-
jection and join operations. The views are used to describe
a wide array of advanced indexing techniques. However,
they point out that their techniques cannot be used to de-
scribe indexing over subclasses of a class. This is due to
the lack of support for unions and for schema independent
views. Consider the class
tickets
that contains information
about traffic violations issued by the highway patrol which
is adapted from [l]. The class
tickets
contains the attributes
tnum, lit, infr describing the ticket number, license number
and infraction code respectively. Information kept by each
jurisdiction (a state or region within a state) is kept in a
subclass whose name is the jurisdiction’s name. Each ju-
risdiction maintains information about tickets issued in its
boundaries and in addition attempts (although perhaps not
consistently) to maintain information about all tickets given
to people holding licenses issued within its region. The def-
inition of a Bf-tree index keyed on the ticket infraction is
given in Figure 4 using the notation from [37]. The index
is defined over all jurisdictions. In [37], indices must be de-
scribed using SQL views and as a result, the proposal cannot
express or use indices over data dependent collections of ta-
bles (or classes). For example, indices over all subclasses of a
class cannot be express [37]. Using higher order views, such
indices can be expressed. However, to make use of these new
indices in query processing, it must be possible to determine
whether a view (index) can be used in answering a query.
Furthermore, it must be possible to determine what portion
create index ticketlnfr as btree by
create view dui(lic, infr) as
given T. infr select Tl.lic, T2.infr
select R, T.tnum, T.lic ,from ->R, R Tl, R T2
,from ->R, R T
where Tl.lic = T2.lic
and Tl.infr = ‘dui’
and Tl.tnum <> T2.tnum
Figure 4: Traffic ticket database.
of a query the view (index) answers.
Using higher order views, a wide class of useful new in-
dexing structures can be expressed. For example, consider
the view
dui
of Figure 4 which can be used to describe an
index helpful in evaluating data fusion [l] style queries. A
data fusion query involves the self-join of a union of tables.
The view dui finds all infractions issued to anyone with one
or more ‘DUI’ infraction (driving under the influence of al-
cohol). This view may describe an index that materializes
this specific fusion query.
Similarly, schematically disparate views of the data can
be used to model an inverted index that might be used
to support the evaluation of keyword search queries over
a structured database. By incorporating such views into an
architecture for supporting physical data independence, the
query optimizer can reason about the use of these indexing
structures and their combined used with traditional access
methods.
1.2 Paper Overview
In this paper, we develop techniques such as those required
in the above examples for understanding and managing sche-
matic heterogeneity. Our examples point to several different
scenarios in which some form of schematic reasoning is re-
quired. Indeed, the examples make use of higher order views,
higher order queries and higher order indexes (schema inde-
pendent indexes). On the surface, these examples seem to
necessitate full support for a higher order query language
and view definition facility capable of describing logical and
physical (index) structures. Specifically, the examples point
to the need for a language that is powerful enough to recon-
cile schematic heterogeneity and to describe heterogeneous
data representations. Additionally, they point to the need
for techniques to automatically translate queries against the
view into queries against the legacy schema(s) and to opti-
mize these modified queries.
Such extensions to a complex query processing engine
can be prohibitively expensive. We therefore take a prag-
matic approach. We propose a solution in which all sche-
matic heterogeneity is reconciled by a limited form of higher
order views called dynamic
views.
In Section 2, we overview the contributions of this work
and place it in the context of related work on schematic het-
erogeneity. In Section 3, we overview the proposed solution.
While our solution does not provide full support for higher
order reasoning, we show that it is still powerful enough
to meet the needs of the diverse applications we have just
outlined. In Section 4, we discuss the restructuring prop-
191
erties of dynamic views. In Section 5, we analyze when a
dynamic view can be used to answer an SQL query and
provide query translation algorithms. Given our desire to
support data warehousing and database publishing applica-
tions, we include a discussion of aggregate queries and views
in the analysis since such queries are important in these en-
vironments. In Section 6, we consider how dynamic views
(whether used to describe legacy sources or indexes) can be
used in a conventional query optimizer.
2 Related Work
The prevalence of schematic heterogeneity is well document
in the heterogeneous database research literature. The prob-
lem is discussed informally in early schema integration work
with the goal of defining transformations between schemati-
cally disparate schemas [13] and in other work with the goal
of enumerating and classifying the types of schematic dif-
ferences that can arise 121, 181. Builders of federated and
warehousing systems frequently encounter schematic hetero-
geneity in practice. See the work on Garlic [7], TSIMMIS
[lo], and Pegasus [2] to name just a few. Work on man-
aging and using schematically disparate structures has fo-
cussed on languages for querying such structures. MSQL
provides basic features for querying over schema labels [27].
This and other higher-order query languages can be used in
multi-database architectures where integration is not done a
priori through an integrated view, but rather on demand in
each query. More recently work on languages to query struc-
tured and semi-structured sources also provides higher-order
capabilities [32].
Other proposals provide data model extensions to permit
type extents (meta-extents) to range over unspecified col-
lections [36]. New extents can be added dynamically with-
out changing the schema (type) specification. Meta-extents
are similar in motivation to the dynamic views we propose.
However, dynamic views are more amenable to use in appli-
cations beyond integration, including those discussed in the
introduction. Several proposals have consider the definition
of higher order views for defining integrated views (22, 241.
We build on this work to consider how these views can be
used in practice with minimal extensions to query processing
engines.
3 Proposed Architecture
To present our solution, we will use SchemaSQL, a higher
order extension of SQL [24]. We briefly overview the rel-
evant features of SchemaSQL, defining the concept of dy-
namic views and
first
order views that will be central to our
proposed solution. We then present our proposed architec-
ture and show that it meets the needs of the applications
outlined in Section 1.
3.1 SchemaSQL
SQL makes use of tuple variables that range over the tuples
of a relation declared in the from clause of a query. Do-
main variables range over the values in a single attribute
of a relation (for example, stock.date). SchemaSQL per-
mits three additional types of variables: database, relation
and attribute variables. The expression + D declares D
to be a database variable ranging over all database names
in a federation. The expression db -+ R declares R to be
a relation variable ranging over all relations in db (which
can be a constant or a database variable). The expression
db :: rel -+ A, declares A to be an attribute variable rang-
ing over all attribute names in the relation rel of database
db. For clarity in our examples, we will use capital letters for
variables (usually starting with D, R, A and 2’ for database,
relation, attribute and tuple variables respectively). Lower
case strings will be used for specific database, attribute and
relation names to help clarify the difference between vari-
ables and constants.
Attribute, relation, and database variables are collec-
tively referred to as schema variables. Schema variables can
appear in the query (select, from, group by, having, etc.,
clauses) any where a domain variable can traditionally be
used. In addition, SchemaSQL permits the specification of
views that have data dependent output schemas. That is,
the schema is defined by the query result. In SQL, the out-
put schema is specified by labels (constants) in the create
view clause. Specifically, one label is used for the relation
name and one for each of the attribute names. We ignore
shorthand notation permitted by SQL. In SchemaSQL, any
of these labels may be replaced by variables. Examples are
given in Figure 5. Views v4 and v5 translate data from
Schema sl of Figure 1 to Schema-s s2 and s3, respectively.
For clarity, we will explicitly define most domain variables
in the from clause unless no confusion can arise.
In the
first view, data is grouped into different relations based on
the value of the domain variable C. As this example shows,
using variables for relation or database names, we can ex-
press data dependent unions (horizontal partitioning of a
relation). In the second view, data is grouped into different
tuples based on C and date. For a given date, s3 groups
all stock prices for that date into a single tuple. If a com-
pany has multiple prices listed for the same date, then s3
will contain multiple tuples. Using variables for attribute
names, we can express data dependent outer-joins (vertical
partitioning of a relation). In this example, suppose sl con-
tains information about only two companies, coA and COB.
Then v5 can be expressed as the following relational query,
where @I denotes the full outer-join. The full semantics of
these views is given elsewhere [24].
A=
~company=‘coA’~toC~
B = ucompany=~co~~ stock
n
date,price(A) @date ndate,price@)
If an instance of stock contains three price values for
coA on l/1/98 and two price values for COB on l/1/98, then
view v5 will contain the cross product of these values, or six
tuples with the date l/1/98.
In what follows, we will consider a restricted class of
views with data dependent output schemas.
Definition 3.1 A dynamic view is any SchemaSQL view
that possesses a data dependent output schema defined by a
query that uses only tuple and domain variables.
In Figure 5, Views v4 and v5 are dynamic while v6 is not
since it uses an attribute variable A. As we will show the
restructuring properties of dynamic views are very powerful.
3.2 Architecture Definition
The requirements outlined in Section 1 necessitate the use
of an integration that is independent of specific source struc-
tures. Under traditional integration paradigms [5], the inte-
grated view depends directly on the source schemas. So we
adopt the approach of [3,26], in which the integrated schema
192
v4 create view s2::C(date, price) as
select D, P
from sl::stock T, T.company C,
T.date D, T.price P
v.5
create view s3::stock (date, C) as
select D, P
from sl::stock T, T.company C,
T.date D, T.price P
v6 create view A::avg(date, avgprice) as
select D, avg(P)
from s3::stock T, s2::stock-> A, T.A P, T.date D
where A # ‘date’
group by A, D
Figure 5: Example SchemaSQL views with data dependent
schemas.
is treated as a stable, central structure through which other
evolving, dynamic data sources can be queried. That is, un-
like traditional integration approaches, the integrated view
is not synthesized from a snapshot of a set of local schemas.
Rather, the integrated view is created to meet the require-
ments of the integration (for example, the decision analysis
functions to be supported by a data warehouse). The local
databases are treated as materialized views over the inte-
grated schema. A query on the integration can be answered
by rewriting it into a query on the local databases, using
perhaps additional data that is stored directly under the
integrated schema.
This choice of architecture is particularly appropriate for
reconciling schematic discrepancies. Using a traditional syn-
thesis approach the design choices of the current schemas
(and the integration algorithm) will determine what data is
“queriable” in the integration. In particular, data that is
used as labels in all local sources will typically remain as
part of the schema. Our architecture however, allows ex-
plicit choice as to how data and meta-data is divided. This
choice can be made based on the requirements of the inte-
gration, rather than on legacy requirements they may not
be valid for new applications. The architecture also permits
data sources to evolve independent of the integration.
Using this approach, we restrict the integration schema
and views as follows.
1. The integration schema I must be first order.
We will use the term first order schema (or view) for a
schema under which all values of interest to a user are
modeled as data. Hence, a schema is only first order
relative to a given set of queries. For a set of queries
&, a schema is first order if all queries in & can be
written in a first order language such as SQL. This
concept is closely related to first order normal form
proposed in [28]. Schema s2 of Figure 1 may be first
order for an application that does statistical analysis
on individual company stocks and does not require the
ability to issue queries that iterate over all companies.
While our solution will enable schema independent
querying, such as keyword searching, it does not neces-
2.
Materialize
Views
Figure 6: Integration Architecture.
sitate that all schema components be made queriable.
That is, the design of the integrated schema may limit
which schema components from the sources may be
queried.
All data sources are expressed as SQL or dynamic
views on
I.
Each local schema (or index) is described as a set of
views on I. Let local schema S = {Ti, . . . .
T,,}.
Each
Tj
is a view on
I,
that is,
Tj = vj(I)
where vj is an
SQL or dynamic view. This architecture leaves open
the possibility that a local schema may contain data
(in other tables) that is not used in the integration.
The integration may also contain locally stored data,
not derived from one of the existing data sources.
This architecture is depicted graphically in Figure 6.
Note that a single dynamic view may define a set of tables.
Although we refer to I as an integration schema, it may in
fact be a logical view on a homogeneous source. The view
may be introduced to provide schema browsing functionality
or new forms of physical data independence.
Within this framework, the dynamic view provides a nat-
ural barrier for encapsulating higher order reasoning. As we
show, a query engine employing these views must be able to
recognize when a view is usable in answering a query, but
does not need full knowledge of the meaning of the higher
order components of the view.
3.3 Applications
Despite its limitations, this restricted architecture can be
used to address the requirements of the examples from Sec-
tion 1. Recall the requirements these examples illustrated.
1. Cooperative querying of schematically heterogeneous
structures.
2. Schema independent queries (and schema browsing).
3. Schema independent indexing.
We briefly revisit the examples of Section 1 and show
how this architecture supports each application.
193
hprice
hid rmtype
price
create view hotelpricing(hid, R) as
select H, P
from hprice T, T.hid H, T.rmtype R, T.price P
Q
select H
from hprice T, T.price P, T.hid H
where P < 70
Figure 7: Schema independent querying of hotelpricing.
Legacy System Integration
The Schema sl can be used as
the Integration I. All three schemas can then be represented
as dynamic views on I. The mapping from I to sl is the
identity map, and views v4 and v5 of Figure 5 provide the
definitions for s2 and 93. Queries posed on the integration
must be translated to queries on the legacy schemas (mate-
rialized views). To do this, an algorithm for translating any
SQL query into an SQL or SchemaSQL query on the un-
derlying views is required. This algorithm will permit the
answering of queries on the integration using data stored
under the legacy sources.
Database Publishing
Consider a query to retrieve inexpen-
sive hotels where an inexpensive hotel is considered to be
one that offers rooms for under $70. The rooms may be
of any type (single, double, suite, etc.) and the price may
be valid any time (weekends, weekdays, off-season, etc.).
On the schema of Figure 3, this query is select hid from
hotelpricing where any attribute value is less than
$70. To support this query, our architecture requires the use
of an interface schema, like hprice of Figure 7. This schema
represents the names of pricing attributes as data. Such
a schema permits the expression of this query in SQL as
shown in Query Q of Figure 7. The original hotelpricing
table can be expressed as a dynamic view on this schema.
To be evaluated, SQL queries on hprice must translated to
SQL or SchemaSQL queries on the
hotelpricing
view.
Physical Data Independence
Using the schema of Figure
8, the index and view of Figure 4 can be expressed as dy-
namic views. Recall that these structures could be used in
evaluating data fusion style queries involving self-joins over
large data-dependent unions of tables. The index and view
axe
SQL views on the
tickets
tables. The original tables can
be specified as a dynamic view on tickets (View V of Figure
8). If an SQL query on tickets can be answered entirely by
using either the
ticketInfr
index or the dui view, then it
can be translated into an SQL query on these structures.
However, since these views do not fully replicate the source
data, some queries may only be answered using the legacy
sources (for example, a query to find all tickets issued to a
specified license number). These queries can be translated
into SQL or SchemaSQL queries on the legacy sources using
View V.
In addition, dynamic queries can be used to model in-
verted indices that are useful in answering keyword search
queries. An inverted index maps a keyword, for example
“Sofitel”, to all tuples (or more generally documents) con-
taining the keyword. In Figure 9, the table
hotelwords
con-
tickets
state tnum lit infr
create view dui(lic, infr) as
create index ticketlnfr as btree by
select Tl.lic, T2.infr
given T.infr
from tickets TI, tickets T2
select T.state, T.tnum, T.lic
where Tl.lic = T2.lic and
from tickets T
Tl.infr = ‘dui’ and
Tl.tnum <> T2.tnum
V create view S(tnum, lit. infr) as
select T.tnum, T.lic, T.infr
from tickets T, T.state S
Figure 8: Ticket database.
tains for each hotel and for each attribute of a hotel, a tuple
for each word or attribute value. The index
keywords
is an
inverted index that given a specific keyword value returns a
set of all hotel ids and corresponding attribute names that
contain the keyword. Because the schema of
hotelwords
is
first order, the inverted index is actually specified using an
SQL view. However, it may be implemented as an actual
inverted index on the materialize data in the table
hotel
of
Figure 3.
Now consider a query to find all Sofitel hotels located in
Athens. This query contains a structured predicate
(city =
“Athens”) and an unstructured predicate (3 attribute
A A
A
= “Sofitel”). Specified on the
hotelwords
schema, this
query is expressed in Figure 9. Note that the
keywords
index can be used to (partially) answer both predicates (and
hence can answer the full query). The city predicate is only
partially evaluated by the index so an additional selection
(attribute
= “city”) must be done. Similarly, the
hotel
view
can be used to answer both predicates albeit only using a
higher order query. In query planning, we would want to
consider plans involving both views and to select a plan
that most efficiently answers the query. Intuitively at least,
this query can be more efficiently evaluated if the query
processing engine recognizes that the structured predicate
can be most efficiently answered by the
hotel
view and the
unstructyred predicate by the index and the results joined.
We have demonstrated how dynamic views, while limited
by design, meet the requirements of a wide range of new, im-
portant applications. Furthermore, they provide a mecha-
nism for integrating semi-structured or unstructured search
style queries into the query planning of a traditional struc-
tured query optimizer. To do this, a schematically disparate
view of (a possible schematically homogeneous) schema is in-
troduced. This discussion has highlighted the remaining re-
sults required to use dynamic views in practice. Specifically,
using this framework, the problem of answering queries over
an integration I reduces to the problem of determining if a
view or set of views
V
are usable in answering a query Q on
I
and finding an equivalent rewriting Q’ on
V.
4 Properties of Dynamic Views
In this section, we consider the problem of using dynamic
views to answer SQL queries. The problem of using SQL
views to answer SQL queries has been studied extensively.
194
hotelwords hid attribute value
create index keywords as inverted by
given value
select T.hid, T.attribute
from hotel
create view hotel (hid, A) as
select T.hid, T.value
from hotelwords T,
hotelwords.attribute A
Q
select hid
from hotelwords TI, hotelwords T2
where Tl.hid=T2.hid and Tl.value = ‘Sofitel’
and T2.attribute = ‘city’ and 72.value = ‘Athens’
Figure 9: Keyword search query over hotels.
We make use of results by Srivastava et al [35] in which views
and queries with aggregation and multi-set (bag) semantics
are considered. These results generalize earlier work on con-
junctive queries with set and multi-set semantics [25, 91. We
begin with a few definitions.
Definition 4.1 Queries Ql
and
Q2
are
set equivalent 2f
they compute the same set of answers
for
any database.
Queries Ql and Q2 are multi-set equivalent if they com-
pute the same multi-set of answers
for
any database.
Definition 4.2 1351
A
view
V
is set usable
(respectively,
multi-set usable) in Q if there exists a query Q’ that is set
equivalent (respectively, multi-set equivalent) to Q and Q’
contains one
OT
more occurrences of V.
4.1 Restructuring using SQL
To determining whether a dynamic view can be used to an-
swer a query, it is necessary to understand the restructuring
properties of dynamic views. The restructuring permitted
by SQL can be broadly classified into two types. Reorgani-
zation and elimination of columns which is done using the
join and project operators. Reorganization, summarization
and elimination of tuples which is done
using
the select, join,
grouping and aggregation operators.
For a view to be usable in answering a query, it must
contain all the columns and all the tuples required by the
query. The view must also not lose any associations between
data values. Put another way, the view must not combine
tuples that would not be combined in the original. In SQL,
this last condition is checked by ensuring that the join con-
ditions of the view match those of the query. However, for
dynamic views some additional checking must be done. This
is because the higher order functionality of the views per-
mits a greater array of data restructuring than is possible
with SQL.
i db0 -- Integration
1
: stock
I
exchange
company
date
price
1
I
0
i dbl
‘--------‘--;-‘-------------
; db2
lbm ge . . . I
r I I I I
F [date 1 price1 price1 pric{ i
I_________________-_--------
Figure 10: Example views on a stock integration.
4.2 Restructuring using Relation and Database Variables
Relation and database variables permit the horizontal par-
titioning of data baaed on the value of an attribute. The re-
structuring properties provided by database variables is very
similar to those of relation variables so in this discussion we
only consider relation variables. For a table
r[m,al, . . . . a,],
the following is a simple dynamic view using a relation var-
able.
create view Ao(al, a~, .., a,) as
select T.al, T.ag, . . . . T.a,
from
T
T, T.ao
AO
The query defining the view is an information capacity
preserving mapping [30]. That is, for each instance of
T
there
is a unique instance of the view. Intuitively, this means that
the restructuring done to create the view does not lose any
information [15]. We can use this intuition when consider-
ing whether a query can be answered using a dynamic view
with a relation variable. The query defining the view may
(and probably will) lose information, but the restructuring
provided by the relation variable will not augment this loss.
Example 4.1
Consider Figure 11 containing the
definition
for
the company tables of
dbl and a
Query Ql on the Schema db0 of Figure 10. The
Query Ql finds companies that closed over 200
on
two
consecutive days since l/1/98. Suppose
we want to use dbl to answer Ql. Ql’ is a rewrit-
ing of Ql using dbl. We can use the formal
prop-
erties of the view definition to show that Ql and
Ql’ are equivalent.
4.3 Restructuring using Attribute Variables
Attribute variables permit (data dependent) vertical parti-
tioning of data based on the value of an attribute. For a
table r[ao, al, . . . . a,], the following is a simple dynamic view
using an attribute variable.
195
create view dbl::C(date, price) as
select D, P
from DbO::stock T, T.company C
T.date D, T.price P
where D > l/1/90
Ql:
select Cl
from dbO::stock TI, dbO::stock T2
Tl.company Cl, lL?company C2
where Tl.date > l/1/98 and
Tl.date = T2.a’ate + 1 and
Tl.price > 200 and T2.price > 200
and Cl = C2
Ql':
select RI
from dbl-> RI, dbl->R2 Rl TI, R2 T2
Tl.company Cl, T2.company C2
where Tl.date > l/U98 and
Tl.date = T2.date + 1 and
Tl.price > 200 and T2.price > 200
and Cl = C2
Figure 11: Dynamic view with relation variable.
r(I1) a0 al a2 r (12)
a0
al
a2
mi WI
r’(1) pT$-Tg]
Figure 12: Two instance of r map to the same instance of
the view
T'.
create view r’(aa, .., a,, Ao) as
select T.az, . . . . T.a,, T.al
from
T
T, T.ao Ao
The query defining the view is not an information ca-
pacity preserving mapping. To see this, consider the two
instances of r (where n = 2) depicted in Figure 12. The two
instances Zl and 12 of T map to the same instance Z of r’.
Hence, any query on T that would return different results for
the instances Zl and 12 does not have an equivalent query
on the view
T'.
This is not to say that
T’
cannot be used to
answer some queries on T.
Example 4.2
Consider Figure 13 containing the
definition
for
the view db2 :: nyse and a Query
Q2 on the Schema db0 of Figure 10. Suppose we
want to use the View nyse to answer Q2. Q2’ is
a possible rewriting of Q2 using the view. On the
Database Zl of Figure 14, Q2 and Q2’ do not re-
create view db2::nyse(date, C) as
select D, P
from DbO::stock T, T.exch E,
T.company C,
T.date D, T.price P
where E = ‘nyse’
Q2: select Cl, DI, PI
from dbO::stock TI, Tl.date Dl,
Tl.company Cl, Tl.price PI, Tl.exch El
dbO::coType T2, T2.co C2, T2.type Yl
where El = ‘nyse’ and Cl = C2 and YI = ‘hitech’
Q2’: select Cl, Dl, PI
from dbl::nyse Tl, Tl.date Dl, dbl::nyse->Cl
Tl.date Cl, Tl.Cl PI
dbO::coType T2, T2.co C2, T2.type Yl
where Cl = C2 and YI = ‘hitech’
Figure 13: Dynamic view with an attribute variable.
Figure 14: Two different instances of db0 map to the same
instance of the view.
turn the same answer and these queries
are
there-
fore not equivalent. Specifically, assuming both
ge and hp are hitech stocks,
Query Q2
issued on
the Database Zl of Figure 14 will return Zl with
the exchange attribute projected out. Query 92’
on the same database will return four tuples, Zl
plus a second copy of the ge tuple. Intuitively, the
view loses information about multiplicities. To
understand if other rewritings are possible using
the view, consider the two databases Zl and 12 in
Figure 14. Q2 returns different results in Zl and
12. The view represents both of these databases
with the database Jl, losing information about
the number of copies of a tuple present in the
original database.
5 Query Equivalence
To determine the equivalence of queries, we will use an
ex-
tension of the standard proof technique of establishing con-
tainment mappings from V to Q. In [35], the mappings are
expressed from columns of V to columns of Q. This is
pas-
196
v2 create view sl::stock (co, date, price) as
select R, D, P
from s2->R, R T, T.date D, T.price P
v3 create view sl::stock (co, date, price) as
select A, D, P
from s3::stock->A, S3::stock T, T.date D, T.A P
where A c> ‘date’
Figure 15: Example dynamic views.
sible since in SQL, the columns (domain variables) contain
all data that will appear in the query result. We extend this
mapping to all variables in a dynamic view. In the following,
Q is an SQL query on the schema I.
Let TV(Q) denote the set of tuple variables in Q. For
each T; E TV(Q), Table(Ti) is a relation in I. Tables(Q) =
{Table(T; E TV(Q)} is a multi-set of tables used in Q.
Let DV(Q) denote the set of domain variables in Q. For
each Di E DV(Q), Di is declared to range over some at-
tribute domain in the set {Th.Aj(Tk E TV(Q) and Aj an
attribute of Table(Tk)}.
To make our notation simpler, we will assume that all
SQL and dynamic queries explicitly declare all tuple and
domain variables. That is, relation names are not used as
shorthands for tuple variables and relation names (or tuple
variables) prefixed with an attribute names are not used
as shorthands for domain variables. We give two example
dynamic views, that follow these conventions in Figure 15.
These views are equivalent to the views of Figure 2.
The variables of Q are then Var(Q) = DV(Q) UTV(Q).
Let Sel(Q) E Var(Q) be the set of variables appearing
in the select clause of Q. The predicates used in the where
clause are denoted by Cond(Q).
For a view V with a dynamic schema, the schema compo-
nents include data that must be mapped in considering vari-
able mappings. Let Db(V) be the database specified in the
view definition and Rel(V) be the name of the view table.
In a dynamic schema, Db(V) and Rel(V) may be variables
or constants. Att(V) is the set of names of attributes in the
view table. Each A E Att(V) may be a variable or constant.
For each A E AH(V), Dam(A) denotes the variable from
the select clause of the view that provides domain values for
A. The variables in Db(V) UReZ(V) UAtt(V) axe referred to
as view variables. The output variables of a view, Out(V),
are the view variables and Sel(V).
Definition 5.1 (Variable mapping)
A variable mapping
from a query Ql to a query 9.2 is a mapping 4 from Var(QI)
to Var(Q2) such that if R(A1, . . . . An) E Tables(Q1) with tu-
pie variable Tl, there exists a table R(B1, . . . . B,) E Tables(Q2)
with tuple variable T2 and Bi = +(Ai), 1 < i 5 n, T2 =
do.
The mapping 4 is l-l if every variable of Ql maps to a
unique variable of Q2. To simplify the presentation, we as-
sume the integration does not conatin nul values. However,
this restriction is not required [29].
5.1 Select-Project-Join Queries
We first consider queries involving only select, project and
join operations (select, from and where clauses without ag-
gregate functions in SQL), or SPJ queries. The queries may
have built-in predicates (5, >) used in the where clause of
the query. In considering whether a view V can be used by
Q, one must ensure the following conditions [35].
l
V does not project out any columns needed in Q
l
V does not discard any tuples needed by Q
5.1.1 Set Semantics
We first state the conditions for an SPJ view to be set usable
in answering an SPJ query.
Theorem 5.1 [25]
Let Q be an SPJ SQL query and let V
be an SPJ SQL view. Then V is set usable in Q if:
1.
2.
There exist a variable mapping 4 from V to Q.
FOT all A E Sel(Q) such that A E $(Var(V)), either
$-‘(A) E Sel(V)
or
there exist a B E Sel(V) such
dhat Conds(Q) implies A = d(B).
3.
There
exist a boolean combination of built-in predi-
cates, Conds’ such that
(a) Conds(Q) E +(Conds(V)) A Conds’
(b) Conds’ uses only the variables in q5(Sel(V)) U
(Var(&) - d(Var(V))).
If V is a dynamic view, then schema components in the
view may contain data used for the query. Hence, data
returned by Q may originate either from data returned by V
(and contained in the attributes of Sel(V)) or from schema
components in V (and contained in the schema variables of
V). For dynamic views, the following conditions must be
ensured.
l
V does not discard any databases, relations, or at-
tributes needed by Q.
l
V does not lose any association between values needed
in Q.
Theorem 5.2
Let Q be an SQL SPJ query and let V be a
dynamic SPJ view. Then V is set usable in Q if:
1. There exist a variable mapping 4 from V to Q.
2. FOT all A E Sel(Q) such that A E 4(Var(V)), either
4-‘(A) E Out(V)
or
there exist a
B E Out(V)
such
that Conds(Q) implies A = 4(B).
3. There exist a boolean combination of built-in predi-
cates, Conds’ such that
(a) Conds(Q) E 4(Conds(V)) A Conds’
(b) Conds’ uses only the variables in +(Out(V)) U
(Var(&) - 4(Var(v))).
Example 5.1
In Example 4.2, we can use the
following mapping 4 to show equivalence. d(T) =
Tl, 4(E) = El, 4(D) = Dl, 4(C) = Cl, d(P) =
Pl. Conds’ = (Cl = C2 A Y 1 =’ hitech’).
197
5.1.2 Multi-Set Semantics
Under set semantics, the mapping may be many to one. In
such a mapping, all associations between relevant tuples are
preserved. However, multiplicities of tuples are not neces-
sarily preserved. This observation motivates the following
result from [35].
Theorem 5.3 [35]
Let Q be an SPJ query and let V be an
SPJ view. Then V is multi-set usable in Q if the conditions
of Theorem 5.1 hold and 4 is one-to-one.
However, the same result does not carry over as directly
to dynamic views. As we saw in Example 4.2, the use of
attribute variables in a dynamic view can cause multiplic-
ity information to be incorrect. Even if a query uses only
attributes that are not dynamic, or relation and database
names, the information needed to correctly answer a query
may be lost in the view.
Theorem 5.4
Let Q be an SPJ query and let V be a dy-
namic SPJ view. Then V is multi-set usable in Q if the
conditions
of
Theorem 5.2 hold, V does not contain any at-
tribute variables and I$ is one-to-one.
5.1.3 Query Translation
The query translation algorithm replaces all tables covered
by the view with the view schema, mapping variables in
the view schema to the appropriate variables used in the
query. If the view is dynamic, some of the variables in the
rewritten query may be schema variables. The algorithm
must be careful to properly declare these variables according
to SchemaSQL syntax.
fgrchm 5.1
Translation
of
SQL Query Q, to Query Q’
Replace all tables covered by the view with the view
schema in the from clause.
(4
(b)
(4
(4
(4
Remove $(Tables(V)).
If Db(V) is a variable, then add the declaration
+ WW’)).
If
Rel(V) is a variable, then add d(Db(V)) -+
d(ReW?).
Add d(Rel(V)) T where T is a new tuple variable
not used in Q.
For each attribute A E Att(V), if A is a vari-
able, then add 4(Db(V)) :: 4(ReZ(V)) + +(A)
and T&(A) 4(Dom(A)).
Replace each variable A E Sel(Q) by 4(B) where B E
Out(V) and Conds(Q) imply that A = d(B).
Replace Conds(Q) in Q by Conds’ where 4(Conds(V))r\
Conds’ s Conds( Q) .
For each A E Att(V) where #(dam(A)) is used in the
select or where clause
of
Q, add +(dom(A)) # 0 to the
where clause
of
Q’.
5.2 Aggregate Queries and Aggregate Views
We now consider the use of dynamic aggregate views to an-
swer aggregate SQL queries. For economy of presentation,
we do not present the full details of the mappings and query
translation [29]. Instead, we present a few examples high-
lighting the major issues involved in using dynamic views
with aggregation to answer queries (which may also include
aggregates).
We have shown that the restructuring of data in relation
and database names in a dynamic view does not lose asso-
ciations between tuples or multiplicities of tuples. Dynamic
views with relation and database variables can therefore be
used in much the same way as static views with minor exten-
sions to the query translation algorithm. Dynamic views in-
volving attribute variables (dynamic attribute views) proved
to be more problematic. While retaining relevant associa-
tions between values, multiplicities of values may be lost.
This fact limits the use of these views in answering multi-
set queries. Theorem 5.4 appears to be quite a weak result
since it does not permit the use of dynamic views with at-
tribute variables in answering queries where the number of
values appearing in the result must be preserved. However,
dynamic views with attribute variables can be used to an-
swer some aggregate queries.
First, a dynamic attribute view without aggregation can
be used to answer aggregate queries that do not require mul-
tiplicity information.
Example
5.2 Consider the following aggregate
query on the data
of
Example
4.2.
The proposed
rewriting Q’ is multi-set equivalent to Q.
Q
Q’
select D, max(P) select D, max(P)
from dbO::stock T, T.date D, from dbl::nyse T,
T.price P, T.exch E T.date D,
where E = ‘nyse’
dbl::nyse+A, T.A P
group by D
group by D
having min(P) > 100 having min(P) > 100
This result follows from the set semantics of aggregates.
Our second example shows that dynamic attribute views,
where the dynamic attribute is defined using aggregation,
can be used to answer aggregate queries.
create view db4::E(date, C)
select D, avg(P)
from dbO::stock T, T.exch E,
T.date D, T.price P, T.company C
where D > 1980
group by E, D, C
Example 5.3
Query Q below can be rewritten
to the equivalent query Q’ that uses the following
aggregate query.
Q
Q’
select E, C, avg(P)
select E, A, avg(P)
from dbO::stock T, from db4-+E, db4::E -+A,
T.exch E, T.company C, E T, T.date D, T.A P
T.price P, T.date D where D > 1990
where D > 1990 group by E, A
group by E, C
198
6
Integration with a Query Optimizer
The usability criteria and query translation algorithms of the
previous section permit the selection of views that can cor-
rectly be used to answer a query. However, they do not ad-
dress the issue of optimizing queries over a set of views. For
SPJ queries and views, this issue has been addressed [8, 371
by extending the common dynamic programming style op-
timizer [34] to consider plans using views. Two primary
extensions are necessary. First, in addition to access meth-
ods to base relation, the optimizer must consider the use
of views to build a query plan. These views may in turn
describe index structures. Second, it must be possible to
quickly determine what portion of a query a view answers
and whether a plan built from views and other access meth-
ods is a partial or complete answer to the query.
A conventional optimizer begins by finding execution
plans for single relations, then iteratively finds execution
plans for successively larger portions of a query. In the pres-
ence of materialized views or indices described by views, the
initial set of access plans is extended to include these views.
Chaudhuri et al [8] show how for SPJ views, a simple data
structure can be used to record the portion of the query an-
swered by the view. This is possible because an SPJ view
will replace (or answer) a set of tables and predicates in the
query and possibly add a new set of predicates (Conds’ from
the translation algorithm). Using only these sets (tables and
predicates answered by the view and new predicates added
as a result of the view), the optimizer can determine whether
a plan using a set of views is partial or complete. For par-
tial plans, the set of tables and predicates that remain to be
addressed can be computed.
We have shown that for dynamic SPJ views, the portion
of a query answered by a view can also be modeled in the
same way. That is, the query translation process involves
determining a set of tables and predicates from the query
that can be replaced by the view. A query optimizer can
therefore treat dynamic views as primitive access plans irre-
gardless of whether the view is implemented by an external
data source or by an internal index. The higher order prop-
erties of the view must be analyzed to determine if the view
is usable but additional analysis of this information is not
required by the optimizer. In particular, only the index or
external source needs to be able to execute the, possibly
higher order, plan.
Note that this form of integration with a query optimizer
permits correct use of dynamic views while requiring mini-
mal extensions to the optimizer.
7 Conclusions
The existence of schematic heterogeneity in legacy systems
is well document in the research literature. Many of the
more than thirty representations for a single data fact, enu-
merated by Kent result from some type of schematic hetero-
geneity [19]. Despite its prevalence, and despite the plethora
of work on enumerating and categorizing types of schematic
heterogeneity, no systematic study of how this schematically
heterogeneous structures can be used and queried in prac-
tice has been undertaken. We have provided a first step
in such a study. Specifically, we have analyzed the prop-
erties of restructuring transformations required to resolve
schematic discrepancies. Using this analysis we have showed
how higher order views can be used in answering queries on
schematically heterogeneous structures. We have showed
that our solution is powerful enough to meet the needs of
numerous applications drawn from the realms of data ware-
housing, decision support, database publishing and physical
data independence. The solutions allow access methods for
semi-structured and unstructured data to be incorporated
into the framework of structured query evaluation and op-
timization.
References
PI
PI
I31
PI
[51
PI
]71
PI
191
PO1
[III
S. Abiteboul, H. Garcia-Molina, Y. Papakonstanti-
nou, and R. Yerneni. Fusion Queries over Internet
Databases. Technical Report unpublished manuscript,
Stanford University, 1997.
R. Ahmed, P. DeSmedt, W. Du, W. Kent, M. A.
Ketabchi, W. A. Litwin, A. Rafii, and M. C. Shan. The
Pegasus Heterogeneous Multidatabase System. IEEE
Computer, 24(12):19-27, December 1991.
Y. Arens, C. Y. Chee, C. N. Hsu, and C. A. Knoblock.
Retrieving and Integrating Data from Multiple Infor-
mation Sources. Intl. J. of Intelligent and
Cooperative
Info. Systems, 2(2):127-158, 1993.
T. Barsalou and D. Gangopadhyay. M(DM): An Open
Framework for Interoperation of Multimode1 Multi-
database Systems. In Proc. of the Int? Conf. on Data
Eng., pages 218-227, Tempe, AZ, February 1992.
C. Batini, M. Lenzerini, and S. B. Navathe. A Compar-
ative Analysis of Methodologies for Database Schema
Integration. ACM Computing Surveys, 18(4):323-364,
December 1986.
M. L. Brodie and M. Stonebraker. Migrating Legacy
Systems: Gateways, Interfaces, and the Incremental
Approach. Morgan Kaufmann Series in Data Mngmt.
Sys., Jim Gray, Ed. Morgan Kaufmann, 1995.
M. J. Carey, L. M. Haas, P. M. Schwarz, M. Arya,
W. F. Cody, R. Fagin, M. Flickner, A. W. Luniewski,
W. Niblack, D. Petkovic, J. Thomas, J. H. Williams,
and E. L. Wimmers. Towards Heterogeneous Multi-
media Information Systems: The Garlic Approach. In
Proc. of the Fifth Int’l IEEE Wksp. on Research Issues
in Data Eng. (RIDE-95): Distributed Object Mngmt.,
Taipei, Taiwan, March 1995.
S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and
K. Shim. Optimizing Queries with Materialized Views.
In PTOC. of the Int’l Conf. on Data Eng., pages 190-200.
IEEE, 1995.
S. Chaudhuri and M. Y. Vardi. Optimization of Real
Conjunctive Queries. In Proc. of the ACM Symp. on
Principles of Database Systems (PODS), 1993.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ire-
land, Y. Papakonstantinou, J. Ullman, and J. Widom.
The TSIMMIS Project: Integration of Heterogeneous
Information Sources. In Proc. of the 100th Anniver-
sary Meeting of the Information Processing Society of
Japan(IPSJ), pages 7-18, Tokyo, Japan, October 1994.
W. Chen, M. Kifer, and D. S. Warren. HiLog as a
Platform for Database Languages. In Int’l Workshop
on Database Programming Languages, pages 315-329,
Gleneden Beach, OR, June 1989.
199
[12] E.F Codd and S. B. Codd. Providing OLAP (On-line
Analytical Processing) to User-Analysts: An IT Man-
date. Technical report, E.F. Codd and Associates, 1994.
[13] U. Dayal and H. Y. Hwang. View Definition and Gen-
eralization for Database Integration in a Multidatabase
System. IEEE Trans. on
Software
Engineering, SE-
10(6):628-644, November 1984.
[14] J. Gary, A. Bosworth, A. Layman, and H. Pirahesh.
Data Cube: A Relational Aggregation Operator Gener-
alizing Group-By, Cross-Tab, and Sub-Totals. In Proc.
of the Int’l Conf. on Data Eng., pages 152-159, 1996.
[15] R. Hull. Relative Information Capacity of Simple Re-
lational Database Schemata. Society
for
Industrial and
Applied Mathematics (SIAM) Journal of Computing,
15(3):856-886, August 1986.
[16] R. Hull.
Managing Semantic Heterogeneity in
Databases: A Theoretical Perspective. In Proc. of
the ACM Symp. on Principles of Database Systems
(PODS), pages 51-61, 1997.
[17] D. Van Gucht J. Van den Bussche and G. Vossen. Re-
flective Programming in the Relational Algebra. In
Proc. of the ACM Symp. on Principles of Database Sys-
tems (PODS), pages 17-25, 1993.
[18] V. Kashyap and A. Sheth. Semantic and Schematic
Similarities between Database Objects: A Context-
based Approach. The Int? Journal on Very Large Data
Bases, 5(4):276-304, December 1996.
[19] W. Kent. The Many Forms of a Single Fact. In Proc. of
IEEE Int’l Computer Conf. (COMPCON), pages 438-
443, 1989.
[20] M. Kifer, W. Kim, and Y. Sagiv. Querying Object-
Oriented Databases. In ACM SIGMOD Int’l Conf. on
the Management of Data, pages 393-402, 1992.
[21] W. Kim and J. Seo. Classifying Schematic and Data
Heterogeneity in Multidatabase Systems. IEEE Com-
puter,
24(12):12-18, December 1991.
[22] R. Krishnamurthy, W. Litwin, and W. Kent. Language
Features for Interoperability of Databases with Sche-
matic Discrepancies. In ACM SIGMOD Int’l Conf. on
the Management of Data, pages 40-49, 1991.
123) L. Lakshmanam, F. Sadri, and I. N. Subramanian.
On the Logical Foundations of Schema Integration and
Evolution in Heterogeneous Database Systems. In Proc.
of the Int?. Conf. on Deductive and Object-Oriented
Databases, 1993.
[24] L. Lakshmanam, F. Sadri, and I. N. Subramanian.
SchemaSQL - A Language for Interoperability in Re-
lational Multi-database Systems. In Proc. of the Int’l
Conf. on Very Large Data Bases (VLDB), Bombay, In-
dia, 1996.
[25] A. Y. Levy, A. 0. Mendelzon, Y. Sagiv, and D. Sri-
vastava. Answering Queries Using Views. In Proc.
of the ACM Symp. on Principles of Database Systems
(PODS), San Jose, CA, May 1995.
[26] A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying
Heterogeneous Information Sources Using Source De-
scriptions. In PTOC. of the Int’l Conf. on Very Large
Data Bases (VLDB), pages 251-262, Bombay, India,
1996.
[27] W. Litwin and A. Abdellatif. Multidatabase Interoper-
ability. IEEE Computer, 19(12):10-18, December 1986.
[28] W. Litwin, M. Ketabchi, and R. Krishnamurthy. First
Order Normal Form for Relational Databases and Mul-
tidatabases. SIGMOD Record, 20(4), December 1991.
[29] R. J. Miller. Using Schematically Heterogeneous Struc-
tures: Extended Version.
Technical Report OSU-
CISRC-3/98-TR09, Ohio State University, Dept of
Computer and Information Science, 1998.
[30] R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan.
The Use of Information Capacity in Schema Integra-
tion and Translation. In Proc. of the Int’l Conf. on Very
Large Data Bases (VLDB), pages 120-133, Dublin, Ire-
land, August 1993.
[31] R. J. Miller, 0. G. Tsatalos, and J. H. Williams. Data-
Web: Customizable Database Publishing for the Web.
IEEE Multimedia, 4(4):14-21, Ott-Dee 1997.
[32] Y. Papakonstantinou, H. GarciaMolina, and J. Widom.
Object Exchange Across Heterogeneous Information
Sources. In Proc. of the Int’l Conf. on Data Engineer-
ing, Taipei, Taiwan, March 1995.
[33] K. A. Ross. Relations with Relation Names as Argu-
ments: Algebra and Calculus. In Proc. of the ACM
Symp. on Principles of Database Systems (PODS),
pages 346-353, San Diego, CA, June 1992.
[34] P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie,
and T. Price. Access Path Selection in a Relational
Database Management System. In ACM SIGMOD Int’l
Conf. on the Management of Data, pages 23-34, 1979.
[35] D. Srivastava, S. Dar, H. V. Jagadish, and A. Y Levy.
Answering Queries with Aggregation Using Views. In
Proc. of the Int’l Conf. on Very Large Data Bases
(VLDB), Bombay, India, 1996.
[36] A. Tomasic, L. Raschid, and P. Valduriez. A Data
Model and Query Processing Techniques for Scaling Ac-
cess to Distributed Heterogeneous Databases in Disco.
IEEE Frans on Computers, 1997.
[37] 0. Tsatalos, M. Solomon, and Y. Ioannidis. The
GMAP: A Versatile Tool for Physical Data Indepen-
dence. The Int’l Journal on Very Large Data Bases,
5(2), April 1996.
200