Querying a Database by
Fuzzification of Attribute Values
Mubariz EMINOV
Abstract
In this paper we describe the basic function of
query processing with respect to crisp (numerical) data in relational database.
Moreover, modification of conventional search criteria by using of fuzzy
predicates is presented. We consider the process of fuzzification of attribute
values based on the use of fuzzy sets that allows presenting of the objects
through linguistic terms.
The
implementation of fuzzy query processing for classification problem the result
of which is subset of the objects ranking on degree of satisfaction for given
criterion is described, as well.
Keywords: query processing, relational database,
fuzzification, fuzzy set.
1.
Introduction
A querying system is a kind of information retrieval system that may be
used to retrieve relevant objects from a database. The database stores a
collection of objects, some of which are of interest to the current user. Each
object in database contains the index component that can be used to help
identify and select the objects that may be relevant to a user. The essential
problem in data retrieval processes is to fund the subset of objects in the
database that is relevant to a given user. Data retrieval operations are
defined by particular user’s queries, which identify search criteria in terms
of features (attributes) of interest used to describe the objects through own
index component.
A search criterion with respect to current relational database being
tabular representation of information where objects or records (tuples) are
represented on rows consisted of Boolean expression involving attribute names
and their values, which cover the index component. One characteristic of these
queries, being crisp queries, is that their search criteria involve precisely
defined certain attributes (features) presented through their numerical values.
There, atomic query is used with respect to each certain attribute, each of
which is expressed quite clear by employing ordinary (crisp) set theory. The
result of search executed according to this criterion, which is supported by
standard structured query language (SQL), is simply a subset with a
crisply-defined collection of objects in the database that satisfy all
correspondent atomic queries. In contrast, in above crisp features it is more appropriate
for the use of fuzzy features to describe objects such that a database querying
system may select a subset of the database objects (records) which conform to
vague or imprecise description of the objects. The selection of a subset of
relevant objects, features of, which are approximately similar, have been
provided by fuzzification of numerical attributes of the objects in entire
database based on fuzzy set theory (Zadeh 1965).
In this report we consider modified imprecise queries with respect to
crisp data in relational database. It is suggested that imprecise (fuzzy)
queries contain fuzzy predicates (atomic queries) in which fuzzy features are
presented as fuzzy variables. Fuzzy features are expressed through linguistic
values such as good, weight, cold, long, etc. Such kind of presentation of
attributes is a need in large domain of analysis like classification, image
processing, quality control, diagnostics, decision-making, etc.
At the grammar level of considered fuzzy search criteria the linguistic
values of fuzzy features are predetermined by association with fuzzy sets. Each
of fuzzy sets, being presentation of fuzzy attribute values of the objects, is
determined through own membership function. Evaluation of degree to satisfy
search criterion with respect to all the objects presented by numerical
(crispy) attribute values is based on the use of a fuzzy matching process. As a
result of this matching, the membership function values against all the records
in the database have been calculated. According to the computed degree of
matching with respect to each object, ranking of the selected subset of objects
is supplied.
For an implementation of proposed fuzzy query processing we suggest
extended SQL query language that contains addition manipulation (filtering)
primitives.
2.
A Crisp Definition of Attributes Value
As it is noted above, relational database possesses tabular
representation of data where rows represent data records or objects and columns
represent fields (attributes) within records (objects). In database data
retrieval operations, which has to find the subset of objects, are defined in
user’s query requests in which attribute values and Boolean logic over them are
used. In SQL query language query requests as selection of statement possess
generally a form as follows.
Select
attribute-list from relation where predicate
where attribute-list identifies attribute values to be returned to the
user, relation identifies a particular table in the database, and predicate
identifies a search criteria. According to standard SQL’s grammar, a search
criterion is Boolean expression involving attribute names and the requested
value range of particular attributes. Let Ai be attribute defined in numerical
values in interval [Vi1, Vi2] and number of attributes by
i=3, then a search criteria can be written as
V11
A1
V12 and V21
A2
V22 and V31
A3
V32
As it is seen, features (attributes) Ai of the objects are
presented precisely (numerically). Therefore, every Ai may be
considered as conventional (crisp) set, that is, collection of object attribute
values which satisfy precise properties (attributes) and are required for
membership of relevant set Ai. Generally, the ordinary (crisp) set Ai
is described by its membership function
mA: V à 0 or 1 defined as in Figure 1(a) and
formulated as:
![]()
mA (v) = 1,
v
V
0,
otherwise


a)
b)
Figure
1. Membership functions
a) for ordinary
(crisp) set
b) for fuzzy set
where V is called the universe of discourse for the A features values
of v variables which are defined as crisp set A. V is a collection of the
possible numerical values for A features.
Obviously, membership function, which maps values (members) of V to a
membership degree for Ai, is discrete values that are 0 or 1. As it is seen, described query processing
that contains search criterion, according to which the objects in database such
that all attributes value lie within the relevant domain Ai can be
selected. This domain is determined by correspondent crispy atomic queries
(predicates) involving in search criterion. That is, for selection of any
object its membership values with respect to all features Ai must be
1. Such kind of data retrieval operation is based on the crisp query processing
that may be executed by standard SQL’s instruction SELECT.
3.
Fuzzification of Attribute Values
The result of the above described query processing is the subset of the
objects, features of which take values in the relevant interval which is
determined in a user’s crisply query. The remained part of the objects are not
satisfied in search criterion, that is, they are out of the relevant intervals.
Therefore, specification of the object into the two parts through crisply
defined features is provided. However, the use of imprecise features of the
characteristic of the objects and utilisation of linguistic (semantic)
description in query processing is more reasonable. It is described in two
ways. First, many of the search criteria supplied in a user query are that the
needs they intend to represent are not crisp, i.e. ones are expressed in
linguistic terms such as middle, old, very good, etc. Second, given features
(search criterion) may be less or more satisfied by the objects instead of
completely satisfied and not satisfied. Therefore, it does not provide an
identification of approximately closed objects and an ordering (ranking) on
objects in database indicating the degree to which the user query is satisfied.
Because of above described drawback of crispness in queries we suggest
in data retrieval process to take advantage of fuzziness in search criteria
using fuzzy features for description of the objects. The most appropriate tool
for approximate representation of imprecise data is fuzzy set theory introduced
by Zadeh. Fuzzy set theory deals with fuzzy sets that may be viewed as a
generalisation of the concept of a ordinary (crisp) set and defined in a given
universe of discourse V, as well. A fuzzy set F in V is characterised by
membership function mF which takes
values in the interval [0,1], that is
mF (v) : V à [0,1] v
V
where mF describes a
grade of membership of numerical values v
V for a fuzzy set F.
A fuzzy set F in V is usually represented as a combination of ordered
pairs of elements (objects) n and their grade
of membership value that generally presented as:
F= {v, mF (v)} v
V
Mostly, functional definition is used to define the degree of
membership function mF for a fuzzy set
F. Membership function mF (v) of a fuzzy set F in an analytic expression
allows calculation of the membership grade for each element (objects) in V. In
practice different types of membership functions’ shapes such as S-function,
triangular form, trapezoid form, exponential form, etc. are used.
Thus, a given fuzzy sets are associated with membership functions
throughout. They allow the fuzzification of the crisp values of the attributes
estimating the degree of membership function with respect to relevant fuzzy set
that may be ranged in interval [0,1]. As it is noted above, according to a
fuzzy theory, the attributes can be considered as linguistic variables, which
take linguistic values called the linguistic labels. For instance, individuals’
ages may be viewed as the attributes, which are associated with linguistic
labels such as Young, Middle Age and Old. Each of these linguistic labels being
quantitative semantics is represented by relevant fuzzy sets labelled by them.
We consider, for example, a fuzzy subset Fi labelled Middle
Age that is defined in a universe of discourse Y (0<y<130) for age
attribute. Membership function of this fuzzy subset using a triangular form
(see Figure 1(b)) is more plausible. For now, we present membership function
for Middle Age individuals defined as follows.
![]()
0 for 0
y
a
y-a for a
y
b
b-a
mF (y) =
c-y for b
y
c
c-b
0 for y>c
where the parameters are a=30, b=45, c=60 (see Figure 1(b)). When
needed, adjustment may be proposed. Thus, the use of fuzzy sets provides a
basis for the presentation of all attributes (features) of the objects
(records) through vague and imprecise concept. So, utilisation of the fuzzy
features with respect to crisply data in database is supplied.
4.
Fuzzy Query Processing
Dwelling upon example for individuals we can consider their other
linguistic (imprecise) attributes such as Height and Weight that is associated with
linguistic labels (fuzzy sets) like Short, Middle Height, Tall and Light,
Middle Weight, Heavy, respectively. If we want to identify individuals that are
entered as the objects (records) with correspondent numerical attributes in
relational database, then we can execute queries that combine any three fuzzy
sets taken one by one from feature variables such as Age, Height and Weight.
For instance, we pose the following query request when are looking for
candidates for basketball team
Select
*
From
Fuzzy
where
Age is Young and Height is Tall and Weight is Light
where from the relational
database Fuzzy it has to be selected the records (individuals) on three fuzzy
predicates that linked Boolean and (intersection) operation.
Obviously, the search criterion
involves three fuzzy sets but fuzzy query processing may contain large sets,
which is one of issues associated with access processing. An understanding
query request for selecting objects (records) is based on the membership
functions for Young, Height and Tall attributes as defined in Figure 2. A
membership function for relevant attributes can be defined for identification
of people within their supports that yields mY (a) > 0, mT (h) > 0 and mL (w) > 0.
The current query language
as utilised in relational database SQL does not support above considered
imprecise query with respect to crisp data. Because its grammar does not
provide the use of the imprecise predicates. It has been proposed few different
extensions to SQL language such as QUEL, SQLf where fuzzy queries
are consistent with previous fuzzy query grammars.
We suggest an extended SQL
query language where fuzzy query processing is modified for the classification
of population storing on the relational crisp database. We describe an
implementation of fuzzy query processing in relational database structure. For
the experimental prototype implementation we have limited the population
records up to 70 and the crisp data by numeric integer data.
Extended SQL-based interface
that supplies interaction with database records contains some additional
manipulation (filtering) primitives provides the performance of the efficient
evaluation membership function and the fuzzy logic operations (union and
intersection) to combine them, as well. Used database is content-addressable
meaning that individuals (records) can be identified and retrieved based on the
content of any attribute or their combinations. So that, any one or several
numeric attributes are used for building of additional index structure based on
the evaluation of the relevant membership functions for each record
(individual).
Thus, for all the records
(individuals) in database the membership function values in all the fuzzy sets
have been calculated with respect to above linguistic labels (see Figure.2.).
This approach has the advantage of evaluating the membership functions once
(i.e., when adding or updating data in the database) and thus avoids a runtime
evaluation during query processing and provides high speed for access
processing.

a)

b)

c)
Figure 2. Membership functions for
fuzzy sets
a) for Age
b)
for Height
c) for Weight
We consider new issue
dealing with evaluation of degree to satisfy the presented SQL’s search criterion
with respect to all the records (individuals). As it is noted above, this
evaluation means assessment of three atomic queries (fuzzy predicates) such as
“Age is Young”, “Height is Tall “ and “Weight is Light” and then taking of
Boolean combinations (aggregation) of these evaluating. Particularly, as
aggregation function is used the conjection rule by us that logically conjects
the grades denoted by mY (a), mT (h) and mL (w), respectively
assigned to relevant atomic queries. Thus, it is based on pointwise
implementations of intersection operations. Then, the formula of the conjection
rule that express degree for the satisfaction of the search criterion by an
individual presented as follows:
mY
T
L
(a,h,w) = mY(a) AND mT(h) AND mL(w)
=min{mY(a), mT(h), mL(w) }
where A, H, W are the universe of discourse for Age (a),
Height (h) and Weight (w) attributes
respectively; a Î A, Y ÎA, h Î H, T Î H, wÎ W, LÎ W.
Therefore, by using the
conjection of
mY (a): Y→
[0,1], mT (h): H→
[0,1] and mL (w): W→
[0,1]
with respect to all the individuals (they
exist in the additional indexing structure) the records are selected in a way
that better meet all the fuzzy atomic queries, when all membership function
values are greater than zero. Such kind of selection of the population in the
database can be executed similarly by using other fuzzy sets, namely as:
for age a : MA ( Middle Age
) Î A, O ( Old) Î A
for height h : S ( Short ) Î H, MH ( Middle Height ) Î H
for weight w : MW (Middle
Weight ) Î W, H ( Heavy ) ÎW
As a result, the records
(individuals) selected during the fuzzy query processing can be supplied by
listing records in order of degree of meeting as value of mY
T
L (a,h,w) for
given search criterion. Listing begins from record that best meets criterion in
a descending direction.
Thus, described fuzzy query
processing accomplishing fuzzy classification scheme provides fuzzy clustering
where the individuals might belong to one group with its various membership
grade. A selection of individuals (objects) by ranking degree for satisfaction
of given criterion within the same cluster provides the possibility for certain
number of selected ones.
Proposed fuzzy query
processing in relational database as experimental prototype is implemented
using extended SQL that contains above denoted addition manipulation primitives.
These primitives have been developed using

Figure 3. Computer output for query
processing
As clearly seen in Figure 3,
by means of adjusting the buttons for Age, Height and Weight attributes
according to relevant fuzzy predicates the search criterion is established. As
a result, records (individuals) are presented in the table for individuals
where they are listed on the grade of satisfaction for the entering criterion.
1. Ali M. Abbasov, Masuma G. Mamedova, Mubariz E. Eminov, Synthesis of the Fuzzy System for Automatic Control over the Complex Objects, 6th International Conference, Machine Design and Production, UMTIK-94, 21-23 September 1994, Ankara, Turkey.
2. Mansfield W.H. and Fleischman R.M., A High
Performance, Ad Hoc, Fuzzy Query Processing System, Journal of Intelligent
Information Systems, Vol.2., No., November 1993, pp.397-418
3. R.Fagin, S.Jose,
Fuzzy Queryies in Multimedia Database Systems, Proceedings: ACM
Sigact-Sigmod-Sigart Symposium on Principles of Database Systems, 1998.
4. Yan Jan, Micheal
Ryan, Using Fuzzy Logic: towards in intelligent systems, Prentice Hall, 1994.
5. Yager, R.R., and
Larsen H.L., Retrieving Information by Fuzzification of Queries, Journal
of Intelligent Information Systems,
Vol.2, No:4, November 1993, pp.421-441
6. Zadeh, L.A.,
Fuzzy Set Theoretic interpretation of linguistic hedges, Journal of
Cybernetics, Vol. 2, pp 4-34, 1972
7. Zadeh,
8. Zadeh,