Querying a Database by Fuzzification of Attribute Values

 

 

Mubariz EMINOV

 

 

Abstract

 

 

In this paper we describe the basic function of query processing with respect to crisp (numerical) data in relational database. Moreover, modification of conventional search criteria by using of fuzzy predicates is presented. We consider the process of fuzzification of attribute values based on the use of fuzzy sets that allows presenting of the objects through linguistic terms.

The implementation of fuzzy query processing for classification problem the result of which is subset of the objects ranking on degree of satisfaction for given criterion is described, as well.

 

Keywords:  query processing, relational database, fuzzification, fuzzy set.

 

 

1. Introduction

 

A querying system is a kind of information retrieval system that may be used to retrieve relevant objects from a database. The database stores a collection of objects, some of which are of interest to the current user. Each object in database contains the index component that can be used to help identify and select the objects that may be relevant to a user. The essential problem in data retrieval processes is to fund the subset of objects in the database that is relevant to a given user. Data retrieval operations are defined by particular user’s queries, which identify search criteria in terms of features (attributes) of interest used to describe the objects through own index component.

 

A search criterion with respect to current relational database being tabular representation of information where objects or records (tuples) are represented on rows consisted of Boolean expression involving attribute names and their values, which cover the index component. One characteristic of these queries, being crisp queries, is that their search criteria involve precisely defined certain attributes (features) presented through their numerical values. There, atomic query is used with respect to each certain attribute, each of which is expressed quite clear by employing ordinary (crisp) set theory. The result of search executed according to this criterion, which is supported by standard structured query language (SQL), is simply a subset with a crisply-defined collection of objects in the database that satisfy all correspondent atomic queries. In contrast, in above crisp features it is more appropriate for the use of fuzzy features to describe objects such that a database querying system may select a subset of the database objects (records) which conform to vague or imprecise description of the objects. The selection of a subset of relevant objects, features of, which are approximately similar, have been provided by fuzzification of numerical attributes of the objects in entire database based on fuzzy set theory (Zadeh 1965).

 

In this report we consider modified imprecise queries with respect to crisp data in relational database. It is suggested that imprecise (fuzzy) queries contain fuzzy predicates (atomic queries) in which fuzzy features are presented as fuzzy variables. Fuzzy features are expressed through linguistic values such as good, weight, cold, long, etc. Such kind of presentation of attributes is a need in large domain of analysis like classification, image processing, quality control, diagnostics, decision-making, etc.

 

At the grammar level of considered fuzzy search criteria the linguistic values of fuzzy features are predetermined by association with fuzzy sets. Each of fuzzy sets, being presentation of fuzzy attribute values of the objects, is determined through own membership function. Evaluation of degree to satisfy search criterion with respect to all the objects presented by numerical (crispy) attribute values is based on the use of a fuzzy matching process. As a result of this matching, the membership function values against all the records in the database have been calculated. According to the computed degree of matching with respect to each object, ranking of the selected subset of objects is supplied.

 

For an implementation of proposed fuzzy query processing we suggest extended SQL query language that contains addition manipulation (filtering) primitives.

 

 

2. A Crisp Definition of Attributes Value

 

As it is noted above, relational database possesses tabular representation of data where rows represent data records or objects and columns represent fields (attributes) within records (objects). In database data retrieval operations, which has to find the subset of objects, are defined in user’s query requests in which attribute values and Boolean logic over them are used. In SQL query language query requests as selection of statement possess generally a form as follows.

 

Select attribute-list from relation where predicate

 

where attribute-list identifies attribute values to be returned to the user, relation identifies a particular table in the database, and predicate identifies a search criteria. According to standard SQL’s grammar, a search criterion is Boolean expression involving attribute names and the requested value range of particular attributes. Let Ai be attribute defined in numerical values in interval [Vi1, Vi2] and number of attributes by i=3, then a search criteria can be written as

 

     V11 A1V12 and V21 A2V22 and V31 A3V32

 

As it is seen, features (attributes) Ai of the objects are presented precisely (numerically). Therefore, every Ai may be considered as conventional (crisp) set, that is, collection of object attribute values which satisfy precise properties (attributes) and are required for membership of relevant set Ai. Generally, the ordinary (crisp) set Ai is described by its membership function

mA: V à  0 or 1 defined as in Figure 1(a) and formulated as: 

 

 


                                     mA (v) =        1,   v V

                                                                 0,   otherwise

 

 

a)                                                                                                                                                b)

 

Figure 1.  Membership functions

a)      for ordinary (crisp) set

b)      for fuzzy set

 

 

 

where V is called the universe of discourse for the A features values of v variables which are defined as crisp set A. V is a collection of the possible numerical values for A features.

Obviously, membership function, which maps values (members) of V to a membership degree for Ai, is discrete values that are 0 or 1.  As it is seen, described query processing that contains search criterion, according to which the objects in database such that all attributes value lie within the relevant domain Ai can be selected. This domain is determined by correspondent crispy atomic queries (predicates) involving in search criterion. That is, for selection of any object its membership values with respect to all features Ai must be 1. Such kind of data retrieval operation is based on the crisp query processing that may be executed by standard SQL’s instruction SELECT.

 

 

3. Fuzzification of Attribute Values

 

The result of the above described query processing is the subset of the objects, features of which take values in the relevant interval which is determined in a user’s crisply query. The remained part of the objects are not satisfied in search criterion, that is, they are out of the relevant intervals. Therefore, specification of the object into the two parts through crisply defined features is provided. However, the use of imprecise features of the characteristic of the objects and utilisation of linguistic (semantic) description in query processing is more reasonable. It is described in two ways. First, many of the search criteria supplied in a user query are that the needs they intend to represent are not crisp, i.e. ones are expressed in linguistic terms such as middle, old, very good, etc. Second, given features (search criterion) may be less or more satisfied by the objects instead of completely satisfied and not satisfied. Therefore, it does not provide an identification of approximately closed objects and an ordering (ranking) on objects in database indicating the degree to which the user query is satisfied.

 

Because of above described drawback of crispness in queries we suggest in data retrieval process to take advantage of fuzziness in search criteria using fuzzy features for description of the objects. The most appropriate tool for approximate representation of imprecise data is fuzzy set theory introduced by Zadeh. Fuzzy set theory deals with fuzzy sets that may be viewed as a generalisation of the concept of a ordinary (crisp) set and defined in a given universe of discourse V, as well. A fuzzy set F in V is characterised by membership function mF which takes values in the interval [0,1], that is

 

                    mF (v) : V à [0,1]                vV

 

where mF describes a grade of membership of numerical values vV for a fuzzy set F.

A fuzzy set F in V is usually represented as a combination of ordered pairs of elements (objects) n and their grade of membership value that generally presented as:

 

                       F= {v, mF (v)}  vV

 

Mostly, functional definition is used to define the degree of membership function mF for a fuzzy set F. Membership function mF (v)  of a fuzzy set F in an analytic expression allows calculation of the membership grade for each element (objects) in V. In practice different types of membership functions’ shapes such as S-function, triangular form, trapezoid form, exponential form, etc. are used.

 

Thus, a given fuzzy sets are associated with membership functions throughout. They allow the fuzzification of the crisp values of the attributes estimating the degree of membership function with respect to relevant fuzzy set that may be ranged in interval [0,1]. As it is noted above, according to a fuzzy theory, the attributes can be considered as linguistic variables, which take linguistic values called the linguistic labels. For instance, individuals’ ages may be viewed as the attributes, which are associated with linguistic labels such as Young, Middle Age and Old. Each of these linguistic labels being quantitative semantics is represented by relevant fuzzy sets labelled by them.

 

We consider, for example, a fuzzy subset Fi labelled Middle Age that is defined in a universe of discourse Y (0<y<130) for age attribute. Membership function of this fuzzy subset using a triangular form (see Figure 1(b)) is more plausible. For now, we present membership function for Middle Age individuals defined as follows.

 


                           0       for 0ya

                        y-a        for ayb

                    b-a    

mF (y) =

        c-y        for byc

                        c-b

                           0         for y>c

 

 

 

where the parameters are a=30, b=45, c=60 (see Figure 1(b)). When needed, adjustment may be proposed. Thus, the use of fuzzy sets provides a basis for the presentation of all attributes (features) of the objects (records) through vague and imprecise concept. So, utilisation of the fuzzy features with respect to crisply data in database is supplied.

 

 

 

4. Fuzzy Query Processing

 

Dwelling upon example for individuals we can consider their other linguistic (imprecise) attributes such as Height and Weight that is associated with linguistic labels (fuzzy sets) like Short, Middle Height, Tall and Light, Middle Weight, Heavy, respectively. If we want to identify individuals that are entered as the objects (records) with correspondent numerical attributes in relational database, then we can execute queries that combine any three fuzzy sets taken one by one from feature variables such as Age, Height and Weight. For instance, we pose the following query request when are looking for candidates for basketball team

 

Select *

From Fuzzy

where Age is Young and Height is Tall and Weight is Light

 

where from the relational database Fuzzy it has to be selected the records (individuals) on three fuzzy predicates that linked Boolean and (intersection) operation.

 

Obviously, the search criterion involves three fuzzy sets but fuzzy query processing may contain large sets, which is one of issues associated with access processing. An understanding query request for selecting objects (records) is based on the membership functions for Young, Height and Tall attributes as defined in Figure 2. A membership function for relevant attributes can be defined for identification of people within their supports that yields mY (a) > 0, mT (h) > 0 and mL (w) > 0.

 

The current query language as utilised in relational database SQL does not support above considered imprecise query with respect to crisp data. Because its grammar does not provide the use of the imprecise predicates. It has been proposed few different extensions to SQL language such as QUEL, SQLf where fuzzy queries are consistent with previous fuzzy query grammars.

 

We suggest an extended SQL query language where fuzzy query processing is modified for the classification of population storing on the relational crisp database. We describe an implementation of fuzzy query processing in relational database structure. For the experimental prototype implementation we have limited the population records up to 70 and the crisp data by numeric integer data.

 

Extended SQL-based interface that supplies interaction with database records contains some additional manipulation (filtering) primitives provides the performance of the efficient evaluation membership function and the fuzzy logic operations (union and intersection) to combine them, as well. Used database is content-addressable meaning that individuals (records) can be identified and retrieved based on the content of any attribute or their combinations. So that, any one or several numeric attributes are used for building of additional index structure based on the evaluation of the relevant membership functions for each record (individual).

 

Thus, for all the records (individuals) in database the membership function values in all the fuzzy sets have been calculated with respect to above linguistic labels (see Figure.2.). This approach has the advantage of evaluating the membership functions once (i.e., when adding or updating data in the database) and thus avoids a runtime evaluation during query processing and provides high speed for access processing.

 

 

 

 

                                                                                                a)

 

 

b)

 

 

c)

 

 

                                Figure 2. Membership functions for fuzzy sets

                                    a) for Age

                                    b) for Height

                                    c) for Weight

 

 

We consider new issue dealing with evaluation of degree to satisfy the presented SQL’s search criterion with respect to all the records (individuals). As it is noted above, this evaluation means assessment of three atomic queries (fuzzy predicates) such as “Age is Young”, “Height is Tall “ and “Weight is Light” and then taking of Boolean combinations (aggregation) of these evaluating. Particularly, as aggregation function is used the conjection rule by us that logically conjects the grades denoted by mY (a), mT (h) and mL (w), respectively assigned to relevant atomic queries. Thus, it is based on pointwise implementations of intersection operations. Then, the formula of the conjection rule that express degree for the satisfaction of the search criterion by an individual presented as follows:

 

                    mYTL (a,h,w) = mY(a) AND mT(h) AND mL(w)

                                                     =min{mY(a), mT(h), mL(w) }

 

where A, H, W  are the universe of discourse for Age (a), Height (h)  and Weight (w) attributes respectively; a Î A,  Y ÎA,  h Î H,  T Î H,   wÎ W,  LÎ W.

 

Therefore, by using the conjection of

 

                    mY (a): Y→ [0,1],  mT (h): H→ [0,1]  and mL (w): W→ [0,1]

 

 with respect to all the individuals (they exist in the additional indexing structure) the records are selected in a way that better meet all the fuzzy atomic queries, when all membership function values are greater than zero. Such kind of selection of the population in the database can be executed similarly by using other fuzzy sets, namely as:

 

for age a : MA ( Middle Age ) Î A, O ( Old) Î A

for height  h : S ( Short ) Î H, MH ( Middle Height ) Î H

for weight w : MW (Middle Weight ) Î W, H ( Heavy ) ÎW

 

As a result, the records (individuals) selected during the fuzzy query processing can be supplied by listing records in order of degree of meeting as value of  mYTL (a,h,w)   for given search criterion. Listing begins from record that best meets criterion in a descending direction.

 

Thus, described fuzzy query processing accomplishing fuzzy classification scheme provides fuzzy clustering where the individuals might belong to one group with its various membership grade. A selection of individuals (objects) by ranking degree for satisfaction of given criterion within the same cluster provides the possibility for certain number of selected ones.

 

Proposed fuzzy query processing in relational database as experimental prototype is implemented using extended SQL that contains above denoted addition manipulation primitives. These primitives have been developed using Delphi 4.0 in environment of Dbase–5 database system. The implementation of fuzzy query processing has been successful and efficient, which is verified by the result of search and retrieval process as presented in Figure 3

 

 

Figure 3.   Computer output for query processing

 

 

As clearly seen in Figure 3, by means of adjusting the buttons for Age, Height and Weight attributes according to relevant fuzzy predicates the search criterion is established. As a result, records (individuals) are presented in the table for individuals where they are listed on the grade of satisfaction for the entering criterion.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

REFERENCES

 

1. Ali M. Abbasov, Masuma G. Mamedova, Mubariz E. Eminov, Synthesis of the Fuzzy System for Automatic Control over the Complex Objects, 6th International Conference, Machine Design and Production, UMTIK-94, 21-23 September 1994, Ankara, Turkey.

2. Mansfield W.H. and Fleischman R.M., A High Performance, Ad Hoc, Fuzzy Query Processing System, Journal of Intelligent Information Systems, Vol.2., No., November 1993, pp.397-418

3. R.Fagin, S.Jose, Fuzzy Queryies in Multimedia Database Systems, Proceedings: ACM Sigact-Sigmod-Sigart Symposium on Principles of Database Systems, 1998.

4. Yan Jan, Micheal Ryan, Using Fuzzy Logic: towards in intelligent systems, Prentice Hall, 1994.

5. Yager, R.R., and Larsen H.L., Retrieving Information by Fuzzification of Queries, Journal of  Intelligent Information Systems, Vol.2, No:4, November 1993, pp.421-441

6. Zadeh, L.A., Fuzzy Set Theoretic interpretation of linguistic hedges, Journal of Cybernetics, Vol. 2, pp 4-34, 1972

7. Zadeh, L.A., Similarity Relations and Fuzzy Ordering, Journal of Information Sciences, Vol.3, pp.177-200, 1971

8. Zadeh, L.A., Fuzzy Sets, International Journal of Information and Control, Vol. 8, pp.338-353, 1965