The idea is that, in many cases, the values of an attribute domain may be
clustered because strongly related according to some kind of "hidden
relationship".
Providing a name to these clusters, we may refer to a relevant value name which
encompasses a set of values.
More formally, given a class/table C and one of its attributes At, a relevant
value for it, rvAt is a pair
rvAt = < rvnAt, valuesAt>
where rvnAt is the name of the relevant value, while valuesAtis the set of values referring to it
Where?
Publications about Relevant
S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori, M. Vincini:
"RELEVANT News: a semantic news feed aggregator",
4th Workshop on Semantic Web Applications and Perspectives (SWAP 2007),
Bari, Italy, December 18-20, 2007
S. Bergamaschi, F. Guerra, M.Orsini, C.Sartori: "Extracting Relevant Attribute
Values for Improved Search", IEEE Internet Computing, vol. 11, no. 5, pp. 26-35, Sept/Oct, 2007
(special issue on Semantic-Web-Based Knowledge Management), ISSN 1089-7801
S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori: "A new type of metadata for
querying data integration systems", Proceedings of the Convegno
Nazionale Sistemi di Basi di Dati Evolute (SEBD2007), Torre Canne (Fasano,
BR)|, 17-20 June 2007, pp 266-273, ISBN 978-88-902981-0-3
S. Bergamaschi, F. Guerra, M. Orsini, C. Sartori: "Relevant values: new metadata
to provide insight on attribute values at schema level", In
proceedings of the 9th International Conference on Enterprise Information
Systems, Funchal, Madeira – Portugal, 12-16, June 2007 (p.p. 274-279). ISBN
978-972-8865-88-7
We started to research about this topic some years ago. The Relevant idea and
implementation started on December 2005.
Why?
Our research is motivated form the idea that the knowledge about the metadata
describing a database (table's name, attributes' names and domain, ...) is
often not enough for writing a query, expecially in a data integration
environment like MOMIS.
Integration puts together in the same global class a number of local
semantically similar classes coming from different sources.
The name/description of a global class/global attribute is often generic and
significantly limiting the effectiveness of querying.
Ignoring the values assumed by a global attribute may generate meaningless, too
selective or empty queries.
Knowing all the data collected from a global class is infeasible for a user:
databases contain large amount of data which a user cannot deal with.
A metadata structure derived from an analysis of the attribute extension could
be of great help in overcoming such limitation. Such metadata represent a
synthesized extensional knowledge emerging from the attribute values. They are
“relevant values” as they provide to the users a synthetic description of the
values of the attribute by representing its domain with a reduced number of
values.
How?
For generating relevant values we have to face two issues:
How can we cluster the values of the domain in order to put together in a
relevant value a set of values which are strongly related?
By means of data mining and clustering techniques, adapted to the problem on
hand. The techniques take into account some semantics extracted from:
the syntax of the values: values related to the same object may have the
same etymology and then share a common root;
the dominance, which discovers values more general than other ones;
the lexical meaning w.r.t. WordNet, which identifies semantically
related values expressed with a different terminology.
How can we choose the relevant value names?
It will require the intervention of the designer, but we will provide an
effective assistant.