VN-KIM KBM: A Distributed and Collective Tool for Managing Semantic Web Knowledge Bases
Dat T. Huynh, Tru H. Cao, Hung Q. Ta, and Le H. Nguyen
Faculty of Computer Science and Engineering Ho Chi Minh City University of Technology, Vietnam {htdat@cse.hcmut.edu.vn, tru@cse.hcmut.edu.vn} Abstract. This paper introduces VN-KIM KBM as software to manage, add, delete, or modify named entities and their relations in a knowledge base on Sesame platform. In contrast to Protégé, beside usual functionalities for knowledge base management, the software allows users to work on certain portions of a knowledge base and do it off-line before actually updating the knowledge base. Therefore, it is useful to deal with a large knowledge base and to avoid accidental errors with direct accesses on the whole knowledge base.
1
Introduction
The advent of semantic web has motivated the development of languages to describe the semantics of web pages, such as DAML+OIL ([6]) and OWL ([1]). Also, various open source software for managing knowledge bases have been implemented and used for large scale systems, such as Sesame ([3]), Jena ([8]), and Kaon ([2]), allowing users to manage and query on knowledge bases. Among those, Sesame is considered as a good platform for storage and management of knowledge bases in RDF/RDFS format and has been used widely for semantic web systems. On top of such knowledge base management systems, tools with more friendly interfaces and functionalities for building up and maintaining knowledge bases have been developed, such as OntoEdit ([12]), GKB-Editor ([11]) and Protégé ([10]), where Protégé is well-known for creating ontology and knowledge base1 via a rich and easyto-use graphic user interface. However, there is a significant drawback with a tool like Protégé. That is, it loads all the knowledge base under construction, which may be huge, to the system memory, and directly manipulated the knowledge base. As such, it unnecessarily consumes memory when only portions of the knowledge base are built or updated. It also does not support building and maintaining a knowledge base in a distributed and collective way by a team of users. Moreover, accessing the whole knowledge base is not really safe as accidental errors may occur. Therefore, we have designed and developed a tool, called VN-KIM KBM, which allows a user to specify and download a portion of a knowledge base, edit it off-line, and upload it back to the knowledge base. Besides, the tool has the usual functionalities for creating ontology, adding, deleting, and modifying named entities (NE) and their relations like other existing ones. The tool has been employed for constructing and managing the ontology and knowledge base of our Vietnamese semantic web system VN-KIM ([9]).
1
In this paper we use the term ontology for class hierarchies of entities and relations, and the term knowledge base for populated instances of an ontology.
1
Section 2 briefly presents the basic features of Sesame and Protégé. Section 3 introduces VN-KIM and highlights essential features of VN-KIM KBM. Finally, Section 4 concludes the paper with some remarks.
2
Basic Features of Sesame and Protégé
2.1 Sesame Sesame is an open source project in Java for storing, managing, reasoning, and querying on RDF/RDFS knowledge bases. It supports users with functionalities to store data in various storage systems such as files, memory, open source DBMS (e.g. PostgreSQL, MySQL). In addition, Sesame also provides a library of APIs for accessing and manipulating RDF/RDFS knowledge bases. It can be run on different platforms and considered as a middleware providing services to manage and retrieve data in RDF/RDFS format. Sesame also offers a flexible communication model, supporting remote accesses via web standard protocols like HTTP and SOAP. Currently, Sesame is being upgraded and widely used in knowledge base servers of large scale semantic web systems. Besides the above-mentioned essential features for storage and communication, Sesame also provides modules for querying, administration, security and versioning. Sesame is equipped with query languages such as SPARQL and SeRQL. Especially, SeRQL (Sesame RDF Query Language) is a RDF/RDFS query language that inherits the best features of RQL and RDQL. It supports subsumption query, where a concept class or relation class in a query can match with its subclasses in a knowledge base. In brief, that is a generic architecture and a good candidate for storing, managing, and retrieving large scale knowledge bases on the web. 2.2 Protégé Protégé is a multi-platform tool for building up a knowledge base. It allows users to create ontology about a specific domain and enter instances for a knowledge base with respect to that ontology. With a friendly and flexible interface, it enables users to model knowledge and customize input data for knowledge acquisition. The current stable version of Protégé is 3.3.1. It supports construction of knowledge bases in various storage formats such as OWL, RDF/RDFS, and XML. Moreover, users can also install and exploit some additional support plug-ins for visualization of ontology and knowledge base, inference and reasoning, etc. Fig. 2.1 is the main graphic user interface of Protégé. The tab “Classes” is for editing a set of classes in a hierarchy. Users can create a new class, delete, or view detailed information about an existing class through a set of function icons on the left panel. Next, the tab “Slots” provides the functions for editing the property and relation classes in the ontology. The tab “Forms” allows users to customize data entry forms for each class, which are then used for entering instances of each class by the tab “Instances”. Finally, the tab “Queries” is used to define query templates and search on the currently edited knowledge base.
2
Fig. 2.1. Protégé main graphic user interface
3
VN-KIM Knowledge Base Management System
3.1 VN-KIM KBM architecture Inspired by the KIM system ([7]), we have developed VN-KIM for Vietnamese web pages. Nevertheless, our research results are directly applicable to or, can be adapted for, web pages in other languages such as English. VN-KIM can recognize named entities in Vietnamese web pages and annotate their features, i.e., classes and identifiers, in those web pages for further processing, in particular semantic searching ([4]) and clustering ([5]). The current precision and recall measures of VN-KIM for NE recognition are about 85%. At present, VN-KIM ontology has 370 classes and 115 properties, with its populated knowledge base of over 200,000 entities, about 60% of which are entities in Vietnam and the rest in the world. Due to the above-mentioned disadvantage of Protégé, we have designed and implemented VN-KIM KBM as a tool to build up and maintain VN-KIM knowledge base. In stead of uploading the whole knowledge base into the system memory, VNKIM KBM allows a user to choose one of its portions, which we call a project, to edit. The user can then add, delete, or update named entities in that portion off-line, generating a script file of such actions. After verifying that modified portion, the user can upload it back to the knowledge base at the central Sesame server, by running the recorded script. Each project is specified by a set of NE classes of the objects to be edited, which can be read as well as written on, and the set of all NE classes that have relations with the edited NE classes, which are read-only. In this way, the task of constructing a knowledge base can be shared by different users at the same time. In this paper, the examples are on VN-KIM ontology and knowledge base so data are displayed in Vietnamese in illustrating screen shots. However, VN-KIM KBM is a general tool, which can run on knowledge bases of other languages as well. That is why its menu items, panel titles, and function names are designed and displayed in English. Figure 3.1 illustrates the idea of cutting a knowledge base into projects. Figure 3.2 outlines the main modules of VN-KIM KBM.
3
a project
edited class
related class
class relation
Fig. 3.1. The boundary of a VN-KIM KBM project
User Interface
Project Management
Project Generation
Script Execution
Project files
Sesame
Fig. 3.2. VN-KIM KBM architecture
3.2 VN-KIM KBM project editor Figure 3.3 shows the main user interface the VN-KIM KB project editor. After providing username and password for authentication, the user is required to choose the knowledge base repository from which he/she wants to extract a project to edit. Then VN-KIM KBM loads and displays the ontology from the repository to allow the user to choose the edited and related classes for the project, as shown in Figure 3.4. The user starts with selecting the main classes, i.e., the classes to be edited. For each edited class, their properties are shown in the “Class Properties” panel, and their related classes in the “Project Related Classes” panel, for the user to select the ones of concern. The “Project Information” panel summarizes the numbers of main classes, related classes, properties, and entities involved in the project.
4
Fig. 3.3. VN-KIM KBM main graphic user interface
Fig. 3.4. Selection of main and related classes for a VN-KIM KBM project
Fig. 3.5. Editing and uploading a project in VN-KIM KBM
5
After the project is created, VN-KIM KBM downloads it to the user’s local computer and the user can edit the named entities involved in the project, via an interface with rich functionalities as shown in Figure 3.5. In the interface, main classes are marked in green, whose entities are allowed to be modified, while related classes are marked in red, prohibiting modification of their entities. After finishing editing the project, the user can save it for further local change or upload it to the central knowledge base managed by Sesame.
4
Conclusion
We have presented the main features of VN-KIM KBM as a tool to construct and manage RDF/RDFS ontology and knowledge base for semantic web. Its development has been motivated by the drawback of the similar tool Protégé requiring loading and editing a whole knowledge base at once. In contrast, VN-KIM KBM works in a distributed and collective way, so that each person in a team can edit part of the knowledge base off-line and then upload it to the central Sesame server. To that end, we have introduced the notion of projects, each of which consists of the main classes and their chosen properties to be edited, and the related classes with respect to those properties. We are enhancing VN-KIM KMB to support OWL. Also, there are problems related to knowledge base maintenance that we are currently investigating, such as duplicate and consistency checking even within a project, when knowledge is acquired from different sources.
References
Bechhofer S., et al: OWL Web Ontology Language Reference. W3C Recommendation (2004). Bozsak E., et al: Kaon - Towards a Large Scale Semantic Web. In: Proceedings of EC-Web, LNCS 2455 Springer, (2002) 304-313. 3. Broekstra J., Kampman A., Harmelen, F.V.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Proceedings of the 1st International Semantic Web Conference, Springer-Verlag, (2002) 54-68. 4. Cao, T.H., Le, K.C., Ngo, V.M.: Exploring Combinations of Ontological Features and Keywords for Text Retrieval. In: Proceedings of the 10th Pacific Rim Intl Conference on Artificial Intelligence. Springer-Verlag, (2008) to appear. 5. Cao, T.H., Do, H.T., Hong, D.T., Quan, T.T.: Fuzzy Named Entity-Based Document Clustering. In: Proceedings of the 17th IEEE International Conference on Fuzzy Systems, (2008) 2028-2034. 6. Horrocks I.: DAML+OIL: A Description Logic for the Semantic Web. In: IEEE Bulletin of the Technical Committee on Data Engineering, 25(1), (2002) 4-9. 7. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic Annotation, Indexing, and Retrieval. In: Journal of Web Semantics, 2, (2005). 8. McBride B.: Jena: A Semantic Web Toolkit. In: IEEE Internet Computing, 6(6), (2002) 55-59. 9. Nguyen V.T.T., Cao T.H.: VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web. In: Journal of New Generation Computing, 25(3), (2007) 277-292. 10. Noy N.F., et al.: Creating Semantic Web Contents with Protege-2000. In: IEEE Intelligent Systems, 16(2), (2001) 60-71. 11. Paley S. M., Lowrance J. D., and Karp P. D.: A Generic Knowledge-Base Browser and Editor. In: Proceedings of the 1997 National Conference on Artificial Intelligence, (1997) 1045-1051. 12. Sure Y., M. Erdmann, J. Angele, S. Staab, R. Studer and D. Wenke: OntoEdit: Collaborative Ontology Development for the Semantic Web. In: Proceedings of the 1st International Semantic Web Conference, (2002) 221 – 235. 1. 2.
6