Open Access Open Access  Restricted Access Subscription Access

Extracting Arabic Relations from the Web


Affiliations
1 Department of Systems and Information, Engineering Division, National Research Centre, Cairo, Egypt
2 Department of Electrical Engineering, Benha University, Egypt
 

There is a vast amount of unstructured Arabic information on the Web, this data is always organized in semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that extracts binary relations between two Arabic named entities from the Web. Several works have been performed for relation extraction from Latin texts and as far as we know, there isn't any work for Arabic text using a semi-supervised technique. The goal of this research is to extract a large list or table from named entities and relations in a specific domain. A small set of a handful of instance relations are required as input from the user. The system exploits summaries from Google search engine as a source text. These instances are used to extract patterns. The output is a set of new entities and their relations. The results from four experiments show that precision and recall varies according to relation type. Precision ranges from 0.61 to 0.75 while recall ranges from 0.71 to 0.83. The best result is obtained for (player, club) relationship, 0.72 and 0.83 for precision and recall respectively.

Keywords

Relation Extraction, Information Extraction, Pattern Extraction, Semi-Supervised, Arabic language and Web Mining.
User
Notifications
Font Size

Abstract Views: 334

PDF Views: 148




  • Extracting Arabic Relations from the Web

Abstract Views: 334  |  PDF Views: 148

Authors

Shimaa M. Abd El-Salam
Department of Systems and Information, Engineering Division, National Research Centre, Cairo, Egypt
Enas M. F. El Houby
Department of Systems and Information, Engineering Division, National Research Centre, Cairo, Egypt
A. K. Al Sammak
Department of Electrical Engineering, Benha University, Egypt
T. A. El-Shishtawy
Department of Electrical Engineering, Benha University, Egypt

Abstract


There is a vast amount of unstructured Arabic information on the Web, this data is always organized in semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that extracts binary relations between two Arabic named entities from the Web. Several works have been performed for relation extraction from Latin texts and as far as we know, there isn't any work for Arabic text using a semi-supervised technique. The goal of this research is to extract a large list or table from named entities and relations in a specific domain. A small set of a handful of instance relations are required as input from the user. The system exploits summaries from Google search engine as a source text. These instances are used to extract patterns. The output is a set of new entities and their relations. The results from four experiments show that precision and recall varies according to relation type. Precision ranges from 0.61 to 0.75 while recall ranges from 0.71 to 0.83. The best result is obtained for (player, club) relationship, 0.72 and 0.83 for precision and recall respectively.

Keywords


Relation Extraction, Information Extraction, Pattern Extraction, Semi-Supervised, Arabic language and Web Mining.