作者(外文):Chou, Shih-Yao
論文名稱(外文):Automatic Content Analysis of Legislative Documents by Text Mining Techniques
指導教授(外文):Lin, Fu-Ren
口試委員(外文):Ray, Soumya
Cheng, Hsing
外文關鍵詞:text miningSVMlegislative performanceclassificationtwo-stage clustering
此研究根據中山政治所專家所建構的立法分類架構為基礎,透過兩階段分群(two-stage clustering)去做特徵值擷取,再採用支持向量機(support vector machine)去建立模型來自動預測立委立法表現到最適合的分類。
The Parliamentary Library of Taiwan’s Legislative Yuan website provides a fair and objective channel for the public to track daily activities of the Legislative Yuan and legislators’ inquiries. However the quantity of generated documents is so large that the general public may not be able to update of the legislative performance of each legislator from these contents. To mitigate the gap of legislative document generation and the sense making by the general public, this study proposed a text mining mechanism to automatically classify legislative documents referring to each legislator, and then represent the proportion of their legislative performance on certain categories.
This study first initiated a basic legislative categorical structure by domain experts. Then a two-stage clustering was applied to perform feature selection for legislative documents. The SVM method was applied to build a model to classify the new document to the appropriate category.
In order to maintain the classification categories up to date, in this study, we also evaluate the difference from labeling contents by domain experts and the general public. If the categories labeled by both do not have significant difference, we can call for the general public via internet to maintain the updated categories of newly generated legislative documents.
Experimental results show the effectiveness of the proposed test mining mechanism, which automatically classifies legislative documents to reveal legislators’ performance accordingly. With this result, people can monitor legislators and track their legislative activities using the information from the Parliamentary Library of Legislative Yuan to update their perception on legislative performance in various categories.
Chapter 1 Introduction --- 8
1.1 Research Background --- 8
1.2 Research Motivation --- 9
1.3 Research Objective --- 10
Chapter 2 Literature review --- 11
2.1 Lack of Research for Information Technique in Political Science --- 11
2.2 Legislative Categorical structure Initialization --- 11
2.3 Support Vector Machine(SVM) --- 13
2.4 Two-stage Clustering --- 14
2.4.1 Stage 1 (hierarchical clustering) --- 14
2.4.2 Stage 2 (k-means clustering) --- 16
Chapter 3 Research Framework --- 17
3.1 System Architecture --- 17
3.2 Pre-processing --- 17
3.3 Showing Relevant Keywords by Clustering --- 19
3.4 Categorical Labeling by Domain Experts --- 20
3.5 Automatic Classification --- 20
Chapter 4 System Implementation and Experimental Design --- 21
4.1 Data Sources --- 21
4.2 System Implementation --- 21
4.3 Evaluation Criteria --- 23
4.4 Experimental Design --- 24
4.4.1 Experiment A: The Evaluation of Classification Result --- 24
4.4.2 Experiment B: The Evaluation between Expert and Public Labeling --- 25
Chapter 5 Experimental Results --- 26
5.1 The Evaluation of Classification Results --- 26
5.2 The Comparison between Experts and Public Labeling --- 31
5.3 The Discussion of Experimental Results --- 32
5.4 The Legislators’ Performance shown by Radar Chart --- 32
Chapter 6 Conclusion and Future Work --- 34
6.1 Conclusion --- 34
6.2 Future Work --- 35
References --- 36
Appendix --- 38
