帳號:guest(3.133.129.64)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):羅傑聘
作者(外文):Lo, Chieh Pin
論文名稱(中文):以協同濾波與內容導向濾波偵測雙盲論文作者身分
論文名稱(外文):Detecting Authors of Double-Blind Papers by Collaborative Filtering and Content-based Filtering
指導教授(中文):張正尚
指導教授(外文):Chang, Cheng-Shang
口試委員(中文):張正尚
李端興
林華君
黃之浩
口試委員(外文):Chang, Cheng Shang
Lee, Duan Shin
Lin, Hwa Chun
Huang, Chih Hao
學位類別:碩士
校院名稱:國立清華大學
系所名稱:通訊工程研究所
學號:102064529
出版年(民國):104
畢業學年度:103
語文別:英文
論文頁數:25
中文關鍵詞:文字探勘
外文關鍵詞:Text Mining
相關次數:
  • 推薦推薦:0
  • 點閱點閱:324
  • 評分評分:*****
  • 下載下載:6
  • 收藏收藏:0
許多學術會議期刊使用雙盲同行審查流程以確保審查的公平性。在雙盲同行審查流程中,學術論文的作者們與審查者們彼此都不知道對方的身分。然而,在雙盲同行審查流程中,作者的身分是否能有效地被隱藏,是一個有趣的研究題目。對此,我們提出一個偵測雙盲論文作者身分的問題,探討是否能從這些作者過往的著作中獲得資訊,進而辨識出雙盲論文作者的身分。

為了解決這個問題,首先我們從arXiv電子資料庫收集了許多論文。我們根據袋字模型從這些論文抽取特徵字來建造文章對字矩陣,以及作者對字矩陣。藉由使用這兩個矩陣,我們提出以下三種來自協同濾波以及內容倒向濾波的預測方法偵測作者: (1)餘弦相似度,(2)在二分圖上的隨機漫步,(3)矩陣因子分解。

在實驗部分,我們比較這三種方法的準確率。在實驗結果中可見到,餘弦相似度方法擁有最高的準確率94%。然而餘弦相似度方法的運算時間可能會過長,因此,我們提出了最小雜湊法來提升餘弦相似度方法的效率。在實驗結果中我們可見到,擁有二十篇以上過往著作的作者,其身分會有超過90%的機率被我們的系統偵測出來;也就是說,雙盲同行審查流程很難去隱藏那些擁有許多過往著作的作者們的身分。
Many conferences and journals use double-blind peer review to ensure the fairness of the review process. In double-blind peer review process, neither author nor reviewer identities are revealed. One interesting research question is to see whether the double-blind paper review process can indeed conceal the authors' identities.

For this, we consider an authors detection problem of double-blind papers to see whether the authors of the double-blind papers can be detected with the information of their past publications.
To solve the authors detection problem, we rst collect a large set of papers from arXiv. Based on the bag-of-word model, we parse these papers to extract terminologies
of authors to construct a document-term matrix and an author-term matrix. By using these matrices, we propose three prediction methods to detect the authors: (i)cosine
similarity, (ii)random walk on the bipartite graph, and (iii)matrix factorization, which are collaborative ltering and content-based ltering techniques.

We compare the accuracy of these three methods in our experiments. Experimental results show that the cosine similarity method has highest accuracy 94%. However, the
computation time of the cosine similarity method might be too long. Therefore, we use minhash to improve the efficiency of the cosine similarity method. We can see that
authors who wrote more than 20 papers have more than 90% probability of being detected by our system; in other words, it is difficult to conceal authors' identities of those who
published a lot of papers in the past.
Contents 1
List of Figures 2
1 Introduction 3
2 Feature extraction 5
3 Prediction Methods 7
3.1 Cosine Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 MinHash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Random walk on the bipartite graph . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Random walks with a xed number of steps . . . . . . . . . . . . 10
3.2.2 Random walks with a stopping probability . . . . . . . . . . . . . 10
3.3 Matrix factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Experimental results 13
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Training Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.2 Testing Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.1 Cosine Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2.2 Random walk on the bipartite graph . . . . . . . . . . . . . . . . 17
4.2.3 Matrix factorization . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.4 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Conclusion 22
[1] P. Brusilovsky, A. Kobsa, and W. Nejdl, The adaptive web: methods and strategies
of web personalization. Springer Science & Business Media, 2007, vol. 4321.
[2] K. Sugiyama and M.-Y. Kan, \Scholarly paper recommendation via user's recent
research interests," in Proceedings of the 10th annual joint conference on Digital
libraries. ACM, 2010, pp. 29{38.
[3] X. Su and T. M. Khoshgoftaar, \A survey of collaborative ltering techniques,"
Advances in arti cial intelligence, vol. 2009, pp. 1-20, 2009.
[4] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, \Item-based collaborative ltering
recommendation algorithms," in Proceedings of the 10th international conference on
World Wide Web. ACM, 2001, pp. 285-295.
[5] G. Linden, B. Smith, and J. York, \Amazon. com recommendations: Item-to-item
collaborative ltering," Internet Computing, IEEE, vol. 7, no. 1, pp. 76-80, 2003.
[6] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, \Using collaborative ltering to
weave an information tapestry," Communications of the ACM, vol. 35, no. 12, pp.
61-70, 1992.
[7] T. Bogers, \Movie recommendation using random walks over the contextual graph,"
in Proc. of the 2nd Intl. Workshop on Context-Aware Recommender Systems, 2010.
[8] M. Li, B. M. Dias, I. Jarman, W. El-Deredy, and P. J. Lisboa, \Grocery shopping
recommendations based on basket-sensitive random walk," in Proceedings of the 15th
ACM SIGKDD international conference on Knowledge discovery and data mining.
ACM, 2009, pp. 1215-1224.
[9] H. Ma, H. Yang, M. R. Lyu, and I. King, \Sorec: social recommendation using
probabilistic matrix factorization," in Proceedings of the 17th ACM conference on
Information and knowledge management. ACM, 2008, pp. 931{940.
[10] C. Basu, H. Hirsh, W. W. Cohen, and C. G. Nevill-Manning, \Technical paper
recommendation: A study in combining multiple information sources," J. Artif.
Intell. Res.(JAIR), vol. 14, pp. 231-252, 2001.
[11] C. Wang and D. M. Blei, \Collaborative topic modeling for recommending scien-
ti c articles," in Proceedings of the 17th ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, 2011, pp. 448-456.
[12] H. Xue, J. Guo, Y. Lan, and L. Cao, \Personalized paper recommendation in on-
line social scholar system," in Advances in Social Networks Analysis and Mining
(ASONAM), 2014 IEEE/ACM International Conference on. IEEE, 2014, pp. 612-619.
[13] Z. S. Harris, \Distributional structure." Word, 1954.
[14] J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of massive datasets. Cambridge University Press, 2014.
[15] M. Newman, Networks: an introduction. Oxford University Press, 2010.
[16] A. Z. Broder, \On the resemblance and containment of documents," in Compression
and Complexity of Sequences 1997. Proceedings. IEEE, 1997, pp. 21-29.
[17] C.-b. Lin, \Projected gradient methods for nonnegative matrix factorization," Neural
computation, vol. 19, no. 10, pp. 2756-2779, 2007.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *