by Daichi Mochihashi, The Institute of Statistical Mathematics, Tokyo, Japan.

ACL2Vec, ACL2Vec-authors are neural search engines of natural language processing papers using the New ACL Anthology Corpus on Github by Shaurya Rohatgi.
It originally contains 80,013 *ACL papers but after removing papers with no authors, PDF conversion errors and too short content, the system is built on 62,313 ACL papers up to September 2022.

Missing papers

Due to the quality of original data, 7,604 ACL papers (about 5%) are missing from this system. Don't worry about them: in fact, 4,328 papers (57%) of them are LREC papers (L), and there are very few missing papers of ACL(P), EMNLP(D), or NAACL(N). The followings are the statistics of the currently missing papers.

ACL researcher's keywords

is a list of keywords computed for each of 8,963 researchers who have >= 5 papers in ACL anthology. These keywords are statistically computed by normalized PMI (NPMI), and largely represent "what words are most associated with him/her" in the actual content of ACL anthology papers. Note that there are no human intervention in creating these statistical keywords.


Below are some statistics computed from the corpus. Hope you find these resources interesting!

Last modified: Sun Jan 8 01:02:26 2023