Project 1: Text mining for keystone citations

This project aims to discover logical dependencies among scientific publications. This project started in Summer 2019 and has gone through several stages.



Developed a conceptual framework with my advisor for identifying keystone citations (i.e., citations whose validity can make or break the arguments in the citing paper).

Fu, Y., & Schneider, J. (2020). Towards Knowledge Maintenance in Scientific Digital Libraries with the Keystone Framework. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 217–226.

2020- early 2021

Proof of concept

Developed a set of proxy machine learning classifiers to concept-proofing the feasibility of extracting keystone citations from biomedical papers.

Fu, Y., Schneider, J., & Blake, C. (2021). Finding Keystone Citations for Constructing Validity Chains among Research Papers. Companion Proceedings of the Web Conference 2021, 451–455.


Building a new dataset

Currently I am working on building a dataset for training machine learning classifiers for extracting keystone citations.

Project 2: Network analysis method for auditing bias in scientific publications

This project aims to use network analysis method to evaluate scientific papers for their potential bias.


Initial work

In the iSchool’s Network Analysis class, Tzu-Kun Hsiao, a fellow PhD student, and I initiated a project using network analysis and visualization to understand the “salt controversy” (i.e., whether reducing salt intake is beneficial at population level). We made a few puzzling discoveries, particularly the inconsistency in what counts as evidence for this specific dietary recommendations.

Hsiao, T.-K., Fu, Y., & Schneider, J. (2020). Visualizing evidence-based disagreement over time: The landscape of a public health controversy 2002–2014. Proceedings of the Association for Information Science and Technology, 57(1), e315.

2020- early 2021

Design network metrics

With undergraduate student Jasmine Yuan, I designed two network metrics to select what I call “marginalized papers” and “unique papers,” both of which should receive more attention to literature users. Python iGraph package was used to compute those metrics (GitHub repository).

Fu, Y., Yuan, J., & Schneider, J. (2021). Using Citation Bias to Guide Better Sampling of Scientific Literature. Proceedings of the 18th International Conference on Scientometrics & Informetrics, 419–424.


Improve the computation of “expected citation counts”

I work with a statistics master student, Zhonghe Wan, to develop a probabilistic model for making better estimation of “expected citation counts.”