Project 1: Text mining for keystone citations
This project aims to discover logical dependencies among scientific publications. This project started in Summer 2019 and has gone through several stages.
Developed a conceptual framework with my advisor for identifying keystone citations (i.e., citations whose validity can make or break the arguments in the citing paper).
Fu, Y., & Schneider, J. (2020). Towards Knowledge Maintenance in Scientific Digital Libraries with the Keystone Framework. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 217–226. https://doi.org/10.1145/3383583.3398514
2020- early 2021
Proof of concept
Developed a set of proxy machine learning classifiers to concept-proofing the feasibility of extracting keystone citations from biomedical papers.
Fu, Y., Schneider, J., & Blake, C. (2021). Finding Keystone Citations for Constructing Validity Chains among Research Papers. Companion Proceedings of the Web Conference 2021, 451–455. https://doi.org/10.1145/3442442.3451368
Building a new dataset
Currently I am working on building a dataset for training machine learning classifiers for extracting keystone citations.
Project 2: Network analysis method for auditing bias in scientific publications
This project aims to use network analysis method to evaluate scientific papers for their potential bias.
In the iSchool’s Network Analysis class, Tzu-Kun Hsiao, a fellow PhD student, and I initiated a project using network analysis and visualization to understand the “salt controversy” (i.e., whether reducing salt intake is beneficial at population level). We made a few puzzling discoveries, particularly the inconsistency in what counts as evidence for this specific dietary recommendations.
Hsiao, T.-K., Fu, Y., & Schneider, J. (2020). Visualizing evidence-based disagreement over time: The landscape of a public health controversy 2002–2014. Proceedings of the Association for Information Science and Technology, 57(1), e315. https://doi.org/10.1002/pra2.315
2020- early 2021
Design network metrics
With undergraduate student Jasmine Yuan, I designed two network metrics to select what I call “marginalized papers” and “unique papers,” both of which should receive more attention to literature users. Python iGraph package was used to compute those metrics (GitHub repository).
Fu, Y., Yuan, J., & Schneider, J. (2021). Using Citation Bias to Guide Better Sampling of Scientific Literature. Proceedings of the 18th International Conference on Scientometrics & Informetrics, 419–424. http://jodischneider.com/pubs/issi2021.pdf
Improve the computation of “expected citation counts”
I work with a statistics master student, Zhonghe Wan, to develop a probabilistic model for making better estimation of “expected citation counts.”