[{"content":"","date":null,"permalink":"https://qingzegu.com/categories/","section":"Categories","summary":"","title":"Categories"},{"content":"Plain-language summaries of my research. Full citations link to each paper.\n1. Prevalence of 406 rare diseases by ethnicity and their COVID-19 burden — first author #Gu Q, et al. (on behalf of the CVD-COVID-UK/COVID-IMPACT Consortium). Prevalence of 406 rare diseases by ethnicity and their associated COVID-19 infection burden: a national cross-sectional study of 62.5 million people in England. medRxiv (2026); under revision at Scientific Reports. https://doi.org/10.64898/2026.01.13.26344068\nThe first national map of how common 406 rare diseases are across 19 ethnic groups in England, from linked records for 62.5 million people — showing that rare diseases, and their COVID-19 burden, fall unevenly across ethnicities.\n2. C-reactive protein responses and antibiotic prescribing — first author #Gu Q, et al. Interplay between C-reactive protein responses and antibiotic prescribing in people with suspected infection. BMC Infectious Diseases 25:987 (2025). https://doi.org/10.1186/s12879-025-11381-9\nAcross 51,544 suspected-infection episodes, changes in a common inflammation marker (CRP) nudged antibiotic decisions only modestly — most went unchanged — but early CRP movement strongly predicted survival.\n3. Transformers and LLMs as efficient feature extractors for EHR studies — joint first author #Yuan K, Yoon CH, Gu Q, et al. Transformers and large language models are efficient feature extractors for electronic health record studies. Communications Medicine 5:83 (2025). https://doi.org/10.1038/s43856-025-00790-1\nTested whether language models (BERT, GPT) can read free-text antibiotic notes to identify infection type across ~938,000 prescriptions; a fine-tuned clinical BERT reached F1 0.98, and free text captured 31% more detail than diagnostic codes.\n4. Predicting individual and hospital-level discharge with machine learning — co-author #Wei J, … Gu Q, et al. Predicting individual patient and hospital-level discharge using machine learning. Communications Medicine 4:236 (2024). https://doi.org/10.1038/s43856-024-00673-x\nMachine-learning models that predict who will be discharged within 24 hours and forecast daily hospital discharge numbers accurately enough (AUROC 0.87) to help with capacity planning.\n5. Vital-sign and inflammatory-marker patterns in suspected bloodstream infection — first author #Gu Q, et al. Distinct patterns of vital sign and inflammatory marker responses in adults with suspected bloodstream infection. Journal of Infection 88:106156 (2024). https://doi.org/10.1016/j.jinf.2024.106156\nMapped how inflammation markers and vital signs typically recover after suspected bloodstream infection across 88,348 episodes, turning them into personalised \u0026ldquo;recovery charts\u0026rdquo; — like growth charts — to track whether a patient is on track.\n6. Amoxicillin vs co-amoxiclav for community-acquired pneumonia — co-author #Wei J, … Gu Q, et al. No evidence of difference in mortality with amoxicillin versus co-amoxiclav for hospital treatment of community-acquired pneumonia. Journal of Infection 88:106161 (2024). https://doi.org/10.1016/j.jinf.2024.106161\nUsing causal-inference methods (propensity-score matching and IPTW) on 16,072 admissions, found no difference in deaths between a narrow- and broad-spectrum antibiotic — supporting wider use of the narrower drug to help curb resistance.\n7. Vancomycin dosing guideline and predictive factors — first author #Gu Q, et al. Assessment of an institutional guideline for vancomycin dosing and identification of predictive factors associated with dose and drug trough levels. Journal of Infection 85:382–389 (2022). https://doi.org/10.1016/j.jinf.2022.06.029\nAudited a hospital\u0026rsquo;s electronic vancomycin dosing guideline in 3,767 patients: it was followed well, but only a quarter reached the target drug level, so I proposed age-, weight-, and kidney-tailored dosing.\n8. \u0026ldquo;Bloodstream infection\u0026rdquo;: a valuable concept we should keep — second author, correspondence #Danielsen AS, Gu Q, Fostervold A, Eyre DW, Bjørnholt JV. \u0026lsquo;Bloodstream infection\u0026rsquo;: a valuable concept we should keep in our toolbox. Journal of Infection (2024). https://doi.org/10.1016/j.jinf.2024.106236\nA short correspondence arguing that \u0026ldquo;bloodstream infection\u0026rdquo; remains a clinically useful concept worth keeping in the diagnostic toolbox.\n","date":"1 January 0001","permalink":"https://qingzegu.com/publications/","section":"Qingze Gu","summary":"","title":"Publications"},{"content":"I am a health data scientist and clinical epidemiologist who generates real-world evidence (RWE) from population-scale health data. I design and run cohort and observational studies that turn electronic health records (EHR), registries, and other real-world data into evidence supporting clinical, regulatory, and commercial decisions — using causal inference, comparative effectiveness, survival analysis, mixed-effects and latent-class models, machine learning, and clinical NLP/LLMs in R, Python, and SQL.\nAs a Research Fellow at NTU Singapore, I work on PRECISE-SG100K — a multi-ancestry Asian population cohort of ~100,000 participants whose deep phenotypes and whole-genome data are linked to electronic health records and analysed within the secure TRUST platform. I build reproducible endpoint-analysis pipelines for chronic diseases (cardiovascular disease, type 2 diabetes, chronic kidney disease, liver disease, and cancer) and LLM pipelines that normalise free-text medications to OMOP/RxNorm concepts.\nBefore NTU, my Oxford DPhil exploited hospital electronic health records to improve infection management — modelling infection-response trajectories, antibiotic prescribing, and drug dosing. As a postdoctoral health data scientist I led a national study of rare-disease prevalence and COVID-19 burden across 62.5M people in the NHS England Secure Data Environment, and collaborated on genetic analyses of shared mechanisms between hypertension and type 2 diabetes. Earlier, I worked on the industry side of RWE at IQVIA and Oracle (Cerner Enviza).\nExperience #Research Fellow · Nanyang Technological University (LKCMedicine) · Singapore · Oct 2025 – present\nBuild reproducible cohort and endpoint-analysis pipelines for PRECISE-SG100K (~100,000 multi-ancestry participants), linking research phenotypes and whole-genome data with electronic health records in the secure TRUST platform. Deliver disease-specific endpoint analyses across chronic diseases (cardiovascular disease, cancer, and others) — SNOMED/ICD codelist mapping, incidence and survival modelling, and absolute-risk estimation — on a versioned, config-driven framework reused across studies. Develop LLM pipelines normalising 35,000+ free-text and self-reported medications to generic ingredients and OMOP/RxNorm drug concepts. Postdoctoral Health Data Scientist · University of Oxford · Oxford, UK · Oct 2024 – Sep 2025\nLed a national cross-sectional study estimating the prevalence of 406 rare diseases and their COVID-19 burden across 62.5M people and 19 ethnic groups, using linked primary-care, hospital, and mortality records in the NHS England Secure Data Environment (Databricks/PySpark/R/SQL). Collaborated on genetic analyses of shared mechanisms between hypertension and type 2 diabetes, accounting for adiposity. Consultant in Clinical NLP · Laboratory of Data Discovery for Health (D24H) · Remote / Hong Kong · Jun 2025 – Oct 2025\nBuilt and validated an end-to-end LLM pipeline for TNM staging of non-small cell lung cancer (NSCLC) from pseudonymised oncology clinical notes — OCR text extraction, gold-standard labelling, and feature-based staging. Led data curation (annotation-schema design, gold-standard labelling) and built a component-wise accuracy-evaluation harness; the pipeline reached over 90% accuracy on each of the T, N, and M staging components. PhD Researcher (Biomedical Data Science) · University of Oxford · Oxford, UK · Oct 2020 – Sep 2024 Thesis: \u0026ldquo;Exploiting electronic health records to improve infection management\u0026rdquo; (viva passed with no corrections).\nCharacterised pathogen-specific inflammatory-marker and vital-sign trajectories in suspected bloodstream infection via latent-class mixed models on five years of hospital EHR, deriving centile reference charts to guide infection management. Applied transformers and LLMs (BERT, GPT) to free-text antibiotic indications to infer infection sources, benchmarked against ICD-10 coding. Evaluated an institutional vancomycin dosing guideline using regression, survival analysis, and population-pharmacokinetic simulation. Real-World Solutions Intern · IQVIA · Beijing, China · Jul – Aug 2024\nAssessed regional EHR databases and built feasibility table shells to support real-world study planning; desk research on disease burden, clinical trials, and patient-reported outcomes. Real-World Evidence Intern · Cerner Enviza, an Oracle company · Shanghai, China · Sep 2023 – Mar 2024\nAssembled treatment and marketed-drug data for survey-based RWE studies; developed case report forms and health-economics indicators; reviewed protocols on patient characteristics, disease burden, and treatment patterns. Education # PhD, Clinical Medicine (Biomedical Data Science) · University of Oxford · 2020–2024 MSc, Pharmacology (Distinction) · University of Oxford · 2019–2020 BEng, Pharmaceutics and Food · Harbin Institute of Technology · 2015–2019 Selected publications # Prevalence of 406 rare diseases by ethnicity and their COVID-19 burden — Gu Q, et al. medRxiv (2026). First author. Transformers and large language models are efficient feature extractors for EHR studies — Yuan K, Yoon CH, Gu Q, et al. Communications Medicine (2025). Joint first author. Distinct patterns of vital sign and inflammatory marker responses in suspected bloodstream infection — Gu Q, et al. Journal of Infection (2024). First author. Assessment of an institutional guideline for vancomycin dosing and predictive factors — Gu Q, et al. Journal of Infection (2022). First author. View all publications →\n","date":null,"permalink":"https://qingzegu.com/","section":"Qingze Gu","summary":"","title":"Qingze Gu"},{"content":"","date":null,"permalink":"https://qingzegu.com/tags/","section":"Tags","summary":"","title":"Tags"}]