IHIWS 2026

Bioinformatics

To read more details about the Bioinformatics theme within 19th IHIWS, click on each of the following subthemes. You will get to learn about the project leader, project description, milestones data required and more.

Data Standards Hackathon for Next Generation Sequencing (DaSH for NGS)

Project Leaders:

  • Steven Mack
  • Martin Maiers
  • Kazutoyo Osoegawa 

Detailed Project Description: Development of methods, standards, software tools, and online services to foster the standardized analysis, collection, exchange and storage of highly polymorphic immune-related genetic data for the purposes of basic and clinical research, the advancement of medical therapies, and understanding the genomics of the vertebrate immune system. A specific focus of this project involves development of pangenomic graphs for the HLA, KIR and LILR genes. 

A key goal of this project is the development of am IHIWS database (dbIHW) structure and system fostering community data-access along with database continuity across subsequent IHIWS iterations.  

Milestones in Years 

2023: Community outreach, enrolment and new project formulation, continuation of existing projects, dbIHIW design.  

2024: Construction of datasets, satellite DaSH meetings, publication of papers and standards, continued enrolment, continued dbIHIW design and implementation. 

2025: Identification of sub-projects, satellite DaSH meetings, publication of papers and standards, continued enrolment, dbIHW data-migration and population 

2026: Presentation and dbIHIW launch 

Data Required (number, type of data, inclusion/exclusion criteria): TBD, but will include existing data from prior DaSH efforts along with data from prior IHIW efforts, going back to at least the 13th IHIW (e.g., dbMHC), and including data from the 17th and 18th IHIW efforts. 

Samples required (if applicable, number, type of samples, inclusion/exclusion criteria): No biological samples will be *required* as part of this project, but data describing and generated from current and prior IHIW efforts will be requested. 

Reagents/Additional Assays Required: NONE 

Data Infrastructure Required: We look forward to discussing this with the 19th IHIW organizers and data team. Certainly AWS/Cloud resources will be used for development.  

SNP-HLA Reference Consortium (SHLARC)

Project Name: SNP-HLA Reference Consortium (SHLARC) 

Project Leaders: Nicolas Vince, Pierre-Antoine Gourraud 

Detailed Project Description: 

Over the past 15 years, genome-wide association studies (GWAS) have identified more than 10,000 associations. Particularly, the HLA genomic region stands out as the most highly associated locus in GWAS, predominantly in immune-related diseases. SNPs are the hallmark of GWAS, however, the information on this type of genetic marker is very limited, especially in the HLA region where linkage disequilibrium (LD; defined as the non-random association of allele frequencies) is strong and extends over several megabases. To advance our understanding of functional mechanisms and potentially identify therapeutic targets, we must move beyond these simple associations, especially when dealing with HLA alleles. HLA typing techniques are expensive, require specialized laboratory infrastructure, and are in constant evolution. 

However, recent developments in statistical inference enable us to impute HLA alleles from genotyped GWAS SNPs. Successful implementation of this technique relies on the availability of adequate reference panels for imputation. The objective of this project is to create diverse reference panels that enhance HLA imputation accuracy from GWAS datasets. To achieve this goal, we still need to: 

1- Collect additional HLA and SNP data from numerous sources. 

2- Improve our understanding of how diverse haplotypes and populations influence HLA imputation accuracy. 

3- Maintain a digital platform (SHLARC, the SNP-HLA reference consortium: https://hla.univ-nantes.fr) accessible to scientists for their own data imputation needs. 

Practically, we have successfully gathered more than 10,000 samples from several sources including public data (the 1000 Genomes project), semi-public data (via access to dbGAP and EGA data repositories), and direct collaborations. These later datasets come from diverse ancestry backgrounds such as Brazil (European + African + Native American), Benin (African), and various European (Western Europe, USA, Finland). We are still open to expanding the diversity of our data sources.  

We have also developed an online platform to perform Hla imputation using the datasets mentioned above, which is freely accessible: (https://hla.univ-nantes.fr). 

 

 

Milestones in years: 

  • 2023: Launch of the SHLARC website. 
  • 2024: Joint SHLARC/SIP workshop in Nantes, France (around September). 
  • 2026: Final Report on database diversity, HLA imputation performance, and applicability for research projects. 

Data Required (number, type of data, inclusion/exclusion criteria): 

Several types of data are suitable but all need to contain at least second-field molecular HLA typing for all HLA genes and SNP genotypes. 

SNP genotypes: all types of GWAS chip data, sequencing data covering 500 kb around HLA genes, whole-genome sequencing WGS data. 

Minimal HLA typing resolution: second-field. HLA can also be called from WGS data. 

Data Infrastructure Required: 

Data infrastructure will be hosted in the Nantes Université data center. Additionally, we will make use of our local high throughput calculation center (Glicid, Nantes Université) to build reference panels with the help of high-performance GPUs (NVIDIA A100). 

Clinical Histocompatibility Laboratory Informatics

Project Name: Clinical Histocompatibility Laboratory Informatics 

Project Leaders Loren Gragert and Nicholas Brown 

Detailed Project Description:  

The project will build key infrastructure to standardize collection and reporting of clinical histocompatibility data and improve analysis tools and resources to aid in virtual crossmatch assessments.

We will query the databases underlying the histocompatibility laboratory information systems (LIS) to extract detailed information on molecular HLA typing (high resolution NGS typing and intermediate resolution deceased donor typing), solid phase antibody screens, flow crossmatch, and post-transplant donor-specific antibody (DSA) assessments. 

This more detailed information usually does not leave the HLA laboratory in an electronic form and is not being adequately captured in organ allocation systems and outcomes registries. To make it easier to transfer HLA lab information into electronic medical records (EMRs), biobanks for clinical research, and transplant registries, we are developing data standards for reporting results from HLA antibody screens and histocompatibility assessments. The project will continue development of first XML-based format for HLA antibody data, HLA antibody markup language (HAML), initially created by Eric Spierings, Gottfried Fischer, and Loren Gragert. 

For organ allocation systems, we plan to build informatics tools that analyze and integrate histocompatibility data between donors and recipients to aid in virtual crossmatch assessments. We also plan to build tools that would help perform data cleaning/curation for large-scale reanalysis of historical histocompatibility data for research. 

In addition to organ allocation systems, with multiple mismatched unrelated donors (MMUD) becoming more common in hematopoietic stem cell transplantation (HSCT), registries such as National Marrow Donor Program (NMDP) and World Marrow Donor Association (WMDA) are recognizing increased needs to have their donor selection systems capture HLA antibody screen data and utilize it to automatically screen off incompatible donors from the search.  

Our team plans to make all our informatics tools available to the transplant and immunogenetics community to benefit other research consortia, including Clinical Trials in Organ Transplantation (CTOT). 

Milestones in Years: 

  • 2023:  Scripts developed to extract detailed information from histocompatibility lab information systems from leading vendors on antibody screens, intermediate resolution molecular typing, and crossmatch results. 
  • 2024: Publication of HLA antibody markup language (HAML) XML standard 
  • 2025: Publish standards for transmitting histocompatibility data in electronic medical records (HL7 FHIR Orders and Observations implementation guide for communicating HLA antibody data and histocompatibility assessments)   
  • 2026: Test advanced virtual crossmatch tools on histocompatibility for simulated donor and recipient pairings. 

Data Required (number, type of data, inclusion/exclusion criteria): 

  • Historical data on HLA typing (molecular and antigen level), antibody assays, and crossmatch results 
  • Scripts will be provided for extracting and de-identifying detailed data from laboratory information systems of leading vendors 

Samples Required (if applicable, number, type of samples, inclusion/exclusion criteria): 

  • Physical samples are not required for this project. The project will involve secondary data analysis. 

Reagents/additional assays required: 

  • Participants will not be required to run additional assays or utilize reagents. 

Data Infrastructure Required:  

  • Participants will need access to their clinical histocompatibility information systems. 
  • The project will host a web server for providing web-based analysis tools for aiding in virtual cross-matching and user-authenticated access to de-identified datasets.  
HLA-Disease Association Platform (HLA-DAP)

Project name:

HLA-Disease Association Platform (HLA-DAP)

Project leader(s):

  • Dr Da Di (Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong)
  • Dr José Manuel Nunes (Department of Genetics and Evolution, University of Geneva, Switzerland)
  • Dr Janette Kwok (Division of Transplantation and Immunogenetics, Department of Pathology, Queen Mary Hospital, Hong Kong)
  • Dr Katia Gagne (UMR_S 1307 Centre de Recherche en Cancérologie et Immunologie Intégrée Nantes Angers, France)

Detailed project description:

HLA-disease associations have been extensively investigated for over half a century, offering valuable knowledge of human immune function and disease susceptibility. Despite the identification of numerous HLA alleles associated with specific diseases, these findings have not yet been fully translated into clinical applications. This limitation is due in part to the fragmented and disparate nature of existing data, as well as the complexity of functional complementarity between HLA alleles and their interactions with other genetic and environmental factors. Therefore, there is a pressing need to develop a centralised platform that supports genotype data synthesis and analysis.

To address this, the proposed component will introduce a novel, dedicated platform for HLA-disease associations, encompassing a broad spectrum of diseases, including autoimmune disorders, infections, cancers, allergies, and transplantation-related complications. Early efforts to synthesise findings, including those by Tiwari and Terasaki (HLA and Disease Associations, Springer, 1985), and later projects like Allelic Frequency Net (Gonzalez-Galarza et al. Nucleic Acids Research 2020) and HLA-SPREAD (Dholakia et al. BMC Genomics 2022), laid important groundwork.

This component, under the 19th IHIWS framework, will advocate for the collection of both published and unpublished HLA genotype data from past HLA-disease association studies. Authors of previous studies will be invited to share their data to expand the database. In the longer term, HLA-related genes, such as KIR, will also be incorporated. The platform itself aims to facilitate the reformatting, standardisation, and visualisation of data, enabling analyses beyond traditional case-control comparisons. By supporting in-depth investigations into HLA genotype data, an interactive environment will be publicly accessible that allows researchers to search, contribute, and explore data in detail, with a special focus on specific diseases, alleles, or populations.

Building on recent progress in in silico predictions of HLA-peptide binding affinities and improved insights into inter-allelic interactions, the platform will also serve as a bridge between HLA association studies and their clinical applications. It will deepen our understanding of disease mechanisms and provide a valuable resource for training deep learning models, thereby leading to future advances in immunogenetic research.

Milestones in years:

2025:  

  • Collecting genotype data available in publications
  • Calling for the submission of genotype data from prior research
  • Establishing the database structure
  • Initiating data analyses
  • Conducting the first group Zoom meeting with participants in Q4

2026:  

  • Summarising findings
  • Conducting the second group Zoom meeting with participants in Q1
  • Presenting results at 19th IHIWS
  • Launching the database for public access
  • Drafting and preparing manuscripts for publication

Data required (number, type of data, inclusion/exclusion criteria):

The database will synthesise data from diverse studies to provide a comprehensive overview of HLA-disease associations across populations. Historically, case-control studies have been the primary method for investigating this subject, comparing carrier and gene frequencies to identify associations. However, the underlying genotype data used for these comparisons have often not been publicly accessible. Following an extensive literature-based data mining process, this component will issue a call to the HLA community to contribute previously published genotype data from both patient and healthy control groups. By incorporating frequency and genotype data on HLA alleles associated with various diseases, the database will become an invaluable resource for understanding the distribution and impact of these alleles across different populations.

Samples required (if applicable, number, type of samples, inclusion/exclusion criteria):

Not applicable

Reagents/additional assays required:

Not applicable

Data infrastructure required:

Not applicable