Natural History Samples Catalogue

Definitions:

ND: Newly Diagnosed Participants. All ND participants had a baseline assessment within 6 weeks from diagnosis of T1D (based on the ADA criteria, defined as the time at which insulin therapy was started).

UFM: Unaffected Family Members. Participants who have a first-degree relative with T1D and tested positive for Islet autoantibodies (IAb+).

ND Participants Visit 1: Baseline < 6 weeks from diagnosis Visit 2 3 months Visit 3 6 months Visit 4 12 months Visit 5 24 months UFM Participants Visit 1: Baseline within 3 months after IAb result Visit 2 6 months Visit 3 12 months Visit 4 18 months Visit 5 24 months Visit 6 36 months Visit 7 48 months

Summary of Available Data & Samples

Description of clinical variables, types of biological samples, and omics data collected.

Demographic and clinical data collected as part of the INNODIA project. In more detail, the eCRF is comprised of

  • Weight
  • Height
  • BMI
  • BMI SDS
  • Age
  • Ethnicity
  • Country
  • Date of visit
  • Age at visit
  • Weeks from diagnosis (only for ND)
  • Sample date
  • Autoantibodies
  • Glucose reading
  • HbA1C values
  • Fasting C-peptide
  • Fasting glucose
  • Fasting C-peptide/Glucose ratio
  • Insuling average daily dose
  • Insulin dose/kg
  • MMTT and OGTT metadata (compliance with the MMTT/OGTT protocol)
  • MMTT and OGTT C-peptide and glucose values
  • MMT AUC C-peptide
  • MMT AUC Glucose

Below, all available samples, with volumes, and collection tubes are listed:

  • Serum: FluidX, 0.5 ml/aliquot
  • EDTA Plasma: FluidX, 0.2 ml/aliquot
  • DNA: FluidX (from EDTA plasma), 0.5 ml/aliquot
  • Lithium-heparin Plasma: Fluidx, 300 ul/aliquot
  • Urine: FluidX, 1 ml/aliquot
  • Whole Blood: PAXgene, 10 ml/aliquot
  • Stool: OMNIgene-GUT, 10 ml/aliquot

All sample types follow INNODIA SOPs. See below for summaries:

  • Immunomic data: Flow cytometry data as raw FCS files. Extracted cell population counts based on manual gating are available as CSV or Excel files. Gating strategy information is saved as pictures embedded in PDF files.
  • Genotyping data: Raw genotyping data in PLINK or VCF format and HLA data.
  • Lipidomic data: The plasma lipidomics data contain a total of 403 molecular lipids from major lipid classes such as glycerolipids, phospholipids and sphingolipids. The samples were analysed at Steno Diabetes Center Copenhagen with liquid-chromatography coupled to quadrupole-time-of-flight mass-spectrometry (UHPLC-QToF-MS) in two complimentary analyses using the positive and negative ion modes (detecting 260 and 143 lipids, respectively).
  • Metabolomic data: DThe metabolomics data contain 81 metabolites from major metabolite classes such as amino acids, free fatty acids and molecules in energy metabolism. The samples were analysed at Steno Diabetes Center Copenhagen with two-dimensional gas-chromatography coupled to time-of-flight mass-spectrometry (GCxGC-ToF-MS).
  • Metagenomic data: Stool samples were collected from a subset of individuals participating in the INNODIA Natural History Study, including 98 patients people with newly diagnosed with type 1 diabetes and 198 autoantibody-positive family members. Samples were collected using tubes that preserve DNA at room temperature and then stored at –80°C. Bulk microbial DNA—that is, all DNA from bacteria and other microbes in the sample—was extracted and assessed for quality. The DNA was sequenced using Illumina technology, which generates millions of short DNA reads. These sequences were compared to a large reference database to identify which microbial species were present and what functions they might perform. Both the raw sequencing reads, and the processed microbiome profiles are available for downstream analysis.
  • Proteomic data: The proteomics data generated for the analysis of INNODIA serum samples at the University of Turku was produced using targeted mass spectrometry. This dataset includes analyses from two consecutive groups of individuals newly diagnosed (ND) with type 1 diabetes: "the first 100" and "the next 150". Additionally, there is data from unaffected 460 first degree relatives (UFM). The data from the ND individuals includes available longitudinal samples collected within 6 weeks of diagnosis, then at 3, 6 and 12 months and single samples from each of the UFMs. There is also data from three QC samples that were periodically measured. For both selected groups, liquid chromatography (LC) coupled with mass spectrometry (MS) was used for selected reaction monitoring (SRM) analysis. With this approach, measurements were confined to pre-selected targets. As part of a follow-up validation study, the "next 150" measurements were conducted using a different, faster LC system, focusing on fewer protein targets (70 out of 105, plus 7 additional). In total, data were recorded for 250 peptides, with 130 peptides common to both datasets.
  • Transcriptomic data: The samples analysed were from the "the first 100" and "the next 150" newly diagnosed (ND) INNODIA cohorts. The analysis was carried out at the University of Turku, Finland. The 1st 100 ND sample cohort included 94 patients. Whole blood PAXgene samples were collected at visit 1, within 6 weeks of diagnosis (baseline), and visit 4 (at 12 months after diagnosis), with 46 patients having samples at both time points. The next 150 ND cohort included 155 patients with samples collected at baseline and 12 months after diagnosis. Additionally, the analysis included four whole blood PAXgene INNODIA QC samples, collected from two anonymous donors as per INNODIA SOP. For both cohort analysis, total RNA, including small RNA fractions, was purified using PAXgene Blood miRNA Kit (PreAnalytix/QIAGEN, Cat# 763134) and following the protocol supplied by the kit manufacturer. Library preparation and sequencing were carried out at the Finnish Functional Genomics Centre (https://bioscience.fi/functional-genomics/services/). Before starting library preparation, ERCC Spike-in control Mix 1 (Invitrogen P/N 4456739) was added to 100 ng RNA according to the kit’s protocol. RNA-seq libraries were prepared using TruSeq stranded mRNA HT kit and protocol # 15031047 (Illumina). Pooled libraries were sequenced on an Illumina NovaSeq 6000 instrument, using 2 × 50 bp (1st 100 cohort) or 2 x 100 bp (next 150 cohort) paired-end sequencing with about 30 million single-end reads per sample
  • mi/smallRNA data: The study design involved the analysis of two cohorts of Type 1 Diabetes (T1DM) individuals: an initial screening cohort, the INNODIA first cohort, consisting of n=115 T1DM individuals, and a validation cohort, the INNODIA second cohort, consisting of n=147 T1DM individuals. All subjects were followed-up with programmed visits at 3 (visit 2), 6 (visit 3) and 12 months (visit 4) after clinical diagnosis of T1DM. In both cohorts, blood samples were collected to isolate plasma EDTA and analysed at baseline (visit 1). The collected blood samples were processed within 2 hours from blood draw and underwent centrifugation to separate plasma from contaminant cells and platelets. The plasma samples were then aliquoted (200 μL) and stored at -80°C in a centralized biobank (see SOP plasma microRNAs version 5). For the INNODIA first cohort, the plasma samples were subjected to miRNA profiling using two different sequencing platforms: (A) HTG-miRNA Edge Seq on Illumina NextSeq550 platform (High Output kit v2 cat. FC-404-2005) and (B) Small RNA-seq using QIAseq miRNA Library Kit on Illumina NovaSeq 6000 platform [NovaSeq 6000 SP Reagent Kit (100 cycles) cat. 20027464, NovaSeq XP 2-Lane Kit cat. 20021664, Illumina] using the XP protocol applying 75x1 single reads. For the INNODIA second cohort, the plasma samples were exclusively analyzed using Small RNA-seq using QIAseq miRNA Library Kit and Illumina NovaSeq6000 sequencing. HTG-miRNA Edge Seq is a targeted RNAse-protection based assay, designed to detect a total of 2083 miRNAs (miRbase v.21). Small RNA-seq using QIAseq miRNA Library Kit allows the unbiased detection of virtually all small RNAs(< 50nt) included in the plasma sample. In both the INNODIA first and second cohort, a subset of samples were included as duplicates. Files are reported as sequencing FASTQ files for both Small RNA-seq and HTG-miRNA EdgeSeq.
  • CGM data: Continuous glucose monitoring data will be hosted in INNODIA database. Additionally, CSV files with custom versions of the database extracts will be available.
  • C-peptide data: This will include plasma C-peptide (fasting and serial C-peptide during MMTT and OGTT) as well as dried blood spot C-peptide. The Core Biochemical Assay Laboratory (CBAL) in Cambridge was the CORE LABORATORY for all plasma C-peptide measurements as well as for dried blood spot (DBS) C-peptide analyses.
    • Plasma C-peptide: Fasting plasma C-peptide and serial C-peptide samples taken during MMTT/OGTT were assayed in singleton on a DiaSorin Liaison XL automated immunoassay analyser using a sandwich chemiluminescence immunoassay (Diasorin S.p.A, 13040 Saluggia [VC], Italy).
    • DBS C-peptide: DBS C-peptide was analysed using an in-house assay based on the Meso Scale Discovery (MSD) electrochemical immunoassay technology. Four 3.2mm dried blood spot discs from DBS quality controls and unknowns were eluted in assay buffer overnight at +2–8°C with shaking and brought to room temperature before proceeding with the immunoassay. Commercial liquid calibrator, serum quality controls and the eluted DBS quality controls and unknowns were added in duplicate to an MSD standard bind plate coated with a mouse monoclonal anti c-peptide capture antibody. After incubation at room temperature and washing, a biotinylated monoclonal mouse anti c-peptide detector antibody was added. After a second incubation and wash, streptavidin Sulpho-TAG was added. After a third incubation and wash, MSD Read Buffer T (diluted 1:2) was added and the plate read using the MSD s600 reader. The C-peptide concentration was calculated using MSD Discovery Workbench software. For DBS samples, this measured concentration was converted to a plasma equivalent using an in-house derived factor.
INNODIA Data & Samples Viewer — Help

INNODIA Data & Samples Viewer — Help

This page explains how to use each tab, what the counts mean, and how filters interact. If you need sample-level or participant-level details, please email data@innodia.org.

Quick start: Choose a tab, set filters if needed, then click Apply (or Show all samples in Samples Explorer). In Data Explorer the only required selection is at least one Longitudinal Variable—demographics are optional.

📊

Data Explorer

High-level counts of how many participants have the selected longitudinal variables available at each visit. You can switch cohort at the top of the app.

Filters & inputs

  • Choose Data Type: ND (Newly Diagnosed) or UFM (Unaffected Family Members).
  • Age Range: limits the cohort by age at consent.
  • Gender / Ethnicity / Country: optional. If you don’t select them, the app uses the full dataset for those dimensions.
  • Longitudinal Variables (required): pick one or more variables to count (e.g., HbA1c, fasting measures, OGTT/MMTT timepoints, autoantibodies, anthropometrics). The list adapts to the current cohort and to what actually exists in the data.
  • Complete cases across visits optional: if enabled, visit counts are cumulative—i.e., a participant is counted at visit k only if they met the criteria for all visits up to k.
  • HLA Genotypes: optional per-gene summary (Class I & Class II) for the participants that match your filters.

How to read the counts

  • Each value box shows individuals with complete data for the selected variables at that visit.
  • An AAb badge marks visits where autoantibodies are measured in that cohort—ND: visits 1 and 4; UFM: visits 1, 3, 5, 6, 7.
  • The blue info banner lists the exact criteria used for the counts you’re seeing (cohort, mode, age range, demographics if any, variables, and AAb visit note when relevant).

Note on autoantibodies: the app knows which visits include AAb for each cohort. You don’t need to select AAb variables for the badge to appear—it reflects cohort-specific visit availability.

Omics availability

The Omics Availability tab summarizes, for your filtered participants, how many have each omic type by visit (RNA-seq, miRNA, Proteomics, Metabolomics, Lipidomics, Metagenomics). It groups rows by ND/UFM when both groups are present.

🩺

Clinical Explorer

Counts of participants that satisfy clinical criteria per visit, with optional “complete across visits” mode. Criteria are applied per visit and differ slightly between cohorts.

Filters & criteria

Common

Age at consentrange slider
Gender / Ethnicity / Countryoptional
HbA1c (mmol/mol)optional range
Complete cases across visitsoptional cumulative mode

UFM-specific

Require OGTT Glucose / C-peptide presentoptional
Dysglycemia per visitoptional
Stage per visitI, II, III

Stages use: AAb count ≥ a threshold (default logic uses ≥2 when staging is active) plus glucose rules (fasting mean of −20 and 0, and 120-min values).

ND-specific

Require MMTT Glucose / C-peptide presentoptional
Insulin dose (per kg)optional range
IDAA1c ≤ 9optional
Fasting glucose / C-peptide availableoptional

IDAA1c is computed from HbA1c and insulin dose per kg; fasting glucose is derived from C-peptide divided by the C-pep/glucose ratio when available.

Reading the output

  • Value boxes report the number of participants that meet all active criteria at each visit.
  • Enable Complete cases across visits to count participants cumulatively (Visit 3 requires meeting Visits 1–3).
  • When AAb filters are active, the banner indicates in which visits AAb criteria were applied and which not.
🧪

Samples Explorer

Explore aggregated biobank availability by sample type, visit and branch—including a dedicated C-peptide summary.

Filters & actions

  • Select Sample Type(s): choose one or more. The list includes a special synthesized type C-peptide from its dedicated summary.
  • Select Branch: filters the data source (e.g., ND/UFM/PIR where applicable).
  • Select Visit(s): visit names depend on your current type/branch selection. For C-peptide we use the standardized Visit_Clean field.
  • Aliquot (C-peptide): when C-peptide is selected you get an extra aliquot picker, grouped by visit.

“Show all samples” behavior: if you click Show all samples with no filters selected:

  • If no sample type is selected, the table shows all non–C-peptide sample types from the master list.
  • If you selected only C-peptide and nothing else, it shows the full C-peptide summary.
  • If you selected both C-peptide and other types, it concatenates both result sets.

Outputs

  • Summary charts: unique participants per sample type and total aliquots by sample type, faceted by branch.
  • Samples table: shows visit, branch, sample type, participant counts, and total aliquots. For C-peptide, extra columns appear (participants with 1 / 2 aliquots, selected aliquot name).
🧬

HLA Genotyping Overview

When you select one or more HLA genes in the Data Explorer sidebar, the app shows the number of filtered participants with any genotype typed for the chosen genes, plus Class I / Class II summaries. Genes and columns are inferred dynamically from the HLA dataset.

💡

Tips & troubleshooting

Common tips

  • You can leave Gender, Ethnicity, and Country empty in Data Explorer—the app will use the full dataset for those.
  • In Data Explorer, you must pick at least one Longitudinal Variable before clicking Apply Filters.
  • Value boxes are counts of participants, not raw samples.

If you see “0 matches”

  • Loosen filters (e.g., turn off Complete cases across visits).
  • Broaden age range or remove demographics.
  • In Clinical Explorer, try disabling strict criteria (e.g., IDAA ≤ 9, dysglycemia, staging).
  • In Samples Explorer, clear visit/aliquot selections and try again.

Need sample-level detail, data extracts, or have a question? Contact data@innodia.org.