ND: Newly Diagnosed Participants. All ND participants had a baseline assessment within 6 weeks from diagnosis of T1D (based on the ADA criteria, defined as the time at which insulin therapy was started).
UFM: Unaffected Family Members. Participants who have a first-degree relative with T1D and tested positive for Islet autoantibodies (IAb+).
Summary of Available Data & Samples
Description of clinical variables, types of biological samples, and omics data collected.
Demographic and clinical data collected as part of the INNODIA project. In more detail, the eCRF is comprised of
Weight
Height
BMI
BMI SDS
Age
Ethnicity
Country
Date of visit
Age at visit
Weeks from diagnosis (only for ND)
Sample date
Autoantibodies
Glucose reading
HbA1C values
Fasting C-peptide
Fasting glucose
Fasting C-peptide/Glucose ratio
Insuling average daily dose
Insulin dose/kg
MMTT and OGTT metadata (compliance with the MMTT/OGTT protocol)
MMTT and OGTT C-peptide and glucose values
MMT AUC C-peptide
MMT AUC Glucose
Below, all available samples, with volumes, and collection tubes are listed:
Serum: FluidX, 0.5 ml/aliquot
EDTA Plasma: FluidX, 0.2 ml/aliquot
DNA: FluidX (from EDTA plasma), 0.5 ml/aliquot
Lithium-heparin Plasma: Fluidx, 300 ul/aliquot
Urine: FluidX, 1 ml/aliquot
Whole Blood: PAXgene, 10 ml/aliquot
Stool: OMNIgene-GUT, 10 ml/aliquot
All sample types follow INNODIA SOPs. See below for summaries:
Immunomic data: Flow cytometry data as raw FCS files. Extracted cell population counts based on manual gating are available as CSV or Excel files. Gating strategy information is saved as pictures embedded in PDF files.
Genotyping data: Raw genotyping data in PLINK or VCF format and HLA data.
Lipidomic data: The plasma lipidomics data contain a total of 403 molecular lipids from major lipid classes such as glycerolipids, phospholipids and sphingolipids. The samples were analysed at Steno Diabetes Center Copenhagen with liquid-chromatography coupled to quadrupole-time-of-flight mass-spectrometry (UHPLC-QToF-MS) in two complimentary analyses using the positive and negative ion modes (detecting 260 and 143 lipids, respectively).
Metabolomic data: DThe metabolomics data contain 81 metabolites from major metabolite classes such as amino acids, free fatty acids and molecules in energy metabolism. The samples were analysed at Steno Diabetes Center Copenhagen with two-dimensional gas-chromatography coupled to time-of-flight mass-spectrometry (GCxGC-ToF-MS).
Metagenomic data: Stool samples were collected from a subset of individuals participating in the INNODIA Natural History Study, including 98 patients people with newly diagnosed with type 1 diabetes and 198 autoantibody-positive family members. Samples were collected using tubes that preserve DNA at room temperature and then stored at –80°C. Bulk microbial DNA—that is, all DNA from bacteria and other microbes in the sample—was extracted and assessed for quality. The DNA was sequenced using Illumina technology, which generates millions of short DNA reads. These sequences were compared to a large reference database to identify which microbial species were present and what functions they might perform. Both the raw sequencing reads, and the processed microbiome profiles are available for downstream analysis.
Proteomic data: The proteomics data generated for the analysis of INNODIA serum samples at the University of Turku was produced using targeted mass spectrometry. This dataset includes analyses from two consecutive groups of individuals newly diagnosed (ND) with type 1 diabetes: "the first 100" and "the next 150". Additionally, there is data from unaffected 460 first degree relatives (UFM). The data from the ND individuals includes available longitudinal samples collected within 6 weeks of diagnosis, then at 3, 6 and 12 months and single samples from each of the UFMs. There is also data from three QC samples that were periodically measured.
For both selected groups, liquid chromatography (LC) coupled with mass spectrometry (MS) was used for selected reaction monitoring (SRM) analysis. With this approach, measurements were confined to pre-selected targets. As part of a follow-up validation study, the "next 150" measurements were conducted using a different, faster LC system, focusing on fewer protein targets (70 out of 105, plus 7 additional). In total, data were recorded for 250 peptides, with 130 peptides common to both datasets.
Transcriptomic data: The samples analysed were from the "the first 100" and "the next 150" newly diagnosed (ND) INNODIA cohorts. The analysis was carried out at the University of Turku, Finland. The 1st 100 ND sample cohort included 94 patients. Whole blood PAXgene samples were collected at visit 1, within 6 weeks of diagnosis (baseline), and visit 4 (at 12 months after diagnosis), with 46 patients having samples at both time points. The next 150 ND cohort included 155 patients with samples collected at baseline and 12 months after diagnosis. Additionally, the analysis included four whole blood PAXgene INNODIA QC samples, collected from two anonymous donors as per INNODIA SOP.
For both cohort analysis, total RNA, including small RNA fractions, was purified using PAXgene Blood miRNA Kit (PreAnalytix/QIAGEN, Cat# 763134) and following the protocol supplied by the kit manufacturer. Library preparation and sequencing were carried out at the Finnish Functional Genomics Centre (https://bioscience.fi/functional-genomics/services/). Before starting library preparation, ERCC Spike-in control Mix 1 (Invitrogen P/N 4456739) was added to 100 ng RNA according to the kit’s protocol. RNA-seq libraries were prepared using TruSeq stranded mRNA HT kit and protocol # 15031047 (Illumina). Pooled libraries were sequenced on an Illumina NovaSeq 6000 instrument, using 2 × 50 bp (1st 100 cohort) or 2 x 100 bp (next 150 cohort) paired-end sequencing with about 30 million single-end reads per sample
mi/smallRNA data: The study design involved the analysis of two cohorts of Type 1 Diabetes (T1DM) individuals: an initial screening cohort, the INNODIA first cohort, consisting of n=115 T1DM individuals, and a validation cohort, the INNODIA second cohort, consisting of n=147 T1DM individuals. All subjects were followed-up with programmed visits at 3 (visit 2), 6 (visit 3) and 12 months (visit 4) after clinical diagnosis of T1DM. In both cohorts, blood samples were collected to isolate plasma EDTA and analysed at baseline (visit 1). The collected blood samples were processed within 2 hours from blood draw and underwent centrifugation to separate plasma from contaminant cells and platelets. The plasma samples were then aliquoted (200 μL) and stored at -80°C in a centralized biobank (see SOP plasma microRNAs version 5). For the INNODIA first cohort, the plasma samples were subjected to miRNA profiling using two different sequencing platforms: (A) HTG-miRNA Edge Seq on Illumina NextSeq550 platform (High Output kit v2 cat. FC-404-2005) and (B) Small RNA-seq using QIAseq miRNA Library Kit on Illumina NovaSeq 6000 platform [NovaSeq 6000 SP Reagent Kit (100 cycles) cat. 20027464, NovaSeq XP 2-Lane Kit cat. 20021664, Illumina] using the XP protocol applying 75x1 single reads. For the INNODIA second cohort, the plasma samples were exclusively analyzed using Small RNA-seq using QIAseq miRNA Library Kit and Illumina NovaSeq6000 sequencing. HTG-miRNA Edge Seq is a targeted RNAse-protection based assay, designed to detect a total of 2083 miRNAs (miRbase v.21). Small RNA-seq using QIAseq miRNA Library Kit allows the unbiased detection of virtually all small RNAs(< 50nt) included in the plasma sample.
In both the INNODIA first and second cohort, a subset of samples were included as duplicates. Files are reported as sequencing FASTQ files for both Small RNA-seq and HTG-miRNA EdgeSeq.
CGM data: Continuous glucose monitoring data will be hosted in INNODIA database. Additionally, CSV files with custom versions of the database extracts will be available.
C-peptide data: This will include plasma C-peptide (fasting and serial C-peptide during MMTT and OGTT) as well as dried blood spot C-peptide. The Core Biochemical Assay Laboratory (CBAL) in Cambridge was the CORE LABORATORY for all plasma C-peptide measurements as well as for dried blood spot (DBS) C-peptide analyses.
Plasma C-peptide: Fasting plasma C-peptide and serial C-peptide samples taken during MMTT/OGTT were assayed in singleton on a DiaSorin Liaison XL automated immunoassay analyser using a sandwich chemiluminescence immunoassay (Diasorin S.p.A, 13040 Saluggia [VC], Italy).
DBS C-peptide: DBS C-peptide was analysed using an in-house assay based on the Meso Scale Discovery (MSD) electrochemical immunoassay technology. Four 3.2mm dried blood spot discs from DBS quality controls and unknowns were eluted in assay buffer overnight at +2–8°C with shaking and brought to room temperature before proceeding with the immunoassay. Commercial liquid calibrator, serum quality controls and the eluted DBS quality controls and unknowns were added in duplicate to an MSD standard bind plate coated with a mouse monoclonal anti c-peptide capture antibody. After incubation at room temperature and washing, a biotinylated monoclonal mouse anti c-peptide detector antibody was added. After a second incubation and wash, streptavidin Sulpho-TAG was added. After a third incubation and wash, MSD Read Buffer T (diluted 1:2) was added and the plate read using the MSD s600 reader. The C-peptide concentration was calculated using MSD Discovery Workbench software. For DBS samples, this measured concentration was converted to a plasma equivalent using an in-house derived factor.
INNODIA Data & Samples Viewer — Help
INNODIA Data & Samples Viewer — Help
This page explains how to use each tab, what the counts mean, and how filters interact. If you need sample-level or participant-level details, please email data@innodia.org.
Quick start: Choose a tab, set filters if needed, then click Apply (or Show all samples in Samples Explorer). In Data Explorer the only required selection is at least one Longitudinal Variable—demographics are optional.
📊
Data Explorer
High-level counts of how many participants have the selected longitudinal variables available at each visit. You can switch cohort at the top of the app.
Filters & inputs
Choose Data Type:ND (Newly Diagnosed) or UFM (Unaffected Family Members).
Age Range: limits the cohort by age at consent.
Gender / Ethnicity / Country:optional. If you don’t select them, the app uses the full dataset for those dimensions.
Longitudinal Variables (required): pick one or more variables to count (e.g., HbA1c, fasting measures, OGTT/MMTT timepoints, autoantibodies, anthropometrics). The list adapts to the current cohort and to what actually exists in the data.
Complete cases across visitsoptional: if enabled, visit counts are cumulative—i.e., a participant is counted at visit k only if they met the criteria for all visits up to k.
HLA Genotypes: optional per-gene summary (Class I & Class II) for the participants that match your filters.
How to read the counts
Each value box shows individuals with complete data for the selected variables at that visit.
An AAb badge marks visits where autoantibodies are measured in that cohort—ND: visits 1 and 4; UFM: visits 1, 3, 5, 6, 7.
The blue info banner lists the exact criteria used for the counts you’re seeing (cohort, mode, age range, demographics if any, variables, and AAb visit note when relevant).
Note on autoantibodies: the app knows which visits include AAb for each cohort. You don’t need to select AAb variables for the badge to appear—it reflects cohort-specific visit availability.
Omics availability
The Omics Availability tab summarizes, for your filtered participants, how many have each omic type by visit (RNA-seq, miRNA, Proteomics, Metabolomics, Lipidomics, Metagenomics). It groups rows by ND/UFM when both groups are present.
🩺
Clinical Explorer
Counts of participants that satisfy clinical criteria per visit, with optional “complete across visits” mode. Criteria are applied per visit and differ slightly between cohorts.
Filters & criteria
Common
Age at consentrange slider
Gender / Ethnicity / Countryoptional
HbA1c (mmol/mol)optional range
Complete cases across visitsoptional cumulative mode
UFM-specific
Require OGTT Glucose / C-peptide presentoptional
Dysglycemia per visitoptional
Stage per visitI, II, III
Stages use: AAb count ≥ a threshold (default logic uses ≥2 when staging is active) plus glucose rules (fasting mean of −20 and 0, and 120-min values).
ND-specific
Require MMTT Glucose / C-peptide presentoptional
Insulin dose (per kg)optional range
IDAA1c ≤ 9optional
Fasting glucose / C-peptide availableoptional
IDAA1c is computed from HbA1c and insulin dose per kg; fasting glucose is derived from C-peptide divided by the C-pep/glucose ratio when available.
Reading the output
Value boxes report the number of participants that meet all active criteria at each visit.
Enable Complete cases across visits to count participants cumulatively (Visit 3 requires meeting Visits 1–3).
When AAb filters are active, the banner indicates in which visits AAb criteria were applied and which not.
🧪
Samples Explorer
Explore aggregated biobank availability by sample type, visit and branch—including a dedicated C-peptide summary.
Filters & actions
Select Sample Type(s): choose one or more. The list includes a special synthesized type C-peptide from its dedicated summary.
Select Branch: filters the data source (e.g., ND/UFM/PIR where applicable).
Select Visit(s): visit names depend on your current type/branch selection. For C-peptide we use the standardized Visit_Clean field.
Aliquot (C-peptide): when C-peptide is selected you get an extra aliquot picker, grouped by visit.
“Show all samples” behavior: if you click Show all samples with no filters selected:
If no sample type is selected, the table shows all non–C-peptide sample types from the master list.
If you selected only C-peptide and nothing else, it shows the full C-peptide summary.
If you selected both C-peptide and other types, it concatenates both result sets.
Outputs
Summary charts: unique participants per sample type and total aliquots by sample type, faceted by branch.
Samples table: shows visit, branch, sample type, participant counts, and total aliquots. For C-peptide, extra columns appear (participants with 1 / 2 aliquots, selected aliquot name).
🧬
HLA Genotyping Overview
When you select one or more HLA genes in the Data Explorer sidebar, the app shows the number of filtered participants with any genotype typed for the chosen genes, plus Class I / Class II summaries. Genes and columns are inferred dynamically from the HLA dataset.
💡
Tips & troubleshooting
Common tips
You can leave Gender, Ethnicity, and Country empty in Data Explorer—the app will use the full dataset for those.
In Data Explorer, you must pick at least one Longitudinal Variable before clicking Apply Filters.
Value boxes are counts of participants, not raw samples.
If you see “0 matches”
Loosen filters (e.g., turn off Complete cases across visits).