Example Datasets – Profiler Multi-Omics Platform

🌐

GitHub Repository

Access Example Datasets & Tutorials

Our GitHub repository contains ready-to-use example datasets for all supported omics types, along with comprehensive tutorials and documentation to help you get started with Profiler.

🌐 Visit GitHub Repository →

📊

Data Loading Options

Profiler supports multiple ways to load your omics data, from raw mass spectrometry files to structured tabular data. Below are the three main loading methods available.

🔧 RAW Data Conversion

Mass Spectrometry RAW Files

Convert vendor-specific RAW files directly within Profiler.

Supported Vendors:

Waters (.raw)
Thermo Fisher (.raw)
Bruker (.d folders)

Output Formats:

mzML, mzXML, mz5, mzDB

Options:
• Mass range selection
• Peak picking
• Lock mass correction (Waters)

📂 MS Standard Formats

Converted MS Files

Load previously converted mass spectrometry files in standard formats.

Accepted Formats:

.mzML
.mzXML

Features:
• Class-based organization
• Peak height threshold filtering
• Automatic binning & extraction

🗂️ Tabular Data

Structured Omics Data

Load pre-processed tabular data from various sources and software.

Accepted Formats:

CSV, XLSX, TXT, TSV

Native Support:

DIA-NN Protein Groups NATIVE
MaxQuant Output NATIVE
Perseus Files NATIVE

📋

Expected Data Formats by Omics Type

1. Proteomics Data

Standard Tabular Format:

| Class     | Protein1 | Protein2 | Protein3 | ... |
|-----------|----------|----------|----------|-----|
| Control   | 1257.5   | 843.2    | 2341.8   | ... |
| Control   | 1189.3   | 891.5    | 2298.4   | ... |
| Tumor     | 2456.7   | 421.9    | 3892.1   | ... |
| Tumor     | 2389.1   | 398.6    | 3756.8   | ... |

Requirements:

First column must be named Class
Feature names (proteins, genes) in column headers
Numeric intensity/abundance values

🎯 DIA-NN Protein Groups NATIVE SUPPORT

Profiler natively supports DIA-NN protein group files. Simply upload the file and select:

Gene names or Protein names as feature identifiers
Profiler will automatically structure the data with the Class column

🎯 MaxQuant Output NATIVE SUPPORT

MaxQuant proteinGroups.txt files are directly supported. Choose between:

Gene names column
Protein names column
Automatic formatting and Class assignment

🎯 Perseus Files NATIVE SUPPORT

Perseus matrix files are supported with feature selection:

T: Gene names row
T: Protein names row
Automatic matrix conversion to Profiler format

2. Metabolomics & Lipidomics Data

Expected Format:

| Class        | Metabolite1 | Lipid_PC_34:1 | Ion_m/z_542.3 | ... |
|--------------|-------------|---------------|---------------|-----|
| Healthy      | 5423.1      | 8932.4        | 1234.5        | ... |
| Healthy      | 5189.7      | 8745.2        | 1198.3        | ... |
| Disease      | 7891.2      | 4532.1        | 2341.7        | ... |
| Disease      | 7654.3      | 4389.6        | 2298.9        | ... |

Supported identifiers:

Metabolite names (e.g., Glucose, Lactate)
Lipid nomenclature (e.g., PC_34:1, TAG_52:3)
m/z values (e.g., mz_542.3201)
Retention time + m/z (e.g., RT_12.5_mz_542.3)

3. Transcriptomics (RNA-seq, Gene Expression)

Expected Format:

| Class     | GENE1  | GENE2  | GENE3  | ... |
|-----------|--------|--------|--------|-----|
| WT        | 145.2  | 89.7   | 523.4  | ... |
| WT        | 132.8  | 94.3   | 498.1  | ... |
| Mutant    | 78.4   | 156.9  | 234.7  | ... |
| Mutant    | 82.1   | 149.2  | 221.5  | ... |

Accepted values:

Raw read counts
TPM (Transcripts Per Million)
FPKM/RPKM values
Normalized expression values

4. Survival Analysis Data

Kaplan-Meier Format:

| Overall survival | State | Class      |
|------------------|-------|------------|
| 12               | 1     | Treatment  |
| 24               | 0     | Treatment  |
| 8                | 1     | Control    |
| 36               | 0     | Control    |

Required columns:

Overall survival: Time in months/days/years
State: Event indicator (0 = censored, 1 = event occurred)
Class: Group/condition for comparison

Cox Regression Format:

| Overall survival | State | Age | BMI  | Protein_X | Lipid_Y | ... |
|------------------|-------|-----|------|-----------|---------|-----|
| 18               | 1     | 67  | 28.5 | 1234.5    | 892.3   | ... |
| 32               | 0     | 54  | 24.1 | 2341.8    | 1023.7  | ... |
| 9                | 1     | 72  | 31.2 | 987.3     | 654.2   | ... |

Required + optional columns:

Overall survival and State (required)
Any additional covariates: clinical variables, omics features, etc.
Can include both numeric and categorical variables

5. Multi-Omics Integration

Integrated Data Format:

| Class   | Protein1 | Protein2 | Metabolite1 | Lipid1  | Gene1 | ... |
|---------|----------|----------|-------------|---------|-------|-----|
| Sample1 | 1257.5   | 843.2    | 5423.1      | 8932.4  | 145.2 | ... |
| Sample2 | 2456.7   | 421.9    | 7891.2      | 4532.1  | 78.4  | ... |

Integration approach:

Combine features from multiple omics datasets
Ensure sample alignment across datasets
Maintain unique feature names (e.g., prefix with data type)
All samples must have the same Class label

💡

Important Notes

✅ General Requirements:

The Class column is mandatory for all tabular data
Feature names must be in column headers (not rows)
All values should be numeric (except Class column)
Missing values are supported (will be handled in preprocessing)
Sample names can be in row index or a separate column

🔧 Automatic Handling:

DIA-NN, MaxQuant, Perseus: Automatically formatted by Profiler
Missing values: Multiple imputation methods available
Normalization: 7 different methods to choose from
Batch effects: NeuroCombat correction available

📥 Download Examples:

Visit our GitHub repository to download example datasets for each omics type, including properly formatted files ready to use with Profiler.

📚 Download Example Datasets →

Example Datasets & Formats