Data Import Guide v1.2 – Profiler Multi-Omics Platform

01Overview

Profiler v1.2 accepts tabular omics datasets in multiple formats: CSV, TSV, TXT, XLS, XLSX. The sidebar loader automatically detects separators, encodings, and column naming conventions.

v1.2 extends format support to specialised software exports from proteomics (MaxQuant, DIA-NN, Spectronaut, FragPipe, Proteome Discoverer, Progenesis QI, PEAKS Studio, Perseus), transcriptomics (DESeq2/edgeR, Salmon, kallisto, featureCounts, STAR, HTSeq), and metabolomics (MetaboAnalyst, XCMS, MZmine).

02Column Naming Conventions

2.1 Target / Class Column

You do not need to rename your column to Class. Profiler recognises all of the following:

Column name	Note
Class / class	Standard Profiler name
Target / target	Common in ML datasets
Condition / condition	Common in proteomics/genomics
Label	Generic label column
Group	Group/cohort identifier
Status	Event status (disease/healthy)
Outcome	Clinical outcome

Numeric values in the Class column (age, dose, survival time) automatically activate regression mode across all supervised learning modules.

2.2 Sample ID Column

Labels samples in PCA/UMAP tooltips, heatmaps and enrichment tables. If absent, Profiler creates Sample_1, Sample_2… automatically.

Column name	Note
ID / id	Standard identifier
SampleID / Sample_ID	Combined forms
SampleName / Name	Text name
Patient / Subject	Clinical datasets

2.3 Clinical Metadata — _meta suffix v1.2

Any column ending in _meta is treated as clinical metadata — available as alternative targets and for heatmap/PCA colouring, but excluded from the feature matrix.

ID· Class· Protein_AProtein_B· treatment_metaage_metabatch_metastage_meta

Column name	Typical use
treatment_meta	Drug arm / treatment group
age_meta	Patient age (numeric or categorical)
stage_meta	Disease stage (I, II, III, IV)
survival_meta	Overall survival time (months)
batch_meta	Batch identifier for ComBat QC
time_meta	Time point for longitudinal data

03Supported Formats

3.1 Generic Tabular

Extension	Format	Notes
.csv	Comma-Separated Values	Auto-detected: `, ; \t \|`
.tsv	Tab-Separated Values	Tab detected automatically — bug fix v1.2
.txt	Plain text table	Any common delimiter
.xlsx / .xls	Excel Workbook	First sheet loaded, openpyxl engine

3.2 Proteomics — Protein Level

Software	Expected file	Auto-detected by
MaxQuant	proteinGroups.txt	Columns starting with `LFQ intensity`
DIA-NN	pg_matrix.tsv	`Protein.Group` column
Spectronaut	ProteinReport.tsv	Any column with `PG.` prefix
FragPipe	combined_protein.tsv	Gene col + `MaxLFQ Intensity` cols
Proteome Discoverer	Proteins.txt	`Accession` + `Abundance: F*`
Perseus	matrix.txt	`T:/N:/C:` prefix rows

3.3 Transcriptomics — RNA-seq

Tool	Expected file	Auto-detected by
DESeq2 / edgeR	counts_matrix.csv	`gene_id` / `gene_name` col
Salmon	quant.sf	Name + TPM cols
kallisto	abundance.tsv	`target_id` + tpm cols
featureCounts	counts.txt	Geneid + Chr cols
STAR	ReadsPerGene.out.tab	4-col + ENS* pattern
HTSeq-count	htseq_counts.txt	2-col + `__summary` rows

3.4 Metabolomics

Software	Expected file	Notes
MetaboAnalyst	data_table.csv	Sample-major (rows = samples)
XCMS	feature_table.csv	row m/z + row retention time
MZmine	feature_table.csv	Same as XCMS format

04Auto-detect Engine

Profiler reads the first 5 rows and checks column signatures against all registered parsers. The detected format appears in the sidebar as "Detected format: …". Override at any time via the dropdown.

Upload

Any format

Read header

First 5 rows

Match signatures

Column patterns

Format shown

Sidebar display

Parse & load

Or override

05Format Examples

5.1 Minimal classification

ID,Class,Protein_A,Protein_B,Protein_C
S01,Cancer,1257.3,0.45,8892.1
S02,Healthy,752.8,1.30,4431.0
S03,Cancer,2103.5,0.21,9012.4

5.2 With _meta columns

ID,Class,Protein_A,Protein_B,treatment_meta,age_meta,batch_meta
S01,Responder,1257.3,0.45,drug_X,58,batch_1
S02,Non-responder,752.8,1.30,placebo,62,batch_1

5.3 Regression — numeric Class

ID,Class,Gene_1,Gene_2
P01,2.4,1890.1,554.3
P02,5.7,3041.5,812.0

06Longitudinal Data New in v1.2

Requires a Subject_ID (or Patient / Subject) column for repeated-measures linking, and a Time column (or time_meta) for time point ordering.

ID,Subject_ID,Time,Class,Protein_A,Protein_B,treatment_meta
S01_T0,P001,T0,Responder,1257.3,0.45,drug_X
S01_T1,P001,T1,Responder,1802.1,0.31,drug_X
S02_T0,P002,T0,Non-responder,752.8,1.30,placebo

07Delimiter & Encoding Detection

Profiler counts occurrences of , ; tab | on the header line and picks the most frequent. Encodings tried in order: UTF-8-sig → UTF-8 → Latin-1 → ISO-8859-1 → CP1252. Column names are stripped of invisible characters and stray quotes.

TSV tab-delimiter detection was unreliable in v1.0/v1.1 and is fixed in v1.2.

08Tips & Common Mistakes

✓CSV with semicolons (European Excel) — just upload, delimiter auto-detected.

✓Numeric Class column activates regression mode automatically.

✓Use _meta columns to stratify QC plots without affecting the feature matrix.

✓For RNA-seq, upload the raw or normalised count matrix — not DESeq2 results.

✗Do not include formula cells in Excel — convert to values first.

✗Avoid duplicate column names — one will be silently dropped.

✗DESeq2 results tables (log2FC, padj) are not count matrices — use raw counts.

✗Missing values should be blank or NaN — not filled with 0 unless truly zero.

09Quick-Start Checklist

File is CSV, TSV, TXT, XLS or XLSX
At least one column named Class / Target / Condition
Each row is one sample; each column (except metadata) is one feature
Numeric features contain numbers — not text like "Not detected"
ID column present (or Profiler will create Sample_1, Sample_2…)
Clinical variables end in _meta for metadata annotations
For longitudinal: Subject_ID and Time columns present
Missing values are blank or NaN (not zero unless truly zero)
No formula cells in Excel files

Launch Profiler Try Sample Datasets Contact Support