Documentation · Data Import Guide · v1.2

Importing your
omics data

Profiler v1.2 accepts generic tabular files, specialised proteomics exports, RNA-seq and metabolomics data. All with automatic format detection.

CSV · TSV · XLSX · TXT MaxQuant · DIA-NN · Spectronaut…… Auto-detect Engine

01Overview

Profiler v1.2 accepts tabular omics datasets in multiple formats: CSV, TSV, TXT, XLS, XLSX. The sidebar loader automatically detects separators, encodings, and column naming conventions.

v1.2 extends format support to specialised software exports from proteomics (MaxQuant, DIA-NN, Spectronaut, FragPipe, Proteome Discoverer, Progenesis QI, PEAKS Studio, Perseus), transcriptomics (DESeq2/edgeR, Salmon, kallisto, featureCounts, STAR, HTSeq), and metabolomics (MetaboAnalyst, XCMS, MZmine).

02Column Naming Conventions

2.1 Target / Class Column

You do not need to rename your column to Class. Profiler recognises all of the following:

Column nameNote
Class / classStandard Profiler name
Target / targetCommon in ML datasets
Condition / conditionCommon in proteomics/genomics
LabelGeneric label column
GroupGroup/cohort identifier
StatusEvent status (disease/healthy)
OutcomeClinical outcome

Numeric values in the Class column (age, dose, survival time) automatically activate regression mode across all supervised learning modules.

2.2 Sample ID Column

Labels samples in PCA/UMAP tooltips, heatmaps and enrichment tables. If absent, Profiler creates Sample_1, Sample_2… automatically.

Column nameNote
ID / idStandard identifier
SampleID / Sample_IDCombined forms
SampleName / NameText name
Patient / SubjectClinical datasets
2.3 Clinical Metadata — _meta suffix v1.2

Any column ending in _meta is treated as clinical metadata — available as alternative targets and for heatmap/PCA colouring, but excluded from the feature matrix.

ID· Class· Protein_AProtein_B· treatment_metaage_metabatch_metastage_meta
Column nameTypical use
treatment_metaDrug arm / treatment group
age_metaPatient age (numeric or categorical)
stage_metaDisease stage (I, II, III, IV)
survival_metaOverall survival time (months)
batch_metaBatch identifier for ComBat QC
time_metaTime point for longitudinal data

03Supported Formats

3.1 Generic Tabular
ExtensionFormatNotes
.csvComma-Separated ValuesAuto-detected: , ; \t |
.tsvTab-Separated ValuesTab detected automatically — bug fix v1.2
.txtPlain text tableAny common delimiter
.xlsx / .xlsExcel WorkbookFirst sheet loaded, openpyxl engine
3.2 Proteomics — Protein Level
SoftwareExpected fileAuto-detected by
MaxQuantproteinGroups.txtColumns starting with LFQ intensity
DIA-NNpg_matrix.tsvProtein.Group column
SpectronautProteinReport.tsvAny column with PG. prefix
FragPipecombined_protein.tsvGene col + MaxLFQ Intensity cols
Proteome DiscovererProteins.txtAccession + Abundance: F*
Perseusmatrix.txtT:/N:/C: prefix rows
3.3 Transcriptomics — RNA-seq
ToolExpected fileAuto-detected by
DESeq2 / edgeRcounts_matrix.csvgene_id / gene_name col
Salmonquant.sfName + TPM cols
kallistoabundance.tsvtarget_id + tpm cols
featureCountscounts.txtGeneid + Chr cols
STARReadsPerGene.out.tab4-col + ENS* pattern
HTSeq-counthtseq_counts.txt2-col + __summary rows
3.4 Metabolomics
SoftwareExpected fileNotes
MetaboAnalystdata_table.csvSample-major (rows = samples)
XCMSfeature_table.csvrow m/z + row retention time
MZminefeature_table.csvSame as XCMS format

04Auto-detect Engine

Profiler reads the first 5 rows and checks column signatures against all registered parsers. The detected format appears in the sidebar as "Detected format: …". Override at any time via the dropdown.

Upload
Any format
Read header
First 5 rows
Match signatures
Column patterns
Format shown
Sidebar display
Parse & load
Or override

05Format Examples

5.1 Minimal classification
ID,Class,Protein_A,Protein_B,Protein_C
S01,Cancer,1257.3,0.45,8892.1
S02,Healthy,752.8,1.30,4431.0
S03,Cancer,2103.5,0.21,9012.4
5.2 With _meta columns
ID,Class,Protein_A,Protein_B,treatment_meta,age_meta,batch_meta
S01,Responder,1257.3,0.45,drug_X,58,batch_1
S02,Non-responder,752.8,1.30,placebo,62,batch_1
5.3 Regression — numeric Class
ID,Class,Gene_1,Gene_2
P01,2.4,1890.1,554.3
P02,5.7,3041.5,812.0

06Longitudinal Data New in v1.2

Requires a Subject_ID (or Patient / Subject) column for repeated-measures linking, and a Time column (or time_meta) for time point ordering.

ID,Subject_ID,Time,Class,Protein_A,Protein_B,treatment_meta
S01_T0,P001,T0,Responder,1257.3,0.45,drug_X
S01_T1,P001,T1,Responder,1802.1,0.31,drug_X
S02_T0,P002,T0,Non-responder,752.8,1.30,placebo

07Delimiter & Encoding Detection

Profiler counts occurrences of , ; tab | on the header line and picks the most frequent. Encodings tried in order: UTF-8-sig → UTF-8 → Latin-1 → ISO-8859-1 → CP1252. Column names are stripped of invisible characters and stray quotes.

TSV tab-delimiter detection was unreliable in v1.0/v1.1 and is fixed in v1.2.

08Tips & Common Mistakes

CSV with semicolons (European Excel) — just upload, delimiter auto-detected.
Numeric Class column activates regression mode automatically.
Use _meta columns to stratify QC plots without affecting the feature matrix.
For RNA-seq, upload the raw or normalised count matrix — not DESeq2 results.
Do not include formula cells in Excel — convert to values first.
Avoid duplicate column names — one will be silently dropped.
DESeq2 results tables (log2FC, padj) are not count matrices — use raw counts.
Missing values should be blank or NaN — not filled with 0 unless truly zero.

09Quick-Start Checklist

  • File is CSV, TSV, TXT, XLS or XLSX
  • At least one column named Class / Target / Condition
  • Each row is one sample; each column (except metadata) is one feature
  • Numeric features contain numbers — not text like "Not detected"
  • ID column present (or Profiler will create Sample_1, Sample_2…)
  • Clinical variables end in _meta for metadata annotations
  • For longitudinal: Subject_ID and Time columns present
  • Missing values are blank or NaN (not zero unless truly zero)
  • No formula cells in Excel files
Launch Profiler Try Sample Datasets Contact Support