01Overview
Profiler v1.2 accepts tabular omics datasets in multiple formats: CSV, TSV, TXT, XLS, XLSX. The sidebar loader automatically detects separators, encodings, and column naming conventions.
v1.2 extends format support to specialised software exports from proteomics (MaxQuant, DIA-NN, Spectronaut, FragPipe, Proteome Discoverer, Progenesis QI, PEAKS Studio, Perseus), transcriptomics (DESeq2/edgeR, Salmon, kallisto, featureCounts, STAR, HTSeq), and metabolomics (MetaboAnalyst, XCMS, MZmine).
02Column Naming Conventions
You do not need to rename your column to Class. Profiler recognises all of the following:
| Column name | Note |
|---|---|
| Class / class | Standard Profiler name |
| Target / target | Common in ML datasets |
| Condition / condition | Common in proteomics/genomics |
| Label | Generic label column |
| Group | Group/cohort identifier |
| Status | Event status (disease/healthy) |
| Outcome | Clinical outcome |
Numeric values in the Class column (age, dose, survival time) automatically activate regression mode across all supervised learning modules.
Labels samples in PCA/UMAP tooltips, heatmaps and enrichment tables. If absent, Profiler creates Sample_1, Sample_2… automatically.
| Column name | Note |
|---|---|
| ID / id | Standard identifier |
| SampleID / Sample_ID | Combined forms |
| SampleName / Name | Text name |
| Patient / Subject | Clinical datasets |
_meta suffix v1.2Any column ending in _meta is treated as clinical metadata — available as alternative targets and for heatmap/PCA colouring, but excluded from the feature matrix.
| Column name | Typical use |
|---|---|
| treatment_meta | Drug arm / treatment group |
| age_meta | Patient age (numeric or categorical) |
| stage_meta | Disease stage (I, II, III, IV) |
| survival_meta | Overall survival time (months) |
| batch_meta | Batch identifier for ComBat QC |
| time_meta | Time point for longitudinal data |
03Supported Formats
| Extension | Format | Notes |
|---|---|---|
| .csv | Comma-Separated Values | Auto-detected: , ; \t | |
| .tsv | Tab-Separated Values | Tab detected automatically — bug fix v1.2 |
| .txt | Plain text table | Any common delimiter |
| .xlsx / .xls | Excel Workbook | First sheet loaded, openpyxl engine |
| Software | Expected file | Auto-detected by |
|---|---|---|
| MaxQuant | proteinGroups.txt | Columns starting with LFQ intensity |
| DIA-NN | pg_matrix.tsv | Protein.Group column |
| Spectronaut | ProteinReport.tsv | Any column with PG. prefix |
| FragPipe | combined_protein.tsv | Gene col + MaxLFQ Intensity cols |
| Proteome Discoverer | Proteins.txt | Accession + Abundance: F* |
| Perseus | matrix.txt | T:/N:/C: prefix rows |
| Tool | Expected file | Auto-detected by |
|---|---|---|
| DESeq2 / edgeR | counts_matrix.csv | gene_id / gene_name col |
| Salmon | quant.sf | Name + TPM cols |
| kallisto | abundance.tsv | target_id + tpm cols |
| featureCounts | counts.txt | Geneid + Chr cols |
| STAR | ReadsPerGene.out.tab | 4-col + ENS* pattern |
| HTSeq-count | htseq_counts.txt | 2-col + __summary rows |
| Software | Expected file | Notes |
|---|---|---|
| MetaboAnalyst | data_table.csv | Sample-major (rows = samples) |
| XCMS | feature_table.csv | row m/z + row retention time |
| MZmine | feature_table.csv | Same as XCMS format |
04Auto-detect Engine
Profiler reads the first 5 rows and checks column signatures against all registered parsers. The detected format appears in the sidebar as "Detected format: …". Override at any time via the dropdown.
05Format Examples
ID,Class,Protein_A,Protein_B,Protein_C S01,Cancer,1257.3,0.45,8892.1 S02,Healthy,752.8,1.30,4431.0 S03,Cancer,2103.5,0.21,9012.4
_meta columnsID,Class,Protein_A,Protein_B,treatment_meta,age_meta,batch_meta S01,Responder,1257.3,0.45,drug_X,58,batch_1 S02,Non-responder,752.8,1.30,placebo,62,batch_1
ID,Class,Gene_1,Gene_2 P01,2.4,1890.1,554.3 P02,5.7,3041.5,812.0
06Longitudinal Data New in v1.2
Requires a Subject_ID (or Patient / Subject) column for repeated-measures linking, and a Time column (or time_meta) for time point ordering.
ID,Subject_ID,Time,Class,Protein_A,Protein_B,treatment_meta S01_T0,P001,T0,Responder,1257.3,0.45,drug_X S01_T1,P001,T1,Responder,1802.1,0.31,drug_X S02_T0,P002,T0,Non-responder,752.8,1.30,placebo
07Delimiter & Encoding Detection
Profiler counts occurrences of , ; tab | on the header line and picks the most frequent. Encodings tried in order: UTF-8-sig → UTF-8 → Latin-1 → ISO-8859-1 → CP1252. Column names are stripped of invisible characters and stray quotes.
TSV tab-delimiter detection was unreliable in v1.0/v1.1 and is fixed in v1.2.
08Tips & Common Mistakes
_meta columns to stratify QC plots without affecting the feature matrix.09Quick-Start Checklist
- File is CSV, TSV, TXT, XLS or XLSX
- At least one column named
Class/Target/Condition - Each row is one sample; each column (except metadata) is one feature
- Numeric features contain numbers — not text like "Not detected"
- ID column present (or Profiler will create Sample_1, Sample_2…)
- Clinical variables end in
_metafor metadata annotations - For longitudinal:
Subject_IDandTimecolumns present - Missing values are blank or NaN (not zero unless truly zero)
- No formula cells in Excel files