The random forests classifier in MRIQC

MRIQC is shipped with a random-forests classifier, using the combination of the ABIDE and DS030 datasets as training sample.

To predict the quality labels (0=”accept”, 1=”reject”) on a features table computed by mriqc with the default classifier, the command line is as follows:

mriqc_clf --load-classifier -X aMRIQC.csv

where aMRIQC.csv is the file T1w.csv generated by the group level run of mriqc.

Building your custom classifier

Custom classifiers can be fitted using the same mriqc_clf tool in fitting mode:

mriqc_clf --train aMRIQC_train.csv labels.csv --log-file

where aMRIQC_train.csv contains the IQMs calculated by mriqc and labels.csv contains the matching ratings assigned by an expert. The labels must be numerical (-1``= exclude, ``0``= doubtful, ``1 = accept). With the flat --multiclass the flags are not binarized. Otherwise 0 and 1 will be mapped to 0 (accept) and -1 will be mapped to 1 (reject).

Removing all arguments of the --train flag we instruct mriqc_clf to run cross-validation for model selection and train the winner model on the ABIDE dataset:

mriqc_clf --train --log-file

Model selection can be followed by testing on a left out dataset using the flag --test. If test is provided empty (without paths to samples and labels), then the default features and labels for ds030 are used:

mriqc_clf --train --test --log-file

The trained classifier can be then used for prediction on unseen data with the command at the top, indicating now which classifier should be used:

mriqc_clf --load-classifier myclassifier.pklz -X aMRIQC.csv

Predictions are stored as a CSV file, containing the BIDS identifiers as indexing columns and the predicted quality label under the prediction column.

Usage of mriqc_clf

MRIQC model selection and held-out evaluation.

usage: mriqc [-h] [--train [TRAIN [TRAIN ...]] | --load-classifier
             [LOAD_CLASSIFIER]] [--test [TEST [TEST ...]]]
             [-X EVALUATION_DATA] [--train-balanced-leaveout] [--multiclass]
             [-P PARAMETERS] [-M {rfc,xgb,svc_lin,svc_rbf}] [--nested_cv]
             [--nested_cv_kfold] [--perm PERM] [-S SCORER]
             [--cv {kfold,loso,balanced-kfold,batch}] [--debug]
             [--log-file [LOG_FILE]] [-v] [--njobs NJOBS] [-t THRESHOLD]

Named Arguments


training data tables, X and Y, leave empty for ABIDE


load a previously saved classifier


test data tables, X and Y, leave empty for DS030

-X, --evaluation-data

classify this CSV table of IQMs


leave out a balanced, random, sample of training examples

--multiclass, --ms

do not binarize labels


-P, --parameters
-M, --model

Possible choices: rfc, xgb, svc_lin, svc_rbf

model under test


run nested cross-validation before held-out


run nested cross-validation before held-out, using 10-fold split in the outer loop


permutation test: number of permutations

-S, --scorer

Possible choices: kfold, loso, balanced-kfold, batch


write log to this file, leave empty for a default log name

-v, --verbose

increases log verbosity for each occurence.


number of jobs

-t, --threshold

decision threshold of the classifier