The Random Forests Classifier in MRIQC

MRIQC is shipped with a random-forests classifier, using the combination of the ABIDE and DS030 datasets as training sample.

To predict the quality labels (0=”accept”, 1=”reject”) on a features table computed by mriqc with the default classifier, the command line is as follows:

mriqc_clf --load-classifier -X aMRIQC.csv

where aMRIQC.csv is the file T1w.csv generated by the group level run of mriqc.

Building your custom classifier

Custom classifiers can be fitted using the same mriqc_clf tool in fitting mode:

mriqc_clf --train aMRIQC_train.csv labels.csv --log-file

where aMRIQC_train.csv contains the IQMs calculated by mriqc and labels.csv contains the matching ratings assigned by an expert. The labels must be numerical (-1``= exclude, ``0``= doubtful, ``1 = accept). With the flat --multiclass the flags are not binarized. Otherwise 0 and 1 will be mapped to 0 (accept) and -1 will be mapped to 1 (reject).

Removing all arguments of the --train flag we instruct mriqc_clf to run cross-validation for model selection and train the winner model on the ABIDE dataset:

mriqc_clf --train --log-file

Model selection can be followed by testing on a left out dataset using the flag --test. If test is provided empty (without paths to samples and labels), then the default features and labels for ds030 are used:

mriqc_clf --train --test --log-file

The trained classifier can be then used for prediction on unseen data with the command at the top, indicating now which classifier should be used:

mriqc_clf --load-classifier myclassifier.pklz -X aMRIQC.csv

Predictions are stored as a CSV file, containing the BIDS identifiers as indexing columns and the predicted quality label under the prediction column.

Usage of mriqc_clf

MRIQC model selection and held-out evaluation

usage: mriqc [-h] [--train [TRAIN [TRAIN ...]] | --load-classifier
             [LOAD_CLASSIFIER]] [--test [TEST [TEST ...]]]
             [-X EVALUATION_DATA] [--train-balanced-leaveout] [--multiclass]
             [-P PARAMETERS] [-M {rfc,xgb,svc_lin,svc_rbf}] [--nested_cv]
             [--nested_cv_kfold] [--perm PERM] [-S SCORER]
             [--cv {kfold,loso,balanced-kfold,batch}] [--debug]
             [--log-file [LOG_FILE]] [-v] [--njobs NJOBS] [-t THRESHOLD]

Named Arguments

--train training data tables, X and Y, leave empty for ABIDE.
 load a previously saved classifier
--test test data tables, X and Y, leave empty for DS030.
-X, --evaluation-data
 classify this CSV table of IQMs
 leave out a balanced, random, sample of training examples
--multiclass, --ms
 do not binarize labels


-P, --parameters
-M, --model

Possible choices: rfc, xgb, svc_lin, svc_rbf

model under test

--nested_cv run nested cross-validation before held-out
 run nested cross-validation before held-out, using 10-fold split in the outer loop
--perm permutation test: number of permutations
-S, --scorer
--cv Possible choices: kfold, loso, balanced-kfold, batch
--log-file write log to this file, leave empty for a default log name
-v, --verbose increases log verbosity for each occurence.
--njobs number of jobs
-t, --threshold
 decision threshold of the classifier