The Random Forests Classifier in MRIQC

MRIQC is shipped with a random-forests classifier, using the combination of the ABIDE and DS030 datasets as training sample.

To predict the quality labels (0=”accept”, 1=”reject”) on a features table computed by mriqc with the default classifier, the command line is as follows:

mriqc_clf --load-classifier -X aMRIQC.csv

where aMRIQC.csv is the file T1w.csv generated by the group level run of mriqc.

Building your custom classifier

Custom classifiers can be fitted using the same mriqc_clf tool in fitting mode:

mriqc_clf --train aMRIQC_train.csv labels.csv --log-file

where aMRIQC_train.csv contains the IQMs calculated by mriqc and labels.csv contains the matching ratings assigned by an expert. The labels must be numerical (-1``= exclude, ``0``= doubtful, ``1 = accept). With the flat --multiclass the flags are not binarized. Otherwise 0 and 1 will be mapped to 0 (accept) and -1 will be mapped to 1 (reject).

Removing all arguments of the --train flag we instruct mriqc_clf to run cross-validation for model selection and train the winner model on the ABIDE dataset:

mriqc_clf --train --log-file

Model selection can be followed by testing on a left out dataset using the flag --test. If test is provided empty (without paths to samples and labels), then the default features and labels for ds030 are used:

mriqc_clf --train --test --log-file

The trained classifier can be then used for prediction on unseen data with the command at the top, indicating now which classifier should be used:

mriqc_clf --load-classifier myclassifier.pklz -X aMRIQC.csv

Predictions are stored as a CSV file, containing the BIDS identifiers as indexing columns and the predicted quality label under the prediction column.

Usage of mriqc_clf

MRIQC model selection and held-out evaluation

usage: mriqc [-h] [--train [TRAIN [TRAIN ...]] | --load-classifier
             [LOAD_CLASSIFIER]] [--test [TEST [TEST ...]]]
             [-X EVALUATION_DATA] [--train-balanced-leaveout] [--multiclass]
             [-P PARAMETERS] [-M {rfc,xgb,svc_lin,svc_rbf}] [--nested_cv]
             [--nested_cv_kfold] [--perm PERM] [-S SCORER]
             [--cv {kfold,loso,balanced-kfold,batch}] [--debug]
             [--log-file [LOG_FILE]] [-v] [--njobs NJOBS] [-t THRESHOLD]

Named Arguments

--train

training data tables, X and Y, leave empty for ABIDE.

--load-classifier

load a previously saved classifier

--test

test data tables, X and Y, leave empty for DS030.

-X, --evaluation-data

classify this CSV table of IQMs

--train-balanced-leaveout

leave out a balanced, random, sample of training examples

--multiclass, --ms

do not binarize labels

Options

-P, --parameters
-M, --model

Possible choices: rfc, xgb, svc_lin, svc_rbf

model under test

--nested_cv

run nested cross-validation before held-out

--nested_cv_kfold

run nested cross-validation before held-out, using 10-fold split in the outer loop

--perm

permutation test: number of permutations

-S, --scorer
--cv

Possible choices: kfold, loso, balanced-kfold, batch

--debug
--log-file

write log to this file, leave empty for a default log name

-v, --verbose

increases log verbosity for each occurence.

--njobs

number of jobs

-t, --threshold

decision threshold of the classifier