The Random Forests Classifier in MRIQC

MRIQC is shipped with a random-forests classifier, using the combination of the ABIDE and DS030 datasets as training sample.

To predict the quality labels (0=”accept”, 1=”reject”) on a features table computed by mriqc with the default classifier, the command line is as follows:

mriqc_clf --load-classifier -X aMRIQC.csv -o mypredictions.csv

where aMRIQC.csv is the file T1w.csv generated by the group level run of mriqc.

Building your custom classifier

Custom classifiers can be fitted using the same mriqc_clf tool in fitting mode:

mriqc_clf --train aMRIQC_train.csv labels.csv --log-file

where aMRIQC_train.csv contains the IQMs calculated by mriqc and labels.csv contains the matching ratings assigned by an expert. The labels must be numerical (-1``= exclude, ``0``= doubtful, ``1 = accept). With the flat --multiclass the flags are not binarized. Otherwise 0 and 1 will be mapped to 0 (accept) and -1 will be mapped to 1 (reject).

Removing all arguments of the --train flag we instruct mriqc_clf to run cross-validation for model selection and train the winner model on the ABIDE dataset:

mriqc_clf --train --log-file

Model selection can be followed by testing on a left out dataset using the flag --test. If test is provided empty (without paths to samples and labels), then the default features and labels for ds030 are used:

mriqc_clf --train --test --log-file

The trained classifier can be then used for prediction on unseen data with the command at the top, indicating now which classifier should be used:

mriqc_clf --load-classifier myclassifier.pklz -X aMRIQC.csv -o mypredictions.csv

Predictions are stored as a CSV file, containing the BIDS identifiers as indexing columns and the predicted quality label under the prediction column.

Usage of mriqc_clf

MRIQC model selection and held-out evaluation

usage: mriqc [-h] [--train [TRAIN [TRAIN ...]] | --load-classifier
             [LOAD_CLASSIFIER]] [--test [TEST [TEST ...]]]
             [-X EVALUATION_DATA] [--train-balanced-leaveout] [--multiclass]
             [-P PARAMETERS] [-M {rfc,xgb,svc_lin,svc_rbf}] [--nested_cv]
             [--nested_cv_kfold] [--perm PERM] [-S SCORER]
             [--cv {kfold,loso,balanced-kfold,batch}] [--debug]
             [--log-file [LOG_FILE]] [-v] [--njobs NJOBS] [-t THRESHOLD]

Named Arguments

–train training data tables, X and Y, leave empty for ABIDE.
–load-classifier
 load a previously saved classifier
–test test data tables, X and Y, leave empty for DS030.
-X, –evaluation-data
 classify this CSV table of IQMs
–train-balanced-leaveout
 leave out a balanced, random, sample of training examples
–multiclass, –ms
 do not binarize labels

Options

-P, –parameters
 
-M, –model

Possible choices: rfc, xgb, svc_lin, svc_rbf

model under test

–nested_cv run nested cross-validation before held-out
–nested_cv_kfold
 run nested cross-validation before held-out, using 10-fold split in the outer loop
–perm permutation test: number of permutations
-S, –scorer
–cv Possible choices: kfold, loso, balanced-kfold, batch
–debug
–log-file write log to this file, leave empty for a default log name
-v, –verbose increases log verbosity for each occurence.
–njobs number of jobs
-t, –threshold
 decision threshold of the classifier