Defining Experimental Details¶

There are two places where you specify the experimental details for an analysis performed using lyman. General experimental parameters that are constant across a group of subjects go in an experiment file in your lyman directory. You also must define the paradigm information, which can be different for each subject. Both modes of specification are documented here.

The Experiment File¶

Experiments are defined by creating a Python file in your lyman directory called <experiment>.py. This is just a regular Python file that gets imported as a module at runtime.

There is a default set of parameters that will be overridden by the fields in this file; to see these defaults, run the following code:

In [1]: import lyman

In [2]: print(lyman.default_experiment_parameters())

The namespace of the imported experiment file is used to update these values.

Here is a description of each of these fields.

Preprocessing Parameters¶

source_template: A string that, relative to your data directory, provides a path to the raw functional files for your experiment. It should include the string formatting key {subject_id} in place of the subject id in the path. When you have multiple runs of data, replace the run number with a glob wildcard. For instance, your template might look like "{subject_id}/bold/func_*.nii.gz".
whole_brain_template: If your acquisition covered only part of the brain (e.g. you are doing a high-res study), you can also supply a single EPI image in the same slice prescription but with enough slices to cover the full brain. This image will be used to initialize the anatomical coregistration. This field provides a template path to this image, as with the source_template (although you specify only one whole-brain file). If you don’t have such an image, just leave this out of the experiment file.
fieldmap_template: A template string, like the source template, that identifies images that can be used to compute a map of distortions in the field. This isn’t actually meant to be a traditional “fieldmap” image, but rather multiple images with the same slice prescription as the function runs but with one or more images where the phase encoding direction is reversed.
n_runs: An integer with the number of functional runs in the experiment.
TR: A float with the repetition time of the acquisition in seconds.
frames_to_toss: An integer with the number of frames to trim from the beginning of the timeseries.
fieldmap_pe: Directions of phase encoding directions in the fieldmap images, if they exist. For example, ["y", "-y"] means that the fieldmap images have two frames where the first has normal phase encoding on the y axis and the second has reversed phase encoding also on the y axis.
temporal_interp: A boolean that specifies whether slice-time correction should be performed.
interleaved: A string or boolean that specifies whether the data were acquired with an interleaved protocol. Only relevant if temporal_interp is True. The only valid string is "siemens", which indicates a Siemens interleaved sequence (slice order depends on number of slices). Set to True if you are using an interleaved acquisition on a GE scanner.
coreg_init: A string that is either "fsl" or "header". This controls how the boundary-based registration algorithm is initialized. Using "header" may give better performance for partial brain acquisitions, but only if the anatomicals were acquired in the same session as the functionals.
slice_order: A string that is either "up" or "down". This corresponds to the slice acquisition order, and is only relevant if temporal_interp is True.
intensity_threshold: A float specifying the threshold for intensity artifacts. Frames where the median signal intensity across the brain exceeds the grand run median by this many median absolute deviations will be excluded from the model.
spike_threshold: A float specifying the threshold for white noise spike artifacts, or None to avoid detecting spikes (a plot is produced in either case). This is in the same units as the intensity_threshold, but spikes are usually more extreme. Unlike the intensity_threshold, white noise spikes will be unidirectional, but whether they are going to be positive or negative deflections will depend on your scanner. Ask your physicist.
motion_threshold: A float specifying the threshold for motion artifacts. Frames where the total displacement (in mm) relative to the previous frame exceeds this number will be excluded from the model.
wm_components: Number of principle components to use when extracting nuisance timeseries data from deep white matter voxels. This provides additional confound information that can be added during the model fitting (using the confound_sources variable).
smooth_fwhm: A float with the smoothing kernel size for the volume-based SUSAN smoothing. Note that an unsmoothed version of the timeseries is always produced.
hpf_cutoff: A float with the cutoff time (in seconds) for the highpass filter. This value is used in both preprocessing and in the model. It can also be None to skip the highpass filter

Model Parameters¶

design_name: A string used to build the name of the file with paradigm information (see below).
condition_names: A list of strings with condition names. If this is absent or set to None, the sorted unique values in the condition field of the design file are used. Otherwise, the design matrix will include only the conditions named in this list (in the order provided here).
regressor_file: The name of a file containing information about other regressors to add to the timeseries model (see below).
regressor_names: A list of strings that can be used to select specific columns from the regressor file specified above. If None, all columns in the csv file are used.
hrf_model: A string corresponding to the name of the HRF model class. Currently only GammaDifferenceHRF is supported.
temporal_deriv: Boolean specifying whether a derivative regressor should be used in the model for each explanatory variable (these are considered regressors of no interest).
confound_sources: A list of strings that can include "motion", "wm", and "brain". This specifies what information should be added to the “confounds” component of the design matrix. "motion" are the six motion parameters obtained during realignment, "wm" are timeseries data from deep white matter with dimensionality reduced using PCA (controlled by the nuisance_components variable), and "brain" is the mean timecourse over the whole brain mask.
remove_artifacts: Boolean specifying whether indicator vectors should be added to the design matrix to identify (and thus remove) frames that are determined during preprocessing to contain artifacts.
confound_pca: A boolean specifying whether the dimensionality of the confound matrix should be reduced using PCA to include dimensions explaining 99% of the variance.
hrf_params: A dictionary with keyword arguments for the HRF model class.
contrasts: A list of tuples, with one entry per contrast. Each contrast is defined by the 3-tuple (<name>, [<conditions>], [<weights>]). For instance, if you want to test the contrast of hard events vs. easy events, you would use ("hard-easy", ["hard", "easy"], [1, -1]). The conditions must be present in the design, but you do not have to include the names of any conditions not involved in the contrast. If you provided a list of condition names, baseline contrasts are automatically generated for each of these conditions and prepended to this list. Importantly, the contrast names end up in file paths for the analysis results, so you should avoid spaces.
memory_request: An integer with the number of gigabytes of memory to request for model workflow nodes that involve large memory computations. This only applies to submission through a qsub-based distribution plugin.

Group Analysis Parameters¶

flame_mode: A string indicating the type of inference that should be performed in the group model. Options are ols, flame1, and flame12, for ordinary least squares, MAP mixed effects, and full MCMC mixed effects, respectively.
cluster_zthresh: A float indicating the threshold used to initially define clusters in the Z-stat maps during multiple comparisons correction.
grf_pthresh: A float indicating the p value threshold for thresholding the corrected Z-stat images.
peak_distance: A float specifying the minimum distance (in mm) between local minima when finding activation peaks.
surf_name: The name of a Freesurfer surface to plot group results on.
surf_smooth: Extent of spatial smoothing (in mm) to apply after sampling to the surface.
sampling_range: A 3-tuple of floats where where to start, stop and the size of the step (all in sampling_units) when projecting data onto the white surface. This only applies to group analysis in fsaverage space.
sampling_units: A string that is either “frac” or “mm” that makes up part of the specification for projecting results onto the surface manifold (it determines the units of the sampling_range paramters). This only applies to group analysis in fsaverage space.
sampling_method: A string that is either “average”, “max”, or “point” that makes up part of the specification for projecting results onto the surface manifold (it determines how to summarize the samples obtained using sampling_range and sampling_method into a single value at each verex). This only applies to group analysis in fsaverage space.
surf_corr_sign: A string that is either “pos”, “neg”, or “abs” for the sign of the test to run. This only applies to group analysis in fsaverage space.

The parameters that were present in this file at runtime will be saved with the other processing outputs in the preproc and model analysis directories (in a file called experiment_info.json). Any comments in the docstring to this module will be included in this archive. Note that if you preprocess your data, change the experiment definition, and then run the model without rerunning preproc, the preprocessing parameters in this archived file will be inaccurate.

Detailed Design Information¶

The design file¶

You also have to generate a file in csv format for each subject specifying what actually happened during the scan. This file should live at <data_dir>/<subject_id>/design/<design_name>.csv, where design_name is specified in the experiment file. Each row in this file corresponds to an event, where the term “event” is used broadly and can mean a “block” in a block design experiment. At a minimum, the following fields need to be present in this file:

run: 1-based index for the run number.
condition: A string with the condition name for the event.
onset: Onset time (in seconds) of the event. 0s is considered to be the onset of the first frame that is not trimmed (by frames_to_toss in the experiment file).

For example, an extremely basic design might look like this:

run,condition,onset
1,easy,0
1,hard,12
2,easy,0
2,hard,12

Of course, you’ll almost certainly want to write this file using Pandas and not by hand.

This information can be augmented with the following fields:

duration: Duration (in seconds) of the event. If duration is 0 (which is the default), it is assumed to be an “impulse”.
value: A parametric value corresponding to the height of the response. The defualt value is 1.

Additionally, other columns can be included with some parametric value for that event (e.g. reaction time). This information is not used in the timeseries model, but it can be used in decoding analyses to regress confounds out of the data.

A more complete file that will result in the same design as the simple example above would read

run,condition,onset,duration,value,rt
1,easy,0,0,1,0.894
1,hard,12,0,1,1.217
2,easy,0,0,1,0.993
2,hard,12,0,1,1.328

The regressors file¶

A secondary and optional way to add design information uses a regressor file. This file, like the design file, should be a csv and should live at <data_dir>/<subject_id>/design/<regressor_file>.csv, where regressor_file is specified in the experiment file. The format is a csv where column names are regressor names and rows are observations of the regressors at each timepoint in the experiment. Additionally, the file must have a run column, specifying the 1-based run number for each observation. This information is not transformed when building the design matrix beyond de-meaning by run. This is intended to allow the use of, e.g., BOLD timeseries information extracted from seed ROIs for functional connectivity analyses. The regressors are considered elements “of interest” in the design matrix, can be included in contrasts, and contribute to the “main model” R^2 calculation.

An example file for an experiment where each run has 3 TRs and the experimenter is interested in functional connectivity early visual areas might look like

V1,V2,run
1.46,1.55,1
0.80,-0.37,1
-1.91,-1.01,1
-0.65,0.38,2
1.00,1.01,2
-0.88,-2.00,2

Each experiment can take information from at most one regressor file, but you can create multiple regressor files for different experiments. It is also possible to include all possible regressors in a single file and select the specific columns for each experiment using the regressor_names field in the experiment definition.

Specifying Alternate Models¶

You can fit several models to the same preprocessed data, which in lyman is called an altmodel or alternate model. To fit an alternate model, create an experiment file called <experiment>-<altmodel>.py and execute run_fmri.py with the arguments -experiment <experiment> -altmodel <altmodel>. This module “inherits” from the base experiment file, so you only need to include information if it differs from what was previously defined. Note that the experiment parser isn’t smart enough to detect when an altmodel overrides parameters that affect preprocessing.