Data organization¶
Datasets¶
Datasets constitute the most basic unit of organization in flexiznam. They
correspond to the dataset
entities on Flexilims and can represent either a
single file or a collection of files. In the former case, the path
attribute
of the Flexilims entry will point to the file itself. In the latter case,
path
will point at the parent directory.
Note
Dataset path
is defined relative to the root directory. See
below for more details.
Dataset entities have the following default attributes:
created: timestamp when the dataset was generated.
is_raw:
'yes'
or'no'
, depending on whether the dataset corresponds to raw or processed data.path: location of the data on CAMP.
dataset_type: string describing the type of data represented by the dataset, e.g.
'scanimage'
,'camera'
, or'ephys'
. Permitted dataset types are listed in config.
In addition, any custom attributes can be specified for individual datasets.
The Dataset
class provides a useful
abstraction for datasets, especially for creating entries for processed data.
See the quick start guide for more details.
You can define your own subclasses of Dataset
to handle import or
loading of different dataset types.
flexiznam.schema.microscopy_data.MicroscopyData
provides a fairly
minimal example.
Directory structure¶
To protect the integrity of raw data and facilitate archiving, raw and processed
data are stored in different directory trees. The paths of the raw and processed
directories are specified in config and can be accessed through
flexiznam.config.PARAMETERS['data_root']['raw']
and
flexiznam.config.PARAMETERS['data_root']['processed']
. Paths of Flexilims are
relative to these directories. The is_raw attribute of the dataset tells us
which directory tree the dataset is stored in. When using the Dataset
class,
the full path can be conveniently retrieved with
Dataset.path_full
property.
Within raw and processed directories, subdirectories will typically correspond to projects, which will in turn contain subdirectories corresponding to individual mice.
In vivo recording and behavioral data¶
Behavioral and recording data is organized in sessions, which may be composed of multiple recordings. Datasets can have either sessions or recordings as the origin. For example, in a two-photon imaging session, all recordings will be segmented together and the dataset containing the resulting ROI will be assigned to the session. However, the ROI traces may be split and assigned to individual recordings.
Sessions¶
In case of in vivo recordings, a session corresponds to neurons recorded together - if you change the imaging field of view or move the electrodes, you would create a new session. The idea is that all the data within a given session can be segmented / spike sorted together.
Sessions should always have a mouse as their origin and should be
stored under <DATA_ROOT>/<PROJECT>/<MOUSE>/<SYYYYMMDD>
. For example, a
session acquired for my_project on 4 July 2021 from mouse BRAC7777.1a would be
stored in <DATA_ROOT>/my_project/BRAC7777.1a/S20210704
. The name of the
session on flexilims will also follow this hierarchy. The example session would
be named BRAC7777.1a_S20210704_0
. The numerical index at the end is added
if multiple sessions are created on the same date - e.g. for imaging in two
different fields of view. Sessions have the following default attributes:
date: YYYY-MM-DD string corresponding to the date of the recording.
path: path to the session on CAMP, used primarily to decide where to store processed datasets. As with datasets, this is relative to
data_root
.
Recordings¶
During a session, you may carry out multiple recordings, which will be stored
as child entities of the session. A recording essentially corresponds to every
time you start acquisition on the microscope or the ephys rig. Recordings are
stored as subdirectories of the session, i.e.
<DATA_ROOT>/<PROJECT>/<MOUSE>/<SYYYYMMDD>/RHHMMSS_PROTOCOL
, where HHMMSS
is the time when the recording was started and PROTOCOL
is a short string
identifying the experimental protocol. Recordings have the following attributes:
protocol: short string describing the experimental protocol, e.g.
retinotopy
, orvisual_cliff
.recording_type: the modality of the recording, one of
two_photon
,widefield
,intrinsic
,ephys
,behaviour
, orunspecified
.path: path to the recording on CAMP, used primarily to decide where to store processed datasets. As with datasets, this is relative to
data_root
.
Ex vivo and other data¶
Ex vivo data are organized through sample entities. Samples are generic
placeholders. They can correspond to, for example, the entire brain, a slide
with multiple tissue sections, a single tissue section, or an LCM cubelet.
Samples can have the mouse as their origin or they can be nested, e.g.
sample slide_20
can contain sample section_5
.
Raw data for samples can be stored directly as a subdirectory of the mouse
(e.g. <DATA_ROOT>/my_project/BRAC7777.1a/brain/
). In this case, the session:
field should be left blank when uploading the YAML file to Flexilims.
Alternatively, samples can also be stored in subdirectories corresponding to acquisition sessions,
e.g. <DATA_ROOT>/my_project/BRAC7777.1a/S20210704/brain/
, for example if you
would like to separate confocal data acquired on different days. In this case the
session:
field should be filled.
This only affects where flexiznam.camp.sync_data.parse_yaml()
searches
for datasets: on Flexilims the samples would still be direct children of the
mouse entity or of other samples.
Note
If datasets for a given sample are acquired across multiple sessions, they would
still have the same sample as their origin. Calling
flexiznam.main.get_children()
for that sample would retrieve them all.
For nested samples, the directory structure should mirror their hierarchy (e.g.
<DATA_ROOT>/my_project/BRAC7777.1a/brain/slide_20/section_5
). It is also
in how samples are named on flexilims - e.g. BRAC777.1a_brain_slide_20_section_5
.
Just like sessions and recordings, samples can have multiple datasets as children.