Getting started with Flexiznam

Creating a project

To create a new project, log into Flexilims and click Add New Stuff > New Project. Once created, add the hexadecimal project ID to the config file in ~/.flexiznam/config.yml.

Adding mice

The best way to add a new mouse to flexilims is to import the data directly from MCMS. This can also be from the command line using flexiznam add-mouse. See flexiznam add-mouse --help for documentation.

It will log in to MCMS, look for a mouse based on it’s name and download a one-line csv with the info about that mouse. It will then read the downloaded file, load it in a pandas Dataframe and delete the file (to make sure it can be re-downloaded without naming issue).

Mice can also be added manually through the Python API using flexiznam.main.add_mouse().

Acquisition and file transfer

This will work only if the acquisition pipeline works as predicted, that means that at acquisition:

  • files are named automatically

  • scanimage creates a directory for each recording

  • file names are: MOUSE_SESSION_RECORDING_PROTOCOL

  • all path are relative to DATAROOT: <DATAROOT>/<MOUSE>/<SESSION>/<RECORDING_PROTOCOL>/...

File transfer is not handled by flexiznam. You should transfer all the data to CAMP first. We will just check that it is available. The path to the camp folder containing the projects must be set in the config file.

The best way to transfer files from a windows computer might be to use robocopy. Mount the CAMP drive and just do:

robocopy <SOURCE_FOLDER> <DESTINATION> /e /z

With /e to copy recursively, including empty directory and /z to use restartable mode (in case connection is lost).

Note

You can also consider:

  • /j to copy using unbuffered I/O (recommended for large files).

  • /copy:DAT to copy data, attributes and timestamps but not ownership or ACL.

  • /move to move instead of copying (delete after successful upload)

Syncing data

The most efficient way to upload data to Flexilims is using a YAML file. Briefly, the YAML file format is:

---

project: <PROJECT>
mouse: <MOUSE>
session: <SESSION>
notes: "optional notes"
recordings:
  <RECORDING>:
    protocol: protocol_name
    timestamp: [optional, in HHMMSS]
    notes: [optional]
    datasets: [optional]
      <DATASET>: [optional]
        type: dataset type (scanimage for instance)
        path: path to the folder containing the dataset

Fortunately, you do not need to type out all the datasets and paths by hand. Instead, you can create a minimal YAML file and then use Flexiznam to fill in the details. See example below:

---

# the name of the project. Must exist in flexiznam configuration
project: "test"
# name of the mouse as on flexilims (if different than mcms)
mouse: "PZAH4.1c"
# session name for flexilims. Is usually SYYYYMMDD but no actual requirement
session: "S20210513"
# list of recordings
recordings:
  # name of the recording. Must be unique for this session but can be anything.
  R182758_SphereCylinder:
    protocol: "SphereCylinder" # protocol type - mandatory
  # another recording in the same session
  R193432_Retinotopy:
    protocol: "Retinotopy" # protocol type

You can also include notes or other optional attributes. This YAML file must be saved in the session folder. The mouse folder must be named like the mouse, the session folder like the session and the recording folder like the recording.

Once all the data are on CAMP, the first step is to validate and autopopulate the YAML:

flexiznam process-yaml --source_yaml "path/to/acq_yaml.yml" --target_yaml "path/to/target_yaml.yml"

This will call the flexiznam.camp.sync_data.parse_yaml() method and create a local copy of the yml called acq_yaml_autogenerated_full_file.yml. If a dataset cannot be located or loaded, the yaml file with contain a warning starting with XXERRORXX. A list of such errors will also be printed on the console. Here is for instance an example output:

Reading example_acquisition_yaml.yml

Found some issues with the yaml:
    - Dataset: `ref_for_motion`
              Could not find dataset "ref_for_motion". Found "PZAH4.1c_S20210513_R181858_SphereCylinder00001, PZAH4.1c_S20210513_R182025_SphereCylinder00001, PZAH4.1c_S20210513_R182758_SphereCylinder00001" instead
    - Dataset: `overview_picture_02`
              Could not find dataset "overview_picture_02". Found "overview00001, overview00002" instead
    - Dataset: `harp_data_csv`
              Dataset not found. Path /Volumes/lab-znamenskiyp/home/shared/projects/3d_vision/Data/ParamLog/R193432_Retinotopy does not exist
Fix manually these errors before uploading to flexilims
Processed yaml saved to example_acquisition_yaml_autogenerated_full_file.yml

Before uploading, one must then manually edit the yaml to fix it. You can call process-yaml on the fixed yaml until there is no error. Finally, you can add the entries to flexilims:

flexiznam yaml-to-flexilims --source_yaml "path/to/processed_yaml.yml"

Querying the database

flexiznam.main provides high-level functions to retrieve and update entries on the database. Methods of flexiznam.main are directly available in the flexiznam namespace.

First, create a Flexilims session by calling get_flexilims_session(). This returns a flexilims.Flexilims object with your authentication credential that you can pass to other methods. The simplest way is to just provide the project name and use the authentication details stored in the config files:

import flexiznam as flz
flz_session = flz.get_flexilims_session(project)

get_entities() is the most generic method and will retrieve any data type, filtered by name, id, origin, or arbitrary attribute. It returns a pandas.DataFrame by default.

get_entity() has the same functionality but expects only a single result and returns a pandas.Series:

exp_session = flz.get_entity(
    datatype='session',
    name=session_name,
    flexilims_session=flz_session
)

Other useful methods include get_children(), which returns all children of a given entity, and get_datasets(), which returns a dictionary containing paths to all datasets of a given type in a given session, for example:

si_datasets = flz.get_datasets(
    exp_session['id'],
    recording_type='two_photon',
    dataset_type='scanimage',
    flexilims_session=flz_session
)

Adding processed datasets

New entries for pre-processed datasets can be added by calling the add_dataset() method. However, this is not recommended.

Instead, when the new processed dataset is created as a child on an existing entity, such as an experimental session or recording, it is best to use the static Dataset.from_origin() method of the Dataset class, found in flexiznam.schema:

from flexiznam.schema import Dataset
suite2p_dataset = Dataset.from_origin(
    project=project,
    origin_type='session',
    origin_id=exp_session['id'],
    dataset_type='suite2p_rois',
    conflicts=conflicts
)

This method will automatically set the flexilims name and path attribute of the new dataset, based on the path attribute of the parent passed by origin_id and return an instance of Dataset. It will also automatically handle conflicts, providing options to append, overwrite, abort or skip if a dataset of a given type is already associated with parent entity.

Note

If using the skip mode of Dataset.from_origin(), will either return a Dataset object corresponding to the existing entry, if it exists, or to a new entry. You can use Dataset.get_flexilims_entry() to check if the entry already exists - it will return None if it does not.

Warning

The output of Dataset.from_origin() is an abstraction of the dataset you would like to create. The method itself does not update the database. It’s a good idea to do this only after the pre-processing step is completed in case of a crash.

You can set any additional attributes using the extra_attributes property of the Dataset object. When ready (i.e. once preprocessing is completed and the output files have been saved), you can push the changes to flexilims by invoking the Dataset.update_flexilims() method of the Dataset object:

suite2p_dataset.update_flexilims(mode='overwrite')