Getting started with Flexiznam¶
Creating a project¶
To create a new project, log into Flexilims
and click Add New Stuff > New Project. Once created, add the hexadecimal
project ID to the config file in ~/.flexiznam/config.yml
.
Adding mice¶
The best way to add a new mouse to flexilims is to import the data directly from
MCMS. This can also be from the command line using flexiznam add-mouse
. See
flexiznam add-mouse --help
for documentation.
It will log in to MCMS, look for a mouse based on it’s name and download a one-line csv with the info about that mouse. It will then read the downloaded file, load it in a pandas Dataframe and delete the file (to make sure it can be re-downloaded without naming issue).
Mice can also be added manually through the Python API using flexiznam.main.add_mouse()
.
Acquisition and file transfer¶
This will work only if the acquisition pipeline works as predicted, that means that at acquisition:
files are named automatically
scanimage creates a directory for each recording
file names are:
MOUSE_SESSION_RECORDING_PROTOCOL
all path are relative to
DATAROOT
:<DATAROOT>/<MOUSE>/<SESSION>/<RECORDING_PROTOCOL>/...
File transfer is not handled by flexiznam. You should transfer all the data to CAMP first. We will just check that it is available. The path to the camp folder containing the projects must be set in the config file.
The best way to transfer files from a windows computer might be to use robocopy. Mount the CAMP drive and just do:
robocopy <SOURCE_FOLDER> <DESTINATION> /e /z
With /e
to copy recursively, including empty directory and /z
to use
restartable mode (in case connection is lost).
Note
You can also consider:
/j
to copy using unbuffered I/O (recommended for large files)./copy:DAT
to copy data, attributes and timestamps but not ownership or ACL./move
to move instead of copying (delete after successful upload)
Syncing data¶
The most efficient way to upload data to Flexilims is using a YAML file. Briefly, the YAML file format is:
---
project: <PROJECT>
mouse: <MOUSE>
session: <SESSION>
notes: "optional notes"
recordings:
<RECORDING>:
protocol: protocol_name
timestamp: [optional, in HHMMSS]
notes: [optional]
datasets: [optional]
<DATASET>: [optional]
type: dataset type (scanimage for instance)
path: path to the folder containing the dataset
Fortunately, you do not need to type out all the datasets and paths by hand. Instead, you can create a minimal YAML file and then use Flexiznam to fill in the details. See example below:
---
# the name of the project. Must exist in flexiznam configuration
project: "test"
# name of the mouse as on flexilims (if different than mcms)
mouse: "PZAH4.1c"
# session name for flexilims. Is usually SYYYYMMDD but no actual requirement
session: "S20210513"
# list of recordings
recordings:
# name of the recording. Must be unique for this session but can be anything.
R182758_SphereCylinder:
protocol: "SphereCylinder" # protocol type - mandatory
# another recording in the same session
R193432_Retinotopy:
protocol: "Retinotopy" # protocol type
You can also include notes or other optional attributes. This YAML file must be saved in the session folder. The mouse folder must be named like the mouse, the session folder like the session and the recording folder like the recording.
Once all the data are on CAMP, the first step is to validate and autopopulate the YAML:
flexiznam process-yaml --source_yaml "path/to/acq_yaml.yml" --target_yaml "path/to/target_yaml.yml"
This will call the flexiznam.camp.sync_data.parse_yaml()
method
and create a local copy of the yml called acq_yaml_autogenerated_full_file.yml
.
If a dataset cannot be located or loaded, the yaml file with contain a warning
starting with XXERRORXX
. A list of such errors will also be printed on the
console. Here is for instance an example output:
Reading example_acquisition_yaml.yml
Found some issues with the yaml:
- Dataset: `ref_for_motion`
Could not find dataset "ref_for_motion". Found "PZAH4.1c_S20210513_R181858_SphereCylinder00001, PZAH4.1c_S20210513_R182025_SphereCylinder00001, PZAH4.1c_S20210513_R182758_SphereCylinder00001" instead
- Dataset: `overview_picture_02`
Could not find dataset "overview_picture_02". Found "overview00001, overview00002" instead
- Dataset: `harp_data_csv`
Dataset not found. Path /Volumes/lab-znamenskiyp/home/shared/projects/3d_vision/Data/ParamLog/R193432_Retinotopy does not exist
Fix manually these errors before uploading to flexilims
Processed yaml saved to example_acquisition_yaml_autogenerated_full_file.yml
Before uploading, one must then manually edit the yaml to fix it. You can call
process-yaml
on the fixed yaml until there is no error. Finally, you can add
the entries to flexilims:
flexiznam yaml-to-flexilims --source_yaml "path/to/processed_yaml.yml"
Querying the database¶
flexiznam.main
provides high-level functions to retrieve and update
entries on the database. Methods of flexiznam.main
are directly available
in the flexiznam
namespace.
First, create a Flexilims session by calling
get_flexilims_session()
. This returns a flexilims.Flexilims
object with your authentication credential that you can pass to other methods.
The simplest way is to just provide the project name and use the authentication
details stored in the config files:
import flexiznam as flz
flz_session = flz.get_flexilims_session(project)
get_entities()
is the most generic method and will retrieve
any data type, filtered by name, id, origin, or arbitrary attribute. It returns
a pandas.DataFrame
by default.
get_entity()
has the same functionality but expects only
a single result and returns a pandas.Series
:
exp_session = flz.get_entity(
datatype='session',
name=session_name,
flexilims_session=flz_session
)
Other useful methods include get_children()
, which returns
all children of a given entity, and get_datasets()
, which
returns a dictionary containing paths to all datasets of a given type in a given
session, for example:
si_datasets = flz.get_datasets(
exp_session['id'],
recording_type='two_photon',
dataset_type='scanimage',
flexilims_session=flz_session
)
Adding processed datasets¶
New entries for pre-processed datasets can be added by calling the
add_dataset()
method. However, this is not recommended.
Instead, when the new processed dataset is created as a child on an existing
entity, such as an experimental session or recording, it is best to use the
static Dataset.from_origin()
method of the
Dataset
class, found in flexiznam.schema
:
from flexiznam.schema import Dataset
suite2p_dataset = Dataset.from_origin(
project=project,
origin_type='session',
origin_id=exp_session['id'],
dataset_type='suite2p_rois',
conflicts=conflicts
)
This method will automatically set the flexilims name and path attribute of the
new dataset, based on the path attribute of the parent passed by origin_id
and return an instance of Dataset
. It will
also automatically handle conflicts, providing options to append, overwrite,
abort or skip if a dataset of a given type is already associated with parent
entity.
Note
If using the skip mode of Dataset.from_origin()
,
will either return a Dataset
object corresponding to the existing entry, if it exists, or to a new entry.
You can use Dataset.get_flexilims_entry()
to check if the entry already exists - it will return None if it does not.
Warning
The output of Dataset.from_origin()
is an abstraction
of the dataset you would like to create. The method itself does not update
the database. It’s a good idea to do this only after the pre-processing step
is completed in case of a crash.
You can set any additional attributes using the extra_attributes property of
the Dataset object. When ready (i.e. once preprocessing is completed and the
output files have been saved), you can push the changes to flexilims by invoking the
Dataset.update_flexilims()
method of the
Dataset object:
suite2p_dataset.update_flexilims(mode='overwrite')