Getting started with Flexiznam
===============================
Creating a project
------------------
To create a new project, log into `Flexilims `_
and click **Add New Stuff** > **New Project**. Once created, add the hexadecimal
project ID to the config file in ``~/.flexiznam/config.yml``.
Adding mice
-----------
The best way to add a new mouse to flexilims is to import the data directly from
MCMS. This can also be from the command line using ``flexiznam add-mouse``. See
``flexiznam add-mouse --help`` for documentation.
It will log in to MCMS, look for a mouse based on it's name and download a
one-line csv with the info about that mouse. It will then read the downloaded
file, load it in a pandas Dataframe and delete the file (to make sure it can be
re-downloaded without naming issue).
Mice can also be added manually through the Python API using :py:meth:`flexiznam.main.add_mouse`.
Acquisition and file transfer
-----------------------------
This will work only if the acquisition pipeline works as predicted, that means
that at acquisition:
* files are named automatically
* scanimage creates a directory for each recording
* file names are: ``MOUSE_SESSION_RECORDING_PROTOCOL``
* all path are relative to ``DATAROOT``: ``////...``
File transfer is not handled by flexiznam. You should transfer all the data to
CAMP first. We will just check that it is available. The path to the camp folder
containing the projects must be set in the config file.
The best way to transfer files from a windows computer might be to use
`robocopy `_.
Mount the CAMP drive and just do::
robocopy /e /z
With ``/e`` to copy recursively, including empty directory and ``/z`` to use
restartable mode (in case connection is lost).
.. note::
You can also consider:
* ``/j`` to copy using unbuffered I/O (recommended for large files).
* ``/copy:DAT`` to copy data, attributes and timestamps but not ownership or ACL.
* ``/move`` to move instead of copying (delete after successful upload)
Syncing data
------------
The most efficient way to upload data to Flexilims is using a YAML file.
Briefly, the YAML file format is:
.. literalinclude:: ../../flexiznam/camp/yaml_format.yml
:language: yaml
Fortunately, you do not need to type out all the datasets and paths by hand.
Instead, you can create a minimal YAML file and then use Flexiznam to fill in
the details. See example below:
.. literalinclude:: ../../flexiznam/camp/minimal_example_acquisition_yaml.yml
:language: yaml
You can also include notes or other optional attributes. This YAML file must be
saved in the session folder. The mouse folder must be named like the mouse, the
session folder like the session and the recording folder like the recording.
Once all the data are on CAMP, the first step is to validate and autopopulate
the YAML::
flexiznam process-yaml --source_yaml "path/to/acq_yaml.yml" --target_yaml "path/to/target_yaml.yml"
This will call the :py:meth:`flexiznam.camp.sync_data.parse_yaml` method
and create a local copy of the yml called ``acq_yaml_autogenerated_full_file.yml``.
If a dataset cannot be located or loaded, the yaml file with contain a warning
starting with ``XXERRORXX``. A list of such errors will also be printed on the
console. Here is for instance an example output::
Reading example_acquisition_yaml.yml
Found some issues with the yaml:
- Dataset: `ref_for_motion`
Could not find dataset "ref_for_motion". Found "PZAH4.1c_S20210513_R181858_SphereCylinder00001, PZAH4.1c_S20210513_R182025_SphereCylinder00001, PZAH4.1c_S20210513_R182758_SphereCylinder00001" instead
- Dataset: `overview_picture_02`
Could not find dataset "overview_picture_02". Found "overview00001, overview00002" instead
- Dataset: `harp_data_csv`
Dataset not found. Path /Volumes/lab-znamenskiyp/home/shared/projects/3d_vision/Data/ParamLog/R193432_Retinotopy does not exist
Fix manually these errors before uploading to flexilims
Processed yaml saved to example_acquisition_yaml_autogenerated_full_file.yml
Before uploading, one must then manually edit the yaml to fix it. You can call
``process-yaml`` on the fixed yaml until there is no error. Finally, you can add
the entries to flexilims::
flexiznam yaml-to-flexilims --source_yaml "path/to/processed_yaml.yml"
Querying the database
---------------------
:py:mod:`flexiznam.main` provides high-level functions to retrieve and update
entries on the database. Methods of :py:mod:`flexiznam.main` are directly available
in the :py:mod:`flexiznam` namespace.
.. py:currentmodule:: flexiznam.main
First, create a Flexilims session by calling
:py:meth:`get_flexilims_session`. This returns a :py:class:`flexilims.Flexilims`
object with your authentication credential that you can pass to other methods.
The simplest way is to just provide the project name and use the authentication
details stored in the config files::
import flexiznam as flz
flz_session = flz.get_flexilims_session(project)
:py:meth:`get_entities` is the most generic method and will retrieve
any data type, filtered by name, id, origin, or arbitrary attribute. It returns
a :py:class:`pandas.DataFrame` by default.
:py:meth:`get_entity` has the same functionality but expects only
a single result and returns a :py:class:`pandas.Series`::
exp_session = flz.get_entity(
datatype='session',
name=session_name,
flexilims_session=flz_session
)
Other useful methods include :py:meth:`get_children`, which returns
all children of a given entity, and :py:meth:`get_datasets`, which
returns a dictionary containing paths to all datasets of a given type in a given
session, for example::
si_datasets = flz.get_datasets(
exp_session['id'],
recording_type='two_photon',
dataset_type='scanimage',
flexilims_session=flz_session
)
Adding processed datasets
-------------------------
New entries for pre-processed datasets can be added by calling the
:py:meth:`add_dataset` method. However, this is not recommended.
.. py:currentmodule:: flexiznam.schema.datasets
Instead, when the new processed dataset is created as a child on an existing
entity, such as an experimental session or recording, it is best to use the
static :py:meth:`Dataset.from_origin` method of the
:py:class:`Dataset` class, found in :py:mod:`flexiznam.schema`::
from flexiznam.schema import Dataset
suite2p_dataset = Dataset.from_origin(
project=project,
origin_type='session',
origin_id=exp_session['id'],
dataset_type='suite2p_rois',
conflicts=conflicts
)
This method will automatically set the flexilims name and path attribute of the
new dataset, based on the path attribute of the parent passed by `origin_id`
and return an instance of :py:class:`Dataset`. It will
also automatically handle conflicts, providing options to `append`, `overwrite`,
`abort` or `skip` if a dataset of a given type is already associated with parent
entity.
.. note::
If using the `skip` mode of :py:meth:`Dataset.from_origin`,
will either return a :py:class:`Dataset`
object corresponding to the existing entry, if it exists, or to a new entry.
You can use :py:meth:`Dataset.get_flexilims_entry`
to check if the entry already exists - it will return `None` if it does not.
.. warning::
The output of :py:meth:`Dataset.from_origin` is an abstraction
of the dataset you *would like to create*. The method itself does not update
the database. It's a good idea to do this only after the pre-processing step
is completed in case of a crash.
You can set any additional attributes using the `extra_attributes` property of
the `Dataset` object. When ready (i.e. once preprocessing is completed and the
output files have been saved), you can push the changes to flexilims by invoking the
:py:meth:`Dataset.update_flexilims` method of the
`Dataset` object::
suite2p_dataset.update_flexilims(mode='overwrite')