@@ -33,8 +33,7 @@ In this document, we are going to first classify data according to their size an
\item{\bf experimental data}: small to huge files
\end{itemize}
The {\bf experimental data} category can seem quite open.
The {\bf experimental data} category can be seen as quite open.
In the data {\bf backup} community, we often further categorize {\bf experimental data} as :
\begin{itemize}
...
...
@@ -47,11 +46,11 @@ The {\bf hot} to {\bf cold} categorization is closely related to the money and e
For all of the above categories, we need to discriminate between {\bf backuped data} and {\bf archived data}.
The data that you are working on can have none to multiple {\bf backup}. An increase in the number of {\bf backup} will increase the resilience and the physical cost of the storage of your data, but also management time spent to update all the copies.
Data that will not change in the future can be {\bf archived}. In this case the data need to be deposited in an archive facility along with the correct {\bf metadata} where it will get a unique identifier and will stay accessible {\it forever} (which requires a potentially large number of multi-site {\bf backup}).
Data that will not change in the future can be {\bf archived}. In this case the data need to be deposited in an archive facility along with the correct {\bf metadata}, where it will get a unique identifier and will stay accessible {\it forever} (which requires a potentially large number of multi-site {\bf backup}).
The \href{https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-data-management/data-management_en.htm
}{H2020 recommendations to make research data findable, accessible, interoperable and reusable ({\bf FAIR})} and encourage the use of data management plans to structure theses metadata.
}{H2020 recommendations to make research data findable, accessible, interoperable and reusable ({\bf FAIR})}, encourage the use of data management plans to structure theses metadata.
Data Management Plans or {\bf DMP}s) are a key element of good data management. A {\bf DMP} describes the data management life cycle for the data to be collected, processed and/or generated. As part of making research data findable, accessible, interoperable and re-usable ({\bf FAIR}), a {\bf DMP} should include information on:
Data Management Plans or {\bf DMP}s) are a key element of good data management. A {\bf DMP} describes the data management life cycle for the data to be collected, processed and/or generated. As part of making research data {\bf FAIR}, a {\bf DMP} should include information on:
\begin{itemize}
\item the handling of research data during & after the end of the project
\item what data will be collected, processed and/or generated
...
...
@@ -62,10 +61,11 @@ Data Management Plans or {\bf DMP}s) are a key element of good data management.
The {\bf DMP} needs to be updated over the course of the project whenever significant changes arise, such as (but not limited to): new data, changes in consortium policies or changes in consortium composition and external factors.
We will now go over the solutions that you have access to, to store, {\bf backup}, and {\bf archive} your {\bf documents}, {\bf codes} and {\bf experimental data}.
\section{Documents}
There are several solutions to backup and share your {\bf documents}:
There are several solutions to {\bfbackup} and share your {\bf documents}:
\subsection{Automatic backup for workstations}
...
...
@@ -80,8 +80,8 @@ In addition to providing you with a {\bf backup} of these folders, you can also
You also have a small history of the last modifications where you can restore a given file to an anterior version.
\begin{itemize}
\item The CNRS provide a synchronization service called \href{https://biowiki.biologie.ens-lyon.fr/doku.php?id=mycore-cnrs}{MyCore}{\bf 100 Gb}, which should be accessible to all members of the LBMC.
\item The UE provides a synchronization service called \href{https://b2drop.eudat.eu}{b2drop} (20 Gb), which should be accessible to all members of the LBMC.
\item The CNRS provide a synchronization service called \href{https://biowiki.biologie.ens-lyon.fr/doku.php?id=mycore-cnrs}{MyCore}({\bf 100 Gb}), which should be accessible to all members of the LBMC.
\item The UE provides a synchronization service called \href{https://b2drop.eudat.eu}{b2drop} ({\bf20 Gb}), which should be accessible to all members of the LBMC.
\end{itemize}
For both services, the data stored can be considered as heavily {\bf backuped} (the data should not be lost on their ends).
...
...
@@ -99,8 +99,8 @@ Your \href{https://biowiki.biologie.ens-lyon.fr/doku.php?id=biodata}{BIODATA} sp
The \href{https://biowiki.biologie.ens-lyon.fr/doku.php?id=biodata}{BIODATA} storage space is managed by the ENS DSI and allows you to store raw data, directly from scientific platforms. Each team has access to two folders:
\begin{itemize}
\item{\bfnameofteam/}: 2To, with daily snapshots on another server in the SLING room
\item{\bfnameofteam2/}: 15 To, {\bf backuped} monthly by Stéphane
\item\texttt{nameofteam/}: ({\bf2To} for the LBMC), with daily snapshots on another server in the SLING room
\item\textttt{nameofteam2/}: ({\bf 15To} for the LBMC), {\bf backuped} monthly by Stéphane
\end{itemize}
You team can buy more storage, to add to \href{https://biowiki.biologie.ens-lyon.fr/doku.php?id=biodata}{BIODATA}.
...
...
@@ -115,7 +115,7 @@ Your documentation is also a valuable set of files.
All LBMC members have access to the \href{http://www.ens-lyon.fr/LBMC/intranet/services-communs/pole-bioinformatique/ressources/gitlab}{Gitbio} server to back up and share their {\bf codes}.
Using, \texttt{git} means that a copy of these files exists at least on your computer (and the computer of every collaborator in the project), on the gitbio server and on the {\bf backup} of the gitbio server (updated every 24h). The details of the {\bf code} and documentation management within your project are developed in \texttt{src} and \texttt{doc} paragraph of the Section 1 of the \href{https://lbmc.gitbiopages.ens-lyon.fr/hub/good_practices/good_practices.html}{guide of good practices}.
Using, {\bfgit} means that a copy of these files exists at least on your computer (and the computer of every collaborator in the project), on the gitbio server and on the {\bf backup} of the gitbio server (updated every 24h). The details of the {\bf code} and documentation management within your project are developed in \texttt{src} and \texttt{doc} paragraph of the Section 1 of the \href{https://lbmc.gitbiopages.ens-lyon.fr/hub/good_practices/good_practices.html}{guide of good practices}.
When using a version control system (see Section 3 of the \href{https://lbmc.gitbiopages.ens-lyon.fr/hub/good_practices/good_practices.html}{guide of good practices}), making regular pushes to the LBMC gitbio server will not only make you gain time to deal with different versions of your project but also save a copy of your {\bf code} on the server.
...
...
@@ -156,8 +156,8 @@ Moreover, your team can buy more storage if needed.
The \href{https://biowiki.biologie.ens-lyon.fr/doku.php?id=biodata}{BIODATA} storage space is managed by the ENS DSI and allows you to store raw data, directly from scientific platforms. Each team has access to two folders:
\begin{itemize}
\item{\bfnameofteam/}: 2To, with daily snapshots on another server in the SLING room
\item{\bfnameofteam2/}: 15 To, {\bf backuped} monthly by Stéphane
\item\texttt{nameofteam/}: ({\bf2To} for the LBMC), with daily snapshots on another server in the SLING room
\item\textttt{nameofteam2/}: ({\bf 15To} for the LBMC), {\bf backuped} monthly by Stéphane