Skip to content
Snippets Groups Projects
Commit 7f994cae authored by Ghislain Durif's avatar Ghislain Durif
Browse files

improve DVC tutorial

parent eec33323
Branches
No related tags found
No related merge requests found
...@@ -115,7 +115,7 @@ Note: `-d` option is to set the new remote as the default one. ...@@ -115,7 +115,7 @@ Note: `-d` option is to set the new remote as the default one.
dvc push dvc push
``` ```
### Data version management ## Data version management
1. Current data version 1. Current data version
```bash ```bash
...@@ -130,7 +130,16 @@ outs: ...@@ -130,7 +130,16 @@ outs:
path: datafile.dat path: datafile.dat
``` ```
2. Switch to a previous commit: Note: we can verify that the hash stored in the `data/datafile.dat.dvc` file corresponds to the actual `data/datafile.dat` file:
```bash
md5sum data/datafile.dat
```
```
a0c027223a771d1bb1519e5e5aaaf82c data/datafile.dat
```
2. Switch to a previous git commit:
```bash ```bash
git log --all --graph --oneline --decorate git log --all --graph --oneline --decorate
``` ```
...@@ -149,19 +158,116 @@ git log --all --graph --oneline --decorate ...@@ -149,19 +158,116 @@ git log --all --graph --oneline --decorate
git checkout c8ad6a0 git checkout c8ad6a0
``` ```
3. Switch to corresponding data version 3. Verify version of data:
### Tips ```bash
cat data/datafile.dat.dvc
```
Disable analytics reporting: ```
outs:
- md5: 8c0a82ed58e6152f9b134ba8d272dd42
size: 102400000
hash: md5
path: datafile.dat
```
- locally to a project: Note: at this point, the datafile version does not correspond (discrepancy between the hash stored in the `data/datafile.dat.dvc` file and the actual `data/datafile.dat` file:
```bash ```bash
dvc config core.analytics false md5sum data/datafile.dat
```
```
a0c027223a771d1bb1519e5e5aaaf82c data/datafile.dat
```
```bash
md5sum data/datafile.dat
```
```
a0c027223a771d1bb1519e5e5aaaf82c data/datafile.dat
```
3. Switch to corresponding version of data file:
```bash
dvc checkout
```
```
M data/datafile.dat
```
4. Verify that version of data file:
```bash
md5sum data/datafile.dat
``` ```
- globally: ```
8c0a82ed58e6152f9b134ba8d272dd42 data/datafile.dat
```
## SSH remote
More details here: https://dvc.org/doc/user-guide/data-management/remote-storage/ssh
1. Requirements:
```bash
pip install dvc-ssh
```
2. Add an SSH remote to the DVC repository:
```bash
dvc remote add psmn_ssh ssh://gdurif@psmn-local/home/gdurif/work/dvc-testing-remote
```
3. List DVC remotes:
```bash
dvc remote list
```
```
local /home/drg/work/dev/tmp/test_data_version_control/dvc-testing-remote
psmn_ssh ssh://gdurif@psmn-local/home/gdurif/work/dvc-testing-remote
```
4. Record new remote in git:
```bash ```bash
dvc config --global core.analytics false git diff
```
```
diff --git a/.dvc/config b/.dvc/config
index 60e9772..a1cadea 100644
--- a/.dvc/config
+++ b/.dvc/config
@@ -3,3 +3,5 @@
remote = local
['remote "local"']
url = ../../dvc-testing-remote
+['remote "psmn_ssh"']
+ url = ssh://gdurif@psmn-local/home/gdurif/work/dvc-testing-remote
```
```bash
git add .dvc/config
git commit -m "new DVC remote"
```
5. Push to a given remote:
```bash
dvc push -r ssh_psmn
```
## Tips
Disable [analytics](https://dvc.org/doc/user-guide/analytics) reporting (locally in a repository):
```bash
dvc config core.analytics false
```
Note: add the `--global` option for global configuration as with `git`.
``` ```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment