The Delattre teams studies, Mesorhabditis worms, some species of which present atypical reproduction mechanisms.
In a previous paper: Males as somatic investment in a parthenogenetic nematode DOI: 10.1126/science.aau0099, we caracterized contigs of a de novo genome assembly of M. belaris as
From raw sequencing data of male and female individuals, we want to identify \(k\)-mers corresponding to :
A nextflow pipeline to analyse the \(k\)-mer content of fastq files
preprocess the fastq files
Important for the clustering analysis:
Important for the \(k\)-mer counting
fastkmers -k 12 file.fastq > file.csv
Run a sliding windows of size \(12\) by step of \(1\) along the reads counting all the occurrences of each \(k\)-mers
We have the letters: \(A,C,T,G\) and \(N\)
We split the fastq files into \(\sim\) \(1400\) subfiles of \(10^6\) reads.
\(\sim\) \(1400\) splits of \(10^6\) reads \(\rightarrow\) \(1400\) csv files
mergekmer a small rust programme that build a sufix-tree of the \(k\)-mer
mergekmer a small rust programme that build a sufix-tree of the \(k\)-mer
merge fastkmers output
Usage: mergekmer [OPTIONS] --output <OUTPUT>
Options:
-c, --csv <CSV>... list of csv files
-o, --output <OUTPUT> merged csv file
-c, --collate collate csv file
-h, --help Print help
-V, --version Print version
Each leafs of the tree contains the number of \(k\)-mers
The tree traversal is easy to compute with a recursive function
we can merge the counts of a given sex for each specie
merge fastkmers output
Usage: mergekmer [OPTIONS] --output <OUTPUT>
Options:
-c, --csv <CSV>... list of csv files
-o, --output <OUTPUT> merged csv file
-c, --collate collate csv file
-h, --help Print help
-V, --version Print version
In the --collate
version earch leave contain a list of the count of the \(k\)-mers in the female or male of a specie
we can fuse the counts of the male and female for each specie
We have the following possible models
data
XY model
XO model
OO model
Bayesian information criterion (BIC)
Loglikelihood
A nextflow pipeline to analyse the \(k\)-mer content of fastq files
laurent.modolo@ens-lyon.fr