This is a reuse of [Microsplit](https://pypi.org/project/microsplit/) structure. Hi-Classifier is a command-line tool designed to identify, classify, and manage paired reads in BAM files derived from Hi‑C experiments. It follows the logic and structure of the [Parasplit](https://pypi.org/project/parasplit/) tool but is tailored for Micro-C data. The tool reads alignment files (SAM, BAM, or CRAM) using `pysam` and identifies events of soft-clipping or hard-clipping.
This is a reuse of [Microsplit](https://pypi.org/project/microsplit/) structure. Hi-Classifier is a command-line tool designed to identify, classify, and manage paired reads in BAM files derived from Hi‑C experiments. It follows the logic and structure of the [Parasplit](https://pypi.org/project/parasplit/) tool but is tailored for Micro-C data. The tool reads alignment files (SAM, BAM, or CRAM) using `pysam` and identifies events of soft-clipping or hard-clipping.
---
## Why another Hi-C parser ?
Valid/invalid conventions ignore >30% of reads.
Some of these categories—dangling, self-circle, uncut sites—fluctuate with digestion and compaction.
Hi-Classifier is a fast, **site-aware** classifier able to:
* label every pair (cis/trans-up/down, dangling, self-circle, re-joined, other)
* count categories **per restriction site**
* split a BAM into category-specific BAMs for downstream QC
* stream-process paired BAMs (R1/R2) could use ≤8 GB RAM
---
## Features
## Features
-**Parallel Processing**: Microsplit utilizes parallel processing to enhance performance and efficiency.
-**Parallel Processing**: Hi-Classifier utilizes parallel processing to enhance performance and efficiency.
-**Error Margin Handling**: Adds a fixed number of base pairs to new fragments to account for potential over-mapping by Bowtie2, ensuring more accurate downstream analysis.
-**Error Margin Handling**:
-**Output Paired Reads**: Outputs both end-to-end aligned pairs and newly generated fragment pairs.
-**Output Paired Reads**:
## Installation
Hi-Classifier is available on PyPI and can be installed using pip:
## Quick start
```bash
```bash
pip install hi-classifier
hi-classifier \
-1 sample_R1.bam -2 sample_R2.bam \
-f genome.fa --enzyme dpnII \
-o out/prefix --num_threads 8
```
```
## Usage
Outputs
Before using Hi-Classifier, you need to perform an initial alignment of reads using mapper to obtain explicit BAM files. Below is an example of how to use Microsplit from the command line:
```
prefix_counts.tsv # matrix [site, class]
prefix_classified.bam # BAM with CAT tag (optional split per class)