quick start guide ================= installation ------------ #. clone the github repo and change into the source folder:: git clone git@github.com:bihealth/swibrid.git cd swibrid #. create a conda environment:: conda env create -f swibrid_env.yaml conda activate swibrid_env #. install ``swibrid``:: pip install . alternatively, use the docker image:: docker run -v $(pwd):/home/swibriduser -u $(id -u):$(id -g) ghcr.io/bihealth/swibrid:latest -h testing ------- for a simple and (relatively) quick end-to-end test, run:: swibrid test this will create two samples with about 1000 synthetic reads in ``input`` and run the pipeline on this data, using a reduced hg38 genome in ``index`` with only the switch region (chr14:105000000-106000000). it will probably take about 5 minutes and produce plots in ``output/read_plots`` and table of summary statistics in ``output/summary`` running your own data --------------------- this assumes you have a ``fastq.gz`` file with sequencing output from minION or PacBio. If samples were multiplexed (e.g., with ONT barcodes), you should set up a sample sheet like so:: BC01 sample1 BC02 sample2 ... and a file with barcode and primer sequences like so:: >BC01 AAGAAAGTTGTCGGTGTCTTTGTG >BC02 TCGATTCCGTTTGTAGTCGTCTGT ... >primer_mu_fw CACCCTTGAAAGTAGCCCATGCCTTCC >primer_alpha_rv CTCAGTCCAACACCCACCACTCC >primer_gamma_rv CTGCCTCCCAGTGTCCTGCATTACTTCTG if you don't have multiplexed data, you should still run the demultiplexing for primer detection; simply set up a dummy sample sheet and the file with primers; all reads will end up in ``undetermined.fastq.gz`` #. set up snakemake and config files in a new directory:: mkdir results cd results swibrid setup #. provide genome (+ index), annotation files in ``index``:: mkdir index cd index # get hg38 genome from UCSC (or elsewhere) wget http://hgdownload.soe.ucsc.edu/goldenpath/hg38/bigZips/hg38.fa.gz gunzip hg38.fa.gz # create LAST index lastdb hg38db hg38.fa # download gene annotation from ENCODE (or elsewhere) wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/gencode.v33.annotation.gtf.gz gunzip gencode.v33.annotation.gtf.gz swibrid get_annotation -i gencode.v33.annotation.gtf -o gencode.v33.annotation.exon.gene_shorted.bed #. create bed file with switch region definitions:: chr14 105588700 105591700 SA2 chr14 105603000 105603500 SE chr14 105626500 105629000 SG4 chr14 105645400 105647900 SG2 chr14 105708900 105712900 SA1 chr14 105743700 105747700 SG1 chr14 105772100 105775600 SG3 chr14 105856100 105861100 SM #. edit (at least) the following entries in the ``config.yaml`` file (make sure that sample names in ``SAMPLES`` all appear in the sample sheet):: INPUT: "path/to/input.fastq.gz" SAMPLE_SHEET: "path/to/sample_sheet.csv" BARCODES_PRIMERS: "path/to/barcodes_primers.fa" SAMPLES: ["sample1","sample2", ...] SWITCH_ANNOTATION: "path/to/switch_regions.bed" #. run the pipeline:: swibrid run -np # for a dry-run swibrid run # for an actual run swibrid run --slurm # submit to slurm swibrid run --unlock # unlock snakemake before restarting an interrupted/killed instance