More about cpp version

Dependence over the system

The cpp binary file needs to be run on a 64-bit linux system, the latest version preferred in case of version incompatibility. For Mac users, there is no need to concern this, though you still need to install 'samtools' (please see the point below).

Dependence over other programs

If you run directly from the binary file called 'breakdancer_max' in 'cpp' directory, you do not need to download samtools.

Otherwise, i.e., if you want to compile from the source cpp code and generate the binary file by yourself, you need to have 'samtools' installed on your system. Please click 'samtools' on the left bar underneath the links, and download/install samtools. Then open 'Makefile' in the 'cpp' directory, redirect '-I/gsc/pkg/bio/samtools/samtools-0.1.6/' to the place you installed your samtools. Then 'make' again, in which case you generate your own binary, which will overwrite the downloaded one.

Difference with perl version.

There are some functions that cpp version hasn't been fulfilled yet. They are as follows.

  • Cpp version doesn't have bam2cfg yet. Users need to run bam2cfg first to generate the analysis configure file.
  • In BreakDancerMax, cpp version cannot do the following options:
    • -e learn parameters
    • -p prior probability of SV
    • -f use Fisher's method to combine p values

Good to know

The 20100719 version has one more feature: to compute the copy number for each bam or library per SV call. These were estimates computed from the observed number of quality-filtered reads between the breakdancer start and end, divided by expected number of reads in the same interval, where the expected number were based on whole genome read count and under the assumption that reads are randomly distributed. Working well for unique regions, this strategy does not apply well to segmental duplicated regions where the mappability is poor, on which the good solutions is quite rare as we know, but following the mrsFast work as well as Hydra-SV might be a good idea if they apply. Under the default mapping quality cutoff (-q 35) of breakdancer, the copy number estimates turn to underestimate real copy number in segmental duplicated region. Filtering these by applying segmental duplication annotation from UCSC might be a good post-processing method.