* Fri Aug 26 2022 Jan Engelhardt <jengelh@inai.de>
- Update to release 1.16
* New plugin `bcftools +variant-distance` to annotate records
with distance to the nearest variant.
* The -i/-e filtering expression gained support for multiple
filters, e.g. `-i FILTER="A;B"`.
- Resolve "file packaged twice" rpmlint warnings
* Thu Apr 21 2022 Ferdinand Thiessen <rpm@fthiessen.de>
- Update to version 1.15.1
* bcftools annotate: New -H, --header-line convenience option to
pass a header line on command line
* bcftools csq: A list of consequence types supported by bcftools
csq has been added to the manual page.
* bcftools +fill-tags:
* Extend generalized functions so that FORMAT tags can be filled
as well
* Allow multiple custom functions in a single run.
* bcftools norm:
* Fix an assertion failure triggered when a faulty VCF file with
a '-' character in the REF allele was used with bcftools
norm --atomize.
* Fix the loss of phasing in half-missing genotypes in variant
atomization
* bcftools roh: Fix a bug that could result in an endless loop or
incorrect AF estimate when missing genotypes are present and
the --estimate-AF - option was used
* bcftools +split-vep: VEP fields with characters disallowed in
VCF tag names by the specification couldn't be queried.
- Update to version 1.15
* New bcftools head subcommand for conveniently displaying the
headers of a VCF or BCF file.
* The -T, --targets-file option had the following bug originating
in HTSlib code
* bcftools annotate:
* In addition to --rename-annots, which requires a file with
name mappings, it is now possible to do the same on the
command line -c NEW_TAG:=OLD_TAG
* Add new option --min-overlap which allows to specify the
minimum required overlap of intersecting regions
* Allow to transfer ALT from VCF with or without replacement
* bcftools convert:
* Revamp of --gensample, --hapsample and --haplegendsample
family of options
* New --3N6 option to output/input the new version of the .gen
file format
* Deprecate the --chrom option in favor of --3N6.
* The CHROM:POS_REF_ALT IDs which are used to detect strand
swaps are required and must appear either in the "SNP ID"
column or the "rsID" column.
* bcftools csq: Allow GFF files with phase column unset
* bcftools filter: New --mask, --mask-file and --mask-overlap
options to soft filter variants in regions
* bcftools +fixref
* The -m id option now works also for non-dbSNP ids
* New -m flip-all mode for flipping all sites
* bcftools isec: Prevent segfault on sites filtered with -i/-e
in all files
* bcftools mpileup: More flexible read filtering using the options
* bcftools query: Make the --samples and --samples-file options
work also in the --list-samples mode.
* bcftools +setGT: Fix a bug in -t q -e EXPR logic applied on
FORMAT fields, sites with all samples failing the expression
EXPR were incorrectly skipped.
* bcftools sort: make use of the TMPDIR environment variable
when defined
* bcftools +trio-dnm2: The --use-NAIVE mode now also adds the
de novo allele in FORMAT/VA
- Update to version 1.14
* New --regions-overlap and --targets-overlap options which
address a long-standing design problem with subsetting VCF
files by region.
* The --output-type option can be used to override the default
compression level
* bcftools annotate:
* when --set-id and --remove are combined, --set-id cannot use
tags deleted by --remove.
* while non-symbolic variation are uniquely identified by
POS,REF,ALT, symbolic alleles starting at the same position
were indistinguishable.
* add a new . modifier to control whether missing values should
be carried over from a tab-delimited file or not.
* bcftools +check-ploidy: by default missing genotypes are not
used when determining ploidy.
* bcftools concat: new --ligate-force and --ligate-warn options
for finer control of -l, --ligate behavior in imperfect overlaps.
* bcftools consensus: Apply mask even when the VCF has no notion
about the chromosome.
* bcftools +contrast: support for chunking within map/reduce
framework allowing to collect NASSOC counts even for empty
case/control sample sets
* bcftools csq:
* bug fix, compound indels were not recognised in some cases
* compound variants were incorrectly marked as 'inframe' even when
stop codon would occur before the frame was restored
* bug fix, FORMAT/BCSQ bitmasks could have been assigned incorrectly
to some samples at multiallelic sites, a superset of the correct
consequences would have been set
* bug fix, the upstream stop could be falsely assigned to all samples
in a multi-sample VCF even if the stop was relevant for a single
sample only
* further improve the detection of mismatching chromosome naming
(e.g. "chrX" vs "X") in the GFF, VCF and fasta files
* bcftools merge: keep (sum) INFO/AN,AC values when merging VCFs
with no samples
* bcftools mpileup: new --indel-size option which allows to increase
the maximum considered indel size considered, large deletions in
long read data are otherwise lost.
* bcftools norm:
* atomization now supports Number=A,R string annotations
* assign as many alternate alleles to genotypes at multiallelic
sites in the-m + mode, disregarding the phase.
* bcftools sort: increase accuracy of the --max-mem option limit,
previously the limit could be exceeded by more than 20%
* bcftools +trio-dnm: new --with-pAD option to allow processing of
VCFs without FORMAT/QS.
* bcftools view: the functionality of the option --compression-level
lost in 1.12 has been restored
- Update to version 1.13
* bcftools annotate:
* Fix rare a bug when INFO/END is present, all INFO fields are
removed with bcftools annotate -x INFO and BCF output is produced.
* Support for matching annotation line by ID, in addition to
CHROM,POS,REF, and ALT
* bcftools csq:
* When GFF and VCF/fasta use a different chromosome naming convention
no consequences would be added.
* Parametrize brief-predictions parameter to allow explicit number
of amino acids to be printed.
* bcftools +fill-tags:
* Generalization and better support for custom functions that allow
adding new INFO tags based on arbitrary -i, --include type of expressions.
* When FORMAT/GT is not present, the INFO/AF tag will be newly
calculated from INFO/AC and INFO/AN.
* bcftools gtcheck:
* Switch between FORMAT/GT or FORMAT/PL when one is (implicitly)
requested but only the other is available
* Improve diagnostics, printing warnings when a line cannot be
matched and the number of lines skipped for various reasons
* bcftools index: The program now accepts both data file name and
the index file name.
* bcftools isec: Always generate sites.txt with isec -p
* bcftools +mendelian: Consider only complete trios,
do not crash on sample name typos
* bcftools mpileup:
* New --seed option for reproducibility of subsampling code in HTSlib
* The SCR annotation which shows the number of soft-clipped reads
now correctly pools reads together regardless of the variant type.
* Major revamp of BAQ.
* Modified scale of Mann-Whitney U tests.
Newly INFO/*Z annotations will be printed
* bcftools norm:
* Fix Type=Flag output in norm --atomize
* Atomization must not discard ALT=. records
* Atomization of AD and QS tags now correctly updates occurrences
of duplicate alleles within different haplotypes
* Fix a bug in atomization of Number=A,R tags
* bcftools reheader: Add -T, --temp-prefix option
* bcftools +setGT: A wider range of genotypes can be set by the
plugin by allowing specifying custom genotypes.
* bcftools +split-vep:
* New -u, --allow-undef-tags option
* Better handling of ambiguous keys such as INFO/AF and CSQ/AD.
* Some consequence field names may not constitute a valid tag name,
such as "pos(1-based)".
* bcftools +tag2tag: New --QR-QA-to-QS option to convert annotations
generated by Freebayes to QS used by BCFtools
* bcftools +trio-dnm:
* Add support for sites with more than four alleles.
* New --use-NAIVE option for a naive DNM calling based solely on
FORMAT/GT and expected Mendelian inheritance.
* Fix behaviour to match the documentation, the --dnm-tag DNG option
now correctly outputs log scaled values by default, not phred scaled.
* Fix bug in VAF calculation, homozygous de novo variants were
incorrectly reported as having VAF=50%
* Fix arithmetic underflow which could lead to imprecise scores
and improve sensitivity in high coverage regions
* Allow combining --pn and --pns to set the noise thresholds independently
- Rebased use_python3.patch
- Drop python3 and perl build requirements, not needed, shbang of
executables can be patched anyway.
* Fri May 14 2021 Ferdinand Thiessen <rpm@fthiessen.de>
- Update to version 1.12
* The output file type is determined from the output file name
suffix, where available, so the -O/--output-type option is often
no longer necessary.
* Make F_MISSING in filtering expressions work for sites with
multiple ALT alleles
* Fix N_PASS and F_PASS to behave according to expectation when
reverse logic is used (#1397). This fix has the side effect of
query (or programs like +trio-stats) behaving differently with
these expressions, operating now in site-oriented rather than
sample-oriented mode.
* bcftools annotate:
* New --rename-annots option to help fix broken VCFs
* New -C option allows to read a long list of options from a file
to prevent very long command lines.
* New append-missing logic allows annotations to be added for
each ALT allele in the same order as they appear in the VCF.
* bcftools concat:
* Do not phase genotypes by mistake if they are not already
phased with -l
* bcftools consensus:
* New --mask-with, --mark-del, --mark-ins, --mark-snv options
* Symbolic <DEL> should have only one REF base. If there are
multiple, take POS+1 as the first deleted base.
* Make consensus work when the first base of the reference genome
is deleted.
* bcftools +contrast:
* The NOVELGT annotation was previously not added when requested.
* bcftools convert:
* Make the --hapsample and --hapsample2vcf options consistent with
each other and with the documentation.
* bcftools call:
* Revamp of call -G, previously sample grouping by population was
not truly independent and could still be influenced by the
presence of other sample groups.
* Optional addition of INFO/PV4 annotation with call -a INFO/PV4
* Remove generation of useless HOB and ICB annotation;
use +fill-tags -- -t HWE,ExcHet instead
* The call -f option was renamed to -a to (1) make it consistent
with mpileup and (2) to indicate that it includes both INFO and
FORMAT annotations
* bcftools csq:
* Fix a bug wich caused incorrect FORMAT/BCSQ formatting at sites
with too many per-sample consequences
* Fix a bug which incorrectly handled the --ncsq parameter and
could clash with reserved BCF values, consequently producing
truncated or even incorrect output of the %TBCSQ formatting
expression in bcftools query.
* bcftools +fill-tags:
* MAF definition revised for multiallelic sites, the second most
common allele is considered to be the minor allele
* New FORMAT/VAF, VAF1 annotations to set the fraction of
alternate reads provided FORMAT/AD is present
* bcftools gtcheck:
* support matching of a single sample against all other samples
in the file with -s qry:sample -s gt:-.
* bcftools merge:
* Make merge -R behavior consistent with other commands and pull
in overlapping records with POS outside of the regions
* Bug fix
* bcftools mpileup:
* Add new optional tag mpileup -a FORMAT/QS
* bcftools norm:
* New -a, --atomize functionality to decompose complex variants,
for example MNVs into consecutive SNVs
* New option --old-rec-tag to indicate the original variant
* bcftools query:
* Incorrect fields were printed in the per-sample output when
subset of samples was requested via -s/-S and the order of
samples in the header was different from the requested -s/-S order
* bcftools +prune:
* New options --random-seed and --nsites-per-win-mode
* bcftools +split-vep:
* Transcript selection now works also on the raw CSQ/BCSQ annotation.
* Bug fix, samples were dropped on VCF input and VCF/BCF output
* bcftools stats:
* Changes to QUAL and ts/tv plotting stats: avoid capping QUAL to
predefined bins, use an open-range logarithmic binning instead
* plot dual ts/tv stats: per quality bin and cumulative as if
threshold applied on the whole dataset
* bcftools +trio-dnm2:
* Major revamp of +trio-dnm plugin, which is now deprecated
and replaced by +trio-dnm2.
* The original trio-dnm calling model used genotype likelihoods
(PLs) as the input for calling.
* This new version also implements the DeNovoGear model.
* For more details see http://samtools.github.io/bcftools/trio-dnm.pdf
- Update use_python3.patch
* Thu May 13 2021 Ferdinand Thiessen <rpm@fthiessen.de>
- Update to version 1.11
* Breaking change in -i/-e expressions on the FILTER column.
The new behaviour is:
Expression Result
FILTER="A" Exact match, for example "A;B" does not pass
FILTER!="A" Exact match, for example "A;B" does pass
FILTER~"A" Both "A" and "A;B" pass
FILTER!~"A" Neither "A" nor "A;B" pass
* Fix in commutative comparison operators, in some cases reversing
sides would produce incorrect results
* Better support for filtering on sample subsests
* bcftools annotate:
* Previously it was not possible to use --columns =TAG with INFO
tags and the --merge-logic feature was restricted to tab files
with BEG,END columns, now extended to work also with REF,ALT.
* Make annotate -TAG/+TAG work also with FORMAT fields.
* ID and FILTER can be transferred to INFO and ID can be populated
from INFO.
* bcftools consensus:
* Fix in handling symbolic deletions and overlapping variants.
* Fix --iupac-codes crash on REF-only positions with ALT=".".
* Fix --chain crash
* Preserve the case of the genome reference.
* Add new -a, --absent option which allows to set positions with
no supporting evidence to "N" (or any other character).
* bcftools convert:
* The option --vcf-ids now works also with -haplegendsample2vcf.
* New option --keep-duplicates
* bcftools csq:
* Add misc/gff2gff.py script for conversion between various
flavors of GFF files. The initial commit supports only one type
* Add missing consequence types.
* Allow overlapping CDS to support ribosomal slippage.
* bcftools +fill-tags:
* Added new annotations: INFO/END, TYPE, F_MISSING.
* bcftools filter:
* Make --SnpGap optionally filter also SNPs close to other variant
types.
* bcftools gtcheck:
* Complete revamp of the command. The new version is faster and allows
N:M sample comparisons, not just 1:N or NxN comparisons. Some
functionality was lost (plotting and clustering) but may be added back
on popular demand.
* bcftools +mendelian:
* Revamp of user options, output VCFs with mendelian errors annotation,
read PED files
* bcftools merge:
* Update headers when appropriate with the '--info-rules *:join'
INFO rule.
* Local alleles merging that produce LAA and LPL when requested, a
draft implementation of samtools/hts-specs#434
* New --no-index which allows to merge unindexed files.
* Fixes in gVCF merging.
* bcftools norm:
* Fixes in --check-ref s reference setting features with non-ACGT bases.
* New --keep-sum switch to keep vector sum constant when splitting
multiallelics.
* bcftools +prune:
* Extend to allow annotating with various LD metrics: r^2, Lewontin's D'
* bcftools query:
* New %N_PASS() formatting expression to output the number of samples
that pass the filtering expression.
* bcftools reheader:
* Improved error reporting to prevent user mistakes.
* bcftools roh:
* The --AF-file description incorrectly suggested "REF\tALT"
instead of the correct "REF,ALT".
* RG lines could have negative length.
* new --include-noalt option to allow also ALT=. records.
* bcftools scatter:
* New plugin intended as a convenient inverse to concat
* bcftools +split:
* New --groups-file option for more flexibility of defining
desired output
* New --hts-opts option to reduce required memory by reusing
one output
header and allow overriding the default hFile's block size
* Add support for multisample output and sample renaming
* bcftools +split-vep:
* Add default types (Integer, Float, String) for VEP subfields
and make --columns - extract all subfields into INFO tags
in one go.
* Tue Feb 25 2020 Pierre Bonamy <flyos@mailoo.org>
- Changed python dependencies from python3 to python3-base and
python3-matplotlib
* Wed Feb 12 2020 Todd R <toddrme2178@gmail.com>
- Add use_python3.patch to switch from python2 to python3
* Wed Feb 05 2020 Todd R <toddrme2178@gmail.com>
- Update to 1.10.2
* This release fixes crashes reported on files including integer
INFO tags with values outside the range officially supported
by VCF. It also fixes a bug where invalid BCF files would be
created if such values were present.
- Update to 1.10.0
+ Numerous bug fixes, usability improvements and sanity checks were added to prevent common user errors.
+ The -r, --regions (and -R, --regions-file) option should never create unsorted VCFs or duplicates records again. This also fixes rare cases where a spanning deletion makes a subsequent record invisible to bcftools isec and other commands.
+ Additions to filtering and formatting expressions
* support for the spanning deletion alternate allele (ALT=*)
* new ILEN filtering expression to be able to filter by indel length
* new MEAN, MEDIAN, MODE, STDEV, phred filtering functions
* new formatting expression %PBINOM (phred-scaled binomial probability), %INFO (the whole INFO column), %FORMAT (the whole FORMAT column), %END (end position of the REF allele), %END0 (0-based end position of the REF allele), %MASK (with multiple files indicates the presence of the site in other files)
+ New plugins
* +gvcfz: compress gVCF file by resizing gVCF blocks according to specified criteria
* +indel-stats: collect various indel-specific statistics
* +parental-origin: determine parental origin of a CNV region
* +remove-overlaps: remove overlapping variants.
* +split-vep: query structured annotations such INFO/CSQ created by bcftools/csq or VEP
* +trio-dnm: screen variants for possible de-novo mutations in trios
+ annotate
* new -l, --merge-logic option for combining multiple overlapping regions
+ call
* new bcftools call -G, --group-samples option which allows grouping samples into populations and applying the HWE assumption within but not across the groups.
+ csq
* significant reduction of memory usage in the local -l mode for VCFs with thousands of samples and 20% reduction in the non-local haplotype-aware mode.
* fixes a small memory leak and formatting issue in FORMAT/BCSQ at sites with many consequences
* do not print protein sequence of start_lost events
* support for "start_retained" consequence
* support for symbolic insertions (ALT="<INS...>"), "feature_elongation" consequence
* new -b, --brief-predictions option to output abbreviated protein predictions.
+ concat
* the --naive command now checks header compatibility when concatenating multiple files.
+ consensus
* add a new -H, --haplotype 1pIu/2pIu feature to output first/second allele for phased genotypes and the IUPAC code for unphased genotypes
* new -p, --prefix option to add a prefix to sequence names on output
+ +contrast
* added support for Fisher's test probability and other annotations
+ +fill-from-fasta
* new -N, --replace-non-ACGTN option
+ +dosage
* fix some serious bugs in dosage calculation
+ +fill-tags
* extended to perform simple on-the-fly calculations such as calculating INFO/DP from FORMAT/DP.
+ merge
* add support for merging FORMAT strings
* bug fixed in gVCF merging
+ mpileup
* a new optional SCR annotation for the number of soft-clipped reads
+ reheader
* new -f, --fai option for updating contig lines in the VCF header
+ +trio-stats
* extend output to include DNM homs and recurrent DNMs
+ VariantKey support
* Thu Sep 06 2018 flyos@mailoo.org
- Update to 1.9
* `annotate`
- REF and ALT columns can be now transferred from the annotation
file.
- fixed bug when setting vector_end values.
* `consensus`
- new -M option to control output at missing genotypes
- variants immediately following insersions should not be skipped.
Note however, that the current fix requires normalized VCF and may
still falsely skip variants adjacent to multiallelic indels.
- bug fixed in -H selection handling
* `convert`
- the --tsv2vcf option now makes the missing genotypes diploid,
"./." instead of "."
- the behavior of -i/-e with --gvcf2vcf changed. Previously only
sites with FILTER set to "PASS" or "." were expanded and the -i/-e
options dropped sites completely. The new behavior is to let the -i/-e
options control which records will be expanded. In order to drop
records completely, one can stream through "bcftools view" first.
* `csq`
- since the real consequence of start/splice events are not known,
the aminoacid positions at subsequent variants should stay unchanged
- add `--force` option to skip malformatted transcripts in GFFs
with out-of-phase CDS exons.
* `+dosage`: output all alleles and all their dosages at multiallelic
sites
* `+fixref`: fix serious bug in -m top conversion
* `-i/-e` filtering expressions:
- add two-tailed binomial test
- add functions N_PASS() and F_PASS()
- add support for lists of samples in filtering expressions, with
many samples it was impractical to list them all on the command line.
Samples can be now in a file as, e.g., GT[@samples.txt]="het"
- allow multiple perl functions in the expressions and some bug
fixes
- fix a parsing problem, '@' was not removed from '@filename'
expressions
* `mpileup`: fixed bug where, if samples were renamed using the `-G`
(`--read-groups`) option, some samples could be omitted from the
output file.
* `norm`: update INFO/END when normalizing indels
* `+split`: new -S option to subset samples and to use custom file
names instead of the defaults
* `+smpl-stats`: new plugin
* `+trio-stats`: new plugin
* Fixed build problems with non-functional configure script produced
on some platforms
* Thu Jul 12 2018 flyos@mailoo.org
- Cleaned spec file using spec-cleaner
- Update to 1.8
* `-i, -e` filtering: Support for custom perl scripts
* `+contrast`: New plugin to annotate genotype differences between groups of samples
* `+fixploidy`: New options for simpler ploidy usage
* `+setGT`: Target genotypes can be set to phased by giving `--new-gt p`
* `run-roh.pl`: Allow to pass options directly to `bcftools roh`
* Number of bug fixes
* `-i, -e` filtering: Major revamp, improved filtering by FORMAT fields
and missing values. New GT=ref,alt,mis etc keywords, check the documenation
for details.
* `query`: Only matching expression are printed when both the -f and -i/-e
expressions contain genotype fields. Note that this changes the original
behavior. Previously all samples were output when one matching sample was
found. This functionality can be achieved by pre-filtering with view and then
streaming to query. Compare
bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]' -i'GT="alt"' file.bcf
and
bcftools view -i'GT="alt"' file.bcf -Ou | bcftools query -f'[%CHROM:%POS %SAMPLE %GT\n]'
* `annotate`: New -k, --keep-sites option
* `consensus`: Fix --iupac-codes output
* `csq`: Homs always considered phased and other fixes
* `norm`: Make `-c none` work and remove `query -c`
* `roh`: Fix errors in the RG output
* `stats`: Allow IUPAC ambiguity codes in the reference file; report the number of missing genotypes
* `+fill-tags`: Add ExcHet annotation
* `+setGt`: Fix bug in binom.test calculation, previously it worked only for nAlt<nRef!
* `+split`: New plugin to split a multi-sample file into single-sample files in one go
* Improve python3 compatibility in plotting scripts
* New `sort` command.
* New options added to the `consensus` command. Note that the `-i, --iupac`
option has been renamed to `-I, --iupac`, in favor of the standard
`-i, --include`.
* Filtering expressions (`-i/-e`): support for `GT=<type>` expressions and
for lists and ranges (#639) - see the man page for details.
* `csq`: relax some GFF3 parsing restrictions to enable using Ensembl
GFF3 files for plants (#667)
* `stats`: add further documentation to output stats files (#316) and
include haploid counts in per-sample output (#671).
* `plot-vcfstats`: further fixes for Python3 (@nsoranzo, #645, #666).
* `query` bugfix (#632)
* `+setGT` plugin: new option to set genotypes based on a two-tailed binomial
distribution test. Also, allow combining `-i/-e` with `-t q`.
* `mpileup`: fix typo (#636)
* `convert --gvcf2vcf` bugfix (#641)
* `+mendelian`: recognize some mendelian inconsistencies that were
being missed (@oronnavon, #660), also add support for multiallelic
sites and sex chromosomes.
* Mon Jul 10 2017 flyos@mailoo.org
- Update to 1.5
- Fixed some runtime dependencies (perl and python-matplotlib)
* Sun May 22 2016 flyos@mailoo.org
- Initial release