Raw sequencing datasets are big but imminently compressible. This article compares some simple ways to compress sequencing data.

For the comparison I will use three metagenomic samples. Metagenomic samples are harder to compress than samples from humans because the sequences are less redundant. I will compare five compression techniques. Three of these will be familiar to most readers: gzip, bzip2, and BAM files. Two will be less familiar: blocked-fastqs and blocked-fastqs without quality scores.

Blocked-fastq files consist of reorganizing a fastq so that records of the same type come one after the other. Normally fastqs are grouped in four line…

Using ε and δ to build actual statistical tests is surprisingly easy. In this article I’ll explain how ε and δ can be used to modify the workhorse of statistical testing, the t-test, and explain why this is useful.

Before explaining the math behind the tests we should layout some constraints to keep things on track. First, we want to stick with to the idea that ε and δ are an expansion of existing p-value statistics, not a replacement. To that end we will require that whenever ε is zero we will have an actual t-test. Second, we want to…

In my last article I wrote a broad humanistic argument for ε and δ. Anyone familiar with stats will of course see that I was doing a lot of hand waving , particularly when it came to p-values. This article will formalize some of the basics behind ε and δ.

First, I will define what I mean by *p-value. *P-values are everywhere in applied statistics but there is no single all encompassing definition. By p-value I am referring to the interpreted result of a statistical test like the t-test or a Wilcox test. Formally, I am using the p-value as…

The Life Sciences continue to face a crisis of reproducibility. Though driven by many factors much of the blame for this crisis has been heaped on the humble p-value.

As a marker of statistical significance the p-value does not deserve all the ire it receives. The p-value is easy to interpret and as a single number it is a tremendously useful summary of complex phenomena. Likely, this is why the p-value is still in broad use despite the outcry against it.

Since, in practice, p-values do not seem to be going anywhere scientists have made some broad attempts to use…