Small Press, Big Data — or, ‘Bookonomics’

A post in which Valley Press founder Jamie McGarry attempts to apply some of the thinking behind ‘big data’ — and data analysis in general — to the world of small-press publishing, using only four charts (we promise).

Mention ‘big data’ to a high-flying publishing CEO and they will rub their hands together gleefully, in anticipation of the impending riches coming their way. Mention it to a small press owner, and they may sigh and change the subject (or look profoundly baffled, depending how much time they’ve spent reading business articles lately).

These are sensible responses, because ‘big data’ needs scale; hence the word ‘big’. The term generally refers to the study of extremely large sets of information; trying to find patterns that can be used to predict future trends, reduce running costs, and contribute to more confident decision-making.

As the owner of a ‘small press’ (a term recently defined as a publishing house with five or fewer full-time members of staff), I don’t have access to large data sets — but, like a mouse demanding a turn on a trampoline built for elephants, I woke up one day determined to have a go regardless.

To begin my study, I decided to collect the following output data for as many small-press publications as I could:

  • Net profit (or loss), in pounds sterling, at a point exactly twelve months after publication date.
  • Units sold, at a point exactly twelve months after publication date.

Then I came up with the following input variables — factors I believed would affect one or both of those outputs:

  • The RRP, or recommended retail price, of the book. (preconception: cheaper leads to more units sold, less profit)
  • The broad genre of the book. (preconception: poetry sells least, short stories second-least, novels and non-fiction most)
  • The length of the book. (preconception: tiny pamphlets will sell fewer copies than thick, epic novels)
  • The number of copies initially printed. (preconception: a large first print run will be an indicator of publisher confidence, but could hurt profit)
  • The age of the author. (preconception: older authors will be more established, have more contacts, more experience, so sell more)
  • If there was a launch event at which the publisher sold direct. (preconception: this will be good for both outputs)
  • The month the book was published. (preconception: none, but I’ve always wondered whether this makes a difference!)
  • Whether the book came from a solicited or unsolicited submission. (preconception: solicited manuscripts were specifically sought-for by their publishers, and therefore should sell more)
  • If the book was a debut. (preconception: books by established authors should be an easier sell than someone’s first)
  • If the author works full-time in the creative industries, e.g. teaching creative writing, or editing other books. (preconception: this should boost an author’s ‘network’ of possible readers, and be a positive indicator of their standing in the industry)
  • If the book was shortlisted for any significant regional or national awards, or chosen for a special promotion by a major chain of bookshops. (preconception: this may not be entirely positive … these awards sometimes come with fees)

I’d like to declare at this point, I could only lay my hands on reliable information for thirty-one publications — my record-keeping was a bit patchy until last year, and really recent books don’t have the twelve months of data, so don’t qualify. If you’re a publisher or author reading this and feeling generous, I would love more data to help with future articles; all completely anonymous of course. You can get in touch with me here — I just need the two output figures, and as many inputs as you can spare. The more I know, the more helpful these articles can be!

(Note: I didn’t include the publisher’s salary in the net profit, or other business overheads — but did include all expenses specifically related to the production and distribution of each book, including freelancers hired. I put ages and page counts in categories, and rounded prices up to whole pounds.)

So — after a few happy hours putting data into a spreadsheet, and asking the resultant database some questions, what did I learn?


The first (and possibly most useful) observation came before I’d even started on the inputs, and is as follows: the more units sold per book, the greater a publisher’s financial risk. This is best illustrated by a chart (you were warned!):

The thirty-one squares/diamonds are my thirty-one example books, arranged from left to right on the horizontal axis in order of units sold (highest to lowest), with those values represented by that smooth red line. The blue line is the profit from those same books; the precise values don’t matter, it’s all about the shape, and the obviously volatile relationship at the top end.

You can see that the book with the most units sold was also the most profitable, by miles — but the book with the second-most units sold was the least profitable; in fact making a loss at the end of its twelve months. They sold almost the same number of copies, but the financial result couldn’t have been more different. (I hadn’t realised this had happened until now; that’s the power of visualised data for you.)

The third best-selling title made the same profit as the fifth-worst, and the relationship between units and profit continues to be jerky for the entire top third of the chart. The very lower end is a helpful illustration of the importance of the ‘break-even point’; just a few less sales made a significant financial difference to those lowest two books.

If we focus on the middle of the chart, however, we can see the kind of reassuringly predictable pattern I promised you when giving my formula for successful small-press publishing. This is where I want to bring in one of the input variables — here’s the chart again, this time with the single-author poetry titles highlighted in green:

Let’s pause and note that the big success was a single-author poetry title — well done to them! — but then focus on that reliable lower-middle section of the graph. It’s almost all green; this would seem to suggest that poetry collections are reliable books to publish. (Admittedly, the data is skewed by the fact two-thirds of the books featured fall into that category — this will be one to watch for the future, as my information grows.)

That difficult second-book-from-the-left was a novel; it was also the only book featured which ticked the last box in my input list, being a recipient of significant national recognition. Anecdotal evidence suggests this effect is common; a publisher might need to pay a PR fee, might do a big print run, might suffer from the dreaded ‘returns’. (Note that I’m not saying what happened to these books in their next twelve months; that’s a whole different article, a different misspent afternoon…)

Going off-piste a little, I would hypothesise that when a poetry collection is a hit, due to the different ways people buy poetry compared to novels (and the fact they are cheaper to produce, but with the same RRPs), it’s a much more profitable situation for the publisher. Speaking of RRPs: I found no correlation between a book’s price and its success, in units or profit — also, a book’s length had no noticeable effect. One theory would be: as my prices are matched to page count, there’s no friction as they go up or down, so of course there’s no effect.

I was surprised to find no real correlation between success and the number of books initially printed for each title. Here’s the chart anyway, where the initial print run numbers are green triangles:

I’ve numbered the books this time, to give you a fighting chance. Where the green triangle lands on the red square, I got the print-run exactly right for those first twelve months; but the chart suggests accurate first-run printing has no effect on profits. I find that hard to swallow — all the time I’ve spent agonising over those first print quantities! — but that’s the only conclusion I can draw.

Books 8 and 10 fit my expected pattern; a green triangle above the red square means I printed more copies than I needed for that first year, and as a result, the blue diamond (profits) is low. But look at books 5, 9 and 16; I printed too many and still made a significant profit. Books 3 and 7, I printed exactly the right number of copies, but struggled to make cash— how did I manage that?!

Books 1 and 12 suggest that printing too few books doesn’t hurt profits; but then, book 2 suggests that printing too few can be deadly. More research definitely needed in this area.

As far as age goes, you’ll be pleased to hear that once you pass 30, age really is a meaningless number — authors from 30 to 90 (our oldest) were mixed up across the chart. However, writers under 30 generally sold fewer units; I’ve highlighted them in orange on the chart below:

Obviously this could simply be a lack of data; I’ve published some great books by young authors which haven’t made it onto this chart yet, hopefully they’ll set things right when I update the database later in the year. Debuts fared poorly too; they were in the lower half of the charts for both profits and units sold. But of course, that doesn’t mean I’m going to stop publishing debuts! This is all just some statistics-based fun and games; if we learn anything constructive from a business point of view, we’ll have been very lucky.

Finally, factors that apparently don’t make any difference: month of release (so much for that), whether I did a launch event (I’m shocked), whether a manuscript was solicited or not (good news for VP submissions), and whether an author works full-time in the creative industries. But I’ll keep tracking those anyway, and please do include them if you send me any data.


I think that’s more than enough for now, don’t you? I hope this has been of some interest; it’s definitely given me plenty to think about, and I absolutely intend to come back in a few months’ time and revisit these questions. Hopefully by then I’ll be a few steps closer to having actual ‘big data’ … and pushing some of those elephants off their trampolines?