I work in the industry, can confirm, the tactics used to dissuade people for aggregating horse racing data so we can sell $2 PDFs are extremely counter-productive and reflect the age of the industry.
Several orgs, including Equibase (US-based, the gate keeper of a good portion of handicapping data) will regularly send cease and desist orders to people who attempt to automate aggregation of data even with free, publicly available content. That's at least half the reason PDFs are used when customers purchase data access, to make aggregation harder (you should see some of the white space, character encoding fuckery they use to throw off aggregators).
I suppose some of this often depends the quality of the data as well. Most data entry happens at the track during the race by a human, none of the data collection about races or the horse stats are collected by a computer, it's 95% hand entered. That also goes for pedigree information and other statistics including medications, weights, etc. And 100% of that is usually self-reported.
Much of the current handicapping in the industry is everyone trying to protect their personal mountains of data. Tech-minded people would love to provide open, controlled, API services so that people can do what they will with our mountains of data. But "giving it away for free" is a non-starter for the good ole boys at the top..
You've hit the nail on the head. It's all about vested interests.
I was involved for a number of years with a UK based horse racing ratings service (handicapping if in the US). This service used to license their base data from the Press Association[1] and then run algorithms on top to produce the ratings.
There's certain things I can't say due to NDAs which are probably still in effect, but the cost of licensing this basic data was in excess of £10k per annum. So, unless you were a serious bettor or were looking to operate a service of some kind, it's beyond the pocket of most individuals.
Timeform in the UK also license some of their own proprietory data, via an API[2]. They've published some pricing on their website and you're looking at between £6k - £12k per year. This is just to access data which is available via their website for a subscription fee of £75 per month, but via their API.
There's even a specific UK organisation which apparently has the permission from the British Horse Racing Authority to officially licence key racing data. This is who sells the data to bookmakers, form guides, racing newspapers etc. They have a rate card published on their website.[3] Private, pro-punter? £8.5k per year please.
It's a bit of a rort really. Most of the data is "freely" available online or in the racing press, but if you want to access it any useable format, either build a scraper (good luck with staying on top of the website changes) or pay a stack to access things programmatically.
As you stated, the vast majority of racing data is collected, measured and entered by hand, by people who are paid to perform this job. It costs enormous amounts of money to employ all these people to watch every race in meticulous detail and gather all the data required to publish the Daily Racing Form. Why would you expect them NOT to protect this proprietary, valuable information?
Almost all tracks publish result charts online for free along with race videos. If you want free, why not compile the data yourself? How long would DRF or Equibase exist if people could access their data for free?
The DRF relies on Equibase data for program and scratch data for all US and most International tracks. Even Churchill Downs relies on data agreements from Equibase to provide up-to-date information to feed to Totes. Result chart information is also almost exclusively Equibase data at least in the US. They make closed door deals with tracks, ADWs and Totes to provide data feeds.
Also, it's important to make the distinction between editorial content (analysis, predictions, subjective descriptions of a horse or jockey performance) and empirical information (horse weights, medication, surface conditions, weather, placements, jockey-horse combo win-rates, etc).
The DRF sells its speed ratings as well as analysis of pedigree and past performances. There's value in that and it definitely justifies the cost of their publication and the other publications that perform similar work.
The critical issue with your stance is that users have no options to aggregate their own data easily. The free PPs Equibase offers have been scrapped before and I know of several specific instances where the creators of those scrappers were sent cease and desist for collecting the information Equibase otherwise provides for free. Even to Github to remove the repository that contains the code.
I'm not advocating scrapping (please don't scrape sites like that) but there isn't any industry interest in providing modern consumable data. Wouldn't it be in Equibases best interest to put that information behind an API and sell access to the public? The industry actively discourages using publicly available data.
Charging a lot for the data is self defeating. In order for the sport to grow, more people need to be interested in the sport. One measure of interest is betting turnover. And a proportion of betting turnover is usually used to fund the industry. In order to increase betting turnover, one strategy could be to make the data free and easily accessible in an automated, machine readable form.
I really do not care about the likes of DRF or Equibase and how long they will or won't exist. I think it is upon the industry itself to ensure this data is available free and easily accessible. Look at Hong Kong as the alpha example. Loads of free data, huge betting turnover, well funded industry.
You may not care about DRF, but it is the sole source for a typical horseplayer to get reliable information about the horses, without which, these players would have zero guidance and likely abandon the sport.
DRF makes racing data easily accessible. If it was left to the tracks, which are independent entities (unlike NFL/NBA/MLB), an horseplayer would have to compile past performances from dozens of sources. The fields of a single day's race card may have run at 30 or more individual venues, in aggregate. Even if that data were free (well, the result charts and replay videos are already free, so technically this is already possible) if would take a ton of work to assemble it all in a digestible format -- which the DRF does for 6 bucks.
I don't believe HK offers free data that is not available from American tracks. There is no API, the result charts are less detailed than American tracks. If info was so freely available to everyone, how would someone like Bill Benter gain such a huge advantage? Why wouldn't he replicate his methods in the US? Probably because the US makes MORE data available.
Several orgs, including Equibase (US-based, the gate keeper of a good portion of handicapping data) will regularly send cease and desist orders to people who attempt to automate aggregation of data even with free, publicly available content. That's at least half the reason PDFs are used when customers purchase data access, to make aggregation harder (you should see some of the white space, character encoding fuckery they use to throw off aggregators).
I suppose some of this often depends the quality of the data as well. Most data entry happens at the track during the race by a human, none of the data collection about races or the horse stats are collected by a computer, it's 95% hand entered. That also goes for pedigree information and other statistics including medications, weights, etc. And 100% of that is usually self-reported.
Much of the current handicapping in the industry is everyone trying to protect their personal mountains of data. Tech-minded people would love to provide open, controlled, API services so that people can do what they will with our mountains of data. But "giving it away for free" is a non-starter for the good ole boys at the top..