This is pretty shocking. What is PII doing in the query string in the first place? Disclosing pregnancy status from an insurance application sounds like a possible HIPPA violation and runs afoul of various state laws around 'Insurance Information and Privacy Protection'. E.g http://www.leginfo.ca.gov/cgi-bin/displaycode?section=ins&gr.... See Section 791.13(k). That's just CA law but many states followed with their own version. (IANAL)
I think the really big penalties come into play when medical information is 'personally identifiable'. Since this data is going to Google, Facebook, and Twitter (really?!) with 3rd party cookies, or even without, it would be hard to argue this data is not personally identifiable.
It's not like they didn't know they weren't sending this data out. Or perhaps the highly advanced debugging prowess of "Chrome Inspector" is beyond their pay grade.
Edit: Oh it's not even just Referral leak it's actually it the request in some cases, so blatantly intentional. :-(
> Oh it's not even just Referral leak it's actually it the request in some cases, so blatantly intentional.
Be careful about throwing the term intentional around. There is nothing to suggest this is the case. It's just a shocking breakdown in security/testing processes and/or a bug. We see security/privacy issues everyday. They are almost never intentional.
Unfortunately, a quick Google search doesn't explain what the oref parameter does but from the name I'm assuming it's something like "original referrer".
You don't need malice to explain this – it's entirely plausible to imagine that some people wanted to track user activities and they had a staggering lapse in HIPAA auditing due to the rush of getting the site out and stabilized.
> it's entirely plausible to imagine that some people wanted to track user activities and they had a staggering lapse in HIPAA auditing due to the rush of getting the site out and stabilized.
Considering they spent 1.7 billion on the site, I simply cannot believe that they were so unorganised and lazy on their testing that they couldn't find this. Otherwise I don't know what to think anymore.
However, the OIG report has a number of important caveats:
* The list of 60 contracts in the report includes contracts to support state websites and for programs unrelated to the website (for instance, I found an $85 million contract related to accountable care organizations, which doesn't seem to have any connection to the website).
* The $1.7 billion is not the amount expended, it's the estimated value at the time the contract was awarded if all the options are exercised. When you look at the individual contracts, this estimated value turns out not to be very useful. Some contracts had double the estimated expenditure, some had $0 expended. Looking at the total amount expended, you get a figure of $500 million.
So I think it's more reasonable to say that they spent $500 million on various projects to implement the law, including both the user-facing website and all the behind-the-scenes stuff.
I agree that there's no evidence, at least not yet, of malicious intent. But remember that the "rush of getting the site out" took place back in 2012-2013, with a launch in 2013. It's 2015 now.
Oh, sure – I just suspect that project has been in death-march mode for the last few years. I'd be shocked if the initial launch & stability rush wasn't immediately followed by “now that that's done, we have this backlog of postponed requirements…”
I don't see anyone excusing it – only discounting the supposition that it was intentional.
The only reason this is particularly newsworthy is that it's a .gov service connected to a contentious political issue – I mean, my health insurance company uses the same DoubleClick tracking service and I doubt I could even get a reporter to call me back if I tried to peddle some conspiracy theory about it.
Looks like a plausible search URL from a <form> element with GET. Putting it into the querystring instead of a POST body is a bit surprising, but I think not utterly negligent. Then some Javascript code (maybe not even healthcare.gov's in-house code) looked at window.location.href and put it into another URL, and nobody noticed or stopped it. That is negligent, but more understandable, and fair to presume as unintentional, I think.
There's plenty to be legitimately upset about here, but your comment ("specifically code the application to concatenate") seems to imply the code has something outrageous like "&pregant="+currentUser.pregnant+"&smoker="+currentUser.smoker somewhere, without you giving any evidence that's the case.
"Putting it into the querystring instead of a POST body is a bit surprising, but I think not utterly negligent."
Putting sensitive data into query strings has been considered bad practice for a very long time. To name just one problem among many, it goes into the browser history.
You see it all the time though - it is either implemented that way from the outset, or when setup with a proper POST then someone in testing files a feature request or bug that says "when I bookmark the search page or email the link the search results reset - we need to be able to link to a results page" and wa-la.
It suggests that this data was surfaced accidentally and DoubleClick may not have been interpreting the actual parameters. Either way, it is horrible - there shouldn't be any ad networks or third-party requests within a domain scope that is handling health data.
It's just an observation that this switcheroo of Wallah/Voilà seems regional to that area of the world - would you say that's not true? No insult intended.
Other seemingly American conventions include saying: 'would of', 'could of', 'for all intensive purposes', 'I could care less', and plenty more. I travel to the US a couple times every month and notice things like this!
The implied insult is that this region is the only region where people are dumb enough to misspell these words.
All of your examples are examples of words and phrases that people have primarily heard (rather than read), and thus they spell them incorrectly. People who have read these phrases repeatedly are less like to make these mistakes of spelling.
Yes, it's hard to argue against that. I didn't focus on it because the privacy harm of leaking to third parties seems greater than that caused by the URL going into browser history or server logfiles (the most relevant concerns I see.)
This doesn't excuse them, but it's interesting to consider that probably tens of thousands of sensitive Google, etc. searches per day go into query strings. I guess the difference is that Google doesn't know if your freeform search is sensitive, whereas a "Pregnant?" checkbox is known to be sensitive. But maybe general search engines should not be using query strings either, just in case.
Browser history could be a problem if, say, family member #1 (pregnant, but hasn't disclosed it) signs up right before family member #2 (who doesn't know about the pregnancy, and who family member #1 doesn't want to know about the pregnancy).
Then there's the issue of public computers, which people signing up for Obamacare may well be more likely to use (I don't know that for a fact, but it seems plausible... I know in my area they've had workshops where you come in and someone helps you sign up using a public machine).
Given that these requests are using HTTPS (not that a non-secure POST is any better than a non-secure GET in that regard), could you give some other issues aside from the browser history issue?
Query parameters are public data. They are sent along with every single request from your browser to 3rd parties which is launched from the page! Analytics, image downloads, click-throughs, everything. I think browser history and server logs are actually the lesser evil here.
These query parameters in particular are not search terms, they are PHI you enter to determine the rates you will pay for health insurance. They are basically all the key parameters besides the age of household members which determine the price you will pay for a given insurance plan.
The other big one (as another poster noted below) is server log files. Those may be accessible to someone who shouldn't necessarily have access to the data.
Medicare spokesman Aaron Albright said outside vendors "are prohibited from using information from these tools on HealthCare.gov for their companies' purposes." The government uses them to measure the performance of HealthCare.gov so consumers get "a simpler, more streamlined and intuitive experience," he added.
Spokesman Aaron Albright said outside vendors "are prohibited from using information from these tools on HealthCare.gov for their companies' purposes." The government uses them to measure the performance of HealthCare.gov so consumers get "a simpler, more streamlined and intuitive experience," he added.
It's one thing to send session length, general location, usage stuff like that to see where, for example, awareness campaigns might be needed. But really:
That's a bit much! And I suppose DoubleClick is carefully siloing this information so it doesn't accidentally perform all kinds of analysis on it for comparison with its other huge databases? Perhaps they are barred from selling it wholesale to data brokers but I can't imagine they are unable to use it for plenty of their own purposes.
I think the analogy would be like opening a Government Bank where the doors are always wide open and there are no locks. Everyone can see everyone account status or simply come in and grab their jewls or gold coins, but they won't because generally
[...] citizens "are prohibited by law from stealing other people's belongings"
I agree. On the surface, this looks ridiculous and I am very surprised. Hopefully there is a good explanation... but pregnancy status as a query parameter? I don't understand what that could be other than data mining.
I'm sure you're right that the information is getting used by DoubleClick.
To the other point about this information not being appropriate for the purposes Albright mentioned: Isn't this exactly the information that a health insurance company wants to know for outreach? If I know that all the pregnant women in Tuscon are signing up but none from Pheonix, I suddenly know where to put my next billboard or field office.
If this was a private sector company, nobody would be surprised at collecting this data. It would also be a different story if the data was being stored and analyzed in house or even if the doubleclick request happened on the server side instead of the client.
I agree with the general sentiment that this is a privacy violation, but that's because of the way that the data is collected and who processes it, not the collection and use of the data generally.
There is an NDA but I used "folks" since I, personally, have returned to the startup world and am not involved in the details of fixing this particular incident. But yes, rest assured that the people who currently work on healthcare.gov are busy testing a fix, which is why they're not posting on HN.
An additional problem, as I see it, is that the Obama administration made unambiguous assurances that no PII was being collected as part of Healthcare.gov's use of web measurement tools. Here's the excerpt from the privacy policy:
HealthCare.gov uses a variety of Web measurement software tools. We use them to collect the information listed in the “Types of information collected” section above. The tools collect information automatically and continuously. No personally identifiable information is collected by these tools.https://www.healthcare.gov/privacy/
Note the last sentence is in bold on the actual web page.
A Department of Health and Human Services organ called the Centers for Medicare & Medicaid Services is responsible for the site. An enterprising HN reader might want to skim through the CMS (very long) privacy impact assessment to see if there are any other incorrect claims about Healthcare.gov:
http://www.hhs.gov/pia/cms-pia-summary-fy12q4.pdf
It will be interesting to see if anyone gets fired as a result of this particular privacy screwup. The buck should stop somewhere, right?
>A Department of Health and Human Services organ called the Centers for Medicare & Medicaid Services is responsible for the site. An enterprising HN reader might want to skim through the CMS (very long) privacy impact assessment to see if there are any other incorrect claims about Healthcare.gov: http://www.hhs.gov/pia/cms-pia-summary-fy12q4.pdf
Is there any way to split this up so each person is responsible for a section? you'd miss a lot by missing context... but if the section readers bullet pointed everything, that could be combined into a larger context.
Or, in HN speak, we could crowdsource a real-world Map/Reduce job to support big data in the citizen-scientist.
I love the idea of a real-world map/reduce job. :) But before spending any time on this, please make sure it's the right PDF. It does mention Healthcare.gov, but only a few times, and I'm no expert on HHS organizational structure. Here's the full directory of PIAs: http://www.hhs.gov/pia/
Nice find. Considering the bug is literally staring every single user in the face on the URL bar, I would imagine it would be hard to pin blame on an individual.
I guess this is the final nail in the coffin for the 'many eyes' theory though.
At least it will make a good t-shirt;
"Query String Parameters Are Not Private"
"Friends Don't Let Friends Store PHI in Query Parameters"
I think the 'someone needs to be fired' is just press release journalism. It makes for an easy narrative. "There's a problem at healthcare.gov" is the first story. "What happened at healthcare.gov" is the second story. "Blah Jones has resigned" has everybody wiping their hands and looking for the next press release story to write about.
It's certainly possible that a given individual is meaningfully responsible for a problem and that they are incompetent, but it isn't necessarily the case. If the actual problem is organizational, a scapegoat just papers over it, it won't fix anything.
I didn't say "someone needs to be fired" -- that's a paraphrase of what I typed, not a quote.
My point is a broader one: When you have committees and subcommittees and working groups and HHS IT people and CMS IT people and task forces and contractors and subcontractors and new replacement contractors (Accenture) and undersecretaries and sub-sub contractors and assistant secretaries and White House aides and political consultants and PR firms and deputy chiefs of staff and deputy undersecretaries all participating to some extent in the $1B+ process that is the supremely functional Healthcare.gov site we all know and love, the buck can be passed endlessly.
But in all that morass of a process, someone was or should have been responsible for ensuring that standard privacy practices were followed. To her credit, Kathleen Sebelius resigned last year (though not immediately) as a result of what the NYT called the "disastrous rollout" of Helathcare.gov. It is worth looking at whether there is any accountability in the form of dismissals or resignations with this privacy snafu.
If there is not, we should draw our own conclusions.
While the data itself would fit the description of PHI, I don't know if healthcare.gov itself qualifies since it isn't a "health care provider, health plan, public health authority, employer, life insurer, school or university, or health care clearinghouse". That doesn't mean that it's against best practices. I built an analytics platform for a project with the VA on https://catalyze.io/baas (I also work there), so their are some alternatives to analytics when HIPAA is a concern.
It could depend on whether healthcare.gov signed Business Associate Agreements (BAAs) with the insurers that it's connecting to. If it did have to sign BAAs, then heathcare.gov would be covered under the scope of those BAAs, and would likely have to be complying with the security rule and the privacy rule.
Don't governments always write themselves an exemption from following laws for everyone else? I imagine they are exempt by law from things that would land others in court or jail.
When I worked as a researcher the NIH, we took privacy very seriously, as we were liable for information leakage caused by our negligence. The federal government is liable for HIPPA violations.
The user's IP address (which I imagine gets collected by doubleclick) is one of the 18 identifiable attributes which makes data PHI. But, I don't think that healthcare.gov needs to comply by HIPAA.
I was initially curious why Google/DoubleClick were in there myself, since there aren't ads on the healthcare.gov site. Those requests look to be retargeting tags so they have the ability to do things like show banner ads on CNN only to people who have an incomplete marketplace application, along with conversion tracking so they can see which marketing campaigns led to completed applications or other goals. Presumably whoever controls the rest of the healthcare.gov marketing budget also runs the DoubleClick/AdWords account.
This is certainly scary stuff, but I was a bit annoyed with the line:
"...consequences such as when Target notified a woman's family that she was pregnant before she even told them. "
I've heard this story referenced time and again with respect to motivating people to care about privacy and tracking. I'm all for privacy, but I feel like: (a) we should have more recent anecdotes about the consequences of tracking than a story from 2012, (b) the mechanism that Target used to infer this is far less intrusive (not making it OK) than what we see here, and (c) its really not strong enough an example.
Not that speculation is the way to go, but what about the possibility of someone being turned down for life insurance due to this information?
Well, it is a simple example and has the virtue of being true instead of the often quoted but misrepresented McDonalds hot coffee story. Simple examples showing a situation are best, and much like iOS bug statistics, the parties who would have the statistics on situations caused by tracking are never going to make them public.
They don't even get into the repercussions of loading externally-hosted JavaScript into a secure page.
We avoid this entirely (also hosting medical data), though it's been a bit of extra work to do so.
I'm sure Chartbeat, Mathtag, Mixpanel, Google, etc. are reasonably careful about their security, and of course they would suffer as well if one of the servers/scripts was compromised and the breach was made public.
But in short -- healthcare.org's security relies on the idea that none of these many 3rd parties will ever have a CDN server compromised, for example. Or (in other situations) have the NSA demand access.
It just takes one -- and then an "improved" script could be delivered to only clients visiting a single targeted site, or even specific targeted clients. The normal customer just sees the lock icon and can verify that there's a secure connection to the main host; but there are actually many other connections going on to other hosts, and any of them may provide a script that can access any sensitive data on the page.
What else could one possibly expect when an industry has succeeded at convincing the government to make buying their product mandatory?!
I know the EFF focuses specifically on informational issues, but stirring outrage over one abuse of a captive market when such abuses are by design is a disservice to general sanity.
The entire issue with the ACA and private insurance aside, this particular website does not need to have 18 tracking scripts on it. I'm sure this is just another symptom of the convoluted development process behind the Healthcare.gov mess.
Not sure if you've ever worked on an enterprise website before but this would have nothing to do with the development process. This is all the work of the various marketing/product teams wanting to use the tools they are accustomed to.
It's ridiculous how many times I've seen this happen before.
The ACA model in the US is very similar to what exists in many places in the world e.g. here in Australia. And it was driven from the needs of the government not the needs of the health care industry. Although they are a beneficiary. That said the model really does work.
The fact is that uninsured people has a devastating effect on the economy. It prevents movement of labour, affects productivity, promotion to higher socio-economic levels, prevents people starting businesses, affects crime and countless other social effects. You need to force people who don't think they need it to have it.
From experience, an Australian doesn't have the correct frame of reference to even engage in the US healthcare debate. I have tried to understand the issues many times but it comes from such a fragmented starting point it's difficult to understand unless you've been in it for a long, long time.
Your points about uninsured are valid, but it's much more complicated than just saying 'hey, you guys should insure everyone'. So I generally try and observe from the sidelines.
We're clearly coming from very different places with regards to whether governments exist for their people, or people exist for their government. FWIW, forcing someone to purchase something they otherwise wouldn't actually hurts movement of labour, promotion to higher socio-economic levels, and people starting businesses.
>The ACA model in the US is very similar to what exists in many places in the world e.g. here in Australia.
Does your boss decide what your health insurance will be?
Are health insurance companies in Australia publicly traded for profit corporations?
There's other questions I have but I wont ask because I feel that people may take them as attacks. (I don't mean to attack)
>it was driven from the needs of the government not the needs of the health care industry.
The white paper that became thd affordable care act was written by Liz Fowler who was a VP at one of the largest health insurance companies. After the ADA was passed she became a lobbyist for a major pharmaceutical company.
Do not underestimate the back-end complexity of data integration. The front end may be a small engineering problem, but I assure you, the back end is fraught with more political, protocol, transmission, and format problems than you could imagine. Or maybe you can. But please do consider the number of disparate businesses that had to be technically unified for this purpose, and the tendency for technical unity to break constantly across political/organizational boundaries.
Yeah... this is what happens when the customer is not the end-user of the product. Markets break down because there's no feedback loops.
Kind of like those toilet paper dispensers in public bathrooms that require a key to open and make it as difficult as possible to unroll a few sheets of what feels like sandpaper. Georgia Pacific has no reason to care about the person sitting there, powerless, on the toilet. Terrible user experience!
More government doing shitty things not in its charter. I'm numb to this abuse. Next up: increased taxes + inflation.
I hope I live to see the day that the laws are twisted and shredded such that all corporate-government data about every person is available for purchase. I'd love to have that detailed record of everything I've said, thought, places I've been, etc since ~Y2K. How cool would that be?
I've heard it said that future cultural anthropologists of the future will absolutely love mining the rich personal data coming out of this period of time.
>>> I've heard it said that future cultural anthropologists of the future will absolutely love mining the rich personal data coming out of this period of time.
Former Anthropologist here.
While culturally speaking it will be interesting, up to a certain point in human history there has always been physical things left behind by cultures to denote their existence.
As our whole lives have become digital, once the servers are gone, the pseudo physical evidence will vanish. One of my professors told me in passing in the early aughts that, "This generation (meaning the Y generation) will barely leave a trace of its existence in 200 years."
He inferred that once technology has evolved past our current rate of burn, the mechanisms by which we preserve our memories will be forever wiped out. He made a note of saying, "When was the last time you used something physical to create, retain or share your memories?" When was the last time you printed a photograph? Listened to a music album? Once the devices by which we save our memories become obsolete, so does our existence.
It caught me off guard, and was. . one of those times where you stop and wonder what people will dig up in 2-300 years from now and discover about our civilization? Will it all just be zero's and one's on a server somewhere?
Couldn't you say that hard drives, SSDs, tape backups, etc are all still physical mediums? While these mediums lose data over time, forensics will still be able to recover partial data, similar to other physical mediums (pen and paper, photos, etc).
Those are usually destroyed when their useful life ends, exactly because someone might dig them up later and extract data from them. Large corporate data centers, for example, physically destroy hard disks and never allow them to leave the facility intact.
There will be hard drives left around by individual consumers, I suppose, but the vast majority of all those that exist today are likely to be deliberately destroyed. We're so good at copying and replicating data these days that we no longer rely on hard drives for data permanence over long periods.
Perhaps, but I tend to believe the data, as we depend more and more on the 'cloud' won't be tied to physical mediums (or particular physical mediums) and instead be towed along as technology and the mediums improve.
100 terabytes of information now will likely be absurdly easy to store 100 years from now, and we don't lose what we have now because data centers will just upgrade and move the data to better storage platforms as they are invented and deployed.
Athropology of the future may not include digging up hard drives in garbage dumps. Instead you just run the latest google search.
I don't think the 'The Ancients' really cared if they lasted 100 years or 10000 years. They used concrete, or stone, or wood, based on basically the same factors we do: ease of procurement, cost, suitability, and so on.
Anyway, just because some stone tablets are still around, doesn't make it a good storage medium. Most of them are destroyed or lost. Even the ones that are still around, don't give you the perfect fidelity you get with digital storage. And of course, once it's gone - smashed into bits or lost at the bottom of the sea - it's gone. Meanwhile I can copy data stored on a digital medium as many times as I like, with virtually no loss in fidelity or transcription errors, and store those copies anywhere.
Someone else mentioned that archeology of the future will likely consist of a Google search or something like it. I suspect they're right.
If you're looking for a better place to go than healthcare.gov, give us a try at stridehealth.com. Bunch of ex-privacy folks and healthcare folks - can shop from your phone. Pretty shocking to see such a novice mistake by an org I think we were all expecting to take it up a level this year.
I heard something awhile back about the us government(NSA) leveraging the cookie in a way that they could use it as a surveillance beacon. I doubt there is any relation, but it makes you think a bit.
One of the heavily trafficked sites in India (Railway Booking) has been showing Google adsense ads. Someone is making a Million dollars a month in Government :)
I think the really big penalties come into play when medical information is 'personally identifiable'. Since this data is going to Google, Facebook, and Twitter (really?!) with 3rd party cookies, or even without, it would be hard to argue this data is not personally identifiable.
It's not like they didn't know they weren't sending this data out. Or perhaps the highly advanced debugging prowess of "Chrome Inspector" is beyond their pay grade.
Edit: Oh it's not even just Referral leak it's actually it the request in some cases, so blatantly intentional. :-(