RANT: Anyone here interested in the underlying issue of academic publishing? To me, the very notion that publicly (or privately for that matter) funded academic research gets locked behind paywalls of organizations that have not contributed financially to the research endeavor published, seems like irony to me. In short, they pay not a single dime to produce the content, charge the researchers to have their submission reviewed and published, publish the content and make money off of the subscription they sell, and sue the hell out of anyone who tries to wrestle it back out of their control.
Soooo, if they did actual work, they deserve to be paid fairly for that work.
They certainly do not deserve to be paid for things they didn't do, nor a perpetual monopoly on others' work, simply because they've managed to lock-in their customers for historical reasons.
Rent-seeking is rent-seeking, and is bad for society.
organizations that have not contributed financially to the research
charge the researchers to have their submission reviewed and published,
As you point out, they are providing a valuable service. The research is peer-reviewed and published.
You can argue that they aren't needed in the modern peer-to-peer wikiresearch napster world, where anyone can just publish their research wherever they want.
If the researchers don't want their research behind the paywall, they are not forced into the transaction. They must be finding the value-add of the "peer-review and publish" significant enough to give up publishing rights elsewhere.
The journals do not provide the peer reviewers, it's _peer_ review after all. The reviewers are more unpaid academics. The journal's involvement is running an automated submission system that passes draft publications between authors and reviewers, and maintaining a list of reviewers (mostly built from previously published authors).
Some journals have paid editorial staff however, that do useful things like copyediting. In exchange, the public has to pay $15-$30/ea (or whatever, it depends on the journal and field) for access to your papers; probably forever.
It's a rent seeking industry that would make even the the music recording companies blush. Yes, they do some useful facilitation, but it's not commensurate with the (completely externalized) cost.
Unfortunately, this is something that has to occur as a systematic change. Individual researchers are not to blame, necessarily. When you are told by your university that you must publish or face not getting tenure, what they mean is that you must publish in a peer-reviewed journal (or conference, since most of what we talk about here is CS and CS is still a conference field mostly). We could realign the review process so that it was something taken on by universities, for example, but this will require a major systematic change.
Peer review is important. It will not stop being important, but unfortunately the service of peer review is currently only being offered by mostly for-profit organizations, which is the primary problem.
So true. It's not like publishing your paper on your lab's website will satsify the "publish" requirement for obtaining tenure. Why can't we separate the process of peer review from the process of letting a journal handle distribution of the paper?
My pet peeve when downloading articles from journals is that it is a chain of needless HTTP redirects and elaborate cookies. If you are accessing the network from an approved IP address, is all that really necessary? Why can't it be a simple direct download? Answer: Because they've commercialized the process of reading publicly-funded research results. And with that comes the usual mindless hoop-jumping for even the simplest things.
Many investigators will just post a copy on their lab's website anyway. And that's the link that they will often give to students who need a copy of the paper. So the whole scheme of commercializing the publishing of noncommercial research just looks silly.
I've heard a number of times of people starting "open access" peer-reviewed journals for various fields. The persistence of journals that demand permanent exclusive rights means either those fields are still waiting for some enterprising person to do the heavy work to create the new journal (and manage its reputation), or that the old-fashioned journals are still providing some value that the new guys can't replicate.
Even Nature, one of the most awesome-est journals in the world, demands certain restrictions, like not publishing in another journal. They don't want to do all the work of vetting the article only to find out that it's also in Joe's Fishing And Particle Physics Papers.
It's the researchers themselfes who edit and peer-review the content. And academics are kind of locked into those journals.
Just starting your own free publication is not easy because as an academic you are forced to publish in the established journals who demand exclusive content. Only those journals have a high impact factors which are the basis of the arithmetic formula used to measure their academic success. The whole system is rigged...
For cutting edge theoretical research, there are only a handful of people in the world with the knowledge to review the research. And having millions of people that really don't have a clue about it doesn't produce anything worthwhile.
Yes, but those who carry out the peer-review, are the researchers themselves: at times, at little to no cost to the publisher, simply because the reviewing scientist (and the postdoc she/he usually assigns to do the grunt work) get the "prestige" of being a reviewer/editor for a given publication.
There seem to be two issues here that are being conflated.
The first is the value of peer review. I don't agree with the arguments that having a select group of people, who are experts in their fields, reviewing papers is a bad thing. Nor do I think that opening it up will result in anything other than a terrible amount of noise.
The second argument is over the necessity of for-pay journals. Here, I think there could be a lot of work... if the new journal still provides the same amount of review and scrutiny as the current ones.
Just to add a data point that I usually find is missing in this discussion: US Academic Libraries spend nearly $2 Billion/year on serials[0]. And if you look the rate with which that expenditure increases is pretty amazing.
Here's a previous discussion on HN. It begins with a summary of what's already going on with open access to academic papers -- a lot has already happened, far more than many people are aware!:
Is the goal to do aaronsw favors? This is pretty clearly newsworthy.
I think you can ding the reporting for missing important subtleties and lacking context (i.e. the fact that the downloaded material wasn't covered by a JSTOR copyright somehow gets skipped!). But arguing that we should plug our ears to news about a case with clear impact to the community because it might look bad for the defendent is just ... weird.
The point about this being a bargaining tactic makes sense though. They want a plea on something to avoid embarrassment, and it wouldn't surprise me if Aaron was refusing to deal.
The "goal" is to not be a group of spectators in a gladiatorial arena in which one of our own has to do battle with the government's lions. That's all -- just a matter of regard for a fellow human being. It's exactly the same reason I don't rubberneck at the scenes of grisly accidents and why I think the paparazzi are scum.
I disagree that this case has a clear impact on our community, or that this article -- or most of the others about this case -- are newsworthy.
The DoJ isn't human and is not subject to emotions like "pride" or "embarrassment." Their goal is to win cases and enforce laws, and if the bureaucratic cost of adding more charges improves their chances of having the defendant found guilty on them, they're successful.
The DoJ is part of the executive branch, and ultimately answerable to elected officials who absolutely are sensitive to embarassment. No one wants to see a headline like "FBI wastes $2M on failed prosecution of harmless nerd" when it could be "Hacker gets jail time and fine". So they're throwing mud trying to get something to stick. If it looks like anything does, they'll offer a plea again.
You can't really believe that politics have no impact on the case decisions in the DoJ, can you?
Never let the truth get in the way of a good narrative. Your quotation reminds me of statements made in the case against the Wall Street quant who allegedly stole some source code from his former employer (was it GS?). The prosecutor was quoted as saying (paraphrased from memory), "This is so important, they call it their secret sauce." As if calling something "secret sauce" alone is enough to determine the importance of a trade secret.
As if Python was the most powerful language, heh. :)
Maybe he was going for a world record of most citations in a single paper. Who's to say he wasn't just doing research? How many downloads is too many?
I would be willing to bet that the JSTOR TOS do not give a specific number. e.g. "You may not download more than n papers in 24 hours." And if they don't state a maximum in the TOS, then why shouldn't they, for clarity?
It sounds like you are trying to invoke Loki's Wager -- since you cannot define N where downloading N is too many and N-1 is not too many, there must not be such a concept as downloading too many.
People who deal with the law don't have much patience for this.
Courts have a very long history and lots of people have tried lots of really weird things over the years. Swartz will hopefully have a good lawyer, and a good lawyer won't even try to something along the lines of "there wasn't a preset limit on X therefore my client didn't do anything wrong" when his use of X was over a hundred times the combined consumption of all the legitimate users over two months.
Judges are not computers. If counsel presents them with a bad enough argument, they might get insulted that counsel thinks the judge is dumb enough to fall for it. Things that depends on the judge's mood (like purposefully obtuse arguments) are not a good courtroom strategy.
You are jumping around a bit. We were talking about TOS and now we're in a court room and using the words "judge" and "dumb" in the same sentence. I was kidding about doing research. Humor. We all know what he was doing. But the truth is I'm serious about these types of TOS. And I'm looking at this mainlly from the end user's perspective. You see the same type of ambiguous TOS language everywhere on the web. Let's stay focused on TOS for a moment, and leave aside the Swartz case. Do you think ambiguous contracts (TOS) are "better"[1] than unambiguous ones? For example, would reducing ambiguity lower the probability of (costly) disputes?
ToS might be interesting in some cases, but not in this one. JSTOR and MIT kept on denying access to Swartz and he kept on working around their defenses.
What if we looked beyond the Swartz case? Then what do you think about these types of TOS?
Maybe another example would be more interesting. Say you have a choice between an API that allows a "reasonable" number of requests in any 24 hour period and one that allows n number of requests in any 24 hour period. Which one would you prefer?
First assume you're an API user. Then assume you're the API provider.
Anyway, this kind of question is what I was getting at. What is reasonable? I don't know what their server capacity is.
I like using automation, I prefer non-interactive to point and click, and I have always found TOS on academic databases, not to mention most websites, interesting. Because they fail to account for anyone who might want to use automation (reasonably, having respect for the resources of the server). But maybe I'm the only one who finds this question interesting.
It seems to me that Federal law is able to be applied to websites created by private individuals or businesses. As long as the ToS has knowingly been broken and the person doing the breaking has benefited materially then he is at risk of federal prosecution. I don't see a lot of comment about reasonable the ToS has to be. This just strikes as being completely irrational.
Original comment:
Riiiigght, so I can put together a website with some strange terms of service and then the FBI will come arrest anyone breaking those terms of service because it is a federal crime.
No, you cannot do that. A prosecutor must demonstrate, first to a grand jury, then to a criminal jury, and simultaneously to a judge, that the accused not only violated the terms of service to a website, but in doing so caused material harm or found material gain, and that they did so knowing that they were violating the terms of the site.
The fact seems to remain that any private individual can create a website and have the force of federal government apply to a third party as long as said party has knowingly broken the ToS and gained materially somehow from that.
If a restaurant had a 'terms of service' that said no reselling of food that they make, should you be open to federal prosecution if you stopped by to pick up food for yourself and some co-workers? Especially if someone gave you a few dollars for the trouble of running the errand? And even more, if the restaurant has their own delivery service?
Am I missing an important distinction here? Should private companies be able to make binding rules that open people up to criminal prosecution for something that doesn't violate any laws per se? A person breaking a specific law AND breaking a ToS makes sense. A person breaking a law BY breaking a ToS doesn't make sense.
I can't reply to this because no part of the example you provided constitutes a federal crime under the CFAA.
On the other hand, the criminal aspect of using a university's noncommercial JSTOR access to scrape a substantial portion of the entire database so you can put it on BitTorrent is not hard to understand.
It was unclear if what you were implying was that you shouldn't violate a ToS for your own profit (or at someone's expense) because it was a federal offense. (I can see now that that was not what you were trying to get at)
What are your thoughts on PadMapper vs CL? What is the distinction between scraping that data vs scraping this data that makes one worthy of federal prosecution, but not the other? Considering in both cases it was done for profit or detriment
Elements missing from a CFAA case for PadMapper include at least interstate commerce and intent to defraud.
Swartz's prosecution alleges --- credibly, given what Swartz allegedly posted prior to scraping JSTOR --- that Swartz intention was to liberate data from a commercial database onto file sharing networks, making intent a much easier case to prove. Moreover, the indictment is at pains to point out that MIT and JSTOR repeatedly attempted to stop Swartz from continuing his plan, and found themselves in a cat-and-mouse game with Swartz eventually trespassing to maintain access.
PadMapper found itself having exceeded Craigslist's terms, found out by having its access withdrawn and becoming the target of a civil suit, and did not (directly, at least) attempt to evade the countermeasures Craiglist applied to prevent them from obtaining further access.
Whether or not you believe Swartz did something wrong here (I do) or whether you think he should get a felony conviction for doing it (he probably shouldn't), you can see pretty clearly how JSTOR had no straightforward civil remedy to what Swartz was doing. Swartz was playing chicken with them, and he lost --- or rather, his bicycle collided with JSTOR's semi truck at high speed.
Yet stealing a bar of chocolate from your corner shop is not a federal crime.
Additionally if you hack a site and copy all their data but don't do anything with it, is that now not a federal crime because you have not benefited materially from it?
Yes it is because it is interstate commerce. Even if you are located in the same building as the server, just being connected to the internet raises the potential for it to be interstate commerce so it falls into the federal domain.
18 USC 1030(a)(4):
(a) Whoever—
...
4. knowingly and with intent to defraud, accesses a protected computer without authorization, or exceeds authorized access, and by means of such conduct furthers the intended fraud and obtains anything of value ...
"Protected computer" is an incredibly broad term that covers almost any modern computer or device:
(2) the term “protected computer” means a computer—
...
(B) which is used in or affecting interstate or foreign commerce or communication, including a computer located outside the United States that is used in a manner that affects interstate or foreign commerce or communication of the United States;
I can't imagine they'd ever prosecute anything below the $5,000 mark but even a candy bar sized loss does appear to fall into the federal domain. (Not saying I agree with it, but that's how the interstate commerce clause has been applied in almost every case.)
Actually, it may well be. Depends on where you are, where the website is hosted, and whether any of the fiber(etc) your packets traverse cross state lines...
No, stealing a candy bar across state lines is also not a crime under the CFAA.
(Wait, it might be. I misremembered what the dollar minimum in the CFAA applied to --- the dollar limit is why you can't be charged under the CFAA for stealing airplane wifi, but things of value other than computer service itself have no dollar minimum I can find.)
You don't even need Lori Drew to arrive at this conclusion. The CFAA doesn't create strict liability crimes. You have to know you're violating the ToS, and, more than that, you have to benefit (or materially harm someone).
"Materially harm" is such a broad term (it's been used to escape one-penny raises in phone bills) that, in the context of CFAA, it might as well be strict liability. Depending on how big of a dick legal is feeling like on a given day, they could make the argument that having the sysadmin dig up logs was materially harming.
Please read my post again. (Why do I find myself saying this to you in almost every interaction? Do I just fail at communicating?)
The point was that the bar for "material harm" is so low that an infant couldn't trip over it. So much that it's barely even worth consideration. Basically, if you violate a ToS, the company on the other side could make it a federal case if they choose to.
From the company's standpoint, there's no reason not to, unless they've already committed their lawyers elsewhere.
So it might as well be strict liability. If they choose to pursue you, you're in for a bad time. Note that a prosecutor still has to choose to come after you, even for strict liability offenses.
You're not failing at communicating so much as failing at understanding what "strict liability" is about. Strict liability pertains to intent, not to the magnitude of the offense.
Statutory rape is an example of a strict liability crime, because you can be convicted of it without even knowing you committed it (at the time).
I understand that much - holding onto underage porn is a strict liability crime for example
What I'm getting at here is that, the "harm" thing is not a good bar. The only difference between breaking a ToS in this condition and breaking a strict liability law is that it's a corporation instead of a prosecutor initiating the case.
*ed
Dropped "unwittingly", since you have to have been proven to know you're breaking the ToS.. still a broken law..
I don't understand what this comment is trying to say. The words following "the only difference" are not in fact the only difference between breaking a term of service and committing a crime under strict liability. In fact, the opposite is more true. You do not, for instance, simply need to know you're violating a term of service; you have to be doing so in bad faith, with intent to defraud.
I think you need to read the CFAA --- carefully, because clauses that occur early in the statute are refined and clarified later in the statute --- before wading into technical discussions about it.
It's not a particularly difficult law to understand.
...you have to benefit (or materially harm someone).
Are benefit and harm legally defined terms in this instance, or can a clever prosecutor convince a jury of the criminal equivalent of the idea that making a phone with rounded rectangles is worth $1B?
This could be very troubling for scrapers, as they frequently breach the terms of service. If successfully prosecuted it would encourage and empower sites with content that get scraped to push for criminal charges and win given this precedence.
Keep in mind that his alleged actions go way behind scraping. He allegedly broke into an MIT network closet to install his own hardware, so as to avoid IP-based restrictions on the content.
I sympathize with his goals, but I have to wonder what he was thinking. If the goal was liberating the information and I was in his shoes, I would have found a much more paranoid way to go about it. MIT is a very open environment, and I'm sure he could have recruited sympathizers with legitimate access.
Which makes me wonder if he really cared about getting caught. Civil disobedience can be an effective tactic, but in this case it's simply too easy to paint him as an "evil hacker".
Huh? All those statements are in the original indictment.
"Swartz contrived to ... break into a restricted computer wiring closet at MIT;"
"Swartz connected the Acer computer to MIT's computer network"
"JSTOR blocked the computer's access to its network by refusing communications from the computer's assigned IP address. ... Swartz obtained for his computer a new IP address on the MIT network ... and began again to download an extraordinary volume of articles from JSTOR."
(Sorry for typos, I have a PDF of the indictment that is images)
the idea that violating terms of service constitutes "hacking" is very dangerous in general. Get your pay-pal account suspended? you could now potentially be facing federal felony charges.
I didn't read the article, but I imagine the parent poster's logic is like this: if your Paypal account is suspended, you must have done something to violate the terms of service. If you violate the terms of service, you must be a hacker. If you are a hacker, you are at risk for federal criminal charges. Therefore, if your Paypal account is suspended, you are at risk for federal criminal charges.
Are you suggesting that mere scraping alone would demonstrate sufficient intent to warrant prosecution? Can you elaborate? Various courts have had mixed interpretations on the enforceability of sites' terms of service. How would this impact the body of case law that has been building around automated scrapers such as the many "... vs Google" cases? Also, "precedence" is ranking something in priority; "precedent" is the legal term.
The indictment goes into great detail as to how Swartz would have known his actions violated JSTOR's terms, and how he repeatedly took surreptitious steps to continue his plan despite the obvious efforts of both MIT and JSTOR to stop him.
"Mere scraping alone" is unlikely to land you a federal charge; the prosecution needs to demonstrate your intent to act unlawfully. A far more typical outcome for a scraping case is a C&D from the site you scraped.
For obvious reasons, JSTOR can't C&D Swartz once their content hits BitTorrent. Similarly, if you scrape a site and post it to file sharing networks, you might have something to be concerned about.
There was a case where a company successfully got an injunction against a scraper that got pricing data, arguing that the scraper violated the federal unauthorized access statute. http://itlaw.wikia.com/wiki/EF_Cultural_Travel_v._Explorica
This is an area of law that is evolving rapidly so there may be other cases that supersede that one.
You're asserting that it's legal for you to plug your computer into a piece of networking infrastructure that's clearly not intended for general public use, without the permission of that network's owner?
I'm going to have to slap a [citation needed] on that.
It's the same network that's available elsewhere on campus, so it's not like he was connecting without authorization. Aside from possibly trespassing, what kind of crime would it be?
Are you suggesting that there's a kind of network equipment that you have to trespass to access for which people have a reasonable expectation that they are authorized to use it?
There's no giant sign hanging over the breaker box in my building's elevator room saying "AUTHORIZED USE ONLY", but I'm pretty sure I'd get in trouble if I went in there and started flipping switches.
If I'm a student, and there's a switch in the teachers' lounge that connects to the same network that I am allowed to access in the rest of the school, then I can illicitly enter the teachers' lounge and plug in to that network with the same authorization that I would have outside.
Look, if your issue here is that there's no apparent bright line that needs to be crossed to violate the CFAA --- that you don't have to break a 128 bit AES key for instance, or inject a ROP payload --- I guess that's a valid complaint, but it speaks to a pretty profound (and very common, especially with nerds) misunderstanding about the way the law works.
The prosecution does not need to produce a cryptographically signed unimpeachable notarized audit log spelling out exactly which parts of the US Code Swartz broke at each timestamped moment of the day.
Instead, they have to convince a jury that a reasonable person should believe that Swartz knew he was violating JSTOR's terms, took constructive steps to violate those terms, and did so purposefully to commit a fraud.
All we have to go on is the story laid out in the indictment; Swartz has a side to tell here too. But if you just go on the indictment, I think there's a pretty decent case to be made against him.
The whole reason he (allegedly) plugged into it was that it wasn't the same network. There were rate limits and / or other access restrictions on the wifi that weren't in effect when you plugged directly into the equipment in the closet.
Just to elaborate... what it seems we have here is a brilliant engineer and idealist leftie that lost his grip on what is reasonable. He seriously fucked up, and then he seriously fucked up by getting caught. He isn't a hardened criminal, he wasn't stealing to make money and he can almost certainly be reformed with a light sentence, community service and probation.
Only the oppositional system doesn't see things that way.
I have way more sympathy for aaronsw than I had for Mitnick. Maybe this would change if I looked into Mitnick's case, but my prior is that this analogy would not help Aaron.
Same here. The difference is that the motivation that caused Aaron to break the law is a genuine desire to improve the world -- and that's what advocates for Aaron in the arena of public opinion should focus on.