Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Craigslist Lawsuit (3taps.com)
209 points by squigs25 on July 25, 2015 | hide | past | favorite | 122 comments


Greg Kidd, the founder of 3taps, did not have to keep fighting this fight - AT ALL. He is one of the top execs at Ripple Labs, was in the first round of Twitter (and Square), and doesn't have his net OR self worths tied up in 3taps. He continued this because he believes it was right - and I, for one, thank him for it.


Right for repurposing someone else's data without permission? Nothing right about that.


You can't usually convince anyone that they're wrong just by rephrasing some situation in a way that's more favorable to you.

"Right for using information freely provided by Google to help people find homes?" See? Now we have both found a way to describe the same situation in ways that makes it sound like two entirely different situations.


Why should it be Craiglist's data? The users wrote it, not them.


Back when the first bubble burst in late 2001, I scraped a bunch of historical craigslist data from a secondary archive and built an interactive gnuplot webpage of post-traffic by category over time. At the time, it got slashdotted, a couple hundred thousand people looked at it and it was all fun and good.

So I thought afterwards, hey, the economy is kinda sketchy still and looking at this stuff sure is neat... I should build a real tool that robustly and respectfully logs daily post totals for more locales, and maybe build out a cool little graph portal. Maybe I can even do a little NLP to make it smarter. hey, it's craigslist, they're community minded.. they thank me when I post, they won't mind. They give pencils to teachers even.

So I email them, and Craig responds in a cc'd message with a 'hey cool, can this guy use our RSS feeds'? At which point, the assholes that worked there started inventing every excuse under the sun as to why doing so would totally damage their infrastructure (because you know, polling RSS every half an hour is total abuse.)

Anyway, that's when I realized that all the hippie-dippie stuff was just window dressing and that I really truly was dealing with a really special species of asshole.

I put the project down and walked away. The end.


Hey, can you post some quotes from those at Craigslist who were against you using their data for various graphs?


In the end they shut me down with this (names redacted, of course):

--snip--

We already have all this done in house. If our CEO decides to publish it we can do so easily.

Please understand it's not that I think you are trying to do something wacky, it's just that I really am in the business of not working our servers any harder than they need to.

Sorry,

--snip--


the same statute that led to the demise of Aaron Swartz

For fuck's sake.

CFAA criminal sentencing guidelines may very well have contributed to Swartz's suicide. They incentivized prosecutors to create complex, showy indictments cross-linking multiple felony charges (because exploiting unauthorized access in furtherance of other felonies is an accelerator in the CFAA). CFAA may be broken in several ways.

But CFAA is also the sole federal statute governing unauthorized access. In civil litigation, CFAA is the only statute that provides a civil cause of action relating to unauthorized access to computers of any sort.

People like to write about civil CFAA as if it was some sort of nuclear option. But civil and criminal cases are worlds apart. If you're going to sue someone for misusing your computer systems, or even just violating your terms of use, CFAA is merely the statute that enables that. That has nothing whatsoever to do with overzealous prosecution.

Invoking Aaron Swartz in an argument over who's allowed to show apartment ads where is manipulative and grotesque.


As the EFF argues in the linked brief, there shouldn't be any "civil cause of action related to unauthorized access" when the data in question is made publicly available on the internet.

Craigslist was abusing the CFAA with an expansive interpretation – treating unapproved use as if it were the same thing as unauthorized access – similar to that of overzealous federal prosecutors. Craigslist's argument, if embraced by the courts, would make other cases imposing penalties on the reuse of otherwise-public data easier.

The reference is fair to make these points to a mass audience, although a bit macabre.


> As the EFF argues in the linked brief, there shouldn't be any "civil cause of action related to unauthorized access" when the data in question is made publicly available on the internet.

Why?

The principle underlying the Craiglist lawsuit is centuries old. Craigslist is like a shop open to shoppers. The shopkeeper makes the premises open to the public, but the scope of that access is limited by the shopkeeper's purpose in granting that access. If a member of the public accesses the property for improper purpose, a civil action for trespass arises.

The fact that the premises is an Internet website changes nothing.


The fact that the premises is an Internet website changes nothing.

It absolutely should. There is no accurate physical analogy for an HTTP server that responds with "200 OK" and valid data for a given set of "GET whatever HTTP/1.1" requests. Rather than contort existing trespass law to match the Internet, we have to derive meaningful boundaries for the Internet from ethical first principles if we want the conclusions to be remotely sane.


This comes up a lot. It simply cannot be the case that a "200 OK" constitutes a blanket authorization to do anything with a website (so long as you keep getting that HTTP status). A huge fraction of SQL injection attacks, and virtually every authz/forced-browsing/direct-object-reference attack also generates "200 OK".

(I have no idea how this plays into the Craigslist lawsuit. I'm just responding to that one fragment of logic from your comment.)


Right, I'm not saying that "200 OK" means you definitely haven't exceeded authorization, just that it can't reliably be analogized to the physical world.


But Rayiner's point is that it's equally ambiguous in the real world. A shopkeeper opens their shop to the public. Nobody has to use a key or flash a card to get in. But you can certainly exceed your welcome in a real-world store, and the law manages to handle that situation, too.


The fact that the law handles the physical world doesn't mean that real world analogies lead to meaningful conclusions about the digital world. That is the point I intend to convey, that the analogies themselves are counterproductive to the goal of producing coherent digital law.


On the flip side, real world principles don't become irrelevant just because we're talking about the digital world. What does it mean to have private property? It means having the right to exclude. I don't think there is anything difficult about translating that principle to the Internet.

What I think happens a lot is that people use "the Internet is different" as a sort of short-hand for the idea that the Internet should be governed by different underlying principles. But I think it's disingenuous to say that the nature of the Internet dictates that it be governed by different principles. The existing property principles translate just fine.


The existing property principles translate just fine.

If they did, we wouldn't be having this conversation. As others have pointed out, physical concepts of scarcity and mutual exclusion are nearly meaningless in the digital realm. Digital systems are capable of acting autonomously as agents; they are not just passive property. Measures to ensure technical exclusion bear no useful resemblance to physical security. Computers give the ability to lay out in machine-readable terms what is "public" and what is not.

Note that I'm not arguing against the right to decide who can access a computer system and for what purposes, only against the use of analogies to physical spaces and existing laws.


I attended a talk given by Michael Hayden that touched on this concept a bit, with regards to the difference in RoE in "meatspace" vs cyberspace.

An example he gave was seeing a cop drive by your house. Ostensibly that'll put you at ease, the concept that a law enforcement agency is patrolling your neighborhood. Maybe the officer gives a wave, or even chats with you for a bit if you're out washing your car or mowing the lawn. You feel safe.

How would this possibly translate to "cyberspace"? What if the local PD did a regular port scan of your router, or attempted to crack wifis while on their patrol? We allow searches of our belongings when we get on planes, but when our packets go overseas, we very much dislike the idea that someone from the government might be taking a look.

The rules, he went on to say, are simply different, because expectations are different.


Expectations may be different but that does not imply the laws are different. Expectations are what engineers have; laws are what Congress passes.

I understand that in the State of California it is unlawful to shoot a whale while riding a camel.


Actually, it does imply the laws are different, considering the folks who wrote the laws at the time were not considering their cyber implications, and the folks interpreting the laws in courts (judges) themselves have a different set of expectations and understanding when it comes to cyberspace.


But the (purported) violations happen when the trespasser has already left the property. You would need to find a precedent for an "open to the public" property's owner's right to restrict visitors from (off property) telling others what merchant-tenants were offering, under what terms, on that property.


The law of trespass reflects a basic ethical principle: owners of private property may invite the public to access their property for a proper purpose (the scope of which may even be implied) but nonetheless retain the right to deny access and to sue for trespass those who access that property for some other purpose. That ethical principle applies just as much to websites as coffee shops. Just because Craigslist makes its website available to the public for a defined purpose does not mean it's not trespass for a company to access that data for a different purpose.


(I am going to assume we are still talking about the article.)

First of all, trespass is a red herring in this case. Defendent used Google's Cache -- NOT the Craigslist website -- to gather data. Craigslist's claim is that accessing a Google Cache of its website constitutes unauthorized use.

If we're going to resort to ill-fitting metaphores, it's closer to the owner of a piece of artwork posting public notice that their art can only be viewed in their studio, permitting a public gallery to show that art, and then suing you for trespassing because you looked at their art while it was in a public gallery.

Craigslist asserting copyright claims in this case is plausible if precarious (the very quietly changed their ToS just 4 days prior to filing suit -- that's bullshit if I've ever smelled it. Furthermore, as an aside, I wonder whether the rise of walled data gardens shouldn't give us ethical pause).

However, Craigslist's claim to CFAA violation in this case is absurd and dangerous. Period.


> If we're going to resort to ill-fitting metaphores, it's closer to the owner of a piece of artwork posting public notice that their art can only be viewed in their studio, permitting a public gallery to show that art, and then suing you for trespassing because you looked at their art while it was in a public gallery.

Changing the scenario slightly makes this seem far less absurd: you took a photo of the art piece while it was on display at a public gallery, then used your own photo commercially.

> the very quietly changed their ToS just 4 days prior to filing suit -- that's bullshit if I've ever smelled it

From the article, CL changed their ToS, and after a month sent a Cease and Desist Letter to 3Tap. After that, they required CL poster to agree to new copyright rules, and then 4 days later sued 3Tap. It's not clear if that later change was also a change in their T's and C's.


Changing the scenario slightly makes this seem far less absurd: you took a photo of the art piece while it was on display at a public gallery, then used your own photo commercially.

But that would be copyright infringement, not trespass. The CFAA charge is still unfounded.


> you took a photo of the art piece while it was on display at a public gallery, then used your own photo commercially.

Find me a single example of an artist successfully suing for trespassing in such a situation.

CL's only plausible claim is to copyright infringement. They should stick to that claim.

> CL changed their ToS, and after a month sent a Cease and Desist Letter to 3Tap.

1. After "less than a month".

2. 3Taps replied to the Cease and Desist Letter stating that it didn't make any sense because the C&D letter requested that 3Taps stop accessing CL, but 3Taps wasn't accessing CL directly.

3Taps assumed this was an adequate response (and asssumed their initial behavior was okay) because the explicit caveat to the explicit "use our data" invitation from CL executives was that such use shouldn't over-tax CL's bandwidth.

At that point, to quote the article (emph. mine), "craigslist concocted a scheme to allow it to assert ownership over copyrights to user postings so it could bring copyright infringement claims against 3taps and other innovators who accepted Mr. Newmark's invitation. Without notice to its users, craigslist inserted language into its posting process that it claims gave it an exclusive license to user posts. Four days later, craigslist sued 3taps."


I didn't know about the Google Cache thing. The Wiki page doesn't mention it. That definitely changes at least the CFAA part of the case. Did they ever scrape CL directly?


"the policy of 3taps with regards to Craigslist data is that any sourcing of that data for insertion via our API be done without scraping or visiting Craigslist at all." (original emphasis)

Obviously, I don't know whether they followed that policy. However, the important point is that it sounds like Craigslist's argument applies regardless of whether 3taps was scraping directly from CL.

Here's the source: https://3taps.com/papers/response%20to%20%20c_n_d%20letter%2...


yes, both padmapper and 3taps were scraping CL directly for some of the time

padmapper was scraping, got C&D, so moved to 3taps

3taps got C&D (and was playing dumb games like switching ip addresses to avoid blocks), then moved to scraping CL posts out of google's cache, claiming they therefore weren't bound by terms and conditions of CL. After CL blocked google from caching posts, 3taps went back to scraping CL.


Was blocking caching on Google what he referred to as "interfering with Google and other search engines"? That's laying it thick.


Ultimately, 3taps knew they were wrong no matter how many ways they try to justify it. For all of their time and energy, they should have just built a new craigslist.


Read the link. The details of this are in there.


After they were told to C&D and sued by Craigslist, they decided to try a cute workaround and pull the Craigslist listings from cached Google copies instead, which didn't quite work out so well for them in the end.


But this is another case where analogies to tangible property can easily mislead.

Even though we metaphorically call viewing a website 'a visit', I'm not occupying any of the website's property, or even particularly using it a consumptive/rivalrous way. I haven't traveled anywhere or opened anything. Even the kind of 'access' is very different from the 'physical access' of passing through a door or entering a shop/home – though the reuse of the same word can lead to semantic confusion.

Instead, I'm communicating. The website is sending me information, in response to simple requests. It can withhold whatever it wants.

If the website hasn't even so much as enforced a click-through assent-agreement – much less a password-login! – what 'property' is being 'accessed' conditional on some narrow 'defined purpose', when I'm simply viewing things that area also open to everyone else without preconditions?

If it's intellectual property, unlicensed reproduction or use of that is a very different cause-of-action than criminal 'unauthorized computer access'/trespass. (I believe that issue also came up in the case, and individual posters rather than Craigslist were found to be relevant rightsholders in individual listings.)


Correct, and that doesn't even consider that "That ethical principle applies just as much to websites as coffee shops" analogy is flawed because in this case the product (content) is provided by the visitors, entirely different from any typical retail store for which this legislation was originally drafted.

This difference definitely justifies amendments or entirely new legislation.


The web server is not sentient, so you are not "communicating" with it. It is a piece of property that you are using by sending it requests.

And consumptive/rivalrous use is a red herring. Property is about exclusive use of and control of access to something. Whether a competing use is rival or not is irrelevant. Cutting across a yard or walking into a store is still trespass, even though it's only consumptive/rivalrous in a de minimis way.


'Communication', especially in the field of internet systems, does not require two sentient endpoints. (My computers are constantly communicating with other computers while I sleep!)

Consumptive/rivalrous use is relevant to the issue of what words, analogies, and legal regime should apply. You've been emphasizing terms and metaphors which make people think of situations where incremental use exacerbates scarcity. (Even your de minimis examples are activities that in the tangible world, can't be repeated endlessly without inconveniencing others.)

Much of the online world is different. Applying 'trespass' metaphors implies negative-sum wear-and-tear when in fact the activity can be marginally costless... or even socially net-beneficial, when spreading information improves decisions or competition.

A website operator that plugs their server into the net, and publicizes its address and services to the anonymous public, has consented to receiving a rather large set of messages from anyone. They retain full control over what that server – their 'property' – does in reaction, and what information it sends others, via software of their own choice and design.

Given that reality – which is very different from that of physical and real property – an interesting question is how much use-via-communication should be considered implicitly authorized, by custom and common sense, and what it takes to selectively revoke that blanket authorization... especially authorization that's still available to the anonymous public.

Permission-revocation via bright-line technical-access-controls would be a clear and fair system. People and even machines would be able to tell what's allowed, and litigation would be minimized.

Revocation by fuzzy 'terms-of-use' or other out-of-band communications seems ripe for confusion and abuse. That's especially true if any commercial dispute over reuse of true information can automatically be trumped up into a more serious 'hacking' ("access exceeding authorization") federal offense.


The point of the de minimis examples is to tease out the animating purpose of the law. If preventing rivalrous use was the animating purpose, it simply would not make sense to give a trespass right of action in situations where the trespass was de minimis. Yes those de minimis examples can become rival if repeated endlessly, but they're not, so why would we base the law on an extreme hypothetical?

The animating principle of property law is that it gives a stronger right than just the right to exclude rival uses: it gives the right to exclude, period. That doesn't have to be the principle, but that's a question that's entirely orthogonal to meat space versus cyber space. There is nothing about cyberspace that obsoletes the idea that private property gives a blanket right to exclude.

As for revoking license, in this case 3taps had specific knowledge that it's implied license was revoked.


But of the point of your language has been to lock in bad analogies.

There's some begging-the-question in your argument:

"We should recognize this behavior as wrong, because it's analogous to traditional trespass."

Well, there are crucial ways where it's different.

"Because we've already decided trespass is the model we're applying, those aspects are irrelevant, and must be resolved to work the same as in traditional trespass."

No.

Online, there's no physical presence, movement, or consumption. All value requires probing communication. A "blanket right to exclude" has no clear meaning until reinterpreted for the new realities. The word 'cyberspace' itself is poetic, not literal, and using 'space'/'place' as drumbeat metaphors, without adjustments, will mislead us, and will not result in the fair, efficient results we want from law.

Sending a server a communicative message, especially a message the operator has invited and enabled via technical measures, is nothing like 'occupying his property' (at least not until it rises to some destructive/consumptive level).

The operator's technical ability to 'exclude' – but really, ignore – is nearly absolute, far beyond an owner's powers in the physical world. So the standards of notice/care/implied-license, before alleging a criminal communication and involving the courts and state in an enforcement action, should be much higher.

And maybe, when an operator is broadcasting informational goods to all anonymous correspondents ("the public"), the respective rights should be understood differently, totally outside a 'property' frame.

Maybe correspondents should always retain the right to elect to "be anonymous" and thus enjoy whatever conversation is freely given to the anonymous.

Maybe arbitrary conditions on communication, expressed by an 'owner' and asserting limits not just on communication with his server, but other people and servers at other times as well, should not be legally-enforceable by alleging 'criminal trespass' against individual correspondents. (Maybe the owner should have to offer consideration, and earn contractual assent, before asserting such control over others' communications.)


(I think my other post about trespass being a red herring in this case is by far the more important point, but figured I would follow up on trespassing as well. This reply is a bit of a digression from the article, but I think still germane to the over-all conversation.)

> the scope of which may even be implied

Scraping the web and analyzing a website's content is the basis for the primary feature of arguably the world's most successful website. Why should Google expect to be able to use this content in one way, but other users not expect to be able to use it in a different way?

Where is the implied bright line?

Ultimately, a post-Craigslist-victory world introduces a situation where anyone can sue anyone else they feel is both threatening and also vulnerable. That would have serious and negative consequences for the internet as a whole.

> That ethical principle applies just as much to websites as coffee shops.

It does not.

We already distinguish between physical and cyber trespassing, even in a lot of ways that benefit website operators.

The CFAA exists specifically because trespassing law doesn't immediately extend in an obvious fashion to websites.

If the legal extension has to be explicit, it's worth asking whether an implicit ethical extension makes sense. And physical and intellectual property are different enough that this conversation is non-trivial.

> Just because Craigslist makes its website available to the public for a defined purpose does not mean it's not trespass for a company to access that data for a different purpose.

There are substantive differences worth considering.

Most importantly, websites can and do regularly and maliciously change their terms without public notice, as in this case. This practice isn't common, and probably wouldn't be accepted, in the case of physical private property (that's open to the public).

Suppose the coffee shop introduced a "no other coffee shop owners allowed" policy, posted along with 20 pages of other policies outside their store, and then filed suit a few days after posting the amended 20 pages on its front door (with no notice of change).

The other coffee shop owner might have been technically trespassing, and hell, a judge might even concede that point. But regardless, judges aren't (supposed to be) banal computer programs applying law without context or human judgement. You can bet that a typical local judge would be pretty eye-rolly when this hypothetical coffee shop case finally made it across his/her desk...

Interesting side-note: this example really demonstrates that Craigslist doesn't want us to think about unauthorized use and trespassing in the same way; I assume they don't want to evoke eye rolls with this case.


You can read the brief for their full argument:

https://3taps.com/images/pics/430_Amicus%20Brief.pdf

Their analysis may allow for something like what you propose – a purely contractual cause-of-action for damages. But the EFF et al are objecting to criminal liability based on CFAA/CPC §502 (computer fraud statutes).

I don't believe your shopkeeper analogy quite applies, though. Let's say the shopkeeper opens their shop, but puts a sign over the door to the effect "by entering, you promise to keep my prices secret". If someone enters, leaves, then later spills the pricing beans, it's not clear to me that traditional, non-computer law would allow the shopkeeper to retroactively characterize the visit as 'trespass'.

That's a major problem the EFF brief identifies with the Craigslist interpretation: it lets vague or arbitrary private conditions be creatively recast as criminal violations with more significant penalties. As a matter of fair law and the public's interest in clarity and the free-flow of true information, they argue that such application of the CFAA (and its California equivalent CPC §502) to be dangerously incorrect.


Maybe that's true the first time, but 3taps business model wouldn't work with just one visit to craigslist. If you keep coming into the shop, the shopkeeper definitely has the right to revoke your implied license and sue you for trespass the next time you step on the premises.


But in the real-world, the shopkeeper would have to ask you to leave. And online, they'd have to use some actual access-control barrier.

The EFF's argument allows that in such a (alternative) case, real 'unauthorized access' could have occurred. They write:

Of course, Craigslist has the right to restrict access to its data through, for example, requiring a username and login, which would password protect access ot its other users' advertisements. If defendants bypassed that security measure by trying to break through this barrier by systematically attempting passwords or "hacking" their way in through some other method, then their access to Craigslist would necessarily have been "unauthorized". ...

But once Craigslist chose not to password protect its data – a decision that would undercut Craigslist's successful business model – it necessarily authorized the public to view the information on the public website.


> But in the real-world, the shopkeeper would have to ask you to leave.

Craigslist sent them notice and IP blocked them.


Does serving someone with a cease and desist count as asking them to leave?


That seems to be a holding that Craigslist won from the court, which helped assure the settlement.

That's a better standard than the originally-argued "violates terms of use". But it's still problematic given how draconian CFAA penalties can be, and given that the 'unauthorized access' in question penalizes access to information that's freely available to any other anonymous member of the public.

Maybe the EFF will be able to use the $1MM that fell their way to further circumscribe such CFAA application.


What do you mean, "how draconian CFAA penalties can be"? Which penalties are you referring to? The civil cause of action defined by CFAA expressly limits the kinds of damages plaintiffs can pursue, unlike other torts.

This is why it's so upsetting to see these people use this manipulative language. Criminal CFAA --- or, more accurately, federal sentencing law --- can legitimately be criticized for being draconian. But criminal CFAA has almost nothing but a few definitions in common with civil CFAA.

Causing damages under the civil cause of action defined by CFAA does not allow the government to fine you. That's not how this works.


No, a Craigslist judgement on CFAA grounds wouldn't trigger a federal fine. But any activity that wins damages on the civil side could also (at federal prosecutor discretion) be prosecuted criminally.

A civil precedent on what counts as "exceed[ing] authorized access" under the CFAA also affects who might be subject to the full range of federal criminal penalties, in future actions. So Craigslist's interpretation winning, compared to EFF's, ultimately means more people threatened with 1-10 years in federal prison.


> ...it's not clear to me that traditional, non-computer law would allow the shopkeeper to retroactively characterize the visit as 'trespass'.

Seems to me that something like this is more like an NDA. Assuming someone saw and understood the sign, what are the legal ramifications? And how does this compare to establishments that require membership to use such as Costco?


> Craigslist is like a shop open to shoppers.

Craigslist is a lot more like a free classifieds paper. The ads are received, typed up, and printed out by some office somewhere, and the resulting compilation is distributed in unattended and unlocked newspaper stands on the sidewalk.


"... purpose in granting access."

"... for improper purpose, ..."

But in the 4/29/13 Order the Court says it would follow Nosal and that purpose is not enough to sustain "unauthorized access".

Instead it says Defendants' failure to cease and desist after receiving notification from the Plaintiff is the reason why it is not dismissing the CFAA claim.

If Defendants had simply "scraped" from a third party who was "authorized" to access Plaitiff's website (Google?), then perhaps the outcome here might have been different?


> The fact that the premises is an Internet website changes nothing.

Well, the principle stays the same, but apparently Congress thought it changed enough that they had to write a new law to cover things.


>Craigslist is like a shop open to shoppers.

Is it?


Was the respondent on craigslist's premises? IIRC their office is San Francisco.

If thevshop is open to the public its not actually trespass until the shopkeeper asks one to leave.

One fine day I discovered a gross electrical code violation at the Hacker Dojo in Mountain View. I called the fire marshall then tripped all the circuit breakers then unplugged every electrical cord in the place.

One of the members called the police. The cop called the owner who told the cop that I was no longer permitted in the building.

"What exactly does it mean not be be in the building? Where cannI be without violating the law?"

"Just outside the door."

I assert that the plaintiff in this case is just like someone standing outside the shop, on a public sidewalk, looking in the window.


Why? Why is the reference "fair to make these points to a mass audience"?

I'm not saying Craigslist should have won the suit on the legal merits. I'm ambivalent about that.

But surely, every time one company sues another company and their case isn't completely bulletproof, surely it can't be reasonable for us to say "that's just like what killed Aaron Swartz".


Because it is "the same statute", as they accurately say.

And it's even a similar kind of interpretive abuse: stretching its definitions to rack up steeper potential penalties against a lesser-resourced entity.

So no, not every lawsuit. But yes, if an expansive interpretation of the CFAA is deployed against someone, I think it's OK for them to remind the audience it's that law.

(Your exaggerated paraphrase here, "that's just like what killed Aaron Swartz", is far more "manipulative and grotesque" than the actual wording of their small aside.)


Once again, please explain what these "steep penalties" you're referring to are, specifically and in detail. The CFAA is a very short act, and it's not hard to read. Moreover, the civil cause of action in CFAA is a tiny part of the overall statute. I don't think it supports the argument you're trying to make. In fact, I think it contradicts it.


Anything which wins damages under a civil action could also (at federal discretion) be charged criminally: the CFAA elements which allow civil recovery (section (g)) are a proper subset of those that allow criminal prosecution.

As the EFF amicus brief notes: "…although this is a civil dispute, the CFAA is also a criminal statute, and permitting Craigslist's computer hacking claims to go forward would also mean creating criminal liability."

Craigslist's specific allegations againt 3Taps would seem to qualify the responsible parties for imprisonment up to 5 years, on a first offense, by CFAA (c)(2)(B), if a prosecutor felt like making an example of them.


I'm not sure I know how to respond to this, because it doesn't make sense to me.

I don't think what 3taps did under any interpretation of the facts should be considered criminal.

Having said that:

If something's criminal, it's criminal whether or not a private entity sues you for it! Craigslist suing 3taps no more enables the DOJ than would Craigslist simply writing an angry blog post alleging the same facts.


It's not the Craigslist allegation that makes something criminal; it's the Craigslist interpretation, if accepted by the court.

Have you read the linked brief? That's its concern. It's clear and written by experts who agree with your opinion, as expressed here, that nothing 3Taps did should be considered criminal.

However, if that's your opinion, your defense of the use of the CFAA, upthread, is odd. CFAA (g) is clear: the elements allowing civil damages/relief are a strict subset of those that impose criminal liability. You'll only get damages/relief if you've proven that something that's criminally-prosecutable under the CFAA has taken place.

Of course, the standards-of-proof are lower in a civil case, and it remains unlikely most civil judgements would change a federal prosecutor's priorities. But your casual assurance upthread that CFAA civil actions and CFAA criminal prosecutions are "worlds apart" was misleading.


"... or even just violating your terms of use, CFAA is merely the statute that enables that."

Are you sure? The Court in the 4/29/13 Order says violating terms of use would not be enough to sustain a CFAA claim. See page 6.

It is interesting how the Plaintiff changed the TOU after the "unauthorized access" and how the copyright claims were dismissed early.

The Defendents made a mistake by ignoring the C&D letter - that opened up the potential for CFAA liability. But I'm not sure they made a mistake in believing they could copy and serve the same classifieds. It appears they could if they obtained them through a third party.


We really need a non-profit organization that provides a data store with an api for common things like classified listings, sms messages, pictures, likes, etc.

That can help us move away from this sort of chicken and egg problem with user generated data. These companies are basically hogging it because they were able to build the user base.

If we can get the data in a non-profit store with a licensing scheme that basically says you must as a part of using this data add any user-generated data submitted to your website back to this store so other developers can build products on top of it, we could really innovate in classifieds and social networks.

Perhaps something like that can be funded by EFF or related organization... because then we can potentially apply governance to that user generated data which has not been possible with private companies.

The chicken and egg problem can be solved if big non-profit tech and civil rights brands like the ACLU, EFF, Wikipedia, etc. all get behind this and market it.


> These companies are basically hogging it because they were able to build the user base

I am all for liberating data and letting startups drink out of the firehose but I have some cognitive dissonance from reading this news.

I know that OLX spends tens of millions of dollars in India and nearby regions to solve the marketplace problem: get a critical mass of buyers and sellers to achieve escape velocity and enjoy growth through network effects[1]. So it's not just that these companies "happened" to build these user bases, they spent money and took early gambles.

This could very well spell the beginning of the end for much of Craigslist's real estate listings (followed by other categories inevitably) unless they have some grand plan to overhaul their UI/UX entirely. Kijiji, OLX, Gumtree are also vulnerable. Maybe even Twitter since it has a habit of shutting down startups built around its feeds.

What should one do if they are at the helm of CL or one of these other companies?

    [1]: https://en.wikipedia.org/wiki/Network_effect


> non-profit

Yeah, no. Unless this is done as an institution like Telegram is (it's made by the VK guy, and he's not charging for any part of it) or it's paid for by tax dollars (and then it would only work for some inhabitants of our planet), nothing will happen. Also, that licensing scheme idea is nifty, but it creates a chicken-and-egg problem. I don't think a single company wants to open its silo because of the advantage that it gets, and because they pay money to establish themselves socially (ads and whatnot), so allowing crappier competitors space on your platform makes you suck.


meh. not really religious about it being non-profit. and even then it can be non-profit somewhere other than the US. US is not the be all end all. the world is a big place.


For it to work, it really needs to be user-owned and controlled, otherwise you're just the next company-of-the-week.


no it doesn't. the user delegates their rights to the non-profit which has a democratic governance model. in fact this argument of it being user owned and controlled has totally been proven unworkable. most users don't give a shit, hence it's irrelevant what they think or do. they rather delegate their rights to someone else and let them manage that.


All of this over apartment listings? These aren't cancer cures we're talking about, these are for profit listings to sell a product.


Owned in the sense that it exists for the benefit of the user.

Controlled in the sense that the user has influence over the platform.


Wikipedia?


This sort of data is not really notable enough for wikipedia.


Sorry, I didn't mean it should go in Wikipedia. I was holding Wikipedia up as an example of a non-profit that could do important data management at scale.


Wikipedia is sort of weird (good but unique weird). Anyone can edit it and upload data but hardly anyone does so. It also has really strict policies on e.g. defamation without which a commercial service would quickly become a mess.


Huh? Defamation isn't a problem when craigslist is hosting stuff like classified ads. Why would this suddenly be worse for a non-profit?


I guess that craigslist follows local laws that require e.g. human interaction in certain cases.


I was thinking about some kind of user-owned Database service to ensure the data is forever free. Imagine a discussion board run by any company where you authenticate with your external Bring-Your-Own-Data credentials. Through some SQL-like interface, that site is able to create tables and add data to the database but associating ownership of each piece of data with your identity.

At any time you can revoke your permission for their access to this data, or share it with others. Any modification to that data is version controlled so a hostile site cannot just modify/delete the data you created on it -- you'd still be able to gain access to any old version.

From a developer POV I think the key would be a SQL-like interface with appropriate caching/conflict resolution so you essentially just connect to a locally running proxy. Perhaps the advantage for the developer is some kind of tiered storage (e.g. your 100 GB database of posts is mostly stored in this external database with a 5% hot data cache local).

From a user's POV, you know noone can take your data and hoard it. Transformation of the data from one service to another similar seems like it would be easier compared to hoping someone writes a good API for export/import. E.g. consider if you could write:

      INSERT INTO feedly.reader (SELECT feedname, feed_url FROM google.reader);
to migrate YOUR data from Google Reader to Feedly -- not just in a 3-month sunset period while Reader shuts down, but forever and ever.

If you want to be paranoid, the database could be federated, so rather than it being central, multiple providers can complete for your data.

All this could certainly make compliance with the European data protection act easier.


What you're describing is very similar to Ethereum, a platform for decentralized applications. Its blockchain is a public database for application data that all participants in a system need to agree on. Other data is either messaging to discussing parties, static data on a decentralized store, or private data on your own machine.

Users of software built with this architecture enjoy unlimited uptime: the software is run by the user and a decentralized computer that only stops running when no one wants to use it anymore.


At least in the US, I think satisfying the 'purposes' requirement for non-profit status would be difficult. Just providing a free service isn't enough. Here's the IRS brief description:

"The exempt purposes set forth in section 501(c)(3) are charitable, religious, educational, scientific, literary, testing for public safety, fostering national or international amateur sports competition, and preventing cruelty to children or animals.

source: http://www.irs.gov/Charities-&-Non-Profits/Charitable-Organi...

UPDATE: I confused non-profit and charitable organizations. Disregard.


You seem to confusing non-profit status with charitable (501c3) status; a charity is a specific subtype of tax-exempt nonprofit (most notably different from most other nonprofits in that, in addition to the organization being tax-exempt, contributions to the organization are tax-deductible for the donors.)


You can file as a not-for-profit social purposes organization (to incorporate at the state level) without electing to take on IRS non-profit status -- you just need to pay taxes still.


Welcome to my new religion... All user content welcome! Seriously though, it does not have to be non-profit. It can be for-profit. Just as long as the cause is served.


I wonder about the viability of a 501(c)(6) trade association providing data storage service to dues-paying member companies.


sez it all: 'because they were able to build the user base....'


there is more than one way to skin a cat, just because they were able to do it... say like encyclopedia Britannica, doesn't mean wikipedia can't come along and say f-u.


Did Wikipedia upload all the entries of Encyclopedia Britannica? I wasn't aware.


https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Encyclop...

Yes. The 1911 Britannica was used to populate many wikipedia articles.


Public domain (if it isn’t obvious from the year), which is a far cry from content that has been just produced or aggregated.


who's stopping someone from doing that then? im with you.


Put it on the blockchain! </half sarcasm>


Everytime I've ever just yelled "BLOCKCHAIN" as a suggested solution has been exactly half sarcastic. It's comical how often that comes up (in startup circles especially) but also so often has a somewhat plausible application.


"Thoughts on BLOCKCHAIN as an antipattern considered harmful" ;)


Just imagine what it will be like when we have a blockchain engineered for affordably storing megabytes of data. Fun times ahead.


Excellent update on one of the hard cases EFF has been fighting.

There's a link to an interesting law review article on how the CFAA can make it a criminal act for arbitrarily banned users to even browse to a public webpage: http://digitalcommons.law.umaryland.edu/cgi/viewcontent.cgi?...

It's an absurd result and frustratingly unaddressed by the courts.


> and will make its API source code, the settlement agreement, and other legal filings and public policy resources available.

This is interesging to me. A couple years ago being young and naive i received a cease and desist order from craigslist legal team demanding i remove my craigslist scraper from github. It was largely a toy project to play around with an html parser library i wanted to learn anx thought it could be useful. Of course I now understand it was against their tos and from an ethical standpoint, avoid scraping anything unless getting permission, but at the time I was terrified I'd be sued for a ton of money. It felt incredibly aggressive to go after me , a student at the time.

So I'm curious.. is it illegal to scrape but ok to release the source code? Where is the line drawn?


That's why I would be tempted to create a craigslist competitor with really free access to the data.


Network effects: it's difficult to dislodge an entrenched competitor once they've accumulated a large enough dependent userbase.

However, a number of other services are chipping away at Craigslist, developing their own userbases in niches that Craigslist used to occupy. (e.g. Tinder)


I don't understand.

>> The Court has ruled that users—not craigslist—own the copyrights in their postings.

>> ... Craigslist finally conceded in Court that no such harm or impairment ever occurred.

>> Craigslist completely rewrote its Terms of Use, removing many of the most abusive clauses.

Everything above seems to be against Craigslist. Then why does 3taps have to agree to a settlement to pay Craigslist $1 million?

And if there are other parts of the court ruling that went against 3taps which this blog post doesn't mention, then how can Craigslist be forced to forward that money to EFF?


Wikipedia's summary of the case doesn't make it sound as positive for 3taps, especially as the status of the case (their motion to dismiss the case was denied) is not what they wanted: https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc.#...

Two significant problems for them: the Court sided with Craigslist's view that 3taps knew its authorization to access the website was revoked when it received Craigslist's cease-and-desist letter, so scraping past that date might constitute unauthorized access; and Craigslist's change to its ToS on July 16, 2012 to claim copyright on posts was valid, so reuse of their material after that date could constitute a copyright violation subject to statutory damages.


It's not clear why they are shutting down if "the Court has ruled that users—not craigslist—own the copyrights in their postings."

Maybe "3taps lacks the resources to continue the fight" implies that the lawsuit has drained their bank accounts and they are out of money.


It does say that 3taps has to pay $1m to craigslist that might have killed their onhand cash.



Wow that's quite a spin on the fact that they lost the lawsuit and had to fork over $1 million.


So 3taps has to pay craigslist $1M, and craigslist then has to pay that to the EFF. That seems pretty odd.


3taps may not have been able to keep fighting it, but they were able to keep Craigslist from profiting by driving them under. I wouldn't be surprised if there was some element of "agree to this or we spend it all making you pay your lawyers then fold the company with all assets completely exhausted - we'll even sell the name and spend that against you."

I believe the term is pyrrhic victory.

And there may also be a (future) kneecapping element to it with the release of their scraping source code.


Right, and CL's lawyers would still be OK with the outcome because now they have established precedent for a million-dollar settlement of a lawsuit for scraping data. If anybody is dumb enough to use that open source code to provide a service that uses scraped CL data, CL's lawyers will be able to open with a settlement offer well in excess of a million dollars.


And Craig Newmark tweets that he is donating $1M to the EFF and makes it sound like it's coming from his funds[1].

[1] https://imgur.com/lKE1Ak4

[2] https://www.techdirt.com/articles/20150701/14150431519/no-cr...


Seems like both sides violated, perhaps the judge had to rule against both but didn't really want to award either side.

EFF wins though so that's nice.


It's interesting to ponder whether/how EFF could have ethically negotiated such a settlement on 3taps's behalf.


Not sure if I understand this correctly. Does this mean that if Instagram or Twitter terms does not allow scraping of their user's generated content, any developer can just go ahead and do it because the copyright holder of the post is the user?

Since the site does not hold the copyright (and rightly so), the site owner does not have the rights to say what can be done with the data. That belongs to the users that generates it.

In that case, how do we know if all the individual users consent to the scraping? If you scrape 10,000 data, and one user complains, would you be in trouble? And does the user has the right to know who are accessing the data outside of the normal use (since if they don't know, they can't object)?


"3taps replied that it did not access craigslist and instead obtained the data from Google" What does this mean? how do they get that data from Google?


Probably scraped Google's cache pages, so they would never touch Craigslist servers.


Company A made a chocolate fountain for the "public" to see. People enjoyed it. Company B thought this is a great opportunity to make cakes out of it. Because the fountain is "public" they made this as the source of their cake business. Company A complained to take down the fountain and....

Well you know the rest of the story. :)


If only the chocolate fountain was infinitely copy-able :)


interesting outcome. Do users own the copyright to their pictures and postings on Facebook? twitter?

Can I build a Facebook scrapper and redistribute it to other sites?


You can read the various terms of service and user agreements to find out.


Users almost always retain copyright to content they post on Facebook, twitter, etc


Would a "cannot use for commercial purposes" clause have nipped this one in the bud? I'm still on the fence with this one. I think Craigslist could have played it a lot better but I find it hard to believe no-one here can empathise with the founder.


3taps built a data exchange that aggregated user-generated data housed on various websites and then made that data available through an API to developers, including PadMapper and Lovely.

Craigslist discovered that it had become (has become) the "MLS" of rentals... and perhaps even more accurately -- it's a brokerage of _housing_ data -- both rentals and sales. So when property management companies (PMCs) discovered how darn easy it was, for example, to flood craigslist with multiple ads for the same unit, or to flood it with units that were never available to begin and thus alter market perception -- certain people got exactly what they wanted: hyperinflation in rents, or the subsequent upward pressure on housing prices, or both.

As recently as 2010, craigslist welcomed innovative uses of the publicly available data ... Over the next two years, as innovators like PadMapper and AirBnB began to thrive, craigslist reversed course, and punished the innovators it previously welcomed to use the data. In February 2012, craigslist rewrote its Terms of Use, abandoning its long-articulated position that users own their own content which was freely available on the “public” part of craigslist's website.

As outraged as everybody was about this, it is exactly what the real MLS does when you decide to sell your house. You sign a contract promising to pay some Realtor's brokerage company 6 percent of whatever your house goes for -- in that contract you are essentially giving them the "copyright" of your house listing; they own it on the MLS and that is why you have to pay them the big bucks. Never mind that they do basically NOTHING other than simple photography and data entry to post on the MLS... but now they require you give them ~$66K of your equity for their 3 hours of work. (Source: http://www.mercurynews.com/business/ci_28512250/report-silic... Median price of "entry level" home in San Mateo County = $1.1M).

Same thing is happening in rentals / property management co's (PMCs), but slightly different symptoms.

Nobody is attacking the problem the right way, though. 42Floors tried the experiment and found it to be a failure, too. (Source: https://news.ycombinator.com/item?id=9881213)

The market should be putting more pressure on brokers to compete with each other ... damn that 6 percent. (Right, but the NAR signed a non-compete agreement with itself so it gets to do that)

Hackers should stop building tools that make it easier and cheaper for the PMCs and real estate agents to steal everybody's equity.


> You sign a contract promising to pay some Realtor's brokerage company 6 percent of whatever your house goes for

For comparison: NL is roughly at 1,85 (negotiable).


People were 'outraged'? Really? The average person couldn't have cared less. There's nothing stopping anyone from building a new CL, marketing it and then getting people to post on it. The fact that there's a chicken egg problem is irrelevant; that's the same problem faced by every social startup, yet the successful ones manage to overcome it.


Do you think the sentence has impact on other scraping scenarios, say scraping for travel data for instance


why they took on CL instead of, say, FB? Or they think it is better to start with an easy/smaller guy and ramp it up to the bigger fish?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: