You can't create your own link previewer, cloudflare will put a captcha in front of every website. All I want is a a freaking <title> tag. They don't seem eager to fix it either, their proposed solution is to contact every website owner (seriously) to ask them to whitelist you[1].
Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.
I’ve long said Cloudflare is a dangerous threat to the open internet and as well as some privacy tools like TOR.
But it doesn’t always get much traction on here because both the founder and employees of cloudflare are quite popular users on HN. Some have given me brief half assed counter answers that conveniently miss other harder questions like a good PR person does (and which you seem to have gotten in your reply).
I hope every web admin gives it serious second thought before adopting Cloudflare. Just like for cellphones OS/operator the one thing I’d dream of is a tool that offers a limited set of what Cloudflare does (DDOS protection, hosting privacy layer) but is pro internet and pro privacy. They seem hostile to it in many ways likely because it directly affects their bottom line.
The bigger question is whether such a tool could be created without all the downsides. The two I listed I think yes. But their web app security system is overly strict and bad for the internet IMO.
And I say that knowing they protect some serious defenders of human rights and face a lot of abuse from the ‘bad guys’. I just wished there was a better middle ground.
> But it doesn’t always get much traction on here because both the founder and employees of cloudflare are quite popular users on HN.
I don't think it gets much traction because you're barking up the wrong tree. Also, suggesting that YC is out to silence you and that nobody actually has a counter argument isn't very good for traction, either.
Until my website can't get taken off by a $5 rental of an internet-of-shit botnet, Cloudflare gives me and my users recourse against the bad actors of the world. (I also enjoy its host cloaking for my privacy)
You simply gloss over bad actors and attack one of the only solutions that works. The biggest threat to the open internet was its naive "there are no bad actors" design, not the people giving us one of the only bulwarks against bad design.
I agree with your last sentence that it would be nice to have a better middle ground, but notice that's not the "cloudflare bad" thesis of your comment.
The internet needs to be improved so that Cloudflare is redundant. It's not Cloudflare's fault that fundamental design oversights (like optional ISP egress filtering) have created a lucrative niche. And things like faster, unlimited data plans accessible by smart toasters and smart doorbells on top of the internet's naive architecture only entrenches Cloudflare further.
I hosted a server that was attacked all the time over a comcast connection and was always able to figure it out without cloudflare proxy blocking for me
Cloudflare even puts multiple captcha challenges for any request from the default browser on the Samsung S7 Edge. Granted it's an old phone at this point, and most users install Chrome on their phones, but I end up skipping a lot of websites on my phone rather than participate in furthering the misconception that "Chrome is the only browser".
because both the founder and employees of cloudflare are quite popular users on HN.
It seems a lot more likely people aren't finding your argument as convincing as you'd like. Plenty of well-known users (and users who identify their employer) around whose companies' HN-perception fortunes change quite a bit over time.
Can you elaborate on how Tor is a threat to the open internet? That's a non-obvious statement to me. I'm aware that it's compromisable via controlling exit nodes (NSA, various nations) but that's not really the threat profile for the average person. Are there any other reasons?
Because despite its flaws, afaik TOR is an attempt to make the internet _more_ open to those who are being surveiled.
Any company through which a high percentage of web traffic is not only routed through but fully reverse-proxied of course always should be a significant concern and should be subject to extreme scrutiny. But why explicitly do you think they're anti-internet and anti-privacy? To me it seems like being pro-internet and pro-privacy aligns both with their general incentives and their monetary incentives.
I genuinely think they're a net positive for and supporter of Tor users. Before, site owners and security providers who faced issues with abusive/malicious traffic behind Tor connections (spam, illicit content, security scanning, password struffing) nearly always resorted to outright blocking all Tor exit node IPs, because they had no other feasible option. I've been in that position. Cloudflare at least provides any site owner an ability to easily allow the traffic; just with a fairly quick occasional bot check.
Additionally, as of 2018 they now have an "Onion Routing" option which site owners can enable, which results in Tor users being able to access your site 100% through the Tor network. As a result, Tor users no longer experience any captchas, load your site faster, and never have to touch the clearnet.
>But their web app security system is overly strict and bad for the internet IMO.
Their WAF seems to have a pretty low false positive rate, compared to others I've seen. (Though the flipside of that is it also has a pretty high false negative rate and isn't very helpful against a dedicated non-automated attacker, like many other WAFs.)
>But it doesn’t always get much traction on here because both the founder and employees of cloudflare are quite popular users on HN.
They do post a lot here, but I doubt that's really responsible for defensive responses from other HN users. The most common criticism I see here (presenting a captcha for people using Tor, which site owners can now disable) makes me think the majority of people making the criticism have never run large websites or worked infosec for any organization with a large website.
Tor is of course not a threat itself, but anecdotally I'd estimate 90 - 95% of traffic that the average website owner receives from Tor is highly abusive/malicious, and Cloudflare empirically estimated 94% as of 2016 (https://blog.cloudflare.com/the-trouble-with-tor/). And anecdotally, not only is a high percentage of Tor traffic malicious, in many cases a significant percentage of all malicious traffic is Tor traffic. Naturally, due to Tor by design making it impossible to distinguish the ~94% connections from the ~6%, it's extremely difficult to mitigate this without just blocking 100% of Tor traffic. This is obviously not Tor or anyone's fault; it's just a practical reality for website owners. This sort of situation will always be the case for any kind of robust privacy-protecting application.
Cloudflare is possibly the first free service that actually enables anyone to easily allow normal traffic from Tor without much increase in security/abuse risk. They seem explicitly pro-Tor, especially with the explicit Onion Routing feature that lets Tor users access your site 100% through the Tor network without ever experiencing captchas, and statements like in https://blog.cloudflare.com/the-trouble-with-tor/ and https://blog.cloudflare.com/cloudflare-onion-service/
One may certainly have lots of other justified, legitimate concerns regarding the company and their disproportionate control of a huge chunk of the internet and web, but I'm not sure how someone could read those, see how the traffic is handled in practice, and conclude they're anti-Tor or a dangerous threat to Tor.
Because if you don't have it some a-hole will go and ddos your site or you want to prevent a hug-of-death because of reasons.
It seems a lot of issues happen because bad players are continued to allowed to thrive, example: everybody uses a big provider because they're the only ones that solved the spam issue.
The problem is that bad actors can masquerade as a lot of independent clients (The first D in DDoS stands for "distributed").
Figuring out whether a site is under a DDoS attack or getting legitimate requests from many sources is a very hard problem, and can just be worded "telling good actors from bad actors" -- no simple solution works; also, who YOU consider a good actor and who the website owner considers a good actor may be at odds.
Most people (and CloudFlare by default) consider FAcebook a good actor; but as far as I'm concerned, Facebook is an evil an actor as one can be.
We're talking about virtually unknown blogs that get 1 http request from my server's IP, which is not blacklisted anywhere. It's not hard at all , i just think cloudflare's tech s not that good
You're really pulling a "how hard could it really be??" to DDoS prevention?
You should at least be humbled by how few services can even offer DDoS protection that works against volumetric attacks and isn't just based on null-routing. The people with skin and money in the game might know something you don't.
Through a proxy - mind you; CloudFlare makes their decision without access to your CPU or DB metrics, and don't know which page load times are legitimately slow and which aren't supposed to be.
If hardly anyone reads or DDoSes them, why did they go to the trouble of setting up CloudFlare? It’s free for those obscure blogs, but it’s definitely a non trivial hassle. Usually people set it up only after they experienced their first attack.
I get it that you are upset Google gets to scrape them and you don’t. But bad actors really are making it difficult for everyone to just “be” on the internet.
I got round it by just making sure the user agent is set to the latest version of Chrome rather than a version from a few years ago that I had hardcoded before. It seems Cloudflares protection is pretty much "is your user agent in the top 10 user agents?".
Well if you have an easy solution that you think would work, why don't you put up a website, commission a DDOS attack from a skilled actor and try to demonstrate mitigation?
Companies pay big money to CloudFlare. If a simpler and cheaper solution is workable, they'll pay you instead.
Just like telling if it's raining is easy but stopping rain once has started is hard, the claim is that it's not hard to detect if a site is being ddosed.
It is not at all easy to tell the difference between a DDoS and the slashdot effect (or HN hug of death, depending on your age). At least not without a man in the loop.
Zoho isn't Google-size, but it isn't irrelevant, either. Sending mail from a self-hosted email server is far harder since the big providers might put it in spam or drop it even earlier.
> running your own mail server is the only way to ensure your email is not read by someone else
But any mail you send to someone else probably ends up read by Google/Microsoft anyway, since that's where their mailbox is.
Also, email security is a joke. It's 2020, and even TLS encrypted SMTP connections tend not to check for a valid certificate, making them trivial to MITM.
Practically speaking how does one MITM an SMTP connection? For example, from Google to Microsoft. They connect directly to the IP addresses they get from MX records + lookup. What's the actual threat vector/execution here?
Long term, a new HTTP META method would be interesting. I wonder if something like that has ever been considered. Providers like Cloudflare would hopefully be more lenient with these requests.
Huh. It's certainly an interesting idea! Strictly speaking, individual people could implement this today, since nonstandard HTTP verbs don't break anything that doesn't know to request with them. (It wouldn't be of much use, because clients wouldn't know to use it, but still -- something that could easily be prototyped).
I don't think FAAANG (or any other big players) would have much interest in making it happen in the standard though, since it would undercut their big-player advantage.
Maybe, but not really; seems like this thread is more about intent (“I just want a preview”) while content type is more about representation (“I want the content as json”). I can imagine that there will be websites that are actively using the accept parameter to distinguish between “regular visitors” and have their APIs at the same paths (didn’t Reddit do this at some point?), and thus your approach would break in this case.
I guess what this is really about is, I hate to say it, but something in the direction of the semantic web, where web servers (and in this case, CloudFlare et al) actually gain a deeper understanding of the content they serve, and a web browser / crawler being able to query this content directly.
It seems to me that what "previews" really want is an API for the page's content in a structured format: OpenGraph tags and other microformats are one representation, but it's annoying to have to load _all_ the HTML just to grab title and the OG tags.
Doesn't the oembed spec [1] already solve this? I think the OP could solve their problem by simply creating an oembed endpoint with all the necessary meta data.
The cloudflare and google catcha are terrible. It's so bad that at this point I just close the tab if they challenge me with it. I use Brave and always have Shields UP, it seems having it up makes the captchas extremely difficult. Mission accomplished I guess.
At https://host.io we scrape every registered domain once a month, and make the meta data available freely over an API. You could use that to get a title for a domain (although not for a URL that's not the main domain), eg:
See https://host.io/docs for more details about the API and what else you can do with it (eg. find backlinks to domains, domains with the same adsense ID etc)
> Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.
Yup, and exposing just a key pieces of information (title, and some of the meta/og tags) without the body would limit the potential for abuse, while still being fairly useful for legitimate uses.
There hardly are any "illegitimate" uses. The web is meant to be machine-readable (we wouldn't have Google or anything nearly as convenient in the first place if it wasn't). Whatever have been published is public and should not come with artificial limitations on how do you read and process it. Blocking crawling should be outlawed as it clearly is a monopolistic practice. E.g. I want to build my own crawler to index and categorize the web subset I choose for me. I believe this is a perfectly legitimate use. But they will probably try to stop me.
Turn it around at least for a few minutes. Does a website operator have to handle whatever arbitrary traffic you want to throw at them from your crawler?
They’re the ones choosing to use tech that’s blocking you. Proposing to make it illegal for them to make that choice or to speak to you differently than they speak to other users of their site may give you some idea of the resistance you’re likely to face to this proposal.
I don't get what value link previews add. Someone shares a link with me (on skype, slack, teams... whatever) and I care about the content because the person sharing it with me thinks I could/should care about it, or someone shares a link on an aggregator and then I don't think it is too much to ask for that someone to write a summary. If the link is worth sharing writing 1 sentence to explain why isn't too much to ask.
What is the value a link preview adds? And why should I, as a content provider care about the value you add? Cloudflare does something for me, what is your service doing for me and why should I whitelist you (or care about you)?
Imagine Twitter or Facebook without link preview, it's much harder to use and overall reduces the change I'll click on a link. Do you think only Twitter and Facebook should be allowed publish previews?
Half the time the link preview picks the wrong picture and sometimes even the quote. Twitter and Facebook would both be improved by disabling it. Hell, it might even stop people from thinking they need a hero image for their 2 paragraph medium shitpost.
I'd place that blame towards website owners. Both Facebook and Twitter are pretty open where they read that info from, and an owner can pretty easily pass those fields (it's just some <meta> tags in the <head> element).
So, I should have to include twitter specific meta tags even though I personally don't care about twitter? Maybe twitter should make it clear which tags they read? Maybe it's SEO bullshit I don't care about? Maybe even even the OG: tags don't work all the time and result in dumb previews?
If you don't want to fill them out, don't... Filling them out lets you customize your link preview on twitter. If you don't care about Twitter, why would this affect you at all?
That's exactly what I'm saying. Either I care about what that person thinks might interest me or I don't. The link preview abstract is shit anyway. Does the site title and the 2 sentence abstract really sway you? If someone wants to send traffic my way, writing an interesting abstract is not too much to ask.
>it's much harder to use and overall reduces the change I'll click on a link
Maybe you should re-evaluate who you follow on twitter? I frankly could care less about facebook.
>Do you think only Twitter and Facebook should be allowed publish previews?
I think previews are worthless regardless, I thought I made that clear. Either you care about me linking it to you or you do not.
*EDIT: And just for fun, here is the link preview stuff from my latest skype call with my brother: https://imgur.com/a/yO5OP36
>when you paste a link on reddit and it autocompletes the title
Oh no, you have to copy/paste the title?
>update a bookmark title, or check if it exists.
I can access the site without a captcha, my browser can fetch the title.
>is it not self-evident that a link being crawlable is useful?
No, it is not. Maybe a site owner does not want crawlers to index the site?
Me being able to access the title and any html meta tags is not the same as some crawler being able to access it. It seems like your beef is with cloudflare and that is fine but please state that that is your issue and don't try to frame it as something else. What I don't get is how everybody places the blame at cloudflares feet. It is my choice as a host to use cloudflare and to use their protection features.
'I' can get the page title though. That's all. The End.
I don't care about your crawler. Or your ability to post the link to my site to twitter/fb and if I did maybe I'd revise my cloudflare settings.
Frankly, i wish facebook or cloudflare offered their previewer as a free service, since most websites have them whitelisted.
1. https://community.cloudflare.com/t/attention-required-messag...