Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GitLeaks – Search engine for exposed secrets on GitHub (gitleaks.com)
108 points by mkagenius on Feb 17, 2017 | hide | past | favorite | 59 comments


I guess that's one way to get attention to your business.

Instead of informing the owners of repositories by creating an issue, you create a search engine to expose them, and then ask to be paid for usage of this index? The only reason someone would want those secrets is to abuse them. This is basically the only use case for the data. Why do this?

This is coming from "fallible.co" whose homepage says "Prevented 40 million+ users personal data leaks". So you are in the business of making sure people's information does not get leaked, and at the same time expose people's secrets?


I strongly disagree with this reasoning.

Yes, more people will burned by making these data more easily available to the public - but as a result of those people being burned, security for the community as a whole will be improved over time.

An example of this is what happened with Facebook. Prior to 2013, most users logged in to Facebook without using HTTPS. A Firefox-based tool was released that sniffed for Facebook traffic over WiFi and snagged the other users' cookie to allow for easy session hijacking (Firesheep). Shortly after Firesheep started getting press coverage Facebook enabled HTTPS-by-default[1].

I think it's perfectly valid to argue whether or not the short-term harm caused by this sort of thing is justified by the longer-term benefit, but I don't think it's quite fair to say that the only reason to offer it is to enable abuse.

[1]: https://www.facebook.com/notes/facebook-engineering/secure-b...


Why not just write a quick script to send them all an automated message/ticket a week before this is live and then run it on archived data? You can do both at the same time.


There are non-abusive uses of this kind of data, e.g. security researchers, or IT departments outsourcing credential leak scanning, etc..

Also, notifying via a GitHub issue is, in my opinion, a terrible idea. GitHub has no concept of a security issue viewable only to the repo maintainers, so filing a public issue might make things worse (by calling public attention to it). A paid search engine without any notification is probably worse, but maybe they are emailing the repo's committers? They may even be embargoing the search results for a period of time.


If posting individual issues in each project, likely to be seen first by contributors is a bad idea, how is creating a paid search engine likely to be used by people who specifically want to find secrets and not likely to be used/seen by contributors a good thing?


I didn't say it was a good thing (or a bad thing), only that there are legitimate use cases and that the suggested notification method would be, in my opinion, terrible security practice...


Their FAQ indicates they update the index every two months, but no information on what they do with the data in the meantime.

Ideally this scanner would be a feature of a Github or a Bitbucket or a Gitlab, etc, itself. They could've decided to contact them to add this as a feature, or decided to contact repository owners, but instead they decided to sell the data publicly. Real shame.


I do remember reading something about GitHub preventing some keys from being pushed (AWS secrets etc?) But it's a vague memory from a long time ago!


Lol wrong way of marketing own business. Purely nonsense :/ Everyone knows there are secrets on Github.


Open source alternatives for Git repos (ideally run in the pipeline):

https://github.com/dxa4481/truffleHog - "Searches through git repositories for high entropy strings, digging deep into commit history"

https://github.com/ezekg/git-hound - "Hound is a Git plugin that helps prevent sensitive data from being committed into a repository by sniffing potential commits against PCRE regular expressions"

https://github.com/michenriksen/gitrob - "The tool will iterate over all public organization and member repositories and match filenames against a range of patterns for files that typically contain sensitive or dangerous information"

https://github.com/awslabs/git-secrets - "Prevents you from committing passwords and other sensitive information to a git repository"


A lot of those require lists of regexes-- is there a canonical list of secret regexes somewhere?


Thanks for the shout out! I was wondering what brought in the recent stargazers. Happy to share my commonly-used regexes.


Woah, too many negative comments here. We wanted to model it like Shodan where we would provide a searchable interface for secrets on the web, starting with GitHub.

We are removing the search functionality and account upgrades right now until we can come up with a better solution to inform people about secret leaks. For now, you can simply use the existing Check my GitHub button to scan your public repos.


The data is public, there is absolutely nothing wrong with this and you should put it back online.


I absolutely agree. Most of the negative comments on here are nothing more than apologists for incompetence. Like with all security research the solution is to expose the defect to the world. If people didn't want their passwords and keys exposed to the world then they shouldn't put them on public github for all the world to see.

It would be more helpful though if such a search engine could auto create an issue on github when exposed secrets come up in a search result.


Most of the data on there is not meant to be public. It's just a tool to abuse people's ignorance, disguised as a "research tool".


Yet public it is.


Folks who actually exploit GitHub secrets have scrapers hooked onto GH API (so that if you notice you just pushed a password, quickly reverting it won't help you). IMHO you should re-enable the search functionality as it will ultimately make the developer community better at what it does.


Why not use this info to assign a "leak" score to repos that have such info? Don't give anyone the details via a search interface but do rank the various public repos by the number of such leaks. That way the repo owners get a fair warning and a reputation hit without exposing the details of what is being leaked.


HN can be an echo chamber. Keep it online. The world is bigger then HN.


I don't have a problem with it. The only reason you are getting negative comments is that there will be a few HN members biting their nails.


Of course, that's because you did something morally wrong.

This is different than what people label as 'echo chamber'. In there you'd have either 100% love or hate.

Having mixed responses verging toward hate, says you screwed up and the general public doesn't approve of it.


This is part of the disclosure debate that's been going on in the security industry for decades now. Some people take an aggressive full-disclosure stance and believe every flaw should be publicized immediately and some take a non-disclosure stance and say flaws should never be published until they've become entirely irrelevant.

Most have come to the middle and settled on a "responsible disclosure" paradigm, where researchers notify the maintainers and work with them to set a reasonable timeline for the correction of the issue. The issue is publicly disclosed somewhere between 30-90 days after the private disclosure to maintainers; this gives them time to correct the issue and push out updates, and it also incentivizes them to fix the issue instead of sitting on it forever and allowing it to be exploited as a zero-day.

It would've been good to see this paradigm applied here; the search could've sent a message to the repository owner with a note that the result would become public in 60 days, and to ensure all keys had been rotated and that secrets were no longer stored in git after that point.

In any case, none of these people are operating from a morally dubious perspective. I would suggest you refrain from impugning their motives. Virtually everyone in the security community has the end goal of promoting secure software. Aggressive full disclosure advocates believe that their methods will work most effectively not only at getting issues that exist fixed ASAP, but also at ensuring companies adopt strong and safe practices moving forward, since there won't be second chances.


They didn't do anything morally wrong. The data is already public and easily searchable with a few regex tricks. They just made it a tiny bit more convenient but anyone that's thought of this as a source of credentials can easily scrape it themselves.

If anything what they're doing might help shine a light on how big of an issue this actually is and provide a helpful corpus of data to train algorithms on to detect this better.

The issue at this point is far too big to be able to go around and notify everyone about this. There's also plenty of repositories that are abandoned or maintainers that are MIA so you'll never be able to properly resolve all of it.


Morality depends on current life views, upbringing, societal norms etc. You may disagree and that is fully in your right.

However, account for the fact that not all HN readers are from US. In other countries what they did is in some case against the law (promoting/enabling criminal behaviour and activities/etc).


I'm not from the US so that's already accounted for.


This is one of those times where you ask yourself "I know I can do this, but should I?". Most of us know we can search GitHub for stuff like AWS_ACCESS_KEY_ID but putting the work into creating a productized interface for it seems a bit beyond the pale to me.


Why not use your knowledge of these exposed secrets for good? You know which repo they're coming from, it'd be super simple to let the owner know rather than potentially costing them time and money.

It also seems as though the only use of this site is to capitalise on other people's mistakes? It looks like you're just handing over leaked data to people who will definitely abuse it, which seems to go against your core business of preventing data leaks?


Would it be considered spamming to pull the email address of the commits and send them an automated email?


Well, you don't need to send an email necessarily, a GitHub issue with a guide on how to include sensitive data in a public repo would probably suffice.


I suspect it would be easy to secure GitHub's cooperation in this, but almost certainly not for money or via a black box.


Would it be against the GitHub terms of service to do such an email?


I agree with what you wrote. My two cents are: seems that getting attention goes against being nice to others, such a shame living in such a society.


Yet if people don't know the risk exists, they'll continue being ignorant and fucking up. Awareness is a good thing.


This isn't awareness though. This is like telling a specific set of people about all the houses near by that have their front door key under the mat. You only become aware of the issue when it is too late.


I don't think it's illegal or wrong to have that search, those mistakes are made by developers who aren't paying attention to security, and from experience those leaks will never be resolved UNTIL they get widely exposed, until then, lots of those people will just shrug it off.. you're actually doing a favour to the users who depend on those developers, you never know when the next leak will be and it might be stopped by forcing the developers to fix it. it's not about you, your service, nor the companies, developers who are leaking secrets like that, it's about the end users and people who are affected, my two cents, put the search back, expose it, it's already exposed and probably black hats already have the secrets and don't want to notify the devs of their mistakes. anyone else who disagrees with you doesn't really understand how big this is, your service is amazing and I totally appreciate the work you did, if someone thinks you're "getting attention" or "evil" they really are not looking at the big picture, the "evil" ones are the people who already have that leaked data and keep it for their personal use.


'tis better to be pwned and found out than to never realize that you've been pwned at all.

- Shakespeare or something


What a shame that you had to expose everyone's mistakes like this in such a blanket fashion.

You could've taken the moral high ground and created a reverse-search such as HaveIBeenPwned[0], whereby you check repos you own.

I hope this gets taken down because the potential for abuse is ripe.

[0] - https://haveibeenpwned.com


that's such a great idea, although I think people that would sign up for this service would know not commit credentials


We all make mistakes despite better knowledge. I'd probably sign up.


sure, but it's more likely for somebody that doesn't know about this service to publish a secret than someone that is aware of it


Well, the service could auto-create an issues for each repository that contains secrets.


well, that's kinda the same, facilitating finding the secrets since it's easy to search for issues, feel like a private notification is way better


Bad guys already have scrapers like this, for years, so this is really not putting anyone at any additional risk. They're already in danger, just not aware of it. Even script-kiddies have cheap tools available to scan repos easily. As I see it the only new angle here is that this service lets ordinary users and other interested parties search for the f* ups and (if they care enough) let the project maintainers know about it.


I think this is great work, the secrets are already scraped and compromised anyway. Good way to make it more clear. It reminds me of https://twitter.com/dumpmon on Twitter.


Didn't take long from the proggit/HN 'removed password' post to gitleaks:

$ whois gitleaks.com | grep Creation

Creation Date: 06-feb-2017


GitHub searches that expose secrets have been posted numerous times already in the past.


Ethics of a business model that notifies owners of the breach, but they have to pay $10 for specific details or wait a week?


I accidentally pushed keys to github last year and got an email from HelpfulOwl letting me know they found them and to remove them.

That's an example of using this tech for good.


Sorry OP, but this is pretty terrible idea. I know the secrets are already out there, but the least that could be done is let the user know about it.

I am glad to see the search was taken down. There's nothing wrong with the search, but a better use of it would be to educate and inform. I'd be curious to see which kinds of developers are the most likely to leak sensitive data.


I think a more ethical way to go forward with this would be the haveibeenpwned way, where you can search your email and see where your stuff has been leaked instead of a searchable index of leaks.


Problem is that often project owners will not know/care about joining such services. Unlike the passwords, project's security is a matter of interest for a much wider public (all current and the potential future users), but if you let anyone subscribe to any leak than you are back at square one, cause bad guys can do it too.


Are there any legal ramifications for operating something like this?

I know it's publicly available info but since the original creator of the information didn't directly give it to you, do you still have the usual immunity given to service providers?

Also, just because something is on $PUBLIC_URL doesn't mean the copyright would allow you redistribute it. I'm sure a lot of these projects have either a private license, or more likely, no license at all.


Is this different from any other search engine? They just index web pages and let users search the data?


I'm not sure but I think intent matters. That's how they go after torrent sites right because they're "just search engines"?


An interesting aside whilst the search is down, your styling seems a tiny bit messed up for me on the homepage [0]

I'm running Chrome, Win 7 on a mildly large display, nothing particularly out of the ordinary.

[0] https://puu.sh/u7htR/3fe6b4725d.png


Weird. Just yesterday this made it to the HN front page (related): https://news.ycombinator.com/item?id=13650818


I support disclosure on time :)


Huh, that was quick!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: