In my experience, its because the tools that are given to the dev team are opaque, poorly documented, made for a general case that rarely is sufficient, and usually locked down with some role based access control to the point where, even if I can figure out whats wrong with some pipeline, I rarely have the access to change something. I have an AWS Codepipeline that I can see running, but the logs from each step are sent to a Cloudwatch instance running in another AWS account, which only the devops folks can view. I have a Jenkins pipeline that was cookie cutter from what the devops team offers. If I wanted to actually grok whats occurring during the pipeline, I'd have to jump between 3 or 4 github repos, and also account for the versions/branches on each one.
It really feels worse than back in my websphere days, where in order to see app logs, we had to submit a ticket to the hosting teams and wait 2 days for a .log file to be sent back through email. I'm consistently left wondering if we've actually gained anything.
Yeah, one of the reasons I left a previous team was the company's attempt at embracing this. The standards were set down from a central platform team, the access control was held by department specific operations team, but developers were now "responsible" for the production deployments but granted neither the agency or access to do shit about it. So being paged about an issue with a load balancer setup that is understand by 2 guys in the platform team and configured by a different guy and having him have to send you screenshots of the config while digging in the source of the platform team's tool because of course there's no docs.
I think the above is what happens when a devops tools team doesn't do discovery and user interviews.
We've been at a point of software and development where we can build pretty much anything. But people are still making the same mistakes in figuring out what we should build.
Fix the cause, not the symptom.
Edit: And because I know it's coming... if IT security (or insert other org here) is the reason things can't be better, then it becomes the devops tools team's job to cajole them into doing it anyway, on behalf of their users.
I understand where you're coming from, but I don't think you really need "discovery and user interviews" to tell you that app developers need to be able to see their deployment logs, or that having your deployment code spread out across 3-4 repos with frequent changes makes it harder to troubleshoot. I think a lot of people might read this comment as blameshifting, where obvious flaws in the engineering and reliability of the system are blamed instead on a failure to define the specifications properly ahead of time.
Here's the reason, 9 out of 10 times, this situation persists -- no one realizes it's happening.
No one realizes some dev teams can't see their own deployment logs. No one realizes teams don't have the correct Jenkins access. No one realizes there's a mysterious gatekeeper to use the corporate time series database as a service. No one realizes all new containers are transferred to a few people who have the rights to publish them.
And when I say "no one realizes", I mean two things. One, the people who do know, because they're the ones trying to do it, have given up bitching about it, because they did for years and nothing happened. Two, the people who can actually change things don't know, either because they actively ignored / forgot or because the requests never reached them (usually because a middle transport manager didn't understand what was actually being asked).
And so... the status quo prevails, and everyone has a vague sense that something isn't working, but doesn't know what to do about it.
And to that problem, the best solution I've found is to get the fixer and the problem user as close as possible, wherein the problem is usually revealed as a relatively simple change or feature oversight. And for that, discovery and user interviews are the most reliable method.
> And to that problem, the best solution I've found is to get the fixer and the problem user as close as possible
You mean put development and operations together, into some kind of dev..ops.. :)
I'm sorry for the snark, it was not meant as disagreement with what you said. The whole idea of the role was to try to remedy that exact issue, but old habits die hard and sometimes things just change names but stay the same. I shouldn't throw stones, I'm struggling at our shop to steer outside of that behavior as well.
I actually think that it's an issue of company structure. Once you concentrate ALL deployment/operational power to a single DevOps team, you really lose the power to touch individual teams and make human budgets clearly. This is more of a "cost-center" move that implicitly signals that company X needs a single platform and individual requirements are not going to be prioritized.
The same thing with a single DBA or whatever admin team. My dream team structure is a task-force like one. Each team has its own DevOps/DBA. They don't have to be dedicated roles, but someone in the team needs to be well-trained on those topics.
But I guess this model never scales in larger corporations so eventually they all adopt the One X-admin team model.
> My dream team structure is a task-force like one. Each team has its own DevOps/DBA. They don't have to be dedicated roles, but someone in the team needs to be well-trained on those topics.
Looking back at the last twenty years, oscillating between central sysops and everyone doing their own thing (is that "devops" ?), I feel that an optimum requires both: local skills to iterate fast, central skills to consolidate. Striking a budgetary balance is difficult, but at least ensure that some local skills remain - sell it to central ops as a way to get the trivial demands off their backs !
IMHO, the "centralized team" + "decentralized liasons" model isn't explicitly used nearly as often as it should be.
It usually happens in practice ("Sarah knows someone in security"), but there's no reason "devops liason" can't be an explicit 2nd+ hat for someone on a team to wear.
Centralized teams are necessary for budgetary reasons: one person on every team asking for a devops product will never make a coherent case to management for why it's needed, but a team will.
But having been on both sides of the equation, the central-decentralized relationship usually breaks down because there's always a new face on the right side of that equation, and that requires reteaching / recoordinating all the basics. More consistency on the user side (a hat) helps significantly.
And ultimately, it's a trust problem: do I (central person) trust this person who's asking me for something? If they're new to me... probably not very much. If I've been working with them for awhile... probably a lot more.
Agreed. And I don't really see a good solution for this. I guess we have to recognize that when companies grow and especially when structures are moved around, the original "agile" efficiency is not going to be there so we need to slow down in whole.
I'd look at UX books. They've got more formal approaches.
Personally? Make a list of all the teams that use your product. Schedule interviews (30 minutes to an hour for initial) with a few that use it the most, and a few that use it less. Ask them what the best and worst parts are. Ask them details about the things they bring up, and prompt about related areas they might be struggling with too.
If you get any repeats from team to team, assume those are systemic issues and try and find a solution.
The hard part usually isn't figuring out the nuances, but rather realizing that someone wants to do something at all and/or can't because of some restriction. Assume you know nothing about how they actually do what they do and listen closely.
To add on to ethbr0's good response, a few other things to do:
1. Find early adopters of the thing you're building who are willing to use and give feedback (bugs, friction logs) in the early stages of the feature.
2. Build relationships with your users / VIP users and meet regularly to discuss friction points and upcoming features (feasibility, etc). To make it time efficient, make sure you're talking to staff engineers who can speak on behalf of a significant portion of your user base. Make sure your team is closing the feedback loop with these engineers (i.e. shipping things that address their concerns) in a timely fashion.
Sending logs to a dedicated AWS account is usually done for compliance. That account will have rules set via AWS Organization policies that limit IAM user rights and protect S3 buckets from being tampered with. I don't see why it should be necessary to ban read-only access to these logs for troubleshooting.
Anyone here know a good legal, compliance or best practice reason for blocking all access, or did the infra/sec team just get carried away here?
IMO the biggest issue other engineers tend to have with security is that it needlessly restricts them to the point where their job sucks. It's just as important to avoid "too much restriction" as "too little restriction".
> Anyone here know a good legal, compliance or best practice reason for blocking all access
The philosophies include:
1. Applying a maxim like "Principle of Least Privilege", or "That which is not allowed, is forbidden". Maybe through choice, maybe due to standards like PCI-DSS and SOC2.
2. Blocking access isn't a problem, because we're going to grant access requests quickly. Just as soon as we fill these five vacancies on the team....
3. Surely bugs making it to production will be a very rare event? Aren't these developers testing properly?
4. Don't we have test environments for troubleshooting? Isn't the heart of devops that you have a test pipeline that you have confidence in? If your developers think they're likely to deploy buggy software to production, maybe you aren't ready for devops...
5. Any large organisation will, at some point in its history, have had some chump run their test code against the production system and send every customer an e-mail saying "Hello World". If that happened when you had 30 developers, I doubt hiring standards are any higher now you have 300 developers! Stopping that happening again is common sense professionalism.
6. Would you want Google's 100,000 engineers to have read-only access to your gmail account? Of course you wouldn't, you've got loads of private shit in there. And if you were CEO, would you want an intern on a 2 week placement to be able to download the entire customer database? Of course you wouldn't. People having access - even read-only access - to the production system is a bad thing.
7. Locking down a lot of things gives the security team more political power, as every request to reduce restrictions or expedite a request is a favour owed.
Yea I completely understand the complexities that emerge when you're doing this stuff for a ~1500 person organization. And I know that overwhelmingly the people building these systems are talented and competent and have a stake in things and take pride in what they're doing. My comment was really more in response to the tone of the parent comment which insinuates that these helpless devs just can't figure out what the fuck is going on.
Coming from a team that builds and supports an end user suite of applications. I'm not asking you to bend over backwards like I would have to do for a client. But at least don't enter in to conversations with that attitude. You are in a service organization. You deliver value insofar as you can help the next link in the chain.
Yeah that's also part of our job, to expose the right info to developers so they don't have full access to everything but can still do their job. In my case that involves shipping logs to Grafana so they can view them there. In AWS it wouldn't be hard to create a policy for IAM users so the devs can view logs in the AWS console.
You're right, it would not be hard. But I don't have the keys to do that. I would have to talk to someone on the devops team and I would hope they wouldn't write me off as some dumb developer who only knows how to code, whatever that means.
Agreed. I approach shared services like a customer because that is all the power I have. It is supposed to work; that is all I know or have time to deal with. Off the the slack support channel...if you are real lucky.
It really feels worse than back in my websphere days, where in order to see app logs, we had to submit a ticket to the hosting teams and wait 2 days for a .log file to be sent back through email. I'm consistently left wondering if we've actually gained anything.