urandom Mangot ideas

tech.mangot.com

An Agile SRE Meeting Plan

sre-schedule

Engineers dislike meetings. What engineers really dislike are meetings for which they perceive no value. Below is described a meeting plan developed, iterated upon, and used over many years at multiple companies that has proven very effective to both maximize meeting value and minimize unnecessary time in meetings, so that engineers may do what many enjoy most, building things. This may not be the perfect plan for your organization, but will hopefully inspire conversation and discussion about how to structure the time of your SRE team.

The Structure

Over the years, I’ve experienced many different Agile implementations. Scrum is considered to be a pretty poor match for interrupt driven teams like Site Reliability Engineers (SREs), but how to get the agile benefits of Kanban and still retain many of the advantages of Scrum? How to have a schedule that is relatively light on meetings, but still keep the maximum amount of communication and transparency? How to continue to be agile, instead of just doing agile, especially with distributed teams?

In the plan outlined herein, we try to balance many of those things. We lay them out as Monday to Friday, but they could certain be Tuesday to Tuesday, or whatever fits best to line up with the development team’s sprints. (Protip: line up as best you can with the development team sprints) When we first embarked on this path, we were on two week iterations, to match what the dev teams were doing. Over time, we discovered that we lacked the responsiveness we wanted to provide to those teams, and thus switched over to one week iterations, in order to maximize (internal) customer satisfaction.

The iteration starts off a bit meeting heavy to emphasize alignment, and then allows for plenty of time for our standard SRE work (elimination of toil, etc.), and finally closes the iteration with time to reflect and improve.

The Meetings

Iteration Planning

Purpose

The iteration planning meeting is, well, just what it sounds like, planning the iteration. Because SRE teams can be interrupt driven and use Kanban, iteration planning is not for committing to delivering specific work in a time box (like Scrum). Instead, it’s for making sure that the entire team is on the same page in terms of priorities, needs of the business, project work assignment (e.g. Anne really wants to work on the VPC network project), dependencies on other teams, urgency of different tasks, and looking forward toward the next few weeks and what work may be taken on.

This is really a time for discussion, and for identifying things that will require more in depth discussion. It is not a time for going deep on any particular task, but to make sure that everyone on the team is aligned for the next batch of work. Oftentimes, work from the previous iteration will simply continue in the current one, but this is also a good time to check in that the work is being delivered to expectations especially as a result of the demos (which we’ll get to later), that closed out the previous iteration.

Because we want to be agile, instead of simply doing Agile, this doesn’t mean that once work is agreed to that it’s set in stone for a week. It just means that we don’t want to oscillate wildly from day to day, or even week to week, and the iteration planning meeting is the opportunity to ensure the team is moving in the same direction simultaneously. Because we are “walking the wall” while negotiating tasks, this is also a great time to recognize blocked work and any of the “time thieves” Dominca Degrandis (@dominicad) describes in her book Making Work Visible.

At the end of this meeting, the team leader should have worked out with the team a balance of work being requested by other parts of the business and work proposed by the team itself. Some iterations, other team’s requests will occupy more time, some iterations the needs of the team will take priority. A successful manager will navigate the balance between the two ensuring that the needs of the business are being met while simultaneously allowing the team to reduce toil, perform chaos engineering experiments, collaborate with other teams, etc.

Mechanics

The iteration planning meeting should begin with an already prioritized Kanban board. The team can negotiate changes to the priorities during the meeting, but this is not a time for debating what it is that the business values. Depending on the size of the team, and the amount of discussion needed of specific tasks, this meeting should take no more than one hour. Past this point, engineers will lose focus and interest and “just want to start getting things done”.

  1. Holdover discussions from previous iteration
  2. Explanation of the top priorities from the business
  3. Explanation of the top priorities from the team
  4. Identification of merged prioritization
  5. Identification of resources working or interested in tasks
  6. Assignment of work if no resource volunteering for necessary work item
  7. Parking lot

Daily Standup

Purpose

The daily standup has the same purpose as it does in Scrum. To keep the team in close alignment with respect to deliverables and identifying any items that require assistance from other team members or management in order to keep the team operating at its highest velocity.

This plan only has standup during the middle of the iteration because the beginning and end already have time for team discussion with the iteration planning and iteration review meetings.

The daily standup should be restricted to the work at hand and not devolve into in-depth discussions on specific tasks which extend the meeting and hold the rest of the team hostage for the duration of the discussion.

Mechanics

  1. What did I do yesterday, what am I working on currently, identification of blockers. (for each team member)
  2. Parking lot

If working with distributed teams, one might want to allow the standup to extend out to a full 30 minutes to allow the team to socialize with one another and therefore build some of the bonds you would otherwise get by colocation. In that case, the standup should conclude as soon as parking lot is complete.

Inter-team Sync

Purpose

If good intra-team communication can be difficult to do well, then good inter-team communication can be even more difficult. Couple this with dependencies between teams, and one can easily see the need to set aside an agenda specifically for this purpose.

The Inter-team Sync is to ensure close coordination and transparency between the SRE team and their primary customer. We don’t want to fill each iteration with sync meetings between the SRE team and any customer they may have as that number may be very large and frequent context switching is a major impediment to delivery of work. But, the team that works most closely with the SRE team should have a short meeting to discuss work in progress, dependencies, upcoming projects, etc. In this way, we attempt to ensure that both teams are working at the maximum safe velocity and minimize misunderstandings and the conflicting priorities and unknown dependencies time thieves (see Degrandis above).

The Inter-team Sync is how DevOps is done!

Mechanics

The Inter-team sync meeting should be no longer than 30 minutes. Any discussions of deep architectural questions should be put on the agenda for the Architecture Meeting. There should be an agenda (we like Kanban boards for this purpose) created and maintained by the team leads or managers, that is widely available, which anyone can contribute to, that tracks all the work items shared between the two teams, especially dependencies.

The person who runs the meeting simply “walks the wall” until there are no more items to sync upon and then ends the meeting.

Architecture “Arch” Meeting

Purpose

If the iteration planning meeting and the daily standup are not the place for in-depth discussion, then the weekly arch meeting is exactly the place for such discussion. This is the forum for any deep technical discussions on the SRE team. This is also a forum where members from other teams can either be invited or be regular attendees to give guidance, ask questions, provide clarification, etc. of work with which the SRE team is tasked. In other words, DevOps!

The outcomes or inputs to the arch meeting are often technical specifications, diagrams, documentation, requirements documents, and experiments. This can be a time for senior staff to give feedback on proposals to other, both senior and junior, members of the team. This can be a time to solicit opinions from the group on a new or existing technology or to review past postmortems. This can be a time for helping to figure out how to navigate toward a long term goal. The opportunities are very wide open (by design), but the goal should be that by the end of each arch meeting, the entire team should have taken a step forward toward achieving the goals of the team and of the business.

The number of times I’ve heard the phrase “let’s table this and add it to the agenda for the arch meeting” over the years, are far too numerous to be counted. This is another opportunity for the team to ensure they are highly aligned as they move into the meat of the iteration.

Mechanics

The agenda for the architecture meeting tends to build itself over the course of the previous iterations. We always use a simple kanban board or Google Doc for keeping track of proposed topics. The person running the meeting can cover each topic in turn or it can be run Lean Coffee style or if someone has an especially important topic for discussion, that can be moved to the beginning of the meeting or the end (to allow more time for open ended discussion). It is really up to the attendees to determine which best suits the style of the teams involved.

Demos/Retros

Purpose

Students of Gene Kim’s Three Ways of DevOps know that the 2nd way is all about feedback. In order for this to be successful, we need to set aside time in our week (in the form of a meeting) to specifically enable that feedback to occur. The Demo/Retro meeting has two purposes:

  1. Have the team demo the work they accomplished (not necessarily completed) during the iteration.
  2. Have a retrospective to discuss how to improve the team in a psychologically safe environment (see re:Work)

The demo allows the team to get fast feedback on work they have completed or is already in progress. There is a saying in Agile, “maximize the work not done”, which reminds us to spend our time on work that is critical to our success. If someone is delivering a project that does not meet our needs, we’d like to give them that feedback before they finish the project so they can adjust course, not after all that work is complete. The bar for a demo is extremely low; unit tests, working demos, command line tools, a single API call are all acceptable demos. The point isn’t to dazzle, the point is to demonstrate working code.

The retrospective (retro) gives the team the space to improve in the kaizen fashion. We follow the traditional retrospective format (what went well, what did not go well, what could be better) with some modifications. The goal is for each team to be higher performing at the end of every year than they were at its start. By setting aside a safe space for the team to talk about how the iteration went for them, and how the team can improve, we are creating an environment that fosters and encourages that improvement.

Mechanics

The demo part of the meeting should be open to all. Any stakeholder that wishes to participate should be able to attend. Borrowing from a technique I developed with Greg Oehman at Salesforce, we always record the demos and post them somewhere afterwards (wiki, Google Drive, etc.) so that anyone who was not able to attend will be able to see the demos. This is critical if you have a globally distributed organization where time zones make attendance for all a challenge. However, the feedback from those folks can be invaluable to making sure we deliver the right work on time. Again, we’re trying to create an environment that maximizes transparency.

In the retrospective part of the meeting, only the team members should participate (in Agile terms, only the pigs). There should be no executives or project managers attending this part of the meeting. It is strictly for those who need a psychologically safe space to have open and honest conversation in order to move the team forward, or discuss problems without any fear of retribution or interference. Team members fill out a shared document or perhaps their own document with their thoughts about the iteration (what went well, what did not go well, what could be better) . Then each member in turn has an opportunity to read their contribution and explain in greater detail so that they know that they have been heard. During this section of the meeting, clarifying questions can be asked, Arch meeting agenda items can be added, etc.

When holding the demo/retro at the end of a week, we like to have the team spend the rest of the day working on documentation, testing in staging, development, etc. Basically anything that does not touch production before a weekend.

Conclusion

Finding a cadence upon which to work as an engineer can be difficult. As engineers are generally averse to meetings, oftentimes we wind up with sporadic meetings and a lot of people who are unclear on their priorities and goals. On the other side, we can find ourselves in environments that are extremely meeting heavy, and engineers often left wondering when there will be time to actually do the work they believed they were hired to do. The establishment of only necessary meetings, at specifically defined times, allows engineers to plan their time to minimize context switching, and and to maximize the time invested in their meetings with one another.

This plan is certainly not a one-size-fits-all solution, but is deliberately broad and flexible to allow modification to fit into your organization, while being prescriptive enough about the purpose of each interaction to allow for different implementations that accomplish the same goals, namely: transparency, collaboration, agility, and effectiveness.

I hope you are able to use it to advance the capabilities and success of your SRE teams.

Thanks

The thinking demonstrated in this plan has evolved and will continue to evolve over the years. There is no way it would have been possible without specific input from many trusted friends, and co-workers. I’m in debt to Evan Wiley (@absoludicrous), Peter Haggery, Peter Norton, the SFDC ISD team, Jeff Frasca, SWI Cloud SREs, Jathan McCullum, Mauricette Forzano, Eric Rapin, Stuart McCulla, Kelly Courier, and John Irwin.

Why I’m Leaving Tech for Healthcare

I’m leaving tech. Not leaving technology, I just want to leave technology for technology’s sake. I’ve spent a lot of my career working for “tech” companies and helping to advance the state of tech, or DevOps, or Operations in the industry. I’m looking for an engineering leadership role in healthcare.

Background

My first job out of college was as a research assistant/programmer for the National Institutes of Health in an Alzheimer’s disease research lab. Every day, no matter how bad it was, no matter whether my C code wouldn’t compile, or if the network was down, or how much trouble I had finding “normal controls”, I had done something to help people. We talk a lot in our industry about how we are “changing the world”, which is an admirable goal, if that’s indeed what we’re doing. Often however, it’s just changing the state of technology which may or may not have a positive effect on the world. I want to go back to doing something that has a tangible benefit every day, not a theoretical one.

Things I’ve Learned

I’ve spent a few months now just trying to get my own research done and in order. I’ve read a lot, I’ve talked to a lot of people at a lot of different companies. What I’ve learned in a short time breaks down into two categories.

  1. Established companies. These are companies that have been in the healthcare space for a while. As one person I talked to described it, there are large parts of some of these companies that are 10 years or more behind in technology. They may be running some K8s, but they are also just as likely to be running Cobol.

  2. Moonshot startups. These companies have the intention of changing something massive about the industry. Just like any startup, some have more traction than others but all see large opportunities in front of them. These tend to be smaller and have a lot less legacy artifacts to contend with.

Based on my background, I have a small bias towards the established players.

Why You Should Hire Me

I’ve made a career out of advancing the state of operations and software delivery at companies large and small. If every company is a software company, then at least I would hope those skills would be broadly applicable.

I’ve led DevOps transformations for companies as big as Salesforce and have worked on improving dysfunctional culture at small startups.

One of the CTOs I’ve worked with and I were talking about what I was doing at his company and why. My answer was that they were allowing me to build an engineering organization in a way that I would want to work as an engineer. An organization that values transparency, diversity, work/life balance, Agility, empowerment, and talent. I feel the results spoke for themselves. We were able to sign an engineer who had interest from 18 companies and had narrowed it down to 3. Out of the 3 he chose us because we had the culture described above that was exactly what he’d been looking for. Even months after he started, he still talked about how he’d made the right decision.

If that’s the kind of culture you’d want at your company. If you want to be able to attract top talent, and have an organization that reaps the benefits described in the 2018 State of DevOps Report, and you’re in the healthcare space, then we should definitely have a conversation.

I’ve talked to some companies that were worried they had too many challenges for an engineering leader like me to want to work there. That is neither an exclusion nor acceptance criteria for me. I’m looking forward to the discussions.

Thank you.

Showing a Gigabit OpenBSD Firewall Some Monitoring Love

I have a pretty long history of running my home servers or firewalls on “exotic” hardware. At first, it was Sun Microsystem hardware, then it moved to the excellent Soekris line, with some cool single board computers thrown in the mix. Recently I’ve been running OpenBSD Octeon on the Ubiquiti Edge Router Lite, an amazing little piece of kit at an amazing price point.

Upgrade Time!

This setup has served me for some time and I’ve been extremely happy with it. But, in the #firstworldproblems category, I recently upgraded the household to the amazing Gigabit fibre offering from Sonic. A great problem to have, but also too much of a problem for the little Edge Router Lite (ERL).

The way the OpenBSD PF firewall works, it’s only able to process packets on a single core. Not a problem for the dual-core 500 MHz ERL when you’re pushing under ~200 Mbps, but more of a problem when you’re trying to push 1000 Mbps.

PF on ERL

WELP.

More power!

I needed something that was faster on a per core basis but still satisfied my usual firewall requirements.

Loosely:

  • small form factor
  • fan-less
  • multiple Intel Ethernet ports (good driver support)
  • low power consumption
  • not your regular off-the-shelf kit
  • relatively inexpensive

After evaluating a LOT of different options I settled on the Protectli Vault FW2B. With the specs required for the firewall (2 GB RAM and 8 GB drive) it comes in at a mere $239 USD! Installation of OpenBSD 6.4 was pretty straight forward, with the only problem I had was Etcher did not want to recognize the ‘.fs’ extension on the install image as bootable image. I quickly fixed this with good old Unix dd(1) on the Mac. Everything else was incredibly smooth.

After loading the same rulesets on my new install, the results were fantastic!

Protectli throughput

Monitoring

Now that the machine was up and running (and fast!), I wanted to know what it was doing. Over the years, I’ve always relied on the venerable pfstat software to give me an overview of my traffic, blocked packets, etc. It looks like this:

pfstat

As you can see it’s based on RRDtool, which was simply incredible in its time. Having worked on monitoring almost continuously for almost the past decade, I wanted to see if we could re-implement the same functionality using more modern tools as RRDtool and pfstat definitely have their limitations. This might be an opportunity to learn some new things as well.

I came across pf-graphite which seemed to be a great start! He had everything I needed and I added a few more stats from the detailed interface statistics and the ability for the code to exit for running from cron(8), which is a bit more OpenBSD style. I added code for sending to some SaaS metrics platforms but ultimately stuck with straight Graphite. One important thing to note was to use the Graphite pickle port (2004) instead of the default plaintext port for submission. Also you will need to set a loginterface in your ‘pf.conf’.

A bit of tweaking with Graphite and Grafana, and I had a pretty darn good recreation of my original PF stats dashboard!

pf grafana pf grafana2

I’ve added the JSON for the Grafana dashboard as well as some other changes to my fork of the repo.

Because it’s a Grafana dashboard, you can use it with many different backends if Graphite doesn’t suit you.

Hope you find it useful!

DevOps Across the Enterprise: Moving Past Dev and Ops

This post was originally published on the Librato Blog on March 24, 2015.

Last year, when talking to Patrick Debois about what the organizers of DevOpsDays Belgium were looking for in a talk, he told me he wanted to have conversations that move the movement forward. Having just finished a stint at a major enterprise corporation, Dave Zwieback suggested I talk about what it’s like to try and do DevOps at scale. This is my attempt to combine those two ideas.

Enterprise DevOps

I gave a talk last year with Reena Matthew at Gene Kim’s DevOps Enterprise Summit (#DOES14) called On the journey of an Enterprise transformation, Quality is still Job 1. I thought back to all the different speakers I’d seen at the Summit and what they were trying to accomplish. There seemed to be a notion that because DevOps practices were not pervasive throughout the entire organization, but maybe only in one silo or two, that they were doing something slightly different, they were doing Enterprise DevOps. Soon after there was an amazing blog post by Dave Roberts called “Enterprise DevOps” Doesn’t Make Sense. His post really spoke to me, especially this part:

“Enterprise DevOps doesn’t make sense because it confuses the forest for the trees. Those advocating for Enterprise DevOps make the mistake of focusing on specific tools and solutions (Jenkins, Kanban, etc.), then find them wanting for various (extremely valid) reasons in an enterprise context, and then reject DevOps in favor of Enterprise DevOps, which they suggest is somehow different.

But DevOps is about flow and continuous improvement, not about specific solutions. Flow and continuous improvement are equally applicable to a large enterprise as they are to an agile web startup. And if you miss that, you’re lost, regardless of the tools and solutions you choose.” - Dave Roberts

Dave was right: we are trying to accomplish the same goals, it’s still the same DevOps. My own view of DevOps in the enterprise (as opposed to Enterprise DevOps) has to do with differences in scale. If you were to work for a company in operations and they asked you to setup a monitoring solution, you would make very different choices if you were to monitor 60 cloud instances in AWS vs. 30,000 Dell servers spread across 6 data centers on 3 continents. In both cases, you are still setting up “monitoring”, but in each case, the infrastructure required to ensure a successful deployment differ radically. So it is, with DevOps across the enterprise.

Systems Thinking and Transparency

I believe there are two main criteria that are required when talking about the success of DevOps across the enterprise: systems thinking and transparency. I would argue that most of the things that we recognize at DevOps can likely be represented as manifestations of these two main ideas.

From systems thinking we get things like continuous integration pipelines, where one is trying to optimize the flow of software artifacts through the entire system. Configuration management is another instance where we are not tinkering with the configuration of an individual node anymore, but are making changes that can have global effects from a single code change. Even the DevOps emphasis on MTTR speaks to looking at ways to design our systems so that recovery is as close to instantaneous as possible, at least from the perspective of the end user.

Similarly, we can see examples of a focus on transparency in many DevOps principles. Shared revision control allows anyone with access to the code base to not only see any of the code running in production, but even see how it’s changed over time. All agile methodologies have their foundations in transparency, which allows us to gain insight into both bottlenecks and blockers and work to clear them as expeditiously as possible. Almost anything that fits under sharing in the CAMS model by Edwards and Willis, can once again be classified as transparency.

The 1st Way

Some may argue that we’ve not even solved the collaboration issues between Dev and Ops yet, but I think it’s time we start taking our DevOps discussion wider, to encompass the entire system. We need to build upon what we’ve already been able to establish to date.

Most students of DevOps are familiar with Gene Kim’s The Three Ways: The Principles Underpinning DevOps. I hold Gene in the highest regard, and I think we can do even better.

The traditional representation of the 1st way looks like this:

The First Way

While I agree with Gene that the 1st way should focus on Systems Thinking, I feel the diagram only focuses on a narrow part of the system. The first way shows the business as being represented by Dev and the Customer represented by Ops. I think a more accurate representation of the system would look more like this:

The New First Way

Our system actually starts with the customer and ends with the customer. The purpose of a business is to determine what a customers wants, or what a customer would want, and to optimize the delivery of that product so that the customer will buy the product. That’s what we aim to accomplish in DevOps. It is just as important to make the lives of a million IT professionals better, a cause to which Gene Kim has dedicated a large part of his life. We know that burnt out, bitter, professionals do not perform at their peak, which means that making those lives better are an essential part of our system. Ultimately, we’re trying to enable our businesses to be successful, to turn a profit. That’s why we’re paid to show up to work every day. That’s why the systems thinking that we need to apply must encompass the entire system. Our systems do not merely start with Dev and end with Ops.

Traditionally when we’ve talked about DevOps, the typical cast of characters that are open for discussion are Dev, Ops, QE, Security, etc. - the technical disciplines. But, that is selling our new model short, because there are other departments in an enterprise that are just as, if not more critical for its success. I think it’s time to ignore the words Dev and Ops in DevOps. The term DevOps has again begun to acquire secondary meaning (see: I’m not a DevOps…Are you an Agile? - urandom Mangot ideas). We are at the point of moving to a post-DevOps world, a world where DevOps principles include other elements who are required to deliver in our system, groups like Sales and Product.

The System of Selling

Most people who have had even the slightest contact with a sales organization are familiar with the concept of the sales funnel. The idea is that a great many leads come in at the top of the funnel and as you move down, only the bona fide leads come out the end as actual sales.

I believe that the sales funnel is the top of the system that we’re trying to optimize with DevOps. At the top we are collecting all the consumers of our software product, or all the features that our customers might want. As they move through the system, they will encounter product who will try and make sense of all the things entering the system, but ultimately, sales is feeding the system from its inception. Without sales, we have no reason to build and maintain our complex distributed systems, and likewise, without our systems, they have nothing to sell. We are all part of the same system, trying to create or maintain a successful business.

Perhaps the best example I can think of for this is our classic DevOps favorite, Toyota Motor Systems. If you’ve read Toyota Kata by Mike Rother, you know that Toyota is famous for delivering a high quality product using Lean principles while trying to optimize the system for the specific challenges they face. But you also know that Toyota is well known for just in time manufacturing in that they don’t keep lots of inventory arourate a lean system.

Good salespeople are a key to the success of the entire system because they understand the customer. I remember a presentation from Flowcon 2013 where a product manager sat down and met with a number of his customers and when asked about it afterwards, replied “I could always tell you what they wanted, but I could never tell you why.” A good salesperson will always understand why. I have a good friend in sales and he always laughs at me when he hears me talk about “that DevOps stuff”. The notion that you wouldn’t automatically have people trying to communicate their needs and others trying to make those things a reality is a foreign concept. He told me once that within a minute of calling one of his clients on a Monday morning, he can tell based on the description of how the client’s child did during the past weekend’s Little League baseball game whether or not it is a good day to try and sell that client something. A good salesperson knows their clients well, and spends a lot of time trying to make sure they are successful with those clients, that is why they are an ideal head to our system.

That is not to say that in a large enterprise, there aren’t misunderstandings between the sales staff and the technical staff. Many of us have experienced the sales staff promising all varieties of “vaporware” promised by the said staff without truly consulting with the technical departments. What often winds up happening is some kind of awful “death march” where the IT professionals are working night after night and weekends just to meet some insane deadline and give the customer something that at least resembles what was promised. To me, that feels a lot like the “throwing it over the wall” from development to operations that we try and fight so hard in our DevOps initiatives. That kind of situation is not even close to optimizing our entire system, and therefore, is not DevOps.

A Story

You may be saying to yourself that what I described so far doesn’t sound like something that is applicable only in an enterprise, that these ideas could be applied to a startup as well. That is true, but remember, we’re talking about differences in scale. Hopefully you’ll see that the concepts of transparency and systems thinking that we discussed at the beginning, are crucial to the success of our DevOps implementation at scale. To further illustrate, here is a fictional story.

In our story we have a sales guy, Bern. For the purposes of our story, Bern is good at his job. He is very responsive to his customers and tries very hard to give them what they want. He would also never throw something over the wall to his technical staff. Bern has been hearing from a lot of his customers that they would like feature X in his SaaS product. Bern talks to product and finds out that many other customers besides his own are interested in feature X as well. So Bern and a member of the product team contact the head of Engineering and ask her how long it would take for her teams to develop that feature.

Our head of engineering, Tess, has a number of teams that practice agile methodologies. They estimate that they can deliver feature X in two sprints, a two weeks each, which means one month until it’s in the hands of the customers. Everyone agrees, and Bern, being very responsive, explains to the customer that they should be able to purchase or use this new feature in about a month.

One of Tess’ teams starts on the new feature at the beginning of their next iteration. Like all good scrum teams, at the end of their first iteration, they hold the sprint retrospective where they demo what they have so far. During the course of the sprint, they have also been moving the stories that are part of the feature across the wall. During this time, Bern has had the ability to track the progress of the stories through the system because of this transparency. On the day of the sprint retro, Tess’ team records a short (< 5 minute) demo of each of the features they’ve been developing, and posts the demos on the wiki page assigned to their team. Then they go home for the weekend.

Posting the demos for all to see accomplishes a number of things:

  • Despite the fact that Bern is located on the other side of the planet from this dev team, he can view the demo and comment on it, make suggestions, etc., despite the geographical disparity
  • As a matter of fact, anyone in the organization - sales, product, or otherwise - can view the demo
  • By posting the demos, even if those stories are not complete for the features, the Dev team is able to get fast feedback (fail fast anyone?) so they do not hand over a feature at the end that was not what the customer wanted
  • Bern can go to his customers and tell them that he’s seen an early version of the feature and that it looks great ( and that he perhaps made some suggestions for further improvement), so they understand the situation exactly

When Tess’ team is finally able to deliver the feature, it is exactly what the customer wanted. This is not just because her team, along with Operations, have spent a lot of time in the past few years implementing all kinds of optimizations to their system so they can do things like continuous delivery and feature flagging to enable the successful rollout of feature X. It is also because they required the correct input at the head of the system, to know what they should spend their time on building, in order to enable the business to satisfy the customer.

DevOps Across the Enterprise

DevOps across the enterprise does not throw away the work we’ve done to optimize delivery systems in Dev and Ops. Rather, it expands that work so that Dev and Ops are not working in isolation, nor are they the only part of the system that has been optimized. The systems thinking is a requirement in order to allow the business to determine what are the customers actual needs. The transparency is a requirement that allows the business to deliver solutions to those needs at scale.

It’s time to move beyond simply the Dev and the Ops of DevOps. It’s time to embrace the performance of the entire system with transparency like Toyota does in their manufacture of automobiles. Only then can we achieve our true DevOps goals.

On Cross-Functional Teams, DevOps and Spider Charts, Pt. 2

Part 2

As we saw in Part 1 of this essay, there are deliberate ways to organize our cross-functional teams for success in delivering our services in production. One of the most important ways is practicing infrastructure as code, which is critical to automation (the A in the CAMS, or Culture, Automation, Measurement and Sharing DevOps model), a core principle.

When working on that post, I was really excited to find 5 reasons everything you know about teamwork is wrong by Eric Barker that draws from Bronson and Merryman’s book Top Dog, while I was reading the Kates’ excellent Tech Leadership News.

There were two particular “rules” in that post that I felt were particularly relevant to our discussion of cross-functional teams.

Ninety percent of team success is determined before they start work

…60% of a team’s fate has been written before the team members even meet. Its destiny is decided by a combination of the team leader’s efficacy, whether the team’s goal is challenging yet attainable, and the ability level of the people recruited to the team. Thirty percent of a team’s fate is sealed with the initial launch of the team— how the teammates meet and, in those initial exchanges, how they split up the responsibilities and tasks before them. They need to agree on common codes of conduct and shared expectations. All told, 90% of a team’s fate has been decided before the team ever begins its real work.

I thought this was really fascinating given the emphasis Part 1 placed on the ability members of a cross-functional team have to help each other not only deliver, but to grow. It highlights the need to make smart choices about how the teams are formed like Chris Fry mentions in his description of the Twitter Engineering Culture. Does this mean that our cross-functional model doesn’t work? No. It just means that team composition has to take into account other elements (e.g. leadership, experience) than the simple axes we’d outlined in the description of our model. Just as Roy Rapoport said in his Flowcon presentation, “You can’t put smart people in a dumb organization to make the organization smarter”, the choices we make about how to compose our teams are critical to its success. Without the cross-functional aspect however, they are probably without any hope of having a successful deployment.

Defining roles may be the most important thing a team does

Clarifying who is going to do what— identifying distinct roles— is one of the most proven ways to increase the quality of teamwork. The egalitarian notion that team members should be equal in status and interchangeable in their roles is erroneous. Teams work best when participants know their roles, but not every role needs to be equal. Dr. Eduardo Salas, at the University of Central Florida, is one of the most widely cited scholars studying team efficiency. He has devoted his life to understanding the vast sea of team-building and team-training processes— analyzing teams used in the military, law enforcement, NASA, and numerous corporate settings. The only strategies that consistently deliver results are those that focus on role clarification: who’s going to do what when the pressure gets intense.

This quote really threw me for a loop. I had been making an argument that each team member should share responsibility for a variety of roles (dev, system, qe, etc.) on a cross-functional team and here it said “The egalitarian notion that team members should be equal in status and interchangeable in their roles is erroneous”. When reasoning about the possibilities of why this may or not be true, I looked at the teams they studied (“military, law-enforcement, NASA, and numerous corporate settings”). For the first group, it seemed to make sense. If I’m in the military and the guy next to me doesn’t know how to use the grenade launcher as well as I do, I sure want to be the guy using it in a firefight. But, how does that fit with the “numerous corporate settings” team? Our service delivery team is definitely in a corporate setting. Does this mean that the sysadmin should only ever do systems administration, and the developer likewise?

Then I re-read it with the last sentence “who’s going to do what when the pressure gets intense”. That’s when it actually made sense. When we have a production outage and we identify a problem, the developer is probably not going to be the one responsible for troubleshooting overruns in the TCP queuing if there are sysadmins or network engineers who have more experience and the goal is to get the site back up as fast as possible. That doesn’t mean that under normal circumstances the developer cannot come up to speed on network tuning (or that they must, they can contribute in their own way), just that when the “pressure gets intense”, the team needs to play to its strengths.

To use a sports analogy, in football, the cornerback genrally runs with the wide receiver, and the safety generally is his backup down the field. When the ball is snapped (our production outage situation), each plays the role they are assigned. However, that after each play is over they walk over and discuss with each other what they thought about the previous play and what they saw. Maybe after enough time playing together, they understand better and better how those roles are complementary to one another. That kind of open communication is essential for a team to function at its highest levels when the “pressure gets intense”. That is exactly how the team (be it sport team or service delivery team) gets better.

A cross-functional team is still necessary to achieve the 1:1 flow we’ve learned about from Toyota manufacturing. Agile emphasizes the advantage of cross-functional teams and we know that for a team to be effective in delivering a service in an infrastructure as code environment, they need to have all the skills required to deliver that service. We’ve also learned that the composition of the team goes beyond simply plugging in the correct functional area but there are other components that don’t just help, they actually determine the success of the team from the outset. In a high pressure situation, the cross-functional team is still effective, but team members are more likely to fall back to their core strengths in order to get past the stressor. In our corporate environments, cross-functional teams can be one of our most effective tools in helping us realize the benefits of practicing DevOps principles.

Special thanks to Seth Katz and Alan Caudill for helping me sort much of this out and make it understandable. I hope I was able to make their efforts a reality.

On Cross-Functional Teams, DevOps and Spider Charts

Part 1

From Toyota manufacturing we learn about the goal of 1:1 flow, that is, one input, and one output from a team such that we achieve the desired outcome in essentially one step. In order to acheive this, that step needs to be able to deliver the completed product without any other steps. One of the things we talk about often in Agile, and in DevOps, is the concept of a cross-functional team. One of the strengths of a cross-functional team is its ability to perform all the steps necessary to be able to produce their “product” without outside assistance.

In this post Adrian Cho argues that the best way to have a strong team that is able to respond to all their different challenges is to have a diversity of skills and experience, a mix of risk takers (dev) and the risk averse (ops). He argues this diversity is a key element in avoiding groupthink. I will go one step further and argue that it is precisely the ability to form teams in a manner that compensates for or provides the exact skill sets needed to achieve the target condition that makes them so successful.

Let’s look at an example of a cross-functional team. We will represent the team in a spider (or radar) chart. The closer a team member is to a discipline listed on the outside of the chart, the greater a degree of expertise they have in that particular discipline.

Spider chart

  • Member 1 has solid knowledge of systems and development and some experience with network and QE, she would be the classic “DevOps engineer” that recruiters are always looking for, or what we’ve always just called, “a sysadmin who can code”.
  • Member 2 has a deep knowledge of systems and networking but is not much of a coder and knows little about QE.
  • Member 3 has strong QE and development skills but is lacking in systems and network.
  • Member 4 would be your straight up network engineer, or what John Willis likes to call the next frontier for DevOps.

What is so special about this configuration? Each member of the team brings something to end to end delivery of the service. If you squint your eyes, you can almost see the outline of the diamond that covers all four functional areas (not that we are limited to 4 areas in any way). This team is also fully empowered– they can write the service to be delivered, test it, and have the ability to deploy and run it in production (assuming they follow DevOps practices like continuous deployment). This would actually be one representation of what many call “no-ops”. This does not mean we get rid of the operations capabilities of our organizations, it means that this responsibility is shared, and that there are members of the team who still have the skill sets needed to perform in ways traditionally expected of that role. No-ops really means we never throw anything “over the wall”. If our service is behaving very poorly and the TCP/IP stack needs to be tuned, we don’t call in outside help from another department. Instead member 4 steps up and uses their skill and experience to help deliver the service. Traditionally, we might have called that person operations, now, they are a valued skill set on a cross-functional team.

So does that mean that we’ve actually just invented a DevOps team? Of course not. Every one of our teams that operates a service is structured in a variant of this configuration based on the specific service they provide to the business. If all cross-functional teams are “DevOps” then everyone would be the “DevOps team”, which is silly, instead they are the search team, or the analytics team, etc. There could even be a team that consisted solely of DBAs. This is because the team composition is suited for the service they deliver. Our DBA team is practicing “infrastructure as code” principles so they can deliver a true “service” to the business, as opposed to being in service to the business. This means there are many ways to structure cross-functional teams in an organization that practices DevOps principles. We are simply trying to optimize the flow from creation of the service and into production and operation of that service. Does this mean these teams are “doing” DevOps? DevOps is a collection of principles and not practices. The fact that you can adhere to the principles of DevOps is what is important. You can’t “do” DevOps any more than you can “do” emergency preparedness walking down the street.

One thing that I’ve tried to do, that I hear others talk about doing, that unfortunately never really delivers the high performing cross-functional team, is “embeds”. This sounds like a great idea at first. We take a systems engineer from the ops team who is the “expert” on systems in production and drop her on a team of software engineers so that if they have any questions about operations, there will be someone there for them. She is still on the operations team, so that’s all fine in case there is a fire, but spends some time with the devs. Our poor SysEng is going to be constantly pulled off the team anytime there is a fire in operations. We’ve learned that these fires happen less frequently in organizations that truly practice DevOps from the DevOps Survey. This SysEng will also have to balance between the needs of two sides of the organization, probably to the detriment of both. With embeds there is no real opportunity to establish the true empathy required to bring a cross-functional team together.

This really became crystal clear to me listening to Jeff Patton give a keynote at Flowcon this past year. He was talking about really getting to know your customer and his example was that of Jane Goodall. It would have been more efficient for her to do “chimpanzees onsite” where the chimps were brought to the office for her to interview whenever she had a question about what it was like to be a chimp, rather than go spend time in the field with the chimps. You can imagine this would not have suited her research, and it does not suit our service teams either. The point is not to “get” ops or security involved “early in the process”, the point is to involve that input, natively, in the process itself, not as an add-on, bolt-on, or embed.

The nirvana of the cross-functional team, is that as members with different skill sets learn, grow, and collaborate with one another, each brings something to the table that helps the other members to go from the left to the right.

Cross-functional Progresson

In this model, no one is throwing anything over the wall. At the very worst, in a highly regulated environment with poorly written compliance rules, you are handing things over the wall, and stopping to chat about tonight’s sportsball match or the newest restaurant each time you are standing and chatting at the wall. Jez Humble alluded to these regulated environments a few years ago.

Is this cross-functional model going to work for everyone? No. Each organization has to find the way to adhere to DevOps principles that works for them. But cross-functional teams have been promoted in Agile for years because they fit so well with the lean principles of eliminating waste in the system. If handoffs are the killers to flow, then the most well constructed true cross-functional teams will be best equiped to help an organization achieve the left to right flow optimization Gene Kim describes in his First Way, and achieve the really tight feedback loops described in the Third Way. Whether you succeed or fail at creating an environment that allows for all the advantages that DevOps can bring is up to you.

In part two, we’ll cover some evidence both in support of and contrary to this model of team composition.

Special thanks to Seth Katz and Alan Caudill for helping me sort much of this out and make it understandable. I hope I was able to make their efforts a reality.

I’m Not a DevOps…Are You an Agile?

I was signing up for a new MeetUp group a few months ago, and as part of the process, I was supposed to answer a few questions. One of the questions was “Are you a DevOps?”. I was a little struck by this. I’m pretty sure I knew what the question was supposed to mean, but I also know that the question was nonsensical. Maybe a proper rephrasing would have been, “Are you an engineer who works in accordance with DevOps principles?”. Maybe that was just too long and loaded to ask as a question. I feel like we’re starting to lose control of the word “DevOps”. Is that just natural for our industry? Is it a “good thing” or a “bad thing”? Maybe this means that DevOps is gaining adoption, or maybe it means we’ve lost sight of why we’re doing this.

DevOps jobs

Taking a look at my inbox on any given day and it is filled with job opportunities. We are looking for “DevOps engineers”, or one of my personal favorites “Senior DevOps engineer” or “Lead DevOps engineer”. I feel like sometimes our industry will find some term and then it will get dragged through the mud for a few years until perhaps the proper usage will be found. Take the word “hacker”. When hacker first became popular it referred to someone who like to “hack” on code. Then the press got a hold of the word. After a while, a hacker was someone who tried to break into other’s computer systems. For years, in Hollywood movies, we learned about hackers and their nefarious telnet toner backdoor circuit electron overflows. I had a general counsel once explain to me the concept of “secondary meaning” in legal terms. This is when something that is generally known takes on “secondary meaning” as in the case of Apple. Everyone knew what an apple was when they were growing up, and they were delicious. Eventually, Apple the computer company became so big, that the word apple took on secondary meaning. I feel like this is what happened with the word hacker in our industry. Eventually it took on secondary meaning as a way of describing what folks on Bugtraq for years have been lamenting should have been called “cracker” (oh, the irony) instead. Now we have Hacker Dojos, Hacker Bees, IKEA hackers, and Hacker News. If you read news articles you will still see references to how hackers have infiltrated some government network (with their majicks!), but on the whole, our industry has either reclaimed the term, or allowed its dual usage based on context.

So then what about DevOps Engineers? Has Patrick Dubois, has our industry, already lost control of our own term? I’ve been going to DevOpsDays for years, I’ve been talking to people and giving talks about DevOps, I’ve been trying to help salesforce.com through a DevOps transformation. In none of those interactions have we ever talked about DevOps engineers. When I give my Introduction to DevOps talks, I usually tell the audience to substitute the word “collaboration” where they see the word DevOps and they will be much closer to an instant understanding of what we’re trying to accomplish with DevOps, even if they won’t understand the “why” quite as quickly.

If you’ve ever listened to Gene Kim talk about DevOps, he talks about making life better for thousands of IT professionals. DevOps is definitely about getting rid of the “throw it over the wall” mentality between development and operations, but it’s for the purpose of getting the business to focus on what is most important, being able to rapidly deliver value to the customer. In order to do that, Dev and Ops have to change too. Maybe those changes are a natural evolution. I remember the BOFH. I hope that as an industry, we’ve put that behind us already, we’ve evolved. The idea of a BOFH is antithecal to the DevOps movement. The BOFH said no to everyone, it’s no wonder people did not want to collaborate with him or her.

The BOFH was fine for the days when we used to think it was a good idea to outsource our IT departments, to cut costs. If your development wing is in the Phillipines, and your operations department is in India, and they are run by two completely different companies, it doesn’t matter all that much how well your devs and ops communicate, they don’t! You are already starting behind and will never catch up to companies practicing DevOps. DevOps is not about oursourcing, DevOps is about insourcing. The problem is, no matter how much the engineers in us try to escape it, and turn everything into an algorithm, it still all about people and how they communicate. The DevOps movement is a recognition of that fact. If you’ve ever read Crucial Conversations, a book about communication skills, one of the first concepts they teach you when you are going to have an imporant conversation is to Establish Mutual Purpose. That is DevOps in a nutshell. Development and Operations establishing mutual purpose. In DevOps, that purpose it to deliver the most amount of value to the business through streamlined processes, it’s about always seeking to increase flow. Once that purpose is recognized, you’re much more likely to have a successful conversation, and you’re much more likely to have a successful business.

So, I read with interest the recruiters who write me about the latest DevOps opportunity. I don’t remember a major called ‘recruiting’ in college. The closest I can think of would be sales or marketing. What are these recruiters trying to market or sell? Are they marketing a collaborative environment? Free from BOFHs? Where flow reigns supreme? If that is the case, isn’t every company looking for the exact same thing? Is the word “DevOps” redundant in this case? Is DevOps actually a descriptive differentiator? Should it just be “looking for systems administrator”, “looking for developer who likes to understand the entire architecture”? Or is it more than that? Sometimes I worry that “DevOps engineer”, actually means “sysadmin who understands writing code for automation”. But, after being involved in this industry for many years, that’s just what we’ve always called a “good sysadmin”. Is that what they are marketing? “Serious startup company seeks good systems adminstrator”. I guess it doesn’t have the same cache.

The problem, I suppose, is more than that. Where are these good sysadmins going to come from? Where are these systems thinking developers going to be taught? Traditionally developers might code and sysadmins might troubleshoot. But now, developers are responsible for their code in production (if you wrote it, you run it). Sysadmins write code. Where is the line? Where will the subject matter experts we’re accustomed to come from? I suspect that the industry will simply change. There will be people who naturally gravitate toward one or another aspect or specialization based on their interest or experience. It might be harder to find your traditional QE or Sysadmin. Maybe we’ll be looking for cloud engineers? I don’t want to get started on public vs. private clouds, etc.

So where does that leave us? Obviously, our industry is in transition. Everyone is looking for “DevOps Engineers”. Are you part of a movement that represents the fact that we can deliver more business value when people see delivery through lean principles where the empasis is on flow through the system, and short feedback loops, more than it is upon silos and politics? It’s people over process, a core tenent of the Agile Manifesto. Is that what these recruiters are asking me? Do I believe in people over process? Of course I do, that’s why I believe in Agile. I think the lessons that agile development process teaches us not only make us better developers, network admins, and sysadmins, I think they also make us better as people. The sprint retrospective may be about making the team better through a process of self improvement, but it’s also about a process of remembering to improve ourselves. Any strides that we make a person, as an individual, through better tooling, or better communication, make the team better, and benefit the business.

The sales and marketing folks are definitely having success using the DevOps term. When signing up for the O’Reilly Velocity conference I was asked “Do you primarily work in Web Operations, Web Performance or DevOps?”. Last week on Twitter, Puppetlabs asked “Lots of DevOps jobs out there — and more on the way. How do you become a DevOps engineer & get in on this trend?”. These are from two organizations that defintely “get” DevOps. Maybe the word has already acquired that “secondary meaning”. Maybe we’ll be calling all the jobs that are interesting “DevOps jobs”, until the whole industry is operating with that business model anyway. I just have to imagine that Toyota never advertised for “Kaizen Engineers” to work on the production line.

So when confronted with the question, “Are you a DevOps?” I answered the only way I knew how. Someone was asking me not whether I was qualified to fill a role, but whether I believed in a movement. Whether flow was of the utmost importance. Whether communication was more important than silos. “Of course not, are you an Agile?”

DevOps Recruiting http://www.build-doctor.com/wp-content/uploads/2011/07/devop.jpg