Thoughts on Cloud Native Teams

When I started developing Microservices I felt like I had been missing out on a huge part of the modern technology world. It turned out that I was right. Over the past few years I’ve been trying in my mind to figure out how modern development teams run effectively. What I discovered was that there are tons of people out there trying to figure out the same thing. Many of the things I’ll write about are patterns that have been around for more than a decade, yet still many companies don’t seem to adopt them. In the end building a team in the “cloud era” has changed the landscape and we can find that each change we see in infrastructure, development practices, project management, and testing perpetuates further advancement and change.

The more I read about agile, cloud native development, and DevOps it became clear to me how interrelated these practices are and how effective teams can be built around those principles. This may have seemed obvious, but I often think we don’t spend enough time structuring our teams in the same way. Often we will see disparate teams built around project management, development, and operations instead of around products. These teams can grow large overtime and become ineffective. Many are stretched thin in supporting multiple products and can’t meet the demands of the others causing inefficiencies in the product pipeline.

Now as a disclaimer I am going to say that I have an no point ever had to manage someone. This is an opinion piece about how from all that I’ve read and experienced. My belief is that a team could be structured to be effective in developing and deploying applications in the cloud if it focuses more on products rather than teams.

Cloud Native

Cloud Native has been a term that is being thrown around the industry as more applications get built and deployed on externally hosted infrastructure. This shift from on premise servers has changed a lot of the way we structure, deploy, and manage our applications. The assumption is that your application is built to run on AWS, GCP, Azure, or some other stack and therefore be in the “cloud” and this practice has a generic term called “Cloud Native”.

Now, I’m sure if you polled people they would probably agree that moving an old legacy application to the cloud does not deem it “cloud native”. Nor if you are manually uploading and deploying content is it “cloud native”. All you’ve done in those cases is move “off premise”. So just having an application hosted on AWS doesn’t make you a cloud native developer then what does?

This is a question that has plagued me for months and it’s hard to find a formal definition but the closest thing that I could find was from this talk along with some other pieces I’ve found. But in general to have cloud native development you need teams of people that build and manage numbers of small applications (microservices) that are in a managed infrastructure, communicate over a network, and are isolated from one another. Infrastructure is tied to the applications itself and applications are designed to used hosted or external services (serverless technologies).

This definition tells me that our teams need to be structured in a way to accommodate these tasks and in order to do that we need to stop separating our developers from our operations people from our project managers from our quality assurance. Cloud native development in an organization would thrive if team members design and build, manage, prioritize, and test the applications. In order to do that you obviously need someone with knowledge of how to build and deploy applications to the cloud, knowledge of testing, and knowledge on managing the project.

If we can take one lesson from cloud architecture is that we need to assume that resources may not be available. In the same way we need to ensure that our teams can continue to work if team members are unavailable.

The question is how? How do you build a team that can survive without a project manager, expert in operations, quality assurance, or developer. The answer is create team members that can fulfil any of these roles, or at least have the knowledge to help someone help them.

Team Size

In Fred Brook’s paper titled the “Mythical Man Month”, he shows how the larger a team gets the slower they become. This notion was the foundation of some of the agile movement and eventually microservices. How big should teams be? How do you know when you should split them up?

In biology there is a term known as Binary Fission, which occurs when a cell is so large that it then splits into two cells. I have always envisioned teams growing like that. Wait for enough people to get on a product then split both the product and the team to support it. If grown organically your team will have knowledge of the systems and infrastructure.

This notion of splitting may seem strange because it’s slightly counterintuitive to the ideas of how microservices should work. However, there is plenty of literature out there that can say how starting out with microservices can have an impact on productivity and costs can be out of control. Nor am I saying that everyone should start out as a monolith which can impact resiliency. What’s more important here is the sense of product ownership. Which people work on which application.

What’s the optimum size? I have no clue. Amazon famously has the two pizza team which is around six people. This can vary from team to team. But the point I want to underline is no matter how many people you have it’s important to count beyond developers on a team. This should include people who you consider DevOps, quality assurance, and project managers. This is your team, and you will find that different products may need different allotments of people.

Binary fission requires a lot of energy to undertake and in the same way splitting up a team or product may also require time and energy. What you gain in the end though are smaller effective teams that have the ability to cross train one another and help you continually analyze and grow your application.

Polyglot

A lot of companies will say they program in one language (Java, Ruby, etc) which seems extremely limiting from both an application development perspective as well as a hiring perspective.

With microservices one of the major benefits is the ability to use the right tool for the job. In some cases Java isn’t as effective as Python for developing applications or Ruby may not have a great library that Java does. So in the end it should be up to the developer to decide what language an application should be written in or determined by a team based on developer availability.

This also allows the hiring manager to cast a wider net for hiring by not specifying what language someone needs. Instead you have the ability to tap into a larger market of candidates and find the most effective developers. This also provides an opportunity for developers to cross train and learn.

The kicker here is you do not want five programers who each only know five unique languages because the review process will become ineffective. So instead I would suggest that a language can only become part of your application stack if two or more developers understand that language. Most experienced programmers will know more than one language so this shouldn’t be a big deal and provides the ability to cross train your developers.

Furthermore, you should require your infrastructure to be written as code as well. This way developers and DevOps personnelle can begin to train each other and learn. By having developers who know how to write infrastructure you are able to grow your teams even if you don’t have enough people with DevOps backgrounds. DevOps developers can then get an idea of how applications work. This is beneficial when you implement repositories and code reviews (explained later).

The big goal here is to foster a community of learning and the ability to execute in the most effective way possible.

Practices

Being part of team can at times be difficult. Everyone is different. Everyone had opinions. Some are outspoken while others let their actions speak. I’m not a huge advocate of rules for teams. I believe that if you mix groups of people you’ll find that each will develop different habits and standards.

However if you are following this dynamic team model these standards and habits will also become dynamic. While this is great because it means things will be evolving and as a manager you can see what makes your teams effective. It also means that you could have complete chaos.

No one likes chaos. So I think that it is reasonable to have a few standard practices on your teams that will make them effective in their development and make it easy for them to hop between teams.

Project Management

A team must be able to run by itself. That means that the business expectations need to be made clear to the team and projects need to be outlined in advanced. The team will then break down these requirements into pieces they can manage. Writing good requirements or stories is outside the scope of this article but it is important to realize that the communication between the team and the business is paramount. The goal here is effective communication between the two so both the team and the business can move forward.

It is up to the team then to figure out how to manage and distribute the different aspects of the work and determine what all they can deliver. This is a high level view of some agile development practices. The main point I would like to make here is that the team should be small and that the people on the team should be working together to determine the amount of work.

If a team becomes too big you’ll see that it becomes hard to plan for the work coming in and that communication can become ineffective which inhibits productivity. By having the people on the team manage a project through a ticket system or scrum board they can effectively plan out and determine what all can be done in a certain time period. If something becomes a priority they should be able to shift their resources more effectively and deliver on time.

Code Repositories and Reviews

A code repository is the core of a team. All work that the team does - applications, infrastructure, tests - need to be in the same place. This is because the code will be shared by the team and be reviewed by the team.

When a review for a code change goes up it will be visible to everyone so they are able to then make suggestions or ask questions. This allows the members to cross train and spot flaws in the code.

Furthermore this code base will be the driver for the entire development pipeline. By having all of these resources in one place the team can determine if the project is broken as a whole rather than trying to piece together bits and pieces

The more important aspect of this is the need for a review system. Before changes are committed to the repository they should first go through a review by peers or a manager to ensure they are meeting other standards, missed bugs or requirements, or ways to improve the code. One approval should be required by someone who understands the programming language and can make an educated decision about how it may work. Other things can be attached to the review process such as linters and unit tests but at the very least the code needs to be verified by a human before it can be merged into the main code base.

All of this does is allow developers to better understand the code they are supporting and allow other developers to learn from mistakes. What needs to happen is fostering a culture of learning through code reviews instead of either being too passive about your code quality or too critical that no one wants to contribute.

Practices and policies are for you to decide. If your team decides to go above and beyond that’s even better. Take the time to evaluate your standards and iterate and improve. See what teams are doing to make themselves more successful and where they might be struggling.

Code Quality

In order for a team to work together there needs to be some assurance of the quality of the changes they are making. Quality of code can be hard to define and like everything else discussed here is up to the interpretation of the team itself. That being said it’s once again paramount to establish some baseline for developers on your team to ensure the projects are working properly.

Tests

The first way of doing this is implementing tests in and against your code. I won’t go into detail about the different levels of testing since I have addressed this in the past. However it is worth noting here the important role tests play within a team an organization.

Tests should run on or before every merge to notify the users of breaking changes. It is important as well for the developers on the team to notice during a during a code review if tests have been added as part of the commit. By doing this you can make sure that a certain level of effort was made to make sure that the changes to the code are working as expected.

These expectations need to be outlined in the code. This is what reviewers need to pay attention to when doing a review. If the logic of the test doesn’t make sense someone may need to clarify what is changing and why.

Tools like code coverage can be helpful but are not entirely necessary since they don’t often address the logic of the code just that the code is tested.

Tools

Any sorts of aids or tools that can assist developers on or after a review to make sure that their code is standard will increase the velocity of the teams overall development. Tools such as linters and formatters can ensure that code is uniform so developers are always looking at uniform looking code. Static code analysis tools allow developers on a team to automatically check to see if the code may contain a bug or bad programming practices. Security vulnerability checks can look to see if libraries that you are using may contain outdated or insecure code.

The overall goal of these kinds of tools are that they provide a certain level of “auditing” which can be done automatically for a team to ensure a certain level of uniformity as well as catching bugs or vulnerabilities before they go live. What is once again important is ensuring that these tools don’t inhibit development but enhance it. In order to do this it might be useful to have all tests and tools run when code is committed to a branch or requested to merge into the mainline and if successful allow the developer to merge. This would mean that the developer would know if something is wrong before it breaks the mainline of code.

Infrastructure Code

Previously in this post I mentioned the need to define your infrastructure as code. This means that however an application is to be deployed that process should be written as code. Languages like Terraform and CloudFormation allow you to define the specific servers or services you’ll need for the application to run. Alternatively you may have Docker images or Kubernetes deployment scripts that need to be run. All of this defines the infrastructure as code.

The biggest catch to this is that the infrastructure code should live alongside the rest of the service code. Whoever manages the service should also have access to the infrastructure files. This will allow the team to work and make changes if necessary to the infrastructure without tying up a specific resource. Or allow input from the architect into the discussion on how a service will be implemented.

This is the true nature of DevOps. A collaboration between development and operations. The lines between the two should be blurred and everyone on the team should be cross trained. This would allow everyone to be able to manage and deploy applications independent of a single Operations team.

Manage a Pipeline

In the same way a service should manage its own infrastructure so to must it be able to manage how the service gets tested and deployed. This can be done through pipelines, which define the stages that an application must go through in order to be deployed. Certain systems like CircleCI, Jenkins and TravisCI all allow you to defined the pipeline as a file within a repository.

The pipeline can execute specific steps of the pipeline depending on whether code was merged into a specific branch or tagged. This allows you to determine the flow of both your infrastructure and application through a pipeline.

Obviously teams should consider adding some of the tooling that was outlined ahead of time and to trigger the process upon an action on a code review.

Collaboration

The practices and plans outlined above should be standard for all teams but freedom should be given to those teams on how they can go about doing them. This will allow for some natural growth and possibly effective practices that can be shared with other teams.

Remember, we expect that these teams grow organically by splitting off from each other. You can also allow members to be shuffled between teams after a certain amount of time or after a specific project is completed.

It is important to keep an eye on your teams to see what is working and what isn’t and to have them meet regularly to share ideas, come up with new plans, or build effective tooling to help your organization move forward.

People in technology love to grow and learn and build new things. Often time companies seem to constrain and limit that creativity and growth in order to meet deadlines. Building an effective team is not abous siloing or separating but instead building an environment of collaboration and learning.

By forcing people with different talents to collaborate on a project and learn from each other is only going to benefit your company and allow you to grow. The cross training allows someone to pick up the slack if someone leaves or gets sick. Standardizing workflows and practices establish a baseline for all teams to work at but allows for upward growth. Organically grown teams allow for evolution within your organization.

I started this journey as a thought experiment on how I believe teams should work. This was what I came up with. I’ve worked on a few teams with disparate elements found here and often wondered where they were limited. Reading books and listening to talks and experimentation on my own helped form in my mind how I would run a team. I hope someday to have the opportunity to make this happen.

Related