Podcast: Kevin Goldsmith on architecture and organizational design

Aug 6, 2018 | Perspectives in Engineering

Conway’s Law is the observation that the software architecture an organization produces is a reflection of the way that organization’s teams are structured.

Melvin Conway first documented this observation in 1968, and it was subsequently made popular in Fred Brooks’ The Mythical Man-Month (published in 1975). But with modern-day companies like Spotify, Netflix, and Amazon taking deliberate steps to structure their engineering organizations in a way that reinforces the architecture they want to produce, “Conway’s Law” seems to be having a new life.

Having led teams at Avvo, Spotify, Adobe and Microsoft, Kevin Goldsmith (now VP of Engineering at AstrumU) has seen firsthand how organizational models influence the architecture produced — and has even experimented with different structures in order to solve architectural challenges.

Travis Kimmel, Co-founder and CEO of GitPrime, hosted Kevin on SE-Radio to get his take on what Conway’s Law looks like in practice and learn how organizations can leverage this understanding.

Architecture and Organizational Design

Head over to SE-Radio to listen to the full episode, or read the transcript of their conversation here.


Travis:
Kevin, thanks for joining Software Engineering Radio.

Today we’re going to be talking about Conway’s Law: how organizational design influences the software architecture it produces. We’ll explore what that really means, Kevin’s experience with it, and how other engineering leaders can leverage this knowledge int heir own organizations. So, Kevin, let’s start off by defining Conway’s Law. What is it?

What is Conway’s Law?

Kevin:
So Conway’s Law is interesting, and it seems to be having a new life in the industry. Melvin Conway, was a software architect back in the ’60s, and he wrote an article for Datamation Magazine in April, 1968, called “How Do Committees Invent?”. I like to think about it as sort of the equivalent of a grumpy software developer blog post of today: He was complaining about this pattern he saw in his own teams, where the software architectures he and his team of architects were producing were very much mirroring the organizational structures they had (or the way teams were laid out).

So Conway wrote this article, people talked about it a little bit, and then it kind of just disappeared. Then every once in a while, including recently, you start hearing about it again as people are starting to realize just what he hit on.

The absolute truth of this became known as Conway’s Law.

Travis:
What has this looked like in your experience?

Kevin:
The way it’s traditionally stated, is organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. Pretty wordy, not necessarily clear. My favorite restatement of it, which also kind of identifies this, is from Eric Raymond. He says, “If you have four groups working on a compiler, you’ll get a four-pass compiler.”

Travis:
I love that.

Kevin:
Yeah, and it’s true. In your own teams, you can absolutely see this being the case. I did as well. I started working in the industry back in the early ’90s, when we were structuring things very traditionally, not that different from the way things were structured when Conway was working. You would see this very much. The way the team’s structured had a direct correlation, and to the architecture, because that’s how you were splitting up the work. It’s a very natural correspondence.

The communication flows within an organization dictate the way it structures its libraries and dependencies.

I saw this really clearly when I was in much bigger teams, like at Microsoft, whereas you decide when you make these fundamental decisions which team’s going to own what, or do we need a new team? We have this thing we want to build, do we need a new team for it? What would happen is you’d be just trying to figure out how to split the work across the people, but you would end up inadvertently making important architectural decisions that you’d end up living with just because if a different group of people builds this thing, they’re going to build it in a way that makes sense to them, and is independent so that they can execute without being stuck waiting for everybody else all the time.

So, I’ve just seen it over and over again, and then I’ve seen the corollary, as we become much more cognizant of this factor changing how we organize teams in order to take advantage of it.

Is it really a “law”?

Travis:
So, do you view this as something that is really a “law”?

Is it inescapable? Are there notable counterexamples, and what are the boundaries of the law?

Kevin:
So, that’s a good question, because one of the things that as I said, essentially this was Conway kind of ranting in a very well-said way, in a magazine, because he didn’t have his own blog post back then, and it got called Conway’s Law, but it’s not like physics.

Actually, people have tried to prove it. There have been a few different studies; one notable one came from Harvard Business School back in 2008, where they did this study called “Exploring the Duality Between Product and Organizational Architectures: A test of the “Mirroring” Hypothesis”. So, another way that people talk about Conway’s Law is this notion of a mirroring hypothesis, that the organizational structure mirrors the software architectures that are produced.

So, it’s more of a business school case study, but they actually talked to lots of different organizations and they found a fairly good correspondence between this. They proved this mirroring hypothesis to the extent that they could, that yeah, no, there is this very strong correspondence between this structure and a software architecture.

So, is it a law in like the speed of light is a law, or you can’t violate it? No, you can. This is people; this isn’t laws of physics. This is psychology and sociology and to a certain extent brain chemistry, which I can talk about, but it’s really hard not to do it. So, it isn’t impossible, but you tend to go that way whether you mean to or not.

Travis:
Strong enough tendency that we should pay attention.

Kevin:
Yeah, if you don’t pay attention, it’s going to happen. If you don’t want it to happen, you have to work really hard to make it not happen.

How modern-day organizations approach Conway’s Law

Travis:
I’d love to learn a little bit about your experiences throughout your career with this. Do companies typically approach this eyes open, and design an organizational structure along the lines of the architecture they want to design, or is it something that’s realized in the retrospect?

Kevin:
I think there are companies that have been very cognizant of this force. Another way you talk about Conway’s Law, or the mirroring hypothesis, another way of stating it is this “homomorphic force,” which is a mathematical concept, but the idea that there’s a mirroring between these two things that are similarly structured, and a desire to make these things be the same. There have been companies that have been very cognizant of it, either within the context of Conway’s Law or just this homomorphic force, this mirroring hypothesis. Companies like Netflix, Amazon, and Spotify, where I worked, they deliberately structured their teams in a way that reinforced the architectures that they wanted to build.

Another way of saying what it is, they had this structure that they wanted for their teams that ended up producing the optimal software structure that has supported them. So, when you think of all these three companies, you tend to think about microservices. But all these companies have structured themselves in a way that only microservice architectures really work, because they have lots of teams that are all working independently of each other. They talk to each other through interfaces. Well, the only way to make that happen is for each of these teams to own their own set of independent services from each other.

You can think about it, and I’ve definitely seen this a lot, especially talking to different startups. When you’re a small company, you’re just getting started with a few developers, well, what do you build? You build a monolith, right? You have one team. A monolith is easier to deploy. A monolith is easy to understand. Everybody can share the same repo. Awesome. You have one team. One team builds one service. If your company’s successful, it starts to grow. Well, now you need two teams. The team has gotten too big. Well, okay, maybe you can both work on the monolith, but it starts to become difficult, just because you have to coordinate stuff together.

A great example, actually, of a company that has figured out how to build a monolith forever is Facebook, but they have to deal a lot to work around these issues with Conway’s Law, including just inventing new computer science in order to make some of their software architecture work and support their model, because it is so distinct from their software architecture. With a startup, you’re building and now you have two teams. Eventually, that frustration of both working on the monolith, maybe you figure out how to split the monolith if you’re lucky; if you’re not, you continue to work with it, and you pay that down the road. Then you have three teams, and you have four teams, and you have five times.

What starts to happen is because your software architecture and organizational structure are distinct from each other, there’s a lot of tension between them. You broke the build, and now all of us can’t work. Those kinds of things start to happen, which is why you tend to see, as companies grow and they start with a monolith like everybody does, you see all these discussions all over the place of how do we break up our monolith, how are we doing that, because it is so disjoint from their organization that it becomes a problem in itself.

Travis:
So, you said they structure the team. I’d love to pick at that a little bit. Who inside the engineering hierarchy does the structuring?

What role is best suited to thinking about both organizational structure and its impact on architecture?

Kevin:
That’s a great question. So, I’ve seen this done in different ways. I think the way this works is more determined by the engineering culture of the organization, and less by having a specific role. So, I have seen companies have a software architecture group successfully, or a chief software architect who would take on this responsibility of deciding how systems should be structured. I have also seen this, and we did this at Avvo, where there was more of a consensus-driven approach for, well, how do we as an organization think we should do this, and we put structures in place to kind of make those collective decisions efficiently.

Spotify had a chief software architect, but his main role was more or less making sure good engineering principles were being used in making decisions, and asking developers tough questions when they wanted to diverge from things, but those structures and architecture really came very much from an individual team, and best practices coming through the organization. And I’ve seen it driven by somebody like me in the CTO role, as well. I think it’s really going to be determined by the company and how the company wants to operate.

The Inverse Conway Maneuver

Travis:
So, if you’re CTO and you’re looking at making a bunch of changes which are architectural, would you suggest leading with a team structure? Which comes first?

Kevin:
Okay, I get what you’re saying. Yeah, so, there are two ways to do this. One would be to embrace Conway’s Law and say, well, this is the team structure we have, so we should build an architecture that supports this, because if we build an architecture that doesn’t map to our team structure, it’s just going to cause us lots of challenges. The other way to approach it is to do what is called a Reverse Conway Maneuver, which is one of my favorite phrases in the software industry.

Travis:
I love that. What is that?

Kevin:
That is to say we want this software architecture. Like, pick the software architecture first, and then design the teams so they support that software architecture.

https://twitter.com/sarahmei/status/333636839451795456

Travis:
Give me an example of that. How would that actually work in practice?

Kevin:
So Spotify had a SOA approach from the very beginning, but it didn’t start with the team structure that people associate with Spotify. It started with a much more traditional kind of team structure. They changed the structure for many reasons, but one of the reasons being that they were having this problem of the software architecture and the team architecture, or the team design being at odds with each other, they restructured teams in order to match the software architecture they were trying to drive.

Travis:
What happened as a result?

Kevin:
Depending on what you think of SOA or microservices, massive success or massive failure I guess. So when I left, which was a couple years ago, Spotify had over 800 services in production. That’s 800 different services, thousands of instances running, of course, but there was more than one service per developer.

Travis:
Oh, wow.

Kevin:
Yeah. So, when I left, there was around I think 800 developers, around 800 services, but 800 people in technology, and that also included tester, agile coaches, and other folks as well. It worked incredibly efficiently. Spotify is really excellent. I don’t know what Netflix’ number is now, but Netflix runs same time, had the same number of services with fewer developers.

Travis:
How is that an instantiation of Conway’s Law? Is that consistent?

Kevin:
Yeah. Netflix, Spotify, and Amazon run their teams as independent units. Each of them is kind of working against its own set of goals and operating fairly autonomously. I think Spotify by far the most. That’s the critical part of Spotify’s way of working, but certainly, teams in Netflix and Amazon function independently of each other for the most part.

Whether that needs to be more than one service per team or not is questionable, except for if, at Spotify, or Amazon, a team gets too big and you want to split it in half. If that team is building its own mini-monolith, now you’re stuck with restructuring that monolith to split it into two pieces for each of the teams.

If you start with these much smaller building blocks tied to very specific functionality, if a team splits or team divides up into 15, each of those services can travel with the new teams. That gives you a lot more flexibility where you’re not having to do this continual software architecture to make your software and your team congruent.

Travis:
So, I guess I’m having a bit of a hard time envisioning what the organizational structure of an 800 services microservice team is. Does that mean the ownership can be passed along in a fairly facile way? Do teams own multiple services? How does that work? What is a team?

Kevin:
So, I know best Spotify, and Avvo as well. Avvo was also on this path but not as far along, so I can talk about it at Spotify. At Spotify, teams would own many services, sometimes on the order of a couple dozen services for a 10 person team. What that meant was each of these services would own a small piece of the overall product. There were a few large services, the playlist service at Spotify is fairly massive, but there were much smaller services as well; search suggest was its own service, so if you’re typing in the search bar at Spotify and you get the list of suggestions, that’s an independent service that runs, and it’s fairly small.

The beauty of these smaller services would be that, again, let’s say the playlist team decided to split into two teams, one just to do special things for podcasts, and the other one to continue to focus on music specific applications. In that case, some of these services are already going to be structured towards podcasts, and those go with the podcast team, and the other ones stay with the music service. Now, one team, instead of owning 20-some or 30-some services, now one team owns 20 and one team owns 10. So, you can split them much easier.

Also, let’s say that if you wanted to transition, because this did happen while I was there, the playlist service was actually owned by the infrastructure organization, just for whatever historical reason, and they were saying, “Well, we shouldn’t be owning this service, the product team should own it,” so they transitioned it over. Because this service itself, even though it’s one of the larger services, is still a relatively small amount of functionality, relative to the entire product, it was able to be transitioned from team to team, from completely different sets of owners, with a day or two of hand-holding and walking through the code, because it itself was relatively small.

Travis:
So, when services transfer like that from one team to another, is it typical for people to go with them? How is the domain knowledge piece handled?

Kevin:
I’ve done it different ways in different organizations.

I think that in this example I gave specifically is interesting because it was transitioning from a team in Stockholm to a team in Gothenburg. There was sometimes you would do it with an actual person moving from team to team. Usually, that would be the case if that person was particularly attached to working on this problem, they might switch teams. But often, it wasn’t that. It was a handoff process where there’s sort of a sit down, a walk through the service, from the owning team to the team that’s taking it over, to make sure all the questions are asked and you can talk through, well, we were thinking we were going to do this at some point, or this has been a challenge in the past, and you might want to look at refactoring this. Basically, do a knowledge transfer.

Then past that, you might have somebody rotated into a squad or a team for a couple sprints, kind of continuing their work, and before they go back to their team. So, there are lots of different ways to do it, but essentially, it’s like a normal handoff process, and sometimes it means a person goes as well. It depends on the person.

Travis:
How important is the notion of ownership, functional ownership, with regard to organizational and architectural design?

It sounds like it’s fairly integral to a lot of what we’ve been talking about. Is a strong notion of ownership a key piece of this?

Kevin:
That’s actually a really good question, because I think we didn’t talk at all about functional structure in this case. So, having been in organizations where the functional silos are distinct at Microsoft back in the ’90s when I was there, product management was its own thing, building the specs, handing them off to development, or we’d build them, where we’d hand them off to test who would test them. So, these functional silos tended to create their own kind of Conway’s Law problems, not because of the software architecture, but because of the structural aspects of this, where the spec was built completely with technical, or some technical input but really driven outside of the engineering team.

So, the product people would build that made sense from a user perspective and the program managers, but made no sense from an engineering perspective because of the way the software was architected, would require massive re-architecture sometimes of components to make it work in this way, which would then delay the release, which would get it to testing late. Where a cross-functional team, like a lot of companies do now, tend to not really have that, because everyone’s part of that decision-making process. That’s just something that I geeked out on for a second. Sorry. I totally lost your original question.mode

Travis:
Yeah, but I love it. I’m actually going to follow up on that one.

Kevin:
Okay.

Travis:
It almost sounds like you’re suggesting a difference between microservice and micro monolith.

Kevin:
So, I’ve gotten really careful about the “microservice” word, partially because I don’t know if we all agree what that is.

Travis:
Probably not. What do you think a healthy definition of microservice is?

Kevin:
In this context, I am scared to actually say it, because I don’t want to hear what people are going to be yelling at their phones when they hear me say it. But, I think for me, this is why I said, well, SOA or micro service, one of the things, based on your definition, whatever you all have in your head, is probably right.

The way I tend to think about it, and where I’ve seen it actually be very valuable is a service that’s small enough that it’s easier to rewrite than it is to refactor.

Travis:
I love that.

Kevin:
If you think about it, there’s a lot of overhead involved. That would mean you have lots of services, which means you have to have an infrastructure that supports having lots of services. I think when we were really talking about microservices a lot, it was before serverless. Serverless now is essentially microservices taken to the extreme, or taken to its next logical step. What I liked about the way we would structure services at Spotify was that the problems I had seen when building larger services, if we have a new API version, so now we have a bunch of stuff in the code. Well, if you’re using this version of the API, we’re going to do this. If you’re using this version of the API, we’re going to do this slightly differently, and it becomes a little bit of a business logic rats nest sometimes in the code, supporting all these different API endpoints.

Instead, what you could do is say, no, we wrote this service to support version six. Well, now we’re doing version seven. We’re just going to write a new service because it’s small enough that it is a matter of a sprint versus six months. If we do that, and the logic now is very dedicated, we are doing this in the way this version of the API works. We have clients continuing to talk to the old version of the API. We can actually watch that. So, in a company like Spotify, where you have external API folks, hardware talking to specific versions of API, things like that, you really can’t retire old APIs ever. But, you can watch usage. So, how many times is version six of this API being hit?

There’s a threshold below which you can say, well, now we can finally retire this, and just shut the service down. That implementation of the service version seven’s still running. If you had a service talking to both endpoints, and you’ve got all this code in that service to support the differences; however, this is slightly different from each other, when you can say, well, we can turn off version six and, you still have all this code that you’re either going to have to refactor or just live with kind of crufty, dead code in your code, which produces bugs, which produces quality issues, which may produce scaling issues.

So, being able to say, no, we’re just going to have a new service here rather than continue to work on an existing service is really powerful. It’s also easier to test. There’s a lot of benefit to that. So, that’s what I think of when I think of micro-services.

Conway’s Law in cross-functional teams

Travis:
Awesome. So, do you see Conway’s Law playing out with cross-functional teams? It seems to be one of the main trends in the industry right now, to cite this notion of cross-functional teams. Do you see that playing out with regard to Conway’s Law, and if so, how?

Kevin:
So, I think one of the areas when I think of teams, and not just my Spotify bias, but also some of the things that I was working on before, and since, what I think of when I think of effective teams is their components that are able to work independently of each other, and that to me is a highly desirable thing for a software engineering organization. The teams don’t have a lot of strict dependencies between them, because that tends to slow them down.

So, where that maps very well to Conway’s Law, if these teams are independent and they each own by design or by organic, these teams each own their own modules. That builds the software architecture and it’s great. Cross-functionality comes back into it more to the idea that these teams can operate independently of each other, and so a cross-functional team has much fewer dependencies onto centralized organizations. So, when I build teams, I like to have, if I can, a designer. If the team owns UI, I want to have a designer who sits on the team. What that means is because I’ve been in organizations where design was centralized, and if we needed some help with some UI, I had to sort of fill out my request in paper, and triplicate to a centralized design organization so they could give me four hours of a designer’s time, nine days in the future.

So, having that cross-functional team means that that team can operate independently and doesn’t have to wait for things, which I like.

From monolith to microservices

Travis:
All companies are born tiny and monolithic. At what point is it time to start thinking about either breaking that up or doubling down? Where does the pain start to hit, and what does that feel like?

Kevin:
Well, I think that it’s going to depend a lot on your budget and your ability to hire, and a lot of things. In a company with the ability to hire fairly well and a budget to be able to staff up, it’s fairly easy to build these kinds of independent autonomous teams that are fully cross-functional. Spotify is a great example. A lot of companies don’t have that. That’s life. You have enough budget for three designers, but you have six teams. So, in that world, you’re going to have to figure out a different structure, but still, you can do this in an acknowledgment of looking for where the bottlenecks are and then just trying to get rid of them.

So, I think that one thing I like to think about when I think about this, because when you think cross-functionally, and when we think about software teams, we tend to think front-end Dev or full stack Dev and back-end Dev, or maybe data, or design, test, all those different kind of traditional functions, but data is a new one. Data engineering and data science were so rare, and so hard to find those people that you are only going to be able to have one or two for your organization. So, then you couldn’t distribute them, you couldn’t have truly cross-functional teams.

But, you would start with that, where they’re on the projects that are most critical, or they’re going team to team and helping where they can. You as an organization get better at utilizing those folks, you hire in more, and eventually you reach sort of critical mass where you no longer need to centralize it; you can now distribute it, distribute that function, which brings that knowledge just like we did with test, just like we did with product or program management. That got distributed, and the value to the team was having these different functional voices represented was pretty significant, how things worked before and after.

So, that’s the way I like to think about any new function in how to distribute it. You have to start from a centralized kind of timeshare model until you develop that critical mass where you can distribute it, but that should always be your goal.

Travis:
The end game.

Kevin:
That should always be the end game, yeah, and I’m fairly opinionated. You can hear about software architecture and the way that I like to see it structured and the way that I’ve seen it work well. You can apply these, and the same ideas are always going to be there, no matter what your preferred software architecture. I think the critical piece there is just understanding that, and taking advantage of that, or just working with that understanding.

Travis:
So, it sounds like the suggestion here is that almost like a temporary embed is better and sort of more in line with Conway’s Law than these fractional units of someone’s time, like a half day here, six hours here. Is that fair?

Kevin:
Yeah. I don’t know about Melvin Conway, but that’s what I’m going to say.

Travis:
Sure.

Kevin:
That’s what I’m going to say for him. I think that’s also just for me a level of efficiency. If you’re working on something, and that something requires a skill set, and that skill set is a distance from you in the organization, it’s very hard to get any kind of quick response. If I have to apply for a data scientist’s time to help with this problem, and that person isn’t available to me because they’re working on something else for a week, I’m now stuck for a week. If that person instead spends two weeks sitting with me and my team, where we can have very good interaction and I can ask all my questions as I come up with them, I’m going to have a lot better context where I may not need as much of their time.

Travis:
Do you see any sort of common Conway’s Law anti-patterns that are recurrent in the industry?

Kevin:
That’s a good question. I haven’t thought about that.

Travis:
Yeah, or things where you come in and you’re like, well, that is a classic Conway’s Law style problem.

Kevin:
One of the things I see from time to time, is these teams, and by the way, this is how I structured my organization at Adobe, where one thing I see as a pretty common pattern is this thing where, we have the server team, then we have the client team, then we have the core library team, and these are all separate teams. Then people will come to me and say, “Hey, Kevin, can you do a call with us and help us with some challenges we’re having in the organization?” Okay, yeah, what’s going on? “Well, everything’s taking so long,” because every new feature comes from the client team, and then they have to request the functionality from the core library team, which has to request the functionality from the server team, and this organizational structure is very much mapping on to the layer cake of our architecture. That’s just slowing all product development down because we’re having to coordinate across these different silos. That’s a very classic example.

That is how we structured teams for the longest time. You put all the server developers together, and you put all the core together, the piece that talks to the server and supports the functionality to the different clients, and then the client teams. That was a very obvious way to structure your organization, but it was an obvious way to structure your organization because that was how you divided the work on the architecture. This is another case where moving from a monolith approach to an SOA approach lets you actually say, well, no, the services that support this set of functionality is going to live in the team that owns the front end. This is where we get into the full stack developers and the full stack teams that completely own the feature teams model, is sort of a solution for the anti-pattern, but that pattern being an anti-pattern is relatively new.

Travis:
Right. I love that you brought up the full stack developer.

When you’re putting together these cross-functional teams, is that a skillset that you look for, or is it more like a myth that people got, because they want this sort of vertical feature ownership? What’s your view on that?

Kevin:
I’ve worked with a lot of skilled full stack developers. I tend to think, and maybe this is my own bias as a developer or having been a developer for a number of years, I think that we all can do lots of stuff because we’re interested in lots of things and we can learn lots of different skills, but I do think that we all naturally have an affinity towards one side or the other. I don’t think anybody’s a pure full stack developer that’s amazing at building services, and also amazing at crafting user interface. I think you tend to have a proclivity on one side or the other.

Just acknowledging that, there are folks that do full stack, and they love building the backend, but only so they can make a better front end. And then folks who love full stack because they love building the backend, and they can do the front end too if they need to. So, when you build a team, I like building with full stack devs, but I’ll mix it up. People will tell you, well, I like both, I can do both, but I prefer one or the other, and then try and match them up. It’s this idea of the T-shaped person. I can do machine learning, like simple machine learning, and I can also do IOS development, and I can also do this, but I’m a real expert at writing efficient database code. That’s my depth. Then mixing all that up in a team so you get a nice breadth and depth is a good skill to have, or a good way to produce good teams.

Travis:
With the onset of a new software project, is it inevitable that it starts as a monolith?

Kevin:
It isn’t inevitable; it’s just natural. So, you can either choose to do that or choose not to do that. I think when you’re building a new product, and maybe it’s different now, one thing is I tend to see this a lot in these companies that are going through this now somewhat classic monolith to microservices transition, is these are all companies of a certain age, primarily, who sort of came of age in the days before public cloud, when you were still racking your own servers.

In that world, have a deployment that is SEP of a single service, that’s pretty awesome, and saves you a lot of time at the beginning. Nowadays, companies forming, they’re much more quick to embrace that sort of smaller service approach, because they can leverage all the infrastructure that the public clouds give them, and they don’t have to build as much of a capability around a lot of infrastructure support you need.

Now, between Docker, Kubernetes, and serverless, and those kinds of pieces, and cloud formation or whatever, it’s much, much easier to build a microservice architecture or an SOA architecture from the beginning than it used to be. I don’t know if it’s just because I tend to talk to a lot of companies in that 8-12 year range who started with a monolith, because they were running out of their own colo, and buying machines, and this was the easiest way to do it. I don’t see that as much anymore.

Travis:
For teams that are in that transition from a big monolith to microservices, where they’re breaking up the app in some fashion, how do you go about maintaining the architectural integrity of the product as teams become larger and then split, and the architecture splits?

Kevin:
Yeah, if I could solve that problem in a sentence, I should write a book and solve this problem. There are so many companies struggling with this, including when I came to Avvo, they had started this transition and was struggling with it just like everybody else. And I would love to say to people listening, oh, well, all we did was this, and we totally solved it, but no, we continued to struggle with it the whole time I was there. We made a lot of progress, but it’s a challenge.

There are a few patterns I’ve seen that I find really interesting. One was I heard about that they did at Warby Parker, where rather than start splitting individual pieces of functionality out into their own services, they split their monolith into four smaller monoliths, and then split each of those monoliths into four smaller monoliths as a different way of approaching the problem, and just finding those cut lines were critical, obviously. That let them tease it apart, I think, in an easier fashion, but that doesn’t work in correspondence with Conway’s Law necessarily.

Travis:
With these splits, how do you know if you’re dividing things up into monoliths versus microservices? I know it sounds like a silly question, but there’s so much difference in how we use terms in the industry.

Kevin:
Right, and that’s why I’m trying to be really careful about how I use the word microservice. What I would say again, the way we approached it at Avvo was the approach I think a lot of companies do, which was, we found individual pieces of functionality in our monolith. So, single APIs. Here’s an API that returns this. Avvo is a legal marketplace and, returns this set of information about an attorney on the monolith. Okay, so what we can do is let’s put a new API, a new service that just calls through to the monolith, and then we will extract all the logic out of the monolith and put it into this service, and then it stops calling the monolith. This functionality is removed from the monolith. It’s running in its own service.

To the extent that, again, like the way I think of microservices is it’s easier to rewrite a new one than refactor, that tended to be fairly small. When I say rewrite, it’s a sprint to rewrite. That’s certainly where we were with a lot of services at Spotify. When we started, or by the time we were done, or by the time I left at Avvo, that’s where we were now, too. It was much easier to write a new service than it was to add functionality to the monolith. That let us pull stuff out of the monolith on a much better basis; so, small.

Travis:
So, as the code breaks apart a little bit, who helps manage the communication between dependencies that might still exist there?

Kevin:
One of the things I like, and with the right kind of developer culture you have in your teams, works pretty well. When multiple teams are all kind of working together on a monolith, there has to be some level of communication. Then can all be working on their own things, but in the end, this thing is going to get deployed as a unit, and when it goes down, somebody’s going to get called, and they’re going to have to respond, and it’s not going to be obvious whose code broke. So, you have to share knowledge across a team.

It doesn’t change that much when you’re doing this transition. So, a team will say, hey, we’re going to pull this code out of the monolith. This is kind of functionality that maps do, what our team does, so we’re going to move this out into its own service, and just making the rest of the folks working on the monolith aware of that. Usually, you might actually have, if other teams have some sort of visibility into that part of the code, having them do code reviews. Generally making sure there are just people that are aware of what’s going on.

I think part of this, as well, is being careful about overstating code ownership. So, if you say, well, I own this code, or my team owns this code, but other people have dependencies on this code, you put yourself in a situation where if team B needs something in a service that’s maintained by team A, and they have to make a request to team A, which has to be prioritized against whatever else team A is doing, which means team B is now waiting for who knows how long. I’ve seen that a lot. I don’t like that.

So, instead, I like to say, if team B needs something in a service that’s nominal, like team A is operationally responsible for, team B let’s team A know, “Hey, we’re going to make this change,” just in case team A says, “Well, hey, we were just about to get rid of that service,” or something, just so you don’t have miscommunications there. Team A says, “Okay, go ahead and make your change,” team B makes the change in team A’s code. Team A code reviews the change, accepts it, and then deploys it.

That way, that operational responsibility piece is pretty key because, again, somebody’s going to get woken up when something doesn’t work, but I don’t like having teams have to be dependent on each other for changes to the code. So, I’d rather have everybody own code, but operational responsibility is clearly delineated as a way, and which requires pretty significant communication.

Travis:
It sounds like it borrows a little bit from some open source practices. You’re welcome to come change it, but we’re going to review it and make sure it passes since we’re the owners of the project.

Kevin:
That’s exactly where I’m inspired. Yeah. It’s just how open source projects work.

Influencing communicating structures

Travis:
Is most of that style of communication typically done ad hoc, in places you’ve worked?

Kevin:
Yeah. Well, not necessarily in places I’ve worked, but in places that I’ve had the ability to influence and/or change how it works, yeah, it is. So, having worked in very structure environments where communication flows only through very specific channels, very like, the places that are not that distinct from where Conway was working, I always bristled at that because the communication between teams was really tightly controlled and structured, and it removed a lot of the important kind of co-ownership of things.

So, in organizations where I’ve been able to influence and/or direct how things should work, yeah, I prefer this developer to developer, more ad hoc communication.

Travis:
The channel for that communication is typically pull requests, or what are the mechanics behind that?

Kevin:
It can be any number of things. It can be pull requests, or it can be Slack channels. Spotify has this notion of guilds, where folks who work across in a skill set will meet together across all the different teams on a regular basis to discuss best practices or challenges they’re seeing. We had, at Avvo, a weekly meeting for all whichever developers wanted to show up, an open invite meeting to talk about what we called tech strategy, where people could say, “I’ve got this problem, I’m thinking about solving it this way, what does everybody else think?” That ended up being a good way to share developer to developer, and raise issues that maybe one developer was seeing on another team that another developer on a different team had just solved.

There’s also, in matrixed organizations like Spotify, and like Avvo became, you also have immediate peers in the management structure who span multiple groups and can share information that way as well.

Travis:
So, setting up and establishing those communication structures when they may not exist, is that just sort of something that you do by fiat? How does that come to be?

Kevin:
So, there’s a combination of nudging. I like to manage by nudging, primarily, the less by fiat, because by fiat means that people tend to resent somebody who just says, “Do it this way because I want you to do it this way.” So, instead, I kind of manage a lot more through nudging, like “Hey, maybe we should have something like this, or maybe this might work.” Then try and get somebody excited about it and have them take it and drive it themselves if I can get somebody excited about it.

I think these kinds of ad-hoc communication structures work much better if it is not something that management imposes but something actually the developers themselves build and maintain. So, that meeting that I mentioned at Avvo, that was started by one of the developers, and run by the developers themselves. I would come, and would sometimes speak if I had an opinion, but it wasn’t my meeting. I was just there to participate. That meant that they could raise the issues that they had as opposed to me setting an agenda.

So, [inaudible 00:54:08] definitely see this. Slack’s the great equalizer, and Slack’s been really great for this, or IRC channels before it. It’s essentially taking those things where people are already self-organized and just giving it more support to take that beyond the virtual.

Travis:
So, one of the things I always find fascinating about Conway’s Law is it’s specifically about the communication structure.

Kevin:
Right.

Travis:
Which, of course, follows org structure and all that. Do you think there’s still a place for regular formal meetings, and how do you approach that kind of thing?

Kevin:
Yeah, I always tend to think of things in terms of communication pathways within an organization. So, because that is really how that tends to influence beyond software architecture, influences a lot. Who talks to who, and how information moves through an organization shows a lot about how that organization gets its work done, and software architecture just being one part of it.

So, in an organization I lead, sure, you’re still going to have these formal meetings, some of which are going to be me communicating to my team because I have access, spreading new knowledge that I have about what’s going on to them. I think that’s still valuable, but I also try and encourage these person to person kind of informal meetings or formal as well. One on ones, peer to peer one on ones, peer to peer mentoring, team meetings. These kinds of community of interest meetings, just as a way of trying to break through the different silos that you have.

So, Conway was really reacting to how the silos of the organization produced sometimes inferior software architectures, but they were mapped to those silos. I like breaking apart, find ways to subvert the siloization as much as possible. Even in a company like Spotify, where you tend to think of it as this really autonomous and all this great stuff, there’s absolutely communication flows and networks, and those are influencing the structure … influencing lots of different things.

That’s why you have these elements like guilds, because the tribes at Spotify, the squads were their own silos, and they weren’t functional silos, they were feature silos, and then the tribes were silos, and this is where those guilds could break through those things and finding other conduits, like Spotify has a strong agile coach population, and they become their own functional network that transcends all the different silos.

As a leader, you learn how to take advantage of these different networks to spread information but also to glean information.

Travis:
You even said you like to encourage these kinds of alternate communication channels. What are the mechanics behind that? It’s sort of like the inverse nudge. What does that support look like? Is it sponsoring lunch? Just nuts and bolts.

Kevin:
Certainly, it’s stuff like that. It’s encouraging people who will come to you and say, “Hey, I’d like to do a lunch and learn on this,” like, yeah, absolutely, I’ll pay for lunch, go for it. It’s setting up meetings. So, one thing I did at Avvo, based on my experiences at other places, was a kind of a yearly mini developer conference. I did one for data and then one for product development. It was an open invitation, it was a day, there was food, it was an un-conference style, so it wasn’t me driving an agenda, but was an opportunity for people to speak to each other, and start to build correspondences and networks more organically rather than creating it. So, finding opportunities to bring folks from different teams together, and find ways to get them speaking to each other would tend to build these networks in a very kind of organic and loose way.

We also did things at Avvo, like build a mentoring network, a very explicit sign up for to both mentor, and if you wanted to be mentored. So, it was a two way thing. That helped build correspondences. You can also do this really effectively if you have an organization that’s more self-organized. So, there’s a company in Seattle called Qumulo, who builds storage appliances for enterprises, and they do periodic restructuring exercise, where it’s essentially like a job fair, and everybody can switch teams. That tends to produce an organization where your network spans lots of different teams within the company.

Another thing that companies do, and was really valuable, I first did at Spotify, would be these onboarding teams, where you join the team. I know lots of companies do this, where you join the team when you start, and you work on that team for a couple weeks or a month or something, with people who are moving on to all different teams. That kind of kick starts a network for you within the company, where you know people in lots of different parts of the organization. That is another way of breaking through silos.

Travis:
So, how do you know if it’s time to really focus on this in a meaningful way? What are the signs that you may have a communication bottleneck?

Kevin:
I think one of the things I will do periodically well, let me step back. I hear this a lot from companies where they’ll call me and they’ll say, “Hey, can you dial into this meeting? We’ve been really challenged by our ability to execute. We’re moving way too slow.” I hear this so often, we’re moving way too slow, and we don’t understand why, and things should be going a lot faster. One of the things I’ll say around that is how are you actually mapping how stuff moves through the organization? Had some exercises I’ve done within my own organization. When you feel like something should be going faster than it is, and there are tools that let you measure things, but I tend to go by how is it actually feeling.

You do an exercise where you spend a couple weeks and, you track work through your teams. So, we’re building this new feature. Okay, well, this team needed to do this, and then it goes to this team, and this could even be within a single team, but it’s going from this person to this person, then it waited, just a total [combon 01:02:10] kind of view, to track work through it. If you do that for a few weeks, you’ll start to see patterns. Oh, well, everything seems to get stuck here. Things get stuck other places, but this seems to be the primary bottleneck. That’s a primary opportunity to look at it and go, okay, well, is this team understaffed, does this team need to be a separate team? That’s where you can start looking for options.

If you just ask people, why is stuff slow, they’ll all tell you different things, and because none of them have that complete picture of how work moves through. So, each person in the organization is going to identify different areas to what they think the problem is, but they don’t have the broader view, so actually mapping out how the work moves through the organization, actually tracking individual pieces of work and finding where they get stuck, that gives you a lot better insight.

Travis:
I love that. All right. Well, is there anything we haven’t discussed yet that we should have?

The Homomorphic Force

Kevin:
Oh, I think we got to talk about one of my favorite parts when I think about Conway’s Law is how it actually maps your brain.

Travis:
Ooh, I love it, all right.

Kevin:
Which I didn’t get to talk about. I want to talk about that.

Kevin:
So, one of the things that’s kind of interesting, as I originally gave a talk at the O’Reilly Software Architecture Conference a little while ago specifically about this correspondence between software architecture and organizational design. As part of this, and just kind of doing more research and trying to learn more about it so I could speak about it, I talked to Brit Andriata, who is a sort of organizational sociologist, but also has done a lot looking at brain and how your brain responds to different inputs from the organization. I was explaining to her what Conway’s Law was and some of the things that I had seen, and she actually brought up an important thing that I had never really thought about, and then I studied it a little bit more and was really fascinated by.

There’s a part of your brain in the hippocampus called the entorhinal complex. That part of your brain is where you store all your maps. So, it’s your physical maps, it’s how you know to get from your house to the bus and the bus to the office. You can kind of do that on an autopilot after the first couple times because that map is stored in your brain. That also tends to contain the maps of the social structures that you operate within as well. Your organizational structure in the company you work for is also stored in the entorhinal complex, and this software architecture that you work on, that is very much a map.

This homomorphic force I mentioned before, this mathematical thing, the mirroring hypothesis, Conway’s Law, essentially what that’s saying is you have this strong force that wants these two different graphs, let’s say, to be similar because that makes mapping easier. So, you can think about this in this context. You’re working on a software project. Let’s say you’re in a more traditional structure. You’re working on the backend. You’re a user of your product, and you see a bug in the Android client, and you want to say, oh, I should tell that team that I found this bug. I should give them the bug. Then you have to think about, well, who’s on that team? Who do I send the bug to?

The fact is, okay, you have this software architecture because you know this software architecture in your head. If the organizational structure maps to that software architecture, clearly you go, “Oh, I send it to the Android team.” That’s pretty obvious. If, for example, your software architecture and your organizational architecture are distinct, like the Android client is actually owned by the, I don’t know, web team, and you go “You actually don’t know.” Well, I know there is an Android client, but there’s no team that maps to this. Which team owns this? It’s a perennial problem. Who’s supposed to get this bug? I don’t know who owns this thing.

So, this is one of these cases where this homomorphic force, this entity, where these two maps want to be similar in your entorhinal complex, this actually leads you back into Conway’s Law, and it makes things a lot easier. So, this is one of these cases where you ask, is it a law, and the answer is no, of course, it’s not a law. No one’s going to arrest you for building a software architecture that is not congruent with your organizational design. However, if you choose to ignore it and build these two things in very different ways, nobody’s going to have any clue who to talk to or who’s responsible for anything, and there’s going to be continual pressure to make these things be the same because your internal maps don’t match up at all.

You’re literally fighting your own brain here, and actually having come to that realization, started to make me understand why this actually was such a problem to try and avoid Conway’s Law or subvert Conway’s Law. Your brain is actually going to fight you on that one.

Travis:
Yeah, you’re referencing the same map.

Kevin:
Exactly. When they’re different maps, you can’t map between them. There’s no linear mapping from component to component, and it really confuses you and makes it very hard for you to do your job.

Travis:
I love that. All right. Well, I think that’s a wrap. Thanks for joining us today, Kevin. I’m going to turn this off here. Thanks for joining us today, Kevin.

Head over to SE-Radio to catch the full episode. And be sure to subscribe to stay up to date on all things software engineering — each episode is either a tutorial on a specific topic, or an interview with a well-known character from the software engineering world.

 


 

Brook Perry

Brook Perry

Brook is a Marketing Manager at GitPrime. Follow @brookperry_ on Twitter.

Get Engineering Impact: the weekly newsletter for managers of software teams

Keep current with trends in engineering leadership, productivity, culture, and scaling development teams.

Get the Guide on Data-Driven Engineering Leadership

Gut feelings in engineering are being replaced with data. By analyzing over 7 million commits from over 85,000 professional engineers, we share how you can incorporate concrete metrics to guide engineering productivity.

Success! Please check your email for your download. You might also be interested in Engineering Impact: the Weekly Newsletter for Managers of Software Teams. Keep current with trends in engineering leadership, productivity, culture and scaling development teams.

Share This