Evaluating Employees in Product Design & Development Roles
Results. Metrics. Impact.
When deciding how to evaluate employees, these are often the things companies land on. It makes sense on its face. If a company’s goal is to, say, grow its customer base from X to Y in 12 months, what better way to align employees to that objective than to try and directly measure their contribution towards it? You worked on Project A and it singlehandedly got the company 20% closer to its goal? Congrats, you are judged to be a successful employee and you will likely enjoy everything that goes along with that.
But what if you worked on Project B — not even by choice but because you were assigned it — and it ended up being a failure? Your results were terrible, you didn’t move metrics, and your project had no impact. Then what?
Welcome to the controversial world of employee evaluation in product design & development.
I want to cover five aspects of this subject, and in doing so, provide an outline for others to use as they seek to improve processes at their own companies:
- Why this matters
- Decision quality vs. outcome quality
- The four pillars we use(d) at Twitter Design and why (optional past tense because I left my role as Head of Design at Twitter earlier this year)
- How to measure success
- Putting your own plan into action
Why this matters
Treating designers, engineers, PMs, researchers, and any other employees fairly is an unspoken goal across almost every tech company. The only controversial part is how to actually make that happen. In order to do that, we have to recognize what the nature of a job in product development entails. Specifically, working well with others, taking risks, being creative, and embracing high-judgement failure.
Product development is a lot more like poker than chess. As an employee, you aren’t always in control of the hand you’re dealt, the other people at the table, or the unpredictable ways the process unfolds. It is your job to behave in a way that maximizes the chances of success. If you’re a designer, this may mean prototyping an interaction an exhaustive amount of ways. If you’re an engineer, this may mean writing throwaway code so you can test an approach before investing too much in it. If you’re a PM, this might mean yielding to your engineers and designers on something in the name of keeping team energy high.
In chess, a great player beats an average player 100 times out of 100. In poker, however, the “best” player loses all the time. The difference between good and bad poker players is how they behave. How they behave — or the quality of their decisions — is the difference between winning and losing over the long haul.
Decision quality vs. outcome quality
There is a phenomenon in tech called easy-to-measuritis which says that we tend to concentrate on the things we can easily measure rather than the things that are most important. For example, the best thing for our business may be happy customers, but in order to measure happiness, we may have to build a ton of unwieldy survey infrastructure, so instead, we just measure bodies coming through the door and use that as a general proxy for happiness. What then happens is we build things to optimize bodies coming through the door and we move that number whatever way we can, perhaps even to the detriment of customer happiness.
The traditional way of evaluating employees — based on things like results, metrics, and impact — is just another manifestation of easy-to-measuritis. We want objective, binary ways of evaluating people so that they are uncontroversial and unassailable, but what we end up with are objective, binary ways to measure the wrong things, or at the very least, things that employees are not in direct control of. Instead, we should be measuring decision quality instead of outcome quality. After all, how we behave is always 100% within our control.
Won’t you follow me on a personally painful sports tangent for a moment? Come along!
The worst sports moment of my life occurred on February 1, 2015, at 6:59pm. I was in Arizona watching the defending champion Seahawks play the Patriots, and after marching down to the 1-yard line with less than a minute left, the Seahawks called a pass play to Ricardo Lockette instead of a run play to the best running back in the NFL, Marshawn Lynch. If you pay any attention to football, you know what happened next. The ball was intercepted by undrafted rookie Malcolm Butler, preventing the best and most exciting team in the NFL from winning a second straight Super Bowl.
That was the worst play call I've seen in the history of football.😞
— Emmitt Smith (@EmmittSmith22) February 2, 2015
Reading the tweets and other reactions after the game, I had never seen such universal condemnation for a play call. Commentators, coaches, players, fans, and anyone else who had watched a down of football in their life chimed in on the epic stupidity of our offensive coordinator and head coach. My own hatred and disbelief of the call lasted several months as well until I read something really interesting: a FiveThirtyEight.com statistical analysis of that fateful call. The TLDR version is essentially:
- NFL teams had scored 125 rushing touchdowns from the 1-yard line that year and two attempts resulted in turnovers.
- NFL teams had thrown 66 TD passes from the 1-yard line that year and ZERO attempts resulted in turnovers.
- The Seahawks final playcall was either only 0.3% worse or 3% better than a run, depending on certain assumptions.
- New England’s decision to *not* call a timeout was, statistically a much worse decision than the final pass.
… and yet, the Patriots won the game, Seahawks coaches generally looked like buffoons, and New England coaches looked like geniuses. As reporters descended down upon Seahawks coach Pete Carroll to find out what he was thinking, one thing he said repeatedly didn’t sink in for me until a few months later after reading all of the analysis. He said:
“It wasn’t the worst decision. It was just the worst outcome.”
… and in fact, it was arguably a pretty good decision, or at least a defensible one.
As with employee performance, it’s easy, and frankly lazy, to judge outcomes. Only upon thoughtful evaluation can you judge decisions and behaviors, which — as in poker, football, and product development — maximize your chances of success over the long haul.
Ok, no more sports! Believe me, that hurt me a lot more than it hurt you.
The four pillars we use(d) at Twitter Design and why
When we created our career paths and promotion process for Twitter Design & Research, we followed many of the principles mentioned above: reward behavior over outcomes, emphasize the importance of teamwork and execution, and keep everything within each employee’s control. Here are the four equally weighted pillars we settled on:
- Getting things done: Pretty simple. Do you do what you say you are going to do? Do you go the extra mile to complete work that needs completing? Are you where problems go to die? This pillar is both a measure of dependability and prolificacy.
- Creating strong relationships: Product development is a team sport. As I mentioned in Three Years in San Francisco, Intelligence X Collaboration = Results. Note that the relationship is multiplicative. If either I or C nears zero, so does R. Unless you are running the world’s worst interview process or physically sucking brain matter out of your co-workers, I is never really in danger of hitting zero. C, however, always is. If you score highly in this pillar, not just your direct colleagues will love working with you, but your cross-functional colleagues especially will. Know that designer who all engineers want to work with? That’s this woman.
- Improving the team: One of the benefits of working at a company with other great designers, engineers, PMs, and researchers is that the sum can be a lot greater than the parts. Not only that but by virtue of being around so many talented teammates, your own career growth can happen almost automatically without having to spend nights and weekends learning new skills. We encourage people to make their teammates better by doing things like brown bag sessions on prototyping, brainstorming, and other important skills. Additionally, we encourage people to proactively help their teammates out on projects even if the project is not their personal responsibility. Improving the team can come in many other ways, including recruiting new teammates, but the point is to be a great teammate above and beyond being a great designer, engineer, etc.
- Technical skills, empathy, and vision: These are the individual skills that most people initially assume are the only keys to success and promotion. We purposefully made them account for only 25% of the total formula to stress how important the other elements are. These are also the skills that are most customized to Design & Research. If you want to adapt this entire framework to PM, Eng, or any other function, you could probably leave the first three alone and just change this one pillar a bit.
How to measure success
Ok, so now that we have our four pillars, how do we measure success against them? This is where some people are going to get uncomfortable. The answer is by soliciting opinions from peers, managers, and anyone else who works with the person being evaluated. Isn’t that subjective though? Yes, yes it is! It’s subjective, but clearly specified and full of agency. You may disagree with a person’s assessment of you (which is why multiple people give feedback), but you should never feel like you either don’t know what’s expected of you or that you aren’t in control of the associated behaviors.
I will also note that as a manager, if you’ve hired and coached well, people should naturally do well in all four of these areas. If you receive feedback to the contrary, it’s your responsibility to investigate further and get to the bottom of the whatever is going on (if anything).
Putting your own plan into action
One of the reasons the switch to an explicitly behavior-based evaluation system was so well received was that we were transparent throughout the process of creating it and included diverse perspectives all along the way. We solicited feedback from men and women, from senior employees and rookies, from people of different races, and from people outside the department as well. Through that inclusive process, we not only greatly improved the framework, but we also created a feeling of collective ownership before it was even ratified.
So, yeah, if you are going to change the way your team gets evaluated, don’t just pass commandents down from a mountain. Get everyone involved in the journey.
One other piece of advice: solve this for your department first and see how it goes before entering a company-wide holy war on the subject. When we wanted to try this out at Twitter, we simply included our wonderful HR partners in the process, got their blessing to give it a shot, and then just made it happen. While I hope that the entire company soon evaluates employees the same way we do, we wouldn’t have gotten anywhere if we insisted everyone blindly do things the way we were proposing. As in product development, it’s often smart to start small.
Alright, so that’s the framework! Take freely from it whatever you’d like. It’s one of the things I’m most proud of improving during my time at Twitter. Like good design itself, it’s aimed at doing the only thing design actually can do: increase the chances of success.
(This post also available on Medium.)