How to Break an API: How Community Values Influence Practices

Christian Kästner

Recorded at JSConf EU 2017


Get notified about Christian Kästner

Sign up to a email when Christian Kästner publishes a new video

Values Influence Practices
By
[ Applause ] >> All right.
Thanks for coming.
I continue this theme of speaking about community, but I look at it from a different perspective.
We look at the culture of communities in terms of norms and values that are established in
communities and how in different communities have usually fairly cohesive values and practices,
but among different communities you see huge differences.
And I want to talk about some of those effects and what you can learn from looking into other
communities.
This is joint with Chris, Anna, James and Ferdian.
I'm at Carnegie Mellon University.
This is not with experience.
I don't have much experience with JavaScript to be honest.
This is from an outsider's perspective.
We study ecosystems as an outsider, and we hope we have an idea of comparing ecosystems
learning about this from the outside.
And we have been looking into this as a research project which we called breaking APIs for
the last couple of years and we have a Twitter account and web page based on this that I'm
going to advertise a little bit later.
So we looked broadly at ecosystems.
And they form around platforms and products where you have a network structure of dependencies.
You can depend on a package, and that package can depend on other packages.
And you have many of those communities around in different languages and different products.
The main point of this talk is that the culture of the ecosystem matters.
There are lots of different values, different ecosystems adopt different values.
Different things are more important than other things.
And leads to different tradeoffs.
You can't achieve everything perfectly.
Some communities favor some things over others.
And following this, different cost allocations, some people have an easier time.
People are new to an ecosystem or people establish or put into an ecosystem.
And you often see consequences in terms of what we call ecosystem health.
So are there specific problems that you see in ecosystems, like it's hard to retain developers
in this ecosystem?
Or a lot of packages are generally considered low quality, things like that.
And why would you want to learn about this?
There are lots of different issues where this might be interesting.
Let's say you want to change a policy or introduce a new policy.
Say you want to change the versioning rules in your ecosystem or the way� how you upload
packages.
How you make this decision?
What are the consequences based on this?
So understanding the space, understanding tradeoffs can be a help.
Or you want to fight a health issue.
You have problems retaining and recruiting new developers.
What design decision in your ecosystem might be contributing to this?
How can you debug your community to figure out what to change?
Or design a new community.
I don't think any of us will design the next npm, but there might be a chance where we
are focusing on sub communities, say Hoodie or some number of packages where you want
to push specific values.
What can you do to foster goals in one community?
And, again, for this, culture really matters.
It's important to understand the values that� it's important to understand the tradeoffs.
And we have been doing research on this for a while.
What I'm going to talk about is mostly based on interviews and surveys that we have conducted
throughout the last couple of years.
And we looked at this through one specific lens.
The idea here is that if you use a package, there are lots of dependencies often mind
that package and you depend on a number of packages.
And all of those packages have maintainers, and they could make just changes to their
packages.
And some of those changes are modular.
So you just update and have the great new version and everything works fine.
But some of those changes are breaking changes.
So they may change some of their APIs.
You have to invest some time, get interrupted.
This can rip through the ecosystem.
It can have consequences.
A single change can affect hundreds of developers and their packages.
So you can start thinking about costs.
What's the cost of a breaking change in this sense?
And we think about this as kind of a balance of who takes cost� it's somewhat an abstract
concept.
But in the extreme case, it's really easy to make a change and impact hundreds of thousands
of developers out there who all get interrupted and who all need to invest in doing some rework.
So it makes it really easy for maintainers to break something and puts the cost on the
users.
That's one extreme.
The opposite extreme would be you refrain from making a change even though it's really
important to you.
There are lots of reasons for making changes.
Nothing is really ever stable, but if you are not allowed to make breaking changes,
if you really refrain from this, you're taking on costs, mostly technical debt, opportunity
costs, maybe have some performance problems that you could fix with an API change, but
you can't really do that.
Right?
So it be really costly for you and maybe in the longterm also for your users not to break
anything.
And then there's a lot of passes in the middle where you kind of, as a maintainer, you break
something, but you try to reduce the cost for your users by mitigating some of those
changes.
A typical example is you invest some effort in writing a migration grade to make it easier
for your users to update.
You might synchronize releases with other packages or on a schedule to make it easier
to make changes and not get interrupted.
And processes like backporting and deprecation, how you can delay the impact of changes.
So it's not as immediate.
And there are hundreds of those practices.
use different practices, and kind of really negotiate differently in this space.
So in this talk I'm primarily going to talk about three ecosystems.
There are many more.
Our web page contains survey data on 18 ecosystems in total.
But I want to dive in a little bit deeper on three of those.
And I'm going to start with Eclipse and them assuming you know less about them than NodeJS.
When I'm talking about those ecosystems, think about your community and what practices you
do and how this is different from what you see in those ecosystems.
And think about why.
Is this a good idea?
Should this be different?
Let's start with Eclipse.
You have probably heard of it.
Originally for Java programming.
Has hundreds of plugins.
These plugins can depend on other plugins.
So you have this ecosystem structure.
And if you look into this community, if you talk to people, or even if you just look at
their web page, you realize what values are important in this community.
So one important value is backward compatibility.
They say on their web page extremely explicitly, as the API prime director, when involving
the component API from release to release, do not break existing clients, right?
They really focus on stability.
And that's not something that they just say.
This is a culture that they live.
And here I can show you some data from our survey.
But how to read this is mostly� so we ask, how important is stability to your community?
And the higher up they are, the more important it is.
These are all positive values.
So we ask pretty from a little bit important to very important.
And most people say, yes, stability is important.
But what you still see are differences among ecosystems, and we have sorted this by average
rank.
And what you see is that Eclipse is really on the far end here.
Eclipse is the ecosystem on which developers rate stability as the most important practice.
Way more than other ecosystems.
And this is consistent� largely consistent within this community.
So this value that's communicated is actually accepted by the community and you see consequences.
Or how often are you faced by breaking changes?
You see answers that Eclipse� changes in other ecosystems.
So this is actually a best practice.
In terms of cost distribution, what I have shown you in the beginning, this is pretty
much on the extreme side whereas a developer, you are not allowed to make breaking changes.
Use a lot of work arounds, a lot of creative hacking and getting around limitations.
And there's the reduction of the impact of releases.
A fixed schedule, once a year there's a big release.
And large parts of the community synchronize and go through a month-long process of checking
things and so on.
So severing a tradeoff.
What can they achieve and what are the problems here?
So Eclipse is extremely stable.
It's convenient to use.
You can put packets written 13 years ago in a new version of Eclipse, and they work.
Try doing this with JavaScript packages.
Yearly updates are sufficient for many in the community.
You don't need to update from week to week to see the next update.
But this has consequences.
Eclipse is seen by many we spoke to as is stagnant and staid platform.
They have it without the generics, they couldn't update those.
And if I take my students as references, I would say they are not using Eclipse anymore
these days.
So at least in Java, people have switched to IntelliJ.
Not sure it's a good example, but by that example Twitter is pretty much dead.
I'm not sure with the students these days.
Eclipse reports that they have difficulty recruiting and retaining developers�and
that also these coordinated releases are kind of a pain point for them� for the community.
So they invest a lot of evident, but it's also painful.
When we ask the question about health.
How much do you face the difficulty of recruiting new developers?
It's on the far end.
If we ask about how much do you feel limited in innovating in your ecosystem, it's on the
far end, but on the other side if you ask� What I have shown you here is one end.
Eclipse has made specific decisions which I guess you see as very different from your
community and they have accepted certain tradeoffs and have certain goals with this.
I want to talk about the second ecosystem, R, kind of the npm for R.
R is a language for statistically computation.
And there are a couple of thousand packages on CRAN that are usually cutting edge research
So when you interview people in this community you often speak to physicists or soil analysis,
researchers, retired professors who maintain some packages in this community.
And they do something pretty interesting about dependencies and about versioning.
We call it the snapshot consistency.
The goal is the most recent version of all packages on the ecosystem should be compatible.
So it doesn't matter that the newest version of package A is compatible with an older version
of package B, it just has to be compatible with the newest version of package B at this
time.
So at any point you should be able to install any packages in that ecosystem, update all
at once, and it should be compatible.
So to achieve this, this requires coordination.
something, you reach out to people in that community that are affected.
You coordinate.
And then you typically more or less publish both packages.
Your package and the affected package at the same time.
So there's kind of a sliding window.
Typically three or four weeks until you have to do this.
But as a maintainer you have to react.
There's a threat you can get thrown out.
Your package can be taken over or get thrown out if you don't react and somebody depends
on you makes a breaking change and you can't coordinate with them.
To make this work, there's a strong culture of gatekeeping and enforcement of this.
Some volunteers review changes.
So you don't just upload things as an npm.
But you actually submit your changes.
Somebody review this is, run tests on the entire ecosystem to see whether something
breaks.
In terms of costs, we have a different picture.
So you can make breaking changes, but you have to reach out and some volunteers take
on some of the additional costs.
And you cannot have the constant synchronization here.
In terms of values, it's not quite that obvious as in Eclipse.
It's not strongly communicated on their web page.
When we talk to people that are kind of core members of this community, they say they want
to make it easy for end users to install and update packages.
And one of our interviews said CRAN primarily has the academic user in mind who want timely
access to current research.
And this timely access reflects the contrast, let's say to Eclipse, where you have releases
once a year.
These people are doing research and want to publish things and get them out there.
It's a review process, but much faster than yearly rearrests.
interestingly, and I don't have much time to talk about this, this is a value that's
not communicated on their Website.
It's not transparent.
And if you ask people, rapid access or quick release is actually not showing up as one
of the key values that the community as a whole shares.
It's somewhere between Eclipse and bioconducter that both have yearly releases.
I think that to the developers of the community, the idea is we want to be faster than the
turnaround than those yearly releases.
But to package maintainers, it's slower than npm where you can upload things.
You have to go through a review process.
There are certain values where we speculate they are not communicated well, so people
don't see why the ecosystem is designed this way.
So, again, there are tradeoffs and costs and benefits.
They achieve compatibility.
It's actually pretty good usually at that.
But at the same time, you have this urgency in� this burden to react to updates.
So, at any time there could be a breaking change that if you depend on that package
you may need to coordinate, you may get interrupted.
And this leads to other consequences where people aggressively reduce the number of dependencies
that they want to depend� they would rather copy and paste code from another package than
being exposed to this change is something we heard repeatedly.
And this gatekeeping causes additional effort and causes friction in this community.
So, this is always a contested point.
know.
And I think I can make this shorter on Node because, I guess, you know the community much
better there than me.
But my first contact with node was something like this.
Right?
Last week's tutorial is out of date today.
This is a couple of years ago that I heard this.
I forgot who the source was.
But then you also saw things like this which I think is insane, or unique.
That people actually started documenting the APIs, at least of the Node APIs, with stability
indicators.
They have� it's simpler these days� but this entire infrastructure about things we
are experimenting with this API.
You can use it, it might change, and at some point we mark it stable.
So implicitly you can see values of people open to rapid change and they are open to
experiment with APIs to get things right.
It's a more of innovative community, in a sense.
I thought design decisions are really trying to lower the barrier to entry.
It's really easy to publish your first package on npm.
A really low bar.
Also, you hear this in interviews with Isaac.
And the goal is to make it easy and fast for developers and publish and use packages.
If you look at stability, Node is not in the top space.
This is not the value that is most important in this community.
But if you look at innovation, this is where Node is at the far end.
Where the community thinks innovation is a key or is a very important value for us, among
other ones as well.
In terms of costs, though, it's pretty much on the extreme where it's really easy to just
break something and let other people deal with it.
The main saving grace is you're not forced to update, right?
So you can stick to an old version when you can't really do in CRAN that much.
And Eclipse in way.
You can't really stay behind or partially stay behind on some packages.
Which is easy here.
So there's some technology behind it.
And there's a bunch of practices around this.
So the semantic versioning is broadly adopted as signaling what's breaking and what's not
breaking.
This is something that doesn't make sense to people in CRAN and R. To them version numbers
are just increasing.
You don't try to communicate what's breaking or not.
It doesn't matter.
The newest version of our packages should be compatible.
Node or npm has this technology infrastructure to allow you to use multiple versions of the
same package, which helps you to solve the diamond dependency problem, which is pretty
unique and very hard to translate to other languages, which is a key feature here.
And then there's a lot of grassroots tool building from the community that tries to
build tooling to kind of cope with large amounts of changes.
Things like Greenkeeper and a bunch of these security projects that try to fit changes
for you which are important and which are not.
So in terms of tradeoffs, again, there are costs and benefits.
It's certainly a much more open community.
But it's perceived somewhat as unstable and having little quality control, right?
There's a lot of junk on npm.
And the rapid changes out there require constant maintenance, right?
So at least if you decide to keep up, and that's usually a good idea, I guess, if you
decide to keep up, you have a lot of changes to deal with dependencies.
There are often a large number of dependencies.
And then the community helps with tooling and community efforts.
So from an outsider's perspective, I have the impression that the amount of change in
Node and in this ecosystem is slowing down a little bit.
Potentially because there are more commercial users who want more stability.
More users in general.
Maybe a larger community.
So is Node actually getting more stable and, more importantly, should it get more stable?
And think this is an interesting question for the community to ask because stability
can be a nice value, but it also has its cost.
And we have seen the extreme in Eclipse.
So what I tried to show here is that culture really matters for an ecosystem.
Right?
There are tradeoffs.
I have shown you three very different examples with different cost allocations, different
people who are favored in this ecosystem.
We can speculate about the reasons if you're interested.
And in the survey, you see a number of other things that might stand out in one or the
other direction.
Rust really values community, that's communicates.
Ruby are fun, a pretty unique versioning strategy.
People are really aware about the difference [audio cut out]� out of the same thing over
a conflict on how to deal with compatibility.
And Maven is interesting, most people perceive it as a tool for replicability.
So, again, community, culture, of an ecosystem really matters.
The values, the practices matter.
Everything is a tradeoff and it helps to know the design space.
It helps to know what other possibilities we can reduce the amount of changes.
What are the different mechanisms of how we can make it easier for users to update.
Things like this.
And understand the tradeoffs behind them.
unfortunately.
When we understand those things, we can deliberately design communities this we can think about
we can achieve this goal.
What other practices?
How can we encourage people to do this.
And we can debug communities.
If we see certain health issues.
Like people are running away.
People find this all low quality.
What can we do?
Most people don't know that many communities.
One, two, three, maybe.
And don't often think about the explicit differences.
So we think it really helps to understand the space and to understand.
As I mentioned a couple of times, there's much more data behind this.
There's also academic papers behind this and we have created a web page, breaking API's.org,
we just released all the results of the survey.
A large number of plots where you can compare different ecosystems, and personal values
against community values.
Things like this.
So feel free to have a look at this, explore this.
Explore the data.
And we are always looking for feedback.
Always looking to discuss things.
If you have a specific insight, share this.
We would be happy to interview people.
Also learn more about those things.
And with this I get to my last slide.
Because in the title of the talk I asked how to break an API.
Now I can at least answer this for three ecosystems I discussed.
In Eclipse, you just don't.
[ Laughter ] In CRAN, You reach out to affected developers
and release simultaneously.
That's it.
Thanks.
[ Applause ] >> Okay.
So we've got a break now.
Both tracks start at 5:45.
But during the break also there is a what is live JS?
Which is a lightning talk over by where we have been doing the live music in the back
like near registration.
See you in a bit.