vsupalov

Getting to Know Jodel

Looking into the future and working on quality and sustainability while scaling the product and growing the team.

An interview with Luís Rodrigues Soares, CTO at Jodel
October 2017

Can you tell me a bit about the company?

Jodel is a social network from Germany, it was born in Aachen originally. Right now, it’s being developed here in Berlin. Its main focus is on making it possible for you to connect with the community around you as instantly as possibly. It’s location-based and because of the focus on “instantly engaging”, there are no public profiles, people are anonymous. Not so much for the sake of anonymity, but for the sake of allowing you to quickly interact with people.

You can engage with people, ask questions and know more about what’s happening around you. The company became three years old on the 20th of October. It’s been advancing quite well and it’s a super interesting place with a lot of interesting problems to work on.

In total, Jodel is about 55 people. The tech team is 21 people (counting with me). I’m often doing some tech work, but not as strongly - my role is not about being an individual contributor.

What’s most important for the company right now?

The company is focusing a lot on growth right now. As a typical start-up in the social media space, currently the most important topic is growth. It’s about getting to a lot of users, to further prove that the concept works.

Because we’re location based, growing means moving to other markets, actually moving to other cities, adding more countries to the platform, reaching more users, and then dealing with how these new communities and users engage with the application; because not all countries and not all people engage the same way with the application. In summary, growth and improving the concept beyond what’s already been proven to work are the main areas of focus.

From the engineering point of view, I would say the main focus is making what we’ve being doing up until now sustainable. The company is three years old – this means that we’re moving from of mindset of pushing only MVPs out there, focusing only on making it work quickly to prove our assumptions and experiment, into a mindset where we think our approaches more, make things more reliable and think more about the future. Making it sustainable.

My main priority is having a bit more of a look-ahead, a bit more of focus on the future. Saying: “Okay, we’re heading this way already, let’s try to do it right. Let’s add more quality, let’s be more predictable, let’s make solutions which we can build further upon”. From the technology side, this is the first priority at Jodel, that we shift our focus from the short-term into the mid-term.

Either regarding the company or the product, what’s the thing you think is really, really cool?

I like the product a lot. One of the things that I find the most amazing is the fact that it’s location-based, it generates so many interesting challenges. You also need to understand the user base, that different people react differently to different things. Many user dynamics which work in one location, here in Berlin for instance, does not work in other countries or in other cities in Germany.

Next to that, adding anonymity to the equation makes things which are usually trivial for social networks, utterly complicated for us. For example generating a social graph, or interpreting what people are interested in. Since I’m attracted to difficult problems, I’m also building a team around this kind of mentality. The people who join usually go “Okay, wow, this is interesting. We have to generate a social graph without knowing who the people are. We need to rethink how to generate social graphs or we need to rethink how to create new content categorization.”

For me, that’s one of the most enjoyable aspects, and also the main ingredients of the product. Anonymity on one side and location on the other side. I think they are an amazing mix.

Could you tell me about your current setup and in particular, how you’re deploying and maybe what you are automating about it?

It’s important to know that the essential of Jodel is two mobile applications, an Android and an iOS mobile app. We usually release new versions weekly, there’s new versions of both apps on the app store either on the beta channel or after some testing on the main channels. The final triggering of release candidates is manual, but there are continuous integration pipelines for both of them, including the first phase which is running unit tests and some integration tests as well.

The backend is a big NodeJS application, which also needs to be deployed and operated, with a MongoDB data layer. The deployment triggering is manual, but all processes are pretty much automated. We’re using Flightplan together with Docker builds, which are built by our continuous integration pipeline. We are doing canary deployment - we start by just deploying on one server, observe how that is going and then move on to replacing the build on other servers as well.

Regarding my personal preference and the direction that I’m giving the engineering team, I would say that there’s still a lot more to be automated. I eventually want to make sure that everything that needs to be deployed does not need to be deployed by humans, as much as possible; so the verification of canary deployments, or the generation of a release candidate for mobile does not require a human is looking at it. We should have metrics that we know are stable and that we know we can rely on, or other kinds of mechanisms to automate it completely. If it passes those tests it can go to production.

These are just examples, obviously – the underlying point here is automating as much as possible, since that gives us fixed points, or certainty checks that we can rely on – and leaves us available for new and more interesting challenges.

Could you share what you know about the tech journey of the company? How did this team start out doing all these things?

My knowledge is limited to approximately the last year and a half ago, as I only joined recently. I don’t know too much about how it was before. But I know that it was very, very manual in the very beginning. It was even more manual, to the point that monitoring was manual and involved consistent human attention. There’s a few war stories that I heard of. But it’s difficult to do otherwise in the beginning of a startup, as it’s not the first thing that people should focus on. Instead, you should focus on getting a product out there and getting the initial rounds of funding. It’s totally understandable.

It’s important to understand, for the sake of context, that Jodel lost a CTO around the end of 2016, and was almost effectively without a CTO until August 2017 - the time that I joined. There was a lot of senior guidance on the tech side, but I don’t think there was an overall strategy for some of these things. A really interesting thing happened during that time. The several separate teams, Android, iOS, the backend and infra teams, all started noticing that they needed some form of automation - some form of pipelines, or continuous integration, and started working on setting up solutions for their own team.

The Android team started with their pipeline, focusing on unit tests and integration tests. The backend guys, I think they already had Flightplan at the time, started integrating Docker into theirs, doing auto-scaling, and similar things. Each of the teams slowly started doing their own thing.

Considering the brief time that they had and the fact that they didn’t have any sort of strong leadership from the technical side, they did not cross their arms and stopped improvements – they advanced, each one of them on their own direction, and improved their own local processes and systems. When I joined and saw that this is what happened I was very, very happy because we had three different teams, three different groups of people, who all have their own idea of the value that automation brings. They’ve all solved their own problems so they saw from up-close how automation is important, how automating certain things has helped them, and how they were able to solve their own problems. People tend to do good things when left unmanaged – it’s awesome to see this happening and works great for me since I’m a very strong opponent of micro-management.

And they had seen the value of automation, which means that there’s already a buy-in from the technical teams. When I came into the picture as CTO and said “Okay, I want to bring automation. I want to bring even more automation, as much as I possibly can” the reaction was “Yep, that makes sense. We’ve done a bit of it before. It helped us immensely. Yeah, let’s take it up a notch. Let’s move to the next level.”

I think up until now, the journey has been a bit like that. Different teams doing whatever they could do and whatever they felt that would help them. Now there’s a strong strategy in place and focus on these topics.

If you had to pick one thing, what’s most important about the infra tooling setup of one of the teams?

There’s the technology side or more on the practices point of view.

On the technology side I think definitely it’s been Docker. Keep in mind, this is not something that I’ve asked the team about, but rather my feeling by looking at it and also my previous experience with Docker. Docker, especially on the backend, simplifies a lot of things. This is nothing specific to Docker at Jodel, just what it brings to the industry. It has made the life of a lot of people easier, and we’re no exception to that. I think we’re just confirming what a lot of other people in the industry have been noticing and have been feeling.

It makes it very, very light and very easy to reason about deployments, about building an application and putting it into production. You have this one element that you need to consider but then mostly everything happens at that level, it becomes a common language and abstraction for two important levels of an engineering team: application development and systems engineering. Software engineers not just infra engineers, can completely take care of most issues. Obviously there’s other things as well. I can tell that for the specific case of Jodel, the fact that we have auto-scaling on AWS in place has been a huge help as we have really different peaks of loads.

On the practices side, at the risk of repeating myself, I would have to choose automation. It’s hard to choose one thing because there’s just so many, I usually prefer to avoid focusing only on one element. I think it’s usually a conjunction of elements that really makes a difference. Or several things playing together, but if I really have to choose one it’s automation. It’s a bit like talking about the importance of QAs or the important of unit tests. Everybody knows that testing is important, everybody knows that automation is important, but you still find every once in a while some some people making the argument for, “Oh, it’s automated enough. We don’t need to automate.” I’m quite opinionated on this topic. I think if it can be automated it should be automated.

I think not only automating processes, but automating responses as well. For example, focusing on self-healing systems. Making sure that if there’s an emergency or if there’s a warning or an alarm or something starts going crazy on monitoring, things recover by themselves. If we see the same thing happening after two or three times, then we just automate it. Just remove the fact that someone has to go and think about it for a while.

One single thing in isolation does not make a big difference. If you just automate one task, you have quite a bit of overhead - you need to think about it and you need to think of languages and think of tools and all that. After you’ve done this for five or six or seven processes/monitoring/alarms or whatever, you really feel the difference. And you hopefully get into a habit. We should be doing more relevant work and not responding to alarms or deploying systems by hand.

In my opinion, it’s even more relevant than whether we’re using Docker, whether we’re using cloud or bare metal. The fact that you have this mindset of trying to automate things is really, for me, the most important thing.

(That’s an interesting point. I remember when I started working in this area, I always kept in mind that my long-term goal was to automate myself away.)

Absolutely. I don’t know if you read the book on site reliability engineering from Google - they start exactly with that point. The role of a site reliability engineer, and in my opinion the role of an infrastructure engineer or a systems engineer, is to automate themselves “out of the job”. If there’s processes depending on one person, it’s not good for anybody.

It’s not good for the company, it’s not good for the people as well - every manager and every person in a position where decisions are made, should always push for removing these dependencies, even if it takes some time. They create a weight on each person that they have things depending on them. It encourages a mindset of, “Okay, I am the guy who fixes it, and the firefighter is going to jump in and take care of everything.” This is also not a good mindset to have in a team, I’d say. It’s the hero mindset.

The book is a great read by the way. Especially if you abstract from the fact that “okay, this is Google, it’s a very big corporation, and many of the practices that they have may be complicated to bring about on a smaller enterprise, and a smaller company.” Still, all the underlying elements or what drives these guys to think like this, is the important part. It’s great to see those topics discussed in writing - what people think about DevOps in practice, about having teams collaborating and focusing on problems rather than technology. I really enjoyed it, it’s a great book.

What is the greatest pain regarding dev processes or ops for the teams right now, if you had to pick one thing?

In Jodel, I would say it’s not enough self-healing. I feel it from my point of view. Around the time that I joined, our main systems engineer had to stop doing on-call duty. Because I have done systems engineering before, and because my background is in engineering, I just approached the team and said, “Okay, cool. Put me on PagerDuty and I’ll help taking care of it.” He explained to me and to another backend engineer who has been doing it with me, how the systems are, the way we look at the alarms, at the metrics, the things we need to check, the things we need to see.

When a new engineer takes over an existing system they usually see a lot of things that they would want to change. I also came to this new system and I saw that there’s certain things, which I feel should be self-healing, but because the team so far has not had time for it or the has grown used to it, the team has not thought the same way. I felt that there’s just too many things that we really should not be doing manually. If a machine is running out of disk space. This should not be an alarm, there should be something in place that takes care of it. There’s a few examples like that - if a machine needs to be restarted, or a Docker container got stuck.

Again, this is my point of view. I’m sure I would get different answers for each person on the team. From my point of view, as the person who needs to set direction for the engineering team, I would say that having more self-healing is definitely one of the things that are important. Every time we take care of making something self-healing, we free up engineer-time to work on other things and to focus on other improvements, like more automation.

For you personally, what was your biggest infra, deployment or automation learning?

I hate to fall into sort of a common place in the tech industry, but for me it is the people you have around you. You need to have people with the right mentality around you. You need to have people with the right set of tradeoffs. I know that many times it’s ungrateful to say this, but you solve deployment problems or automation problems when you’re hiring. It’s really not in any other way. Most of the times that I’ve seen issues emerging, problems or good things happening, I could trace them down to “we’ve hired someone with this kind of mentality and this has helped us tremendously”, or something like, “we’ve hired someone who has been focusing too much on being a firefighter and this eventually led us here”.

It’s not always the case of course. Sometimes things happen because there’s pressure from business, sometimes it’s because we just make mistakes. People make mistakes, but I would say the mindset factor is just too important. Even all the things I’ve been talking about, automation and self-healing and all of these techniques and practices, all of this requires that people are willing to move into the right direction. It requires that people buy into the concepts, buy into the practices, that they recognize that this is a good thing to do, that this helps the business, that they feel comfortable with splitting that side of the work ahead of their personal preferences with regards to technologies or even practices. The people is the biggest learning that I’ve had.

This learning kind of coincided with me doing less and less engineering and doing more and more management. You realize, the same as you realize when you’re doing engineering, that you need to think of problems, you need to test, you need to focus on quality and sustainability. When I started solving problems in management, I saw and felt in my skin that the key factor is people. People are really important. Having engineers with the right mentality is the most important factor towards building great things.

(This reminds me of an interesting discussion during lunch with former CTO of a relatively big tech company a while ago - his point of view was, that any problem a company is having, in the end, is a people-problem. Whatever the tech challenge, actually it’s a team or people problem deep down.)

It is and it’s totally true. You can, at the end of the day reduce most of the things that we do daily to people and people-interactions. It’s mostly about that, than it’s about technology. Of course, there are tooling issues, there are many quirks, and bugs, and things which aren’t foreseen. But even when you have things like, “oh, business is pushing for this feature,” they could be fixed by “okay let’s sit down with this person, let’s explain why we need to do quality before features”. Again, it is people.

What topics or questions would you be interested in? What would you like me to ask somebody in your position?

Very good question. What would I ask? Maybe continuing with what I just finished saying, I would ask about how someone in my position, a head of engineering or a CTO or a director of engineering, how does someone in this kind of position or with this kind of responsibility deal with people problems?

My answer to your previous question is just the beginning, which is recognizing that we need to focus on people and we need to focus on the human factor of the equation. It’s also important to put into action how you go about and solve these things.

To avoid it becoming something like, “Oh, we need to write that,” but then people don’t. We need to automate, but then people don’t automate. We need to care about the human factor, but then we don’t care about the human factor. I would definitely ask more about the human factor, about strategies, use cases, situations and people.

How did you deal with someone who is not in favor of automation? How did you deal with someone who really wanted to switch technologies, but you knew that you could not switch technologies because you had other kinds of constraints. Or the other way around, you wanted to switch technologies because of some constraints, but the team was completely not in favor of switching technologies. These are not technological problems, these are people problems. How do we, as managers or as people who come from an engineering background and now are managing people, deal with such issues. This is really, really important and interesting. Definitely one of the topics I would like to see.

The other one is, what do people in CTO or management positions do in their daily work/life to continue being regarded as engineers. I was still a full time engineer until recently, I only switched to a full-time management position three years ago. And I remember how I looked at my own managers - if I knew that a manager did not code, or was not interested in coding or engineering work, or was not aware of what was happening in engineering, I naturally lost respect.

When I became a manger myself, I made sure to continue coding; I have to continue being involved in these things because I need to know what people are talking about, even if I don’t decide things on that level. I give a lot of power to my teams, as people closer to the problem should be the ones making decisions. I usually just need to sign off on those decisions. But I still participate in discussions and in debates, in post-mortems, and things like that. For that, you need technical knowledge and you cannot let go of it. You need to continue practicing, as you also know, if you stop programming - you start losing it. It’s not a good place to be in.

The way I think is: what usually distinguishes an engineer who I feel has earned my respect? It’s someone who solves interesting problems, it’s someone who contributes with solutions or someone who helps other people with their own problems. It’s someone who guides people, takes them by the hand, who mentors a bit, but also who is strong technically. Someone who, if you need to talk about performance on virtual machines, that they can talk about it. Or if they’re into functional programming and they want to talk about patterns and monads and things like that, that they can talk about that. If this is who I respect as an engineer, if I’m thinking of the person who is going to lead me in an engineering position, for me it’s quite simple, this is what I would look for in this person’s profile as well.

So that’s a topic I’d be interested in: do other engineering managers consider this a problem? Maybe I just look at it as an issue, but most people don’t care about that topic as much. If not, why? If they do, what do they do daily to ensure that they can continue being relevant and being someone who can have a conversation about technical aspects or doing a code review. Why they feel that’s important and what they got out of it.