My take on QCon London 2012


I was lucky enough to be sent to the QCon conference in London together with two colleagues.

The reason I wanted to go to this conference was mainly due to the agenda being very closely tied to what I was then working on for our client.

The schedule of talks seemed to be very closely tied to distributed large-scale computing and large data-stores, which is exactly what I was working on at the time.

The conference proved to be entirely different to what I anticipated, and instead, the main take away was more on the lines of proper process management, planning failure, the correct use of agile, and emerging technologies as a result of the above.

Unfortunately, the talks from Twitter, LinkedIn and Facebook, left much to be desired.  It was more a showcase of what they have achieved, and not ‘how’ they achieved it.  So I am not going to focus on these.

Instead, my main focus points, based on the streams that drew me, are going to be as follows:

  • Agile
  • Architecture
  • Failure

Agile

There was a lot of talk on agile, and being a process-inclined person, I was drawn to these talks.  The theme though was common among all the speakers; they all seemed to have met up for a quick beer before the conference and agreed to say the same thing.  It also happens to be what I believe a truly agile process to be, so I am going to write things from my point of view as reinforced by the talks I attended.

When we talk about agile, most people think of one of the frameworks, be it SCRUM, XP, Kanban, or any mix and adaptation of these.

Sometimes, you hear people gloating and saying stuff like ‘we have managed to change our process to agile from the original waterfall approach, and everyone is happy’

So what is wrong in this statement?

The problem here is that it is being said that they managed to ‘change our process to agile’.

When you think about it, what is agile really?  Agile is not the name of a framework, and definitely not the name of a methodology.  What it really is, is the name of a ‘mentality’.

When we are ‘agile’, what we are saying (or what we are meant to be saying) is that we are ‘continuously improving our process according to the challenges and variables that are affecting our current work’.

Say we decide to go with SCRUM.  We have set our iteration length, we have decided how to measure complexity, using story points, or ideal hours, or whatever floats our fancy.  We also started sending out our daily burndown charts and have our retrospectives, etc… all the tools in the book…

…and we kept on doing that over and over again until the project was deemed finished.

Is this correct?  The very fact that we are no longer changing our approach is actually hindering us from reaping the real benefits of agile.  Where is the flexibility?  Do we consider ourselves flexible because we have the chance to say ‘let me de-scope this story because we are not going to make it’?  How is this any different to working in a waterfall approach?

If we do not continuously adapt our process, then we are saying that nothing is changing, the requirements are set, their complexity is known, the team is composed of a bunch of monkeys with a brain disorder who are unable to learn anything new, etc…

But is this so?  Do our consultants not improve their skills in the technology they are working in as the project progresses?  Do we not learn more about the system and have less ambiguity and work faster?  Do the requirements stay the same?  Do the requirements remain of the same nature?

Let me give a hypothetical example.

We start a new project, there is a lot of uncertainty.  We do not know the technology that we are working on, the client ‘seems’ to say they know the requirements, but after one week of analysis we discover that nobody really knows what this involves.  So we decide to keep our iterations short at two weeks.

A few iterations pass, velocity improves, we now have a good understanding of the requirements and our consultants are now very comfortable with the technology.  Should we stick to two-week iterations?  Could it be that the overhead of having a two-week iteration is no longer beneficial?  At the start it made sense, since we were not sure, and did not want to risk losing more than two weeks’ worth of work.  But the situation is different now, so we need to adapt.

We cannot assume that because we are following the rules, then the rules are most definitely correct and if things don’t work, we are the ones in the wrong.

I have (very recently) seen an agile process for another company that was being managed with GANTT charts.  Is this wrong?  Many people might frown upon such a thing, because GANTT charts are the trademark of waterfall methodologies.

But let’s think a bit about for a bit.  Irrespective of what process approach we choose, we are going to need to have some form of release plan.

Now in an agile shop, we tend to manage this either on an excel sheet, or some fancy UI that comes with our tool of choice.  But a release plan is nothing but a sequence of events, and I can think of no better way to represent this than on a GANTT chart.

What I am trying to say here is that we should only use the process that we know as a starting template.  Do not get comfortable with it, keep improving it, otherwise we will fall into a routine, and routine is a bad thing in our industry.

People get comfortable when there is a routine, we do not feel challenged enough, and therefore we will automatically start slipping.

It is also important to make the distinction between improving and changing.

Any change that we introduce needs to be measured.  If it is not improving our process, then we need to change it yet again.

We cannot just apply a template onto everything, because no two projects are the same, and no two teams are the same, and all the dynamics of a project are changing all the time, so we need to adapt at the same rate to ensure that we keep output levels at an optimum.

Architecture

The other main focus of QCon was monolithic solutions.  When we talk about monolithic solutions, we are not necessarily talking about large enterprise solutions with millions of lines worth of code, but rather, we are talking about a solution that has been logically broken up into so many layers and components that learning Japanese is probably an easier task.

I have personally seen small solutions built this way, and it is somewhat painful.  Many senior developers nowadays are brainwashed with design patterns, correct layering (according to the book definitions, so they believe these to be ‘the bible’), proper testing, de-coupling to the extreme, and all the practices that we have brainwashed ourselves into believing that ‘this is the way to go’.

As outlined by Greg Young, a common approach by most Java developers to proposing a technical solution to a problem is to start by saying ‘Ok, so we will use Spring and Hibernate… now what do you want me to do?’.

As extreme as this example might sound, it is very true of most of us.  We tend to overcomplicate life.

The solution being proposed it to split up the system into multiple small systems.  Not necessarily components, but literally different systems that communicate, only if absolutely necessary, over a well-known channel.

The advantages of this approach are many, but the main ones are:

  • Deployments are simpler
  • Changes are quicker
  • Maintainability is improved radically

One other advantage of this approach is that it takes away the assumption of project scope.  By breaking down a system into separate ones, we can clearly identify the multiple parts that make up a system and what their scope is.

In addition to the above, we also have the opportunity of selecting technologies that are better suited to the problem we are trying to solve in this sub-system.

For example, in a traditional system, we decide to work with C# and MS SQL.  We layer it into logical layers and components, and we start developing.

Later on in the development cycle, we introduce the requirement of storing documents in our system.  Since we are already working with MS SQL, nobody stops to think about whether or not MS SQL is suited for what we are trying to do, but instead we spend more time figuring out how to make it efficient using MS SQL.

In a sliding doors situation, had we decided to go with sub-systems instead, we start designing the document persistence solution on its own, as a completely separate system.  We do our analysis, and decide that the correct persistence solution to use here is a NoSQL offering, such as MongoDB.

We also realise, with our knowledge of the other system, that there are other parts that could also benefit from using a NoSQL solution.

This is but one of the advantages that we gain by focusing on different deliverables are completely separate systems.  Another advantage that I can think of is that maybe C# is not the correct technology to use.  Maybe we are developing a requirement that is hugely parallelizable, in which case we are probably better off using something like Erlang.

One other advantage is of course the learning curve for new team members.  It is much easier to introduce a new team member to a small solution X, which is just a few thousand lines of code, rather than them having to understand all 12 million lines of code before they can even remotely start being efficient and productive.

If we analyse this approach well, we will also realise that what we are actually doing here is applying agile at an architecture level.  We are continuously revisiting our architectural decisions as requirements become known, and in most cases, we end up questioning them.

This makes perfect sense.  Since we use agile approaches, when we decide on an architecture at the start of a project, we only know a small percentage of what it is that we are actually developing.  As we learn more about the system, we discover that our initial designs are no longer ideal, and therefore we should change them accordingly.

Failure

A good starting point, is to visit the following website -> ‘whoownsmyavailability.com’

What you will be presented with, is a huge YOU.  What does this mean?

What this website is trying to tell us is that we are responsible for every single failure in our system.  You may tell me ‘isn’t this obvious?’, but my response to that will simply be ‘if it is obvious, then why did it happen?  What did you do to avoid this failure in the first place?’

Before QCon, a colleague of mine introduced me to an article written by the famous Jeff Atwood, where he talks about the ‘Chaos Monkey’ (http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html), which is something that he thought of whilst working on Netflix.

The concept behind the chaos monkey is to have a daemon process that randomly killed off different parts of the system.

You may think of this notion as being crazy, but if we think about it a bit more, it makes loads of sense.  If we cannot deal with random failures, then how are we expected to deal efficiently with a scenario were there is a catastrophic failure in our system?  How are we training ourselves to deal with it?  Are we even thinking about it?

Now, I will not go so far as to recommend creating a chaos monkey for every single project that we are working on, since this in itself will probably push our project over budget, and we need to be pragmatic.  But this does not mean that we should assume that our solution will never fail.

In fact, it is not a question on ‘if’ our solution will fail, but of ‘when’

We should not be thinking of Mean-Time-Between-Failures, but rather in terms of Mean-Time-Between-Recovery.

When a failure happens, people are being forced out of their comfort zones.  They tend to be forced out of their roles and go all-hands-on-deck to see who can fix the problem as fast as possible.  Yet in most cases, the truth is that these people are not trained well enough, and even though I am pretty sure that they will fix the problem, the question remains, ‘could they have fixed it faster?’.

The added stress in itself of having to deal with something that we are not well experienced in, is generally one of the main factors of why something takes longer.  We take extra backups, we think about doing a change a hundred times more… stress!  Not to mention that potential for mistakes.

So how can we deal with this?

A good approach that was recommended by most was to continuously highlight what went well.  When we do something during the development of a system, which works out well, we should broadcast this, in detail, to our entire team. At the same time we will be educating them on how this specific part of the system works.

I will also add my two pence to this part, and instead of advertising over email what went well, we should do this over a lunch & learn session, so as to engage the other consultants more (and entice them with pizza), rather than over email, which most people will not even read.

Now writing about our successes is a rather easy thing to get accustomed to, but we should also outline our failures.

If a consultant tries something that does not work, they should also advertise this.  This way, when the system goes down, the person who is tasked with bringing it back up to its full glory, already has ‘unofficial training’ on what will work and on what will not work.

In the event that something does go wrong, it is also recommended that a post-mortem be held with the entire team, detailing on what the problem was, why it occurred, how it was fixed, and what can be done to avoid this happening again in the future.

Concluding Points

The more I think over what I’ve learnt and been made aware of during QCon, the more I realise what our shortcoming are.

We tend to think that if we read all the books, blogs and articles available to our industry, then the more we know the correct way of dealing with everything.

Yet this could not be further from the truth, since no two systems are alike.

We are in an age where agile is being pushed and promoted everywhere, because it just works.  We have come to understand that the term ‘a waterfall plan’ is an oxymoron in itself, since the only way that a waterfall plan can ever really work is when the system is a hello world application that is being produced by a fancy, option-less wizard with just an ‘ok’ button on it, and where the only human interaction is someone pressing said button.

But where we are falling short is that we should be applying agile mentalities not only to our process, but also to our architecture and other unknowns.

Agile is about reducing uncertainty around something that is not well known.  We need to accept that at the start of the project, we also do not know what the correct architecture should be, no matter how much experience we have, and we also do not known what can go wrong and how to correctly and efficiently deal with it.

Advertisements