Image may be NSFW.
Clik here to view.
I’ve spent the last couple of years helping projects with their application performance in NZ (mainly Wellington). I thought it’s about time I wrote something on the experiences I’ve had during that time and the lessons learned.
NZ is comparatively a smallish place. 4.5m people live here. A large bank for example has about 0.5-0.75m customers. One of the biggest online applications running in NZ is probably TradeMe. They have 2.8m customers and about 75k-200k active customers at any point in time. On average they have less than 1m logins a day. If I contrast that to large international systems this is laughable. Ebay for instance has 83m users and 670 million page views a day (I don’t know from when these figures are though). Facebook has 750m users,…. So big international companies talk about building another datacenter, where we might start clustering.
We do things a bit smaller. That has its advantages – if we do our homework correctly. Most products used nowadays are designed to be massively scalable to the requirements of large international companies. So we should have no issues with performance….EVER!
But as you probably know from your own surfing experience this is not always the case. It gets even worse when we use web applications that are in-house. All of this should actually be a no-brainer. So what’s going wrong?
I’ll try and list the thoughts and experiences that I see are common in projects here (no particular order).
- No performance testing/analysis
Performance testing is still a rarity. Luckily awareness is rising and due diligence is being done more often. There are still long ways to go though. - Unrealistic & bad requirements
In most projects I have encountered the requirements are about 10-100 times higher than needed. This results in tuning for the wrong sweet spot. You might have an app that works excellently under high load but performs badly under low load. Or you might agree to test something the user actually doesn’t use that much. Describing performance is not easy by any stretch. - Misunderstanding of terms
Concurrent, load, stress, performance, latency, response time,… they mean different things in different projects.These need to be clarified first before discussing anything. Do not assume that a technically correct interpretation equates to what people understand what the term is. - Bad architecture
My experience is that most solution architects treat performance as a minor worry, if at all. It presents one of the biggest risks to any project. I have seen re-architectures very close to go-live. Performance specialists (and I don’t mean performance testers only!) should be part of the early architecture & design phases. - Developers unaware of performance
Not only architects but – in the heat of meeting the next milestone – even developers will not think of how their code will perform. Only when confronted directly do they start tuning their code. It is less an active ignorance as a result of too narrowly focussed expectations. Non-functional requirements are often just an afterthought in development. There is the fallacy that they are minor things that are just by-products that any developer does anyway. - No implementation of product tuning guidelines
Most products have tuning guidelines. They tell you things like threads per core, buffer sizes at different configurations or how to set up thread pools. Or they tell you what optimal hardware/virtualization configurations are for running the product. There might be hints on when and how to correctly cluster your solution. These are straightforward things that should be done/defined by the architects and developers before releasing their product. I have not yet seen a project do this without being told to do so by the performance tester. Oh and the excuse “there are no guidelines” doesn’t count. In that case the guideline is revealed by Google. - Political & bad choices for products
IT is not as easy as the glossy brochures make it out to be. Products don’t work together, hardware won’t perform in certain set-ups and things are just plain buggy. As any good performance tester will tell you, everything will fail (that is actually always true. The question here is, does it matter). These are the things that should guide proper product selection but often products are selected by company default (we use Microsoft only/only HP servers/Cisco networking/…) or because the CIO went to dinner with XYZ product sales guy. Often products get chosen by people who have never even seen the product themselves or have any clue what it does. These people also are the same, that ignore advice from SMEs that know what they are talking about. - Lack of monitoring or useful monitoring
Monitoring is one of the most important parts of running something in production. It tells you when things break. So why on earth would you not focus on it? Monitoring is also the brother of performance testing. Without it how can you see what load does to your system? But also the converse is true. Monitoring only can never replace performance testing. - Lack of qualified people
There are only so many people in this country with specialist know how. We cannot keep up with the volume of new information that is being disseminated internationally. But amazingly it’s not only the complex stuff that goes wrong. It’s the basics. This to me just shows that we are not thorough enough. I think employers and customers should be asking for more and taking the lead with giving people the correct/full list of expectations and requirements. - Blind belief in delivery dates
Ship and then performance test. Or “can we cut 2 weeks off the performance test?”. Need I say more? - Lack of performance testers
My rule of thumb is the 10/20/70 rule. In performance testing 10% is the tool, 20% the training of how to performance test and 70% is just plain experience. It’s the 70% that make a performance tester. In Wellington I’d say there are probably only two-dozen performance testers that have that 70%. The market could do with a multiple of that but because this is learning from experience it’s a viscous circle. We don’t test enough because we can’t get the specialists and we can’t get specialists because there is too little performance testing. - Project timelines
Ok, so when do we performance test? 99.9% of PMs will put a PT phase just before go-live. If you look at some of the issues above you might see why this can be a really bad idea. If you get architectural or product issues you might have a huge challenge on your hands and an immediate and huge delay in delivery. So performance testing should be pervasive throughout the SDLC to minimise the big risks. - Staff turnover
I have worked on projects that have been running for years. Developers “get the feel” for performance once you have done a few performance testing cycles. They get better at developing performing code from the onset. As is normal for a project there is turnover. This means though that performance will drop when new team members come on board. They do not know the product well enough and will run into issues when the next PT comes. The process starts over. On projects that are highly performance critical I would suggest leaving team members in place for as long as possible and to never exchange a whole team big-bang style. - Testing vs. Investigation
Last but not least it helps to understand the difference between these words. Testing means I have a clearly defined outcome that I am expecting. This is rarely the case. There are rough ideas, some data and guesses but certainly nothing that is clear and defined (see 2). So what performance testing – in all it’s diversity – is, is an investigation and tuning exercise. Think of this when creating/reading/critiquing the performance test plan or when asking for documentation. Also remember that investigation is an agile process and will form as time progresses so don’t be too rigid when prescribing what tests should look like. The investigation might lead you to unanticipated tests. - Complex Tools
Performance Testing from a tool perspective is actually quite straightforward. Complex tools might make performance testing projects more intricate than they need to be. Worst case they get dumped into the “too hard” basket and don’t get done at all. Keep things simple and increase complexity when needed. In my experience the issues encountered in NZ are mostly not that intricate. - Following advice
On nearly every project I have the situation, where a vendor/developer/PM/architect/… doesn’t believe that there is a problem or that the cause of the problem is something other than what I tell him/her. I have spent weeks waiting and trying to convince people to look at XYZ without success. Time and effort wasted, just to come back to XYZ and find it’s exactly what is causing it. I am not saying that I or performance testers are infallible (far from it actually). But it would help immensely if whoever could start by looking at XYZ first, as chances are that the performance tester’s experience in performance exceeds whoever’s. - Averages are worthless
People love to talk about averages. They are easy to understand and from school we know they are important. Well, in performance testing they tell you very little. They actually tell you nothing without other numbers that form a context. I think it is safe to stipulate that you shouldn’t use averages in performance testing (at least when reporting). The number to go for is 90th percentile (or any other percentile between 90% and 99%). If you need details go to Wikipedia, they do a good job of explaining these. And when using these numbers always relate them back to throughput. Without this relation they reveal nothing.
Image may be NSFW.
Clik here to view.
Clik here to view.
