Performance Engineering

Tuesday, February 2, 2016

EHCache Topologies and Monitoring

As we all know that serving content from memory is always faster than to make a network trip to database to execute a query and then return your desired data. Caching helps in improving overall application performance. Unless you have specific reasons not to cache, you should be caching.

There are many open source cache implementation exists like Hazelcast, memcache, redis,ehcache,Jcache etc. For the purpose of this article we will focus on different cache topologies available with ehcache.

EHCache supports following topologies :

Standalone : The cached data set is placed in application JVM/node itself. Any other application node is not aware of already cached data by another node. There are no communication between application nodes for cache synchronization.

Distributed Cache : In this topology cache data set is placed in Terracotta Server Array with subset of recently used data is placed in each application node. We also refer these local copy of cache as near cache.

Replicated Cache : Here cached data set is placed in each application node and data is copied or invalidated across cache cluster without locking. If there is a new cache item on one application node , this data set will be replicated across all application node. There are many techniques that can be used to implement replicated cache :

Cache Replication using RMI

Cache Replication using Jgroups

Cache Replication using JMS

Cache Replication using Cache Server

Key Performance Metrics for EHCache

EHCache expose lots of monitoring metrics. Following metrics can be best answer to measure cache utilization :

CacheHitPercentage : Percentage of total Hit a particular cache observed
Cache Hits : If request data is in cache, it is a Hit and this counter increase for every hit.
CacheMissPercentage : Percentage of total Misses a particular cache observed
CacheMisses : If request data is not in cache, it is a Miss and this counter increase for every Miss. Request data is then accessed from DB and placed in cache.
ObjectCount : Total number of Objects in particular cache.

Monday, December 14, 2015

How to identify root cause for Frequent Minor Garbage Collections using DynaTrace

Frequent young generation (minor) GCs have two root causes :

Small young generation size for application work load
High Object allocation rate

In case your old generation size is growing quickly , it is quite possible that this is linked to small young generation size. This can easily be identified using Jconsole or jstat.

In case you have high object allocation rate then it becomes difficult to identify exact root cause for the same. Performance diagnostic tools such as dynatrace provides capability to identify root cause of such issue. In order to troubleshoot high object allocation rate in dynatrace , you need to add 'RunTime Suspension' dashlet.

RunTime Suspension report a massive statistical concentration of garbage collection in any particular function. This dashlet basically provides following line of information :

Method Name
Class Name
Suspension Count
Suspension Sum[ms]
Suspension Avg[ms]
Reason

Method with high suspension count suggests that method itself allocates enough objects to fill up young generation and thus triggers a GC. This way we can find prime candidate for optimization area.

Thursday, December 3, 2015

Performance Impact of Data Structure Resizing in Java

Java applications tend to make very frequent use of StringBuilder or StringBuffer for assempling of Strings. Both StringBuilder and StringBuffer use char[] internally for their data storage. As elements are added to StringBuffer or StringBuilder , the underlying char[] may be subject to resizing. As a result a new char[] of larger size (2x size) is allocated and elements from old char[] are copied into new char[]. Old char[] are discarded then and it becomes available for Garbage collection.This whole process results in consumption of extra CPU cycles for new array allocation, copying of elements from old array to new array and at some future point cost of garbage collection cycle.

Above facts also true for other java collections that uses array for its internal data storage such as ArrayList, Vector, ConcurrentHashMap and HashMap. Other Collections such as LinkedList or TreeMap often use one or more object references between elements stored to chain together the elements managed by the Collections. Collections that use array for data storage , their relevant classes also provides constructors that provide optional size arguments , but these constructors are often not used or size provided in application program is not optimal for Collection's use.

For HashMaps , to actually view whether resizing is happening or not, you need to perform memory profiling either run time profiling or offline using dumps. In either case, if you are observing java.util.HashMap.resize(int) method call, then your HashMap is getting resized during application execution.

Wednesday, December 2, 2015

Collecting and Analyzing Redis Performance Metrics

Redis is an open source, in memory advance key value store with optional persistence to disk. Two common use of Redis is caching and publish-subscribe queues.

How to access Redis performance metrics

Redis performance metrics is accessible through Redis command line interface (Redis-cli). Use the info command to print all performance metrics for your redis server.

Output of info command is grouped in following 10 groups :

server
client
memory
persistence
stats
replication
cpu
commandstats
cluster
keyspace

In case you just want to see metrics for any specific section, use info command. For example to print memory statistics for redis server use info memory :

Top Redis Performance Metrics:

Memory Usage : used_memory

The used_memory metrics reports the total number of bytes allocated by Redis. This metrics reflects the amount of memory in bytes that Redis has requested to store your data and supporting metadata it needs to run. This metrics doesn't consider memory wastage due to fragmentation , meaning amount of memory reported by this parameter will always be different than total amount of memory that Redis has been allocated by Operating System.

total_commands_processed

total_commands_processed metric give number of commands processed by Redis server. Tracking number of commands processed is critical for diagnosing latency issue in Redis instance. Because Redis is single threaded so commands are processed sequentially. The typical latency in 1 GB/s network is about 200 micro second. In case you are seeing slow response of commands and latency is greater than 200 micro second, this could be because of high number of commands in command queue.

mem_fragmentation_ratio

The mem_fragmentation _ratio metrics gives the ratio of memory used as seen by the operating system (used_memory_rss) to memory allocated by Redis (used_memory)

Memory Fragmentation ratio = used_memory_rss/used_memory

If fragmentation ratio is outside range of 1 to 1.5 , it is likely sign of poor memory fragmentation by either Operating system or Redis instance.

Evictions

The evicted_keys metrics gives number of keys removed by Redis due to hitting maxmemory limit.

Wednesday, October 28, 2015

Understanding G1 Garbage Collector in Java

The Garbage First (G1) is a server style garbage collector targeted for multiprocessor machine having large amount of memory. G1 is designed to meet GC pause time goal with high probability of meeting this goal while maintaining GC throughput at acceptable level.

G1 GC follows different way of heap management. Instead of partition heap into fixed size of structures ( Young, Old and Perm gen)/generation , G1 takes different approach in which heap is partitioned in equal size of regions, each region is a contiguous range of virtual memory. Some regions sets in entire heap perform same role as in Old collectors ( parallel GC, CMS). These roles are eden, survivor and old. However these regions do not follow equal sizing as with earlier garbage collectors.

When G1 initiates garbage collection cycle , it first performs global marking phase where G1 generally determines liveliness of objects within entire heap. After marking phase is complete , G1 knows which regions are mostly empty and G1 collects these regions first. As its name implies Garbage First, G1 performs collection and compaction activity on those regions heap which are likely to be full of reclaimable objects that is garbage objects. Further, number of regions selected by G1 for collection are dependent on pause time target as G1 is designed to meet pause time goal.

With G1 GC , overall memory footprint of java process will be higher due to some additional data structures maintained by G1 for its internal purpose. These accounting structures are :

Remembered Set (RSet) : This is used to maintain object references into a given region. Each region holds its own RSet.
Collection Set (CSet) : It is set of regions that will be collected in GC. All live data in CSet is evacuated ( copied/moved) during a GC cycle.

Young Generation Collection in G1 GC:

Young generation memory is composed of set of non-contiguous regions.
Young GC are basically stop the world events and application thread will be stopped during young GC cycle.
Young GC is done in parallel using multiple threads.
Objects are copied to new survivor or old regions.

Old Generation Collection in G1 GC:

Initial Marking - This is STW phase and piggybacked on young GC cycle.
Root Region Scanning - This phase runs in concurrent with application thread. This phase scans survivor space for references into old generation.
Concurrent Marking : Find live objects over the entire heap. This happens while the application is running. This phase can be interrupted by young generation garbage collections.
Remark : Completes the marking of live object in the heap. Uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what was used in the CMS collector.
Clean up : This phase perform accounting on live objects and completely free regions. It then scrubs the RSet and reset empty regions and return those into free list.
Copying : This is the final phase of the multi-phase marking cycle. It is partly STW when G1 GC does live-ness accounting (to identify completely free regions and mixed garbage collection candidate regions) and when G1 GC scrubs the RSets. It is partly concurrent when G1 GC resets and returns the empty regions to the free list.

Another thing to remember with respect to G1 is Humongous objects (H-Obj). If any objects spans more than 50% or more of G1's region size , it is then considered as H-Obj and will be directly allocated into old regions.

Tuesday, October 27, 2015

Performance Testing in Agile Process

Performance testing is an essential activity in all software development projects including Agile ones. Agile development practice can help teams achieve faster time to market, adapt to changing requirements,provide a constant feedback loop. This Agile transformation has introduced a new challenge in front of performance engineers -

How do we manage non functional performance testing in Agile model ??
Traditional performance testing cycle usually best performed over long period of time, typically expect functionally stable builds,script development , test data generation, day to day debug tests ..etc. It is very difficult to adapt all these activities in a 2 weeks or shorter sprint.
Agile based performance Testing Approach:

Definition of Done should include completion of performance testing activity within a sprint.
Include performance engineering team member in scrum of scrum meeting.
Start performance testing activity on dev box itself where focus should be on individual method execution time (unit level testing).
Next proceed with component level testing to cover response time measurement of developed user stories candidate for performance testing. Here we will create automated scripts and start overnight execution of load tests using tools.
Any issues found during unit and component level testing will be fixed in next subsequent sprint only.
Perform system level ( End to End scenarios) during sprint hardening phase. By this time we can expect that all code / design level optimizations has already been completed as part of unit and component level testing. During initial days of sprint hardening perform load,stress and endurance tests.

So we talked about three different levels of testing in agile process :

Unit Level Testing : This level of testing will be performed on Dev box itself. This level of testing will validate database indexing, application cache mechanism ,method hot spots, JDBC calls etc.
Component Level Testing : During this level , we will validate transaction response time for performance specific user stories.
System Level Testing : This is execution of end to end user scenarios for defined or predicted work load. Here we will cover load,stress and endurance testing.

We have validated above mentioned approach and it worked fine in our case. However implementation of above approach is highly dependent on maturity of agile implementation in any given organization.

Sunday, June 14, 2015

Performance Testing with Cloud

What exactly it means when we say "Performance Testing with Cloud". Before we get deep into this , let us first understand some basics about what exactly cloud is and how can we utilize cloud for our testing needs.

According to Gartner, the cloud is defined as

" a style of computing in which scalable and elastic IT-enabled capabilities are delivered as service to external customers using internet technologies."

Quite simple !!!! ..

This definition refers to following characteristics of cloud :

Scalability
Larger amount of resources
Offering end user services over internet

Technical translation of above will look like :

Service Oriented - Focus is on "What" we need instead of "How"
Elastic - Pay per use concept
Scalable
Internet connected

These are the design concepts behind cloud and what services we as an end user will get out of these design :

Infrastructure as a Service (IaaS) - Here service delivery focus on delivery of Physical or virtual machines , firewalls , load balancers and network infrastructure.
Platform as a Service ( PaaS) - Here service providers delivers a working platform which includes Operating Systems, development environment, data base , web servers and application containers etc. Use this service to develop application without worrying about licensing, buying and maintaining the platform.
Software as a Service (SaaS) - This model ensures provision of executable applications in cloud which are directly accessible to end users. For Example - We can utilize performance test tools ( BalzeMeter, StromRunner etc) in cloud where we need not to worry about it's installation and maintenance , we just need to get access to it and start work on this. Another example could be New Relic application monitoring as a SaaS model of delivery.

I believe above description of cloud and related services should be sufficient in order to relate cloud and use of cloud with performance testing activities.

Continuing on performance testing with cloud.. in order to perform performance testing activity what we really need :

Performance Test environment which matches to production configurations
Platform ( Web Server, App Servers , Load Balancers and Data Base)
Performance Testing tool to simulate real user activities
Performance Monitoring tool

Now we have two solutions with us in order to fulfill the same :

In House Set up of Environment, Platform and related licensing aspects along with license procurement of all related testing and monitoring tools. This will lead to huge set up and maintenance cost.
Use Cloud for desired services where all above performance testing needs can easily be satisfied with Iaas, PaaS and SaaS delivery models.

Benefits of Performance Testing with Cloud:

Perform large scale tests - In order to simulate thousands users of traffic on your application , you need to invest a lot in required hardware and configuring such a big test environment is really very time consuming. Today , we have to meet demand of fast pace development models ( Agile ) and to achieve the same , cloud services can help us in greater extent as we have all this ready in few clicks.
Perform more realistic tests - In order to performance test of your application which include complete delivery chain , we should target of testing application outside of our organization's firewall because if testing application inside firewall may fail to reveal all performance issues. With the help of cloud we can test application as real users will use application i.e outside of firewall and we can validate all components in delivery chain including firewall, DNS, ISP, network equipment.
Save Time and Reduced Cost ( Pay per use) - As mentioned earlier, we may not need entire performance test environment available at all times , hence we can save lot with pay per use delivery model. We can create instance images and save these images to launch another instances at later point of time when needed.

Challenges of Performance testing with Cloud:

Isolation of Root Cause - When we use cloud for performance testing activities , it becomes difficult to isolate the exact root cause of bottleneck discovered specially when we are not equipped with application performance monitoring tools. This technique is perfectly fine if there is a single source of bottleneck. Consider a situation when performance bottleneck is related to multiple problems both inside and outside the firewall. For this reason it is advisable to have an internal performance test environment where we can segregate root cause of performance problems inside firewall (if any).
Reproducing tests - Reproduction of performance defect in cloud is quite difficult (specially when this is linked to infrastructure ) because of variation in internet traffic and bandwidth availability at data center level.
Choosing right mix of computing needs - Some cloud providers delivers instances based on computing needs such as Amazon provides compute optimized and memory optimized instances. We should have clear understanding of computing needs at different layer of architecture such as App containers are most likely need Compute optimized instances where data base needs to be on memory optimized. Wrong decision can have impact on testing results.

Best Approach - Take Hybrid approach ( Internal + Cloud ) :

It is advisable to employ a two stage process where application should first be performance tested on internal testing environment with medium load conditions. By this way we will able to identify all design level issues. Once we are done with internal testing , we can move on cloud based testing to mimic real user behavior with large scale tests and can validate entire delivery chain. This approach offers following advantages:

Enables early testing in development cycle
Isolate design level issues before moving into large scale tests
Enables reproducible tests
Provides better understanding of each major area in delivery chain
It lowers performance testing cost