Performance Engineering: 2015

Monday, December 14, 2015

How to identify root cause for Frequent Minor Garbage Collections using DynaTrace

Frequent young generation (minor) GCs have two root causes :

Small young generation size for application work load
High Object allocation rate

In case your old generation size is growing quickly , it is quite possible that this is linked to small young generation size. This can easily be identified using Jconsole or jstat.

In case you have high object allocation rate then it becomes difficult to identify exact root cause for the same. Performance diagnostic tools such as dynatrace provides capability to identify root cause of such issue. In order to troubleshoot high object allocation rate in dynatrace , you need to add 'RunTime Suspension' dashlet.

RunTime Suspension report a massive statistical concentration of garbage collection in any particular function. This dashlet basically provides following line of information :

Method Name
Class Name
Suspension Count
Suspension Sum[ms]
Suspension Avg[ms]
Reason

Method with high suspension count suggests that method itself allocates enough objects to fill up young generation and thus triggers a GC. This way we can find prime candidate for optimization area.

Thursday, December 3, 2015

Performance Impact of Data Structure Resizing in Java

Java applications tend to make very frequent use of StringBuilder or StringBuffer for assempling of Strings. Both StringBuilder and StringBuffer use char[] internally for their data storage. As elements are added to StringBuffer or StringBuilder , the underlying char[] may be subject to resizing. As a result a new char[] of larger size (2x size) is allocated and elements from old char[] are copied into new char[]. Old char[] are discarded then and it becomes available for Garbage collection.This whole process results in consumption of extra CPU cycles for new array allocation, copying of elements from old array to new array and at some future point cost of garbage collection cycle.

Above facts also true for other java collections that uses array for its internal data storage such as ArrayList, Vector, ConcurrentHashMap and HashMap. Other Collections such as LinkedList or TreeMap often use one or more object references between elements stored to chain together the elements managed by the Collections. Collections that use array for data storage , their relevant classes also provides constructors that provide optional size arguments , but these constructors are often not used or size provided in application program is not optimal for Collection's use.

For HashMaps , to actually view whether resizing is happening or not, you need to perform memory profiling either run time profiling or offline using dumps. In either case, if you are observing java.util.HashMap.resize(int) method call, then your HashMap is getting resized during application execution.

Wednesday, December 2, 2015

Collecting and Analyzing Redis Performance Metrics

Redis is an open source, in memory advance key value store with optional persistence to disk. Two common use of Redis is caching and publish-subscribe queues.

How to access Redis performance metrics

Redis performance metrics is accessible through Redis command line interface (Redis-cli). Use the info command to print all performance metrics for your redis server.

Output of info command is grouped in following 10 groups :

server
client
memory
persistence
stats
replication
cpu
commandstats
cluster
keyspace

In case you just want to see metrics for any specific section, use info command. For example to print memory statistics for redis server use info memory :

Top Redis Performance Metrics:

Memory Usage : used_memory

The used_memory metrics reports the total number of bytes allocated by Redis. This metrics reflects the amount of memory in bytes that Redis has requested to store your data and supporting metadata it needs to run. This metrics doesn't consider memory wastage due to fragmentation , meaning amount of memory reported by this parameter will always be different than total amount of memory that Redis has been allocated by Operating System.

total_commands_processed

total_commands_processed metric give number of commands processed by Redis server. Tracking number of commands processed is critical for diagnosing latency issue in Redis instance. Because Redis is single threaded so commands are processed sequentially. The typical latency in 1 GB/s network is about 200 micro second. In case you are seeing slow response of commands and latency is greater than 200 micro second, this could be because of high number of commands in command queue.

mem_fragmentation_ratio

The mem_fragmentation _ratio metrics gives the ratio of memory used as seen by the operating system (used_memory_rss) to memory allocated by Redis (used_memory)

Memory Fragmentation ratio = used_memory_rss/used_memory

If fragmentation ratio is outside range of 1 to 1.5 , it is likely sign of poor memory fragmentation by either Operating system or Redis instance.

Evictions

The evicted_keys metrics gives number of keys removed by Redis due to hitting maxmemory limit.

Wednesday, October 28, 2015

Understanding G1 Garbage Collector in Java

The Garbage First (G1) is a server style garbage collector targeted for multiprocessor machine having large amount of memory. G1 is designed to meet GC pause time goal with high probability of meeting this goal while maintaining GC throughput at acceptable level.

G1 GC follows different way of heap management. Instead of partition heap into fixed size of structures ( Young, Old and Perm gen)/generation , G1 takes different approach in which heap is partitioned in equal size of regions, each region is a contiguous range of virtual memory. Some regions sets in entire heap perform same role as in Old collectors ( parallel GC, CMS). These roles are eden, survivor and old. However these regions do not follow equal sizing as with earlier garbage collectors.

When G1 initiates garbage collection cycle , it first performs global marking phase where G1 generally determines liveliness of objects within entire heap. After marking phase is complete , G1 knows which regions are mostly empty and G1 collects these regions first. As its name implies Garbage First, G1 performs collection and compaction activity on those regions heap which are likely to be full of reclaimable objects that is garbage objects. Further, number of regions selected by G1 for collection are dependent on pause time target as G1 is designed to meet pause time goal.

With G1 GC , overall memory footprint of java process will be higher due to some additional data structures maintained by G1 for its internal purpose. These accounting structures are :

Remembered Set (RSet) : This is used to maintain object references into a given region. Each region holds its own RSet.
Collection Set (CSet) : It is set of regions that will be collected in GC. All live data in CSet is evacuated ( copied/moved) during a GC cycle.

Young Generation Collection in G1 GC:

Young generation memory is composed of set of non-contiguous regions.
Young GC are basically stop the world events and application thread will be stopped during young GC cycle.
Young GC is done in parallel using multiple threads.
Objects are copied to new survivor or old regions.

Old Generation Collection in G1 GC:

Initial Marking - This is STW phase and piggybacked on young GC cycle.
Root Region Scanning - This phase runs in concurrent with application thread. This phase scans survivor space for references into old generation.
Concurrent Marking : Find live objects over the entire heap. This happens while the application is running. This phase can be interrupted by young generation garbage collections.
Remark : Completes the marking of live object in the heap. Uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what was used in the CMS collector.
Clean up : This phase perform accounting on live objects and completely free regions. It then scrubs the RSet and reset empty regions and return those into free list.
Copying : This is the final phase of the multi-phase marking cycle. It is partly STW when G1 GC does live-ness accounting (to identify completely free regions and mixed garbage collection candidate regions) and when G1 GC scrubs the RSets. It is partly concurrent when G1 GC resets and returns the empty regions to the free list.

Another thing to remember with respect to G1 is Humongous objects (H-Obj). If any objects spans more than 50% or more of G1's region size , it is then considered as H-Obj and will be directly allocated into old regions.

Tuesday, October 27, 2015

Performance Testing in Agile Process

Performance testing is an essential activity in all software development projects including Agile ones. Agile development practice can help teams achieve faster time to market, adapt to changing requirements,provide a constant feedback loop. This Agile transformation has introduced a new challenge in front of performance engineers -

How do we manage non functional performance testing in Agile model ??
Traditional performance testing cycle usually best performed over long period of time, typically expect functionally stable builds,script development , test data generation, day to day debug tests ..etc. It is very difficult to adapt all these activities in a 2 weeks or shorter sprint.
Agile based performance Testing Approach:

Definition of Done should include completion of performance testing activity within a sprint.
Include performance engineering team member in scrum of scrum meeting.
Start performance testing activity on dev box itself where focus should be on individual method execution time (unit level testing).
Next proceed with component level testing to cover response time measurement of developed user stories candidate for performance testing. Here we will create automated scripts and start overnight execution of load tests using tools.
Any issues found during unit and component level testing will be fixed in next subsequent sprint only.
Perform system level ( End to End scenarios) during sprint hardening phase. By this time we can expect that all code / design level optimizations has already been completed as part of unit and component level testing. During initial days of sprint hardening perform load,stress and endurance tests.

So we talked about three different levels of testing in agile process :

Unit Level Testing : This level of testing will be performed on Dev box itself. This level of testing will validate database indexing, application cache mechanism ,method hot spots, JDBC calls etc.
Component Level Testing : During this level , we will validate transaction response time for performance specific user stories.
System Level Testing : This is execution of end to end user scenarios for defined or predicted work load. Here we will cover load,stress and endurance testing.

We have validated above mentioned approach and it worked fine in our case. However implementation of above approach is highly dependent on maturity of agile implementation in any given organization.

Sunday, June 14, 2015

Performance Testing with Cloud

What exactly it means when we say "Performance Testing with Cloud". Before we get deep into this , let us first understand some basics about what exactly cloud is and how can we utilize cloud for our testing needs.

According to Gartner, the cloud is defined as

" a style of computing in which scalable and elastic IT-enabled capabilities are delivered as service to external customers using internet technologies."

Quite simple !!!! ..

This definition refers to following characteristics of cloud :

Scalability
Larger amount of resources
Offering end user services over internet

Technical translation of above will look like :

Service Oriented - Focus is on "What" we need instead of "How"
Elastic - Pay per use concept
Scalable
Internet connected

These are the design concepts behind cloud and what services we as an end user will get out of these design :

Infrastructure as a Service (IaaS) - Here service delivery focus on delivery of Physical or virtual machines , firewalls , load balancers and network infrastructure.
Platform as a Service ( PaaS) - Here service providers delivers a working platform which includes Operating Systems, development environment, data base , web servers and application containers etc. Use this service to develop application without worrying about licensing, buying and maintaining the platform.
Software as a Service (SaaS) - This model ensures provision of executable applications in cloud which are directly accessible to end users. For Example - We can utilize performance test tools ( BalzeMeter, StromRunner etc) in cloud where we need not to worry about it's installation and maintenance , we just need to get access to it and start work on this. Another example could be New Relic application monitoring as a SaaS model of delivery.

I believe above description of cloud and related services should be sufficient in order to relate cloud and use of cloud with performance testing activities.

Continuing on performance testing with cloud.. in order to perform performance testing activity what we really need :

Performance Test environment which matches to production configurations
Platform ( Web Server, App Servers , Load Balancers and Data Base)
Performance Testing tool to simulate real user activities
Performance Monitoring tool

Now we have two solutions with us in order to fulfill the same :

In House Set up of Environment, Platform and related licensing aspects along with license procurement of all related testing and monitoring tools. This will lead to huge set up and maintenance cost.
Use Cloud for desired services where all above performance testing needs can easily be satisfied with Iaas, PaaS and SaaS delivery models.

Benefits of Performance Testing with Cloud:

Perform large scale tests - In order to simulate thousands users of traffic on your application , you need to invest a lot in required hardware and configuring such a big test environment is really very time consuming. Today , we have to meet demand of fast pace development models ( Agile ) and to achieve the same , cloud services can help us in greater extent as we have all this ready in few clicks.
Perform more realistic tests - In order to performance test of your application which include complete delivery chain , we should target of testing application outside of our organization's firewall because if testing application inside firewall may fail to reveal all performance issues. With the help of cloud we can test application as real users will use application i.e outside of firewall and we can validate all components in delivery chain including firewall, DNS, ISP, network equipment.
Save Time and Reduced Cost ( Pay per use) - As mentioned earlier, we may not need entire performance test environment available at all times , hence we can save lot with pay per use delivery model. We can create instance images and save these images to launch another instances at later point of time when needed.

Challenges of Performance testing with Cloud:

Isolation of Root Cause - When we use cloud for performance testing activities , it becomes difficult to isolate the exact root cause of bottleneck discovered specially when we are not equipped with application performance monitoring tools. This technique is perfectly fine if there is a single source of bottleneck. Consider a situation when performance bottleneck is related to multiple problems both inside and outside the firewall. For this reason it is advisable to have an internal performance test environment where we can segregate root cause of performance problems inside firewall (if any).
Reproducing tests - Reproduction of performance defect in cloud is quite difficult (specially when this is linked to infrastructure ) because of variation in internet traffic and bandwidth availability at data center level.
Choosing right mix of computing needs - Some cloud providers delivers instances based on computing needs such as Amazon provides compute optimized and memory optimized instances. We should have clear understanding of computing needs at different layer of architecture such as App containers are most likely need Compute optimized instances where data base needs to be on memory optimized. Wrong decision can have impact on testing results.

Best Approach - Take Hybrid approach ( Internal + Cloud ) :

It is advisable to employ a two stage process where application should first be performance tested on internal testing environment with medium load conditions. By this way we will able to identify all design level issues. Once we are done with internal testing , we can move on cloud based testing to mimic real user behavior with large scale tests and can validate entire delivery chain. This approach offers following advantages:

Enables early testing in development cycle
Isolate design level issues before moving into large scale tests
Enables reproducible tests
Provides better understanding of each major area in delivery chain
It lowers performance testing cost

Thursday, June 11, 2015

How to Monitor Memcached

MemCahced is a high performance distributed memory object caching system. It helps in speeding of dynamic web application by alleviating database load.

During performance testing of application which uses MemCached as it's cache architecture, it becomes necessary to monitor health of MemCached. Although Memcached is fast enough to retrieve required data. For every thing it can, MemCached commands provide algorithmic complexity of O(1). Each commands takes roughly same amount of time every time.

Memcached expose its statistics and we can get those statistics using Stats commands. To execute stats command just telnet Memcahed port :

telnet Server Port

By default Memcahced use 11211 port.

Once you are connected to telnet session , type Stats and press enter , this will display MemCached performance metrics.

Below is the list of key performance metrics that we should monitor in order to collect required performance data:

bytes : Number of bytes currently used for caching items.
limit_maxbytes : Maximum configured cache size.
curr_connections : Number of open connections to this memcached server.
curr_items : Number of items currently in server's cache.
evictions : Number of objects removed from cache to free up memory for new items.
cmd_get : Number of get commands received since server start up not counting whether they are successful or not.
get_hits : Number of successful get operations since startup. Divide this by cmd_get to calculate cache hit rate.
get_misses : Number of failed get requests because key was not there in cache.
listen_disabled_num : Number of denied connections attempts because memcached reached it's max connection limit.
threads : Number of threads used by memcached server process.

Wednesday, June 10, 2015

Real Browser Performance Testing with Jmeter

In case you want to have more realistic performance test results , use of browser based performance testing is the way to go. As we know that Selenium has capability to perform browser based testing using WebDriver. This blog describes step by step procedure to configure Selenium WebDriver with Jmeter and how to use it for performance testing activities.

Configuring WebDriver Plugin:

Download Selenium web driver plugin from jmeter-plugins.org
Copy extracted jar files into /lib and lib/Ext
Delete Older / Duplicate Jars files from /Lib

Once you have completed above steps , you are done with configuration.

Creating Browser based Test Script in Jmeter - WebDriver Plugin:

Open Jmeter and Add Thread Group
Add Firefox Driver config' from config element

Add WebDriver Sampler
Add WebDriver code

Add Listener for Debugging
Run the Test

Tuesday, June 9, 2015

Understanding Google Analytics Metrics

Before we start reading about Google Analytic measures, it is worth to have a little background about basic concept of dimension and metrics.

In Google Analytics we have two types of data :

Dimensions : It describes characteristics of users , their sessions and actions. The dimension city describes a characteristics of sessions and indicates the city for example "Paris" , from which sessions was originated. The dimension page describes a characteristics of page view actions and indicates the URL of each page that was viewed.
Metrics : These are simply quantitative measurement of users, sessions and actions. Metrics are numerical data , basically they are numbers.

When extracting data for City as a primary dimension and Browser as a secondary dimension , we will have following view :

Google Analytics Metrics :

Visitors or Users : This metrics measures number of unique users that visit your site during certain period of time. This is most commonly used metrics used to measure overall size of audience. This can further be categorized as New Visitors and Returning visitors. This number is more accurate in telling you how many individual people visited your web site. In order to perform work load modelling for your performance test , always ask this metrics as part your requirement gathering.
Visits or Sessions : Visits are also known as sessions , are defined as a period of consecutive activity by the same user. By default in Google Analytics , a session persists until a user stops interacting the sites for 30 minutes.
Pageviews : Within each visit or session , your users will engage in one or more interaction with your web pages. Google Analytics will automatically track these interactions as "pageviews". Pageviews metrics counts every time a page is viewed on your site.
Pages per Session : This is average number of pages viewed during a session on your web site. More pages per session indicates user is quite engaged with your web site.
Average Session Duration : This is average length of user's sessions. Higher number again indicates users are more engaged with your web site.
Bounce Rate : This is percentage of visits that are single page only i.e users who visits single page and leave.
% New Sessions : This is an average percentage of first time visitors on your web site.

We will get all above mentioned metrics under Audience Overview :

Monday, June 8, 2015

Performance Testing of Micro Service Based Architecture

Background:

Micro services are a style of software architecture in which system is delivered using a small set of services which are granular, independent and collaborating services. It is a technique of applying single responsibility principle at architectural level.

Micro services are often integrated using REST over HTTP.

Layered Architecture of Micro services :

Resources handle incoming requests. They validate request format, delegate to services and package response. For RESTFUL services this includes deserialization of requests, authentication, serialization of response and mapping exceptions to HTTP status codes.

Services represents core business logic. They may collaborate with other services, adapters or repositories to retrieve required data to fulfill a request. Services only consumes and produces domain objects. They don't interact with DTOs from persistence layer and transport layer objects.

Adapters handles outgoing requests to external services. They are responsible to marshal requests, unmarshal responses, and map them to domain objects. Object mappers are widely used at this layer.

Repositories handles transactions with persistence layer.

A lightweight micro service may combine one or more of the above layer in a single component.

Performance Testing challenges:

Whole application is not available from starting instead we have set of fully functional modules which later plug into end product.
Use of different technologies in micro service development.
Interaction among micro services is not readily available unless there is not single point of contact who has complete view of entire solution.
Performance monitoring is a big challenge considering different technological aspect in micro services development ( Message Brokers, NOSQL, DataBase, N number of independent running JVMs etc).
We need different benchmarks for capacity planning considering benchmarking activities specific to technologies (e.g Heap sizing needs different benchmarks for JVM based services and different for allied services like Node.js apps and .NET CLR will also needs its separate benchmarks)

Performance Testing of Micro Services - Approach

While designing performance testing approach for micro service based architecture, we should consider solutions to above mentioned challenges.

Consider shortening your selection from lot of services bunch to have focus on those services which represent critical business activities model.
Try to build a service interaction diagram for all of your performance scenarios.
Always start with performance testing of services in isolation manner rather than replaying end to end business scenario in an integrated environment. Once you have separate performance benchmarks for each services in isolation then focus on integrated test / business scenarios.
Performance monitoring is critical during testing activities , and use of commercial agent based monitoring solution can increase cost of testing to great extent. Although SaaS based monitoring solutions is also a solution but use of open source tools and in house developed API profiling solution will reduce testing cost to great extent.

Testing Phase Pyramid

Saturday, May 16, 2015

Disk I/O in Amazon RDS and EC2 instances

Disk I/O in Amazon RDS/EC2 instances

Amazon RDS/EC2 uses EBS volume for database and log storage. Depending on the size of storage requested , Amazon RDS automatically stripes across multiple EBS volumes to enhance IOPS performance.

Amazon EBS provides 3 volume types :

1. General Purpose (SSD) volumes

2. Provisioned IOPS (SSD) Volumes

3. Magnetic Volumes

Before we further discuss characteristics of each of these EBS volume, let us first take a look what IOPS are and how measurement take places for these.

What are IOPS:

IOPS are input and output operations per second. Amazon EBS measures each IO operation per second ( that is 256 kb or smaller) as one IOPS. I/O operations larger than 256KB are counted in 256KB units. For example 1024 KB I/O operation will be counted as 4 IOPS.

General Purpose (SSD) volumes:

General purpose (SSD) volumes offers cost effective storage that is ideal for a broad range of workloads. These volume can offer single digit millisecond latency , the ability to burst to 3000 IOPS for extended period of time. These volume can range from 1 GB to 16 TB.

General purpose (SSD) volume provide baseline performance of 3 IOPS/ GB and maximum of 10000 IOPS (3334 GB). Throughput of these volumes range in between of 128 MB/s to 160 MB/s.

Performance of general purpose (SSD) volume is governed by volume size. Higher the size , higher accumulation of I/O credits where I/O credits represent available bandwidth that General Purpose (SSD) volume can use to burst large I/O when more than baseline performance is required.

Provisioned IOPS (SSD) Volumes:

Provisioned IOPS (SSD) volumes are designed to meet the need of I/O intensive workloads , mainly database workloads that are very much sensitive to storage performance. Here while configuring (creating volume) provisioned IOPS (SSD) volume, we have to specify desired IOPS rate. These volume can range from 4 GB to 16 TB with a maximum throughput 320 MB/s. We can achieve maximum of 20000 IOPS with these volumes.

Magnetic Volume:

Magnetic volumes provide the lowest cost per gigabyte of all EBS volume types. Magnetic volumes are backed by magnetic drives and are ideal for workloads performing sequential reads, workloads where data is accessed infrequently, and scenarios where the lowest storage cost is important. These volumes deliver approximately 100 IOPS on average and they can range in size from 1 GB to 1 TB.

Wednesday, May 13, 2015

APM - Real User Monitoring Vs Synthetic Monitoring

As part of setting up application performance monitoring framework , this is very critical to measure end user experience. End user experience monitoring sits on the top of APM framework. Further business judge their web application performance in terms of responsiveness of application. There are two approach by which we can monitor end user experience :

1. Real User Monitoring ( RUM)
2. Synthetic Monitoring

Real User Monitoring ( RUM):

While server side performance can be measured by looking HTTP requests in data center, full page load experience which includes downloading of static contents, rendering page, executing java script - can not be seen from server monitoring aspects. Real User Monitoring is the practice of using java script agent embedded in web page to gather performance data about end user's browsing experience.

Synthetic Monitoring:

Synthetic performance monitoring involves having external agents which run scripted transaction against web application. These scripts are meant to follow steps which is expected from typical user behavior. This might be searching product , log in activity, hotel booking etc. Synthetic monitoring don't track real user's session.

We have to careful while setting up monitoring , which one best suites to our needs. While Synthetic monitoring provides details about reliability, availability whereas RUM provides real user browing experience.

Happy Monitoring !!!!!

Tuesday, May 12, 2015

String Deduplication - New Java 8 update 20 feature

Duplicate Strings in JVM heap ..Worried ??

During JVM heap dump parsing, it is quite common to visualize that Strings are consuming lots of heap space. Especially char[] are always ( most of the cases) present as biggest object in histogram.

If we perform a deeper analysis on String objects it is quite common to encounter thousands of duplicate String instances. Controlling these duplicate Strings may provide improvement in overall JVM memory footprints.

With Java 8 update 20 we now have a new features called String Deduplication which require G1 garbage collector in place and this feature is turned off by default.

Here G1 GC identifies Strings which are duplicated more than once and correct them to point to same internal char[], to avoid multiple copies of the same string.

Parameter : -XX:+UseStringDeduplication

How to capture Connection Time in Jmeter 2.13

Jmeter 2.13 has introduced a new key performance indicator that is connection time. Why is this important ? It takes time to connect to a server before making a HTTP request and this can have impact on response time , specially for HTTPS traffic.

Starting from 2.13 version , new metric ConnectTime has been added. It represent the time to establish connection. By default it is not saved to CSV or XML. In order to save it we need to add following line in user.properties :

jmeter.save.saveservice.connect_time=true

Performance Engineering