Wednesday, October 28, 2015

Understanding G1 Garbage Collector in Java

The Garbage First (G1) is a server style garbage collector targeted for multiprocessor machine having large amount of memory. G1 is designed to meet GC pause time goal with high probability of meeting this goal while maintaining GC throughput at acceptable level.

G1 GC follows different way of heap management. Instead of partition heap into fixed size of structures ( Young, Old and Perm gen)/generation , G1 takes different approach in which heap is partitioned in equal size of regions, each region is a contiguous range of virtual memory. Some regions sets in entire heap perform same role as in Old collectors ( parallel GC, CMS). These roles are eden, survivor and old. However these regions do not follow equal sizing as with earlier garbage collectors.





When G1 initiates garbage collection cycle , it first performs global marking phase where G1 generally determines liveliness of objects within entire heap. After marking phase is complete , G1 knows which regions are mostly empty and G1 collects these regions first. As its name implies Garbage First, G1 performs collection and compaction activity on those regions heap which are likely to be full of reclaimable objects that is garbage objects. Further, number of regions selected by G1 for collection are dependent on pause time target as G1 is designed to meet pause time goal.

With G1 GC , overall memory footprint of java process will be higher due to some additional data structures maintained by G1 for its internal purpose. These accounting structures are :

  • Remembered Set (RSet) : This is used to maintain object references into a given region. Each region holds its own RSet.
  • Collection Set (CSet) : It is set of regions that will be collected in GC. All live data in CSet is evacuated ( copied/moved) during a GC cycle.
Young Generation Collection in G1 GC:
  • Young generation memory is composed of set of non-contiguous regions. 
  • Young GC are basically stop the world events and application thread will be stopped during young GC cycle.
  • Young GC is done in parallel using multiple threads.
  • Objects are copied to new survivor or old regions.

 Old Generation Collection in G1 GC:

  • Initial Marking - This is STW phase and piggybacked on young GC cycle.
  • Root Region Scanning - This phase runs in concurrent with application thread. This phase scans survivor space for references into old generation.
  • Concurrent Marking : Find live objects over the entire heap. This happens while the application is running. This phase can be interrupted by young generation garbage collections.
  • Remark : Completes the marking of live object in the heap. Uses an algorithm called snapshot-at-the-beginning (SATB) which is much faster than what was used in the CMS collector.
  • Clean up :  This phase perform accounting on live objects and completely free regions. It then scrubs the RSet and reset empty regions and return those into free list.
  • Copying : This is the final phase of the multi-phase marking cycle. It is partly STW when G1 GC does live-ness accounting (to identify completely free regions and mixed garbage collection candidate regions) and when G1 GC scrubs the RSets. It is partly concurrent when G1 GC resets and returns the empty regions to the free list.
 Another thing to remember with respect to G1 is Humongous objects (H-Obj). If any objects spans more than 50%  or more of G1's region size , it is then considered as H-Obj and will be directly allocated into old regions.

Tuesday, October 27, 2015

Performance Testing in Agile Process

Performance testing is an essential activity in all software development projects including Agile ones. Agile development practice can help teams achieve faster time to market, adapt to changing requirements,provide a constant feedback loop.  This Agile transformation has introduced a new challenge in front of performance engineers - 

How do we manage non functional performance testing in Agile model ??
Traditional performance testing cycle usually best performed over long period of time, typically expect functionally stable builds,script development , test data generation, day to day debug tests ..etc. It is very difficult to adapt all these activities in a 2 weeks or shorter sprint.
Agile based performance Testing Approach:
  1. Definition of Done should include completion of performance testing activity within a sprint.
  2. Include performance engineering team member in scrum of scrum meeting.
  3. Start performance testing activity on dev box itself where focus should be on individual method execution time (unit level testing).
  4. Next proceed with component level testing to cover response time measurement of developed user stories candidate for performance testing. Here we will create automated scripts and start overnight execution of load tests using tools.
  5. Any issues found during unit and component level testing will be fixed in next subsequent sprint only.
  6. Perform system level ( End to End scenarios) during sprint hardening phase. By this time we can expect that all code / design level optimizations has already been completed as part of unit and component level testing. During initial days of sprint hardening perform load,stress and endurance tests.
So we talked about three different levels of testing in agile process :
  • Unit Level Testing : This level of testing will be performed on Dev box itself. This level of testing will validate database indexing, application cache mechanism ,method hot spots,  JDBC calls etc.
  •  Component Level Testing : During this level , we will validate transaction response time for performance specific user stories.
  • System Level Testing : This is execution of end to end user scenarios for defined or predicted work load. Here we will cover load,stress and endurance testing.
We have validated above mentioned approach and it worked fine in our case. However implementation of above approach is highly dependent on maturity of agile implementation in any given organization.