Monitoring Multiple Java Virtual Machines and Critical System Information in Real-time

If you've read our client-side routing blog, you would know we are focused on making searches within a Git repository better; for both code and code management metrics. To do this, we need to index as much as we can; this means every line change, from every commit, on every branch.

If you have never tried to index at this level before, it may not seem obvious, but it can be very time consuming and resource intensive. To put things into perspective, consider the following sagemath repository:

which has over 6000 branches. This repository may not be the norm, but it's certaintly not the exception, when in comes to Enterprise. And if you've read our client-side routing blog, you would know GitHub only indexes one tree, per repository. So not taking into consideration, all the other indexing GitSense has to do; indexing the sagemath repository, would be equivalent to GitHub indexing 6,000 repositories.

So as you can see, creating a better way to search and browse Bitbucket and GitHub, is not without its challenges. And in this blog post, we'll talk about our realtime-process monitoring system and how we use it to make indexing at this level more manageable.

Tracking Indexing Across Eight Different Repositories

In the above video, you'll find a recording of our process monitoring tool, as it tracks indexing across 8 different repositories, from level 1 to 4. In total, 971 branches were made commits/code searchable and at the peak of indexing, it was tracking the memory usage and garbage collection frequency from 17 JVMs in real-time.

If you would like to view the same recording in full HD and at a slower speed, you can find it on YouTube as well. However, before you head over to YouTube, we recommend you learn a little more about the points of interests in our monitoring tool.

Points of Interest - Process Monitor with No Indexing Activity

Points of Interests

  1. The uptime for the group's* process monitoring script. If you see an uptime with a strike through it, like so 00:10:25, it means the script has stopped running.
    * In GitSense, you don't index repositories per se, but rather, you index the group that the repository belongs to. We call these groups Logical Indexing Groups and they are designed to improve indexing efficiency and search performance. Since it's a bit involved to explain, we'll leave it for another blog post.
  2. System information, with the load average being the most important. With it, we can tell if we are pushing a system a little too hard or not enough.

  3. The amount of group RAM disk space used. Every group has their own RAM disk space and it's used to store short lived files, that requires very fast I/O, like process heartbeats and garbage collection logs.

  4. Current system memory SWAP usage. Ideally you would like to see this as close to zero as possible. If GitSense is installed on machines with 2GB or less RAM, you'll want to keep an eye on SWAP usage.

  5. Uptime for the Workers JVM. If it wasn't alive for very long and was not restarted on purpose, it's a tell-tale sign, that something went wrong. Checking the process log file will tell you more.

  6. The number of times the JVM has had to do major garbage collecting. A double digit number is never a good sign.

  7. The percentage of time, the JVM has spent garbage collecting within the last ten minutes. By default, the Workers process is configured to shut itself down, if it spends more than 30% of its time garbage collecting.

  8. The number of "Index Manager" threads minus one, is the number of indexing jobs that can be run in parallel. In this case, you can run 5 indexing jobs in parallel, since there are 6 Index Manager threads.

    Machines with 1GB of RAM or less, should NEVER index more than one group at a time. Unless the repos are quite small, you'll find the system will spend more time swapping, than indexing.

Points of Interest - Process Monitor with Indexing Activity

Points of Interest

  1. The number beside a thread, indicates the number of requests it is currently processing and/or the number of requests that has failed. You can click on the thread, to view the assigned/failed requests.

  2. Click the desktop icon or group number, to view the process monitoring page for that group.

  3. Click the ticket icon or request number, to view the request log.

  4. The amount of memory that is currently being used by the JVM.

  5. The percentage of time the JVM has spend garbage collecting in the last ten minutes.

  6. How long the index request, has had to wait in the indexing queue. In this case, index request #10 had to wait one second.

  7. How long the index request has been assigned for. In this case, 34 seconds.

Drilling into a Group

What we have shown so far, is how we monitor things from a bird's eye point of view. However, when something goes wrong, we'll often need to take a closer look. To do this, we can drill into a group, by clicking on the group's icon (shown in the video below) and it will bring us to the group's process monitoring page.

Tracking Indexing at the Group Level

If you would like to view the above video in full HD and at a slower speed, you can find it on YouTube as well.

Since we are going to talk about group process monitoring in greater detail, in our next blog about performance tuning, we'll stop here. We hope you found this blog informative and if are looking for a way to improve your GitHub and Bitbucket browsing experience, make sure to checkout GitHub+GitSense and Bitbucket+GitSense, to see how we are reimagining Bitbucket and GitHub browsing.

© 2016 SDE Solutions, Inc. All rights reserved.