Quantcast
Channel: OpsGenie Blog
Viewing all articles
Browse latest Browse all 204

Mobile Application Performance Management with New Relic and Crashlytics

$
0
0

Mobile development requires hard effort to meet the expectations and needs of customers. OpsGenie Mobile Apps provide a user-friendly UI in parallel with a good user experience design; however, mobile development needs much more! Mobile apps should be fast, stable and memory-friendly beside providing a user-friendly UI. Therefore, we are monitoring OpsGenie Mobile apps continuously with the help of New Relic and Crashlytics to be able to improve our apps continuously (and of course applicatively).

It makes sense to begin with what we mean by monitoring our mobile apps: this is a process of Mobile APM (Application Performance Management) which allows developers to see the end-to-end performance of mobile applications with deep and actionable insight into real users and sessions as they happen. So what are the roles of New Relic and Crashlytics in improving the performance and stability of OpsGenie Mobile Apps? To answer this question, we have to review what kind of data we collect via New Relic and Crashlytics.

Let's start with New Relic. Being integrated with New Relic Mobile APM, we gather data of:

  • View loading durations and number of loadings for each page
  • Usage & session statistics
  • HTTP response times, request rates, traffic statistics and error rates
  • Memory usage statistics
  • each according to OS versions, device types, connection types and geography. New Relic Mobile APM also provides interaction trail within app if a crash occurs.

We have benefited from New Relic throughout development process and we are still benefitting from. For instance, we realized that mean view loading time of Dashboard page was above 900ms (which was nearly twice the closest page's) within the first week after the release of OpsGenie iOS v2 Beta. Although there are more subviews and image loadings on Alert List page than Dashboard, its mean view loading time was nearly 400ms. These metrics has pushed us to review Dashboard View Controller and we found two issues: Firstly, each time dashboard view is loaded, image loadings were taking long time and we have fixed this by caching the UIImage objects once they are initialized. Secondly, we recognized that subviews which are UILabel and UIImageView components are re-initialised (unnecessarily) after the dashboard data is received (number of open/unacked alerts and on-call status). After fixing the issues with Dashboard, we have also reviewed other pages and the following is the average loading times for each page within last week:

For another example, we recognized that the number of HTTP errors while fetching schedule layers was above 5%. After working on it, we have fixed an issue which occurs due to mismatching of some locale parameters between devices which contain our app and our server side.

As it can be seen, New Relic Mobile APM provides nearly all data which are essential for application performance management. However, if any of users faces a crash while using OpsGenie Mobile Apps, interaction trail and stack trace which New Relic provides are not sufficient to find the reason behind. However, because being unable to resolve a crash issue is not acceptable for us, Crashlytics steps in at this point. Being integrated with Crashlytics, we generate session-spesific custom logs and detailed crash reports for each session and collect them if a crash occurs. We can say that detailed logs which speak the same language with developer are much more helpful to find out the reason behind a crash, instead of receiving only a stack trace and an interaction trail. Therefore, we decided to be integrated with both New Relic and Crashlytics.

As an example of how Crashlytics helps us to improve, let's review a real case. One of our iOS beta app users has faced a crash and we received crash report of this issue from both New Relic and OpsGenie. Stack trace of the crashed thread was as follows:
Thread : com.apple.main-thread
0 libsystem_kernel.dylib 0x0000000197b60e7c mach_msg_trap + 8
1 libsystem_kernel.dylib 0x0000000197b60cf8 mach_msg + 72
2 CoreFoundation 0x0000000186c61ed0 __CFRunLoopServiceMachPort + 200
3 CoreFoundation 0x0000000186c5fe24 __CFRunLoopRun + 940
4 CoreFoundation 0x0000000186b8d0a4 CFRunLoopRunSpecific + 396
5 GraphicsServices 0x000000018fd275a4 GSEventRunModal + 168
6 UIKit 0x000000018b4beaa4 UIApplicationMain + 1488
7 opsgenie 0x000000010008ae8c top_level_code(AppDelegate.swift)
8 opsgenie 0x000000010008afb8 main (AppDelegate.swift)
9 libdyld.dylib 0x0000000197a62a08 start + 4
This stack trace gives nothing to find out what is the problem. If we had only New Relic to monitor OpsGenie Mobile Apps, we would most probably not be able to find the problem while having only this stack trace. However, when we looked at our custom logs which are generated using Crashlytics:
0 | 00:00:00:060 | $ willFinishLaunching
1 | 00:00:00:061 | $ didFinishLaunchingWithOptions
2 | 00:00:00:066 | $ willRegisterUserNotificationSettings
3 | 00:00:00:095 | $ OS version: iOS8
4 | 00:00:00:105 | $ Reference saved: SplashScreen = Optional(opsgenie.SplashScreenViewController: 0x12fd10a60)
5 | 00:00:00:169 | $ didBecomeActive
6 | 00:00:00:175 | $ didRegisterUserNotificationSettings
7 | 00:00:00:212 | $ Credentials read from Keychain
8 | 00:00:00:220 | $ LoginRequest
9 | 00:00:00:222 | $ didRegisterForRemoteNotificationsWithDeviceToken:
10 | 00:00:01:812 | $ AddMobileDestinationRequest
11 | 00:00:01:828 | $ Reference saved: MyNavigation = Optional(opsgenie.MyNavigationController: 0x12fd2d420)
12 | 00:00:01:834 | $ Side menu table view loaded
13 | 00:00:01:836 | $ GetAlertListRequest
14 | 00:00:01:837 | $ GetAlertListRequest
15 | 00:00:01:838 | $ GetAlertListRequest
16 | 00:00:01:839 | $ GetAlertListRequest
17 | 00:00:01:840 | $ GetAlertListRequest
18 | 00:00:01:840 | $ GetUserListRequest
19 | 00:00:01:841 | $ GetRecipientCandidatesListRequest
20 | 00:00:01:861 | $ Reference saved: Dashboard = Optional(opsgenie.DashboardViewController: 0x12fd2d7f0)
21 | 00:00:01:887 | $ GetDashboardAlertInfoRequest
22 | 00:00:01:887 | $ GetDashboardOncallScheduleInfoRequest
23 | 00:00:01:889 | $ Reference saved: SplashScreen = nil
24 | 00:00:02:442 | $ opsgenie.SplashScreenViewController: 0x12fd10a60View disappeared
25 | 00:00:03:002 | $ Received 10.filter:all insertedAt:
26 | 00:00:03:025 | $ Received 10.filter:open insertedAt:
27 | 00:00:03:041 | $ Received 10.filter:closed insertedAt:
28 | 00:00:03:276 | $ Received 10.filter:unacked insertedAt:
29 | 00:00:03:314 | $ Received 10.filter:notseen insertedAt:
30 | 00:00:26:839 | $ willResignActive
31 | 00:00:26:849 | $ didReceiveRemoteNotification
32 | 00:00:28:738 | $ didReceiveRemoteNotification
we found an issue which causes a crash and occurs when a push notification is received while multitasking screen of iOS (screen which can be opened by tapping Home button twice quickly while the device is not locked) is active and application is visible on multitasking screen (not terminated). This log sequence may seem not useful; however, it says a lot! As it can be seen, there is not a log message which says "didEnterBackground" after "willResignActive" message. However, we are logging changes on application states. It means that the user received two consecutive remote notifications while application is inactive but before entering background and these logs leaded us to make tests while multitasking screen is active.

Because we want to avoid receiving any crash report from Crashlytics, we have to benefit from Crashlytics. Furthermore, data which are provided by Crashlytics Answers helps us seeing the whole picture about the usage of our app.

Considering that we collect so much data about our iOS & Android Mobile Apps, a simple question arises as a matter of course: Why we collect those data continuously? The answer might be long but quite easy:

  • We want to learn when a problem occurs even if our customers do not recognize it, to be able to solve within the shortest time
  • We want to have a better understanding of how our users are interacting with OpsGenie Mobile Apps to enhance our user experience design
  • We want to analyze view loading durations for each page and response time of each HTTP request to be able to keep getting faster
  • We want to stay as a memory-friendly app
  • We want to figure out how our backend services impact the performance of OpsGenie Mobile Apps
  • We want to troubleshoot across different devices and operating systems
  • We need to know immediately if one of our users faces a crash to fix the issue within shortest time

We believe that everything improvable should be improved, and New Relic and Crashlytics help us decide on the next point. We keep our eyes on ourselves!


Viewing all articles
Browse latest Browse all 204

Trending Articles