The Power of New Relic Insights
Visibility is one of the most important aspects of a development team.
Visibility of the requirements gathering, the functionality expectations, the implementation across team members and squads, of deployments, of environments… Without visibility we are, appropriately, completely blind.
Information drives our decisions, our processes and our reactions. It allows us to make suitable assumptions and be proactive about improvements or bugs. To try and operate without key information is like shooting yourself in the foot — it will come back to you, and unlike a bullet wound, you’ll be left wondering why it hurts.
This is where New Relic can really shine. This article will cover the keywords, syntax, tables and functions available with Insights, as well as a few examples.
Why Use Insights?
As a provider of a platform to external clients, there are a lot of questions for our development teams:
- Who is using our app?
- Who are our main demographic?
- What are the most frequent transactions?
- What’s our revenue like?
- Where could we see improvements?
- How do we proceed with deprecations?
- Do I know for sure that everything is running well?
Monitoring for live production services is absolutely invaluable. There are a lot of different tools for doing this but personally, my favourite so far is New Relic Insights.
It allows you to create interesting and dynamic dashboards that displays realtime data and updates automatically every 5 minutes. You’ll never have to miss anything again.
I’ve introduced several dashboards to display metrics of interest to various teams, which are constantly rotated on a large TV for anyone in the area to take a look at. It provides useful and interesting information, and has allowed us to get in front of a fair number of support issues before they’ve even been reported.
New Relic Insights uses an in-house language called NRQL. The good news for a lot of developers is that this reads very much like SQL, so it’s likely that it won’t seem to strange to you if you’re familiar with this or similar query languages.
The best way to think of the query structure is “what information am I interested in”, “where do I find that”, “how do I want to narrow my search” and “what comparison do I want to make”.
For people who are more visually driven, I strongly recommend the Insights tool Data Explorer.
It will allow you to traverse the different event tables and attributes and slowly build queries according to what you want to see, displaying the query you would need to run to see that data at the top of the screen. It’s a great tool for, well, exploring! Just take some time to surf around and see if you can find anything that wakes your curiosity.
So, you’ve taken a look around and want to build your own query now. But how do you construct a query? First thing, know your keywords.
Choose which attributes to select from the table
Rename the attribute/function extracted to a more meaningful name
Indicates that the following table should be used for querying
Allows for filtering on a set on conditions
- AND, OR
Combine filter conditions
- LIKE, NOT LIKE
Allows for fuzzy text matching, or fuzzy exclusion, using the wildcard %
Same as “=”
Allows for a group of values to be provided for a filtering condition: httpResponseCode IN (400,500)
- SINCE, AGO
Can be used either together SINCE 3 hours AGO
Or only as SINCE using key phrases like yesterday or this week
Default time frame is the last hour
- COMPARE WITH
Used with key phrases like yesterday or last week in order to compare the queries between your specified time window (SINCE) and this
“GROUP BY” in SQL — groups results of a query by the specific attribute
Also supports grouping by multiple attributes
Allow for visualisation of data over a period of time as a line graph
Reserved Key Words/Phrases
Time frames have a number of reserved words that can be used with SINCE and COMPARE WITH such as:
- this week (year etc, will only go back to the ‘start’ of the specified time i.e. the Monday of that week)
- hour/s (and week/s etc)
There is also the keyword NULL — generally aggregate functions (covered later) will only operate on entries that have a non-null value for a given attribute. NULL can be useful for examining other cases where the values might not have been provided.
There are a number of useful functions provided by NRQL, and I’ve listed a few of the ones I’ve found the most helpful below.
Number of occurrences of either all transactions (*) or a specific attribute i.e. all transactions that have a non-null value for that attribute
- average(), sum(), stddev()
Returns the average, summed or standard deviation of the attribute across all transactions that have a non-null value
Can be used to limit results for one of the other query functions
- max(), min()
Finds the maximum or minimum recorded non-null value of the specific attribute, across all transactions. If attribute is non-numeric they return 0
Used in tandem to WHERE within the brackets. If the attribute is non-numeric, it returns 100%
Return attributes estimated value for a given set of percentiles:
SELECT percentile(duration, 5, 50, 95) FROM Transaction TIMESERIES
- uniques(), uniqueCount()
Provides a list of unique values for the given attribute
Counts the number of unique non-null values present for the given attribute
So you’re armed with a set of keywords and functions, but what do you do with them?
Some attributes are useful for querying on, and some are more useful to use for the filtering of data.
For querying, I’ve found these the most interesting:
Total server-side response time for the transaction, in seconds
- databaseDuration, databaseCallCount
Useful to retrieve database performance and activity for services that interact with monitored databases
Requests made from a given service to any external resource or service
Shows whether a request may have failed due to a timeout or an exception
If you happen to have a RequestId or similar identifier on your transactions that’s also a very useful attribute to extract — we’ve used it to immediately identify requests that have failed and be able to troubleshoot much easier.
For filtering I usually use a combination of the following attributes:
There are a number of possible values you can use but the one that is useful for spotting API traffic through a service is RestWebService
Any instrumented service, useful for querying the traffic through a specific web service
- name: Transaction name
In the case of web service traffic this will be the URL
Request method is usually also included, so can also be used to narrow the search if needed
Usually useful to use fuzzy matching i.e. name LIKE
Any HTTP response code 200, 201 etc
Useful for filtering errors or specific cases, such as successful checkouts
Free form string
Useful to identify the source of requests, such as Firefox, Chrome or a mobile app. Note that the backticks are required for this to work
Note that when filtering attributes, the values used in the comparisons should be in single quotes, and some attributes may require the use of backticks.
If you’d like to know which attributes might be available to you, where they are, and how they might be useful in your particular case, Data Explorer is a fantastic tool.
5th, 80th, 95 and 99th Percentiles of Time Taken to Perform Checkout This Week
SELECT percentile(duration, 5, 80, 95, 99)
WHERE appName = ‘myservice’
AND httpResponseCode = 200
SINCE this week
Percentage of Android Users This Week Compared to Last Week
SELECT percentage( count(*),
WHERE `request.headers.userAgent` LIKE ‘%Android%’
WHERE appName = ‘myservice’
SINCE this week
COMPARE WITH 1 week AGO
The Synthetics Journey Results from the Past Week for Mobile Apps
WHERE `App-Group`= ‘Mobile’
SINCE 1 week AGO
Or perhaps you want to see how many failures you’ve had compared to previous weeks, to monitor your improvements?
WHERE `App-Group`= ‘Mobile’
AND and result = ‘FAILED’
COMPARE WITH 1 week AGO
It’s as simple as that. These sorts of queries have allowed me to generate visualisations for our dashboard that tells us things like:
- How many active users are there? (reports on the throughput and revenue are we managing etc)
- How many different versions of a given client are active? (useful for making deprecation decisions)
- What percentage of our requests come from different platforms? (has helped us identify if there is an unusual increase in activity from a specific platform, or just to identify our most popular client type for research)
- How is our service performing? (easy to spot bottlenecks and identify slower APIs that could use improvements)
- Are there any errors? (a failing checkout request immediately appears and begins alerting us to a potential production issue)
You really have nothing to lose! I have found Insights to be fascinating at first, and just used it for my own entertainment really. Were there more iOS clients or Android? Were we busiest on a Friday evening or a Sunday lunchtime? It’s easy and intuitive to use, and has extensive well-written documentation available. Creating widgets and dashboards is so quick that putting together something out of your own interest is incredibly straightforward. Which I think is the main draw for me — it’s fun to use!
Very quickly I saw it actually had a lot greater value than that, and how much we could gain. Knowledge is power. Many people now refer to the dashboards I’ve created and I’m a part of an initiative to try and create something at a high level that will help us monitor our entire live production system health, not just a few of the critical services. Whether it’s only for your own interest, or because you want to learn how to leverage it to your team or company’s advantage, I cannot recommend it enough.