Like many websites, we use Google Analytics to track data about our visitors and what they do on our sites. However, Tuts+ is a fair bit bigger than a lot of those sites, and at our size, we run into a few problems using it. Here's what I've learned about working with Google Analytics at scale. 1. Slice Data Into Smaller Date Ranges to Deal With Sampling Issues Google Analytics will only return 500,000 rows of data for any query you send it (with a few exceptions). If you make a request where the result covers more than this amount of data, Google Analytics will take a sample of 500,000 rows, and multiply it up as necessary. For example, if you have five million visitors in September, and you ask Google Analytics for a report on where those visitors came from across the month, Google Analytics will select 10% of the data it has about your September visitors, figure out where those 10% of visitors came from, and multiply those numbers by 10 to generate the report it gives you. This type of extrapolation is a common statistical technique, and a reasonable way for Google to cut down on processing time
↧