# Web analytics¶

## Basic steps of the web analytics process¶

### Web analytics technologies¶

There are at least two categories of web analytics; off-site and on-site web analytics.

• Off-site web analytics refers to web measurement and analysis regardless of whether you own or maintain a website. It includes the measurement of a website’s potential audience (opportunity), share of voice (visibility), and buzz (comments) that is happening on the Internet as a whole.
• On-site web analytics, the most common, measure a visitor’s behavior once on your website. This includes its drivers and conversions ; for example, the degree to which different landing pages are associated with online purchases . On-site web analytics measures the performance of your website in a commercial context. This data is typically compared against key performance indicators for performance, and used to improve a website or marketing campaign’s audience response. Google Analytics and Adobe Analytics are the most widely used on-site web analytics service; although new tools are emerging that provide additional layers of information, including heat maps and session replay_ .

Historically, web analytics has been used to refer to on-site visitor measurement. However, this meaning has become blurred, mainly because vendors are producing tools that span both categories. Many different vendors provide on-site web analytics software and services. There are two main technical ways of collecting the data. The first and traditional method, server log file analysis, reads the logfiles in which the web server records file requests by browsers. The second method, page tagging, uses JavaScript embedded in the webpage to make image requests to a third-party analytics-dedicated server, whenever a webpage is rendered by a web browser or, if desired, when a mouse click occurs. Both collect data that can be processed to produce web traffic reports.

Note

• processing of drivers and conversions into information
• page tagging

### Page tagging¶

Concerns about the accuracy of log file analysis in the presence of caching, and the desire to be able to perform web analytics as an outsourced service, led to the second data collection method, page tagging or ‘Web bugs’.

In the mid-1990s, Web counters were commonly seen — these were images included in a web page that showed the number of times the image had been requested , which was an estimate of the number of visits to that page. In the late 1990s this concept evolved to include a small invisible image instead of a visible one, and, by using JavaScript, to pass along with the image request certain information about the page and the visitor. This information can then be processed remotely by a web analytics company, and extensive statistics generated.

The web analytics service also manages the process of assigning a cookie to the user , which can uniquely identify them during their visit and in subsequent visits. Cookie acceptance rates vary significantly between websites and may affect the quality of data collected and reported.

Collecting website data using a third-party data collection server (or even an in-house data collection server) requires an additional DNS look-up by the user’s computer to determine the IP address of the collection server. On occasion, delays in completing a successful or failed DNS look-ups may result in data not being collected.

With the increasing popularity of Ajax-based solutions , an alternative to the use of an invisible image is to implement a call back to the server from the rendered page. In this case, when the page is rendered on the web browser, a piece of Ajax code would call back to the server and pass information about the client that can then be aggregated by a web analytics company. This is in some ways flawed by browser restrictions on the servers which can be contacted with XmlHttpRequest objects. Also, this method can lead to slightly lower reported traffic levels, since the visitor may stop the page from loading in mid-response before the Ajax call is made.

Note

### Geolocation of visitors¶

This information is used by businesses for online audience segmentation in applications such online advertising, behavioral targeting_ , content localization (or website localization), digital rights management, personalization, online fraud detection, localized search, enhanced analytics, global traffic management, and content distribution.

Note

TODO behavioral targeting

### Click analytics¶

Click analytics is a special type of web analytics that gives special attention to clicks.

Commonly, click analytics focuses on on-site analytics. An editor of a website uses click analytics to determine the performance of his or her particular site, with regards to where the users of the site are clicking.

Also, click analytics may happen real-time or “unreal”-time, depending on the type of information sought. Typically, front-page editors on high-traffic news media sites will want to monitor their pages in real-time, to optimize the content. Editors, designers or other types of stakeholders may analyze clicks on a wider time frame to help them assess performance of writers, design elements or advertisements etc.

Data about clicks may be gathered in at least two ways. Ideally, a click is “logged” when it occurs, and this method requires some functionality that picks up relevant information when the event occurs. Alternatively, one may institute the assumption that a page view is a result of a click, and therefore log a simulated click that led to that page view .

## On-site web analytics - definitions¶

There are no globally agreed definitions within web analytics as the industry bodies have been trying to agree on definitions that are useful and definitive for some time. The main bodies who have had input in this area have been the IAB (Interactive Advertising Bureau), JICWEBS (The Joint Industry Committee for Web Standards in the UK and Ireland), and The DAA (Digital Analytics Association), formally known as the WAA (Web Analytics Association, US). However, many terms are used in consistent ways from one major analytics tool to another, so the following list, based on those conventions, can be a useful starting point:

• Hit - A request for a file from the web server. Available only in log analysis. The number of hits received by a website is frequently cited to assert its popularity, but this number is extremely misleading and dramatically overestimates popularity. A single web-page typically consists of multiple (often dozens) of discrete files, each of which is counted as a hit as the page is downloaded, so the number of hits is really an arbitrary number more reflective of the complexity of individual pages on the website than the website’s actual popularity. The total number of visits or page views provides a more realistic and accurate assessment of popularity.
• Page view - A request for a file, or sometimes an event such as a mouse click, that is defined as a page in the setup of the web analytics tool. An occurrence of the script being run in page tagging. In log analysis, a single page view may generate multiple hits as all the resources required to view the page (images, .js and .css files) are also requested from the web server.
• Event - A discrete action or class of actions that occurs on a website. A page view is a type of event. Events also encapsulate clicks, form submissions, keypress events, and other client-side user actions.
• Visit / Session - A visit or session is defined as a series of page requests or, in the case of tags, image requests from the same uniquely identified client . A unique client is commonly identified by an IP address or a unique ID that is placed in the browser cookie. A visit is considered ended when no requests have been recorded in some number of elapsed minutes. A 30-minute limit (“time out”) is used by many analytics tools but can, in some tools (such as Google Analytics), be changed to another number of minutes. Analytics data collectors and analysis tools have no reliable way of knowing if a visitor has looked at other sites between page views; a visit is considered one visit as long as the events (page views, clicks, whatever is being recorded) are 30 minutes or less closer together. Note that a visit can consist of one page view, or thousands. A unique visit’s session can also be extended if the time between page loads indicates that a visitor has been viewing the pages continuously.
• First Visit / First Session - (also called ‘Absolute Unique Visitor’ in some tools) A visit from a uniquely identified client that has theoretically not made any previous visits. Since the only way of knowing whether the uniquely identified client has been to the site before is the presence of a persistent cookie or via digital fingerprinting that had been received on a previous visit, the First Visit label is not reliable if the site’s cookies have been deleted since their previous visit.
• Visitor / Unique Visitor / Unique User - The uniquely identified client that is generating page views or hits within a defined time period (e.g. day, week or month). A uniquely identified client is usually a combination of a machine (one’s desktop computer at work for example) and a browser (Firefox on that machine). The identification is usually via a persistent cookie that has been placed on the computer by the site page code. An older method, used in log file analysis, is the unique combination of the computer’s IP address and the User Agent (browser) information provided to the web server by the browser. It is important to understand that the “Visitor” is not the same as the human being sitting at the computer at the time of the visit, since an individual human can use different computers or, on the same computer, can use different browsers, and will be seen as a different visitor in each circumstance. Increasingly, but still somewhat rarely, visitors are uniquely identified by Flash LSO’s (Local Shared Object), which are less susceptible to privacy enforcement.
• Repeat Visitor - A visitor that has made at least one previous visit. The period between the last and current visit is called visitor recency and is measured in days.
• Return Visitor - A Unique visitor with activity consisting of a visit to a site during a reporting period and where the Unique visitor visited the site prior to the reporting period. The individual is counted only once during the reporting period.
• New Visitor - A visitor that has not made any previous visits. This definition creates a certain amount of confusion (see common confusions below), and is sometimes substituted with analysis of first visits.
• Impression - The most common definition of “Impression” is an instance of an advertisement appearing on a viewed page. Note that an advertisement can be displayed on a viewed page below the area actually displayed on the screen, so most measures of impressions do not necessarily mean an advertisement has been view-able.
• Single Page Visit / Singleton - A visit in which only a single page is viewed (a ‘bounce’).
• Bounce Rate - The percentage of visits that are single page visits.
• Exit Rate / % Exit - A statistic applied to an individual page, not a web site. The percentage of visits seeing a page where that page is the final page viewed in the visit.
• Page Time Viewed / Page Visibility Time / Page View Duration - The time a single page (or a blog, Ad Banner…) is on the screen, measured as the calculated difference between the time of the request for that page and the time of the next recorded request. If there is no next recorded request, then the viewing time of that instance of that page is not included in reports.
• Session Duration / Visit Duration - Average amount of time that visitors spend on the site each time they visit.It is calculated as the sum total of the duration of all the sessions divided by the total number of sessions. This metric can be complicated by the fact that analytics programs can not measure the length of the final page view.[10]
• Average Page View Duration - Average amount of time that visitors spend on an average page of the site.
• Active Time / Engagement Time - Average amount of time that visitors spend actually interacting with content on a web page, based on mouse moves, clicks, hovers and scrolls. Unlike Session Duration and Page View Duration / Time on Page, this metric can accurately measure the length of engagement in the final page view, but it is not available in many analytics tools or data collection methods.
• Average Page Depth / Page Views per Average Session - Page Depth is the approximate “size” of an average visit, calculated by dividing total number of page views by total number of visits.
• Frequency / Session per Unique - Frequency measures how often visitors come to a website in a given time period. It is calculated by dividing the total number of sessions (or visits) by the total number of unique visitors during a specified time period, such as a month or year. Sometimes it is used interchangeable with the term “loyalty.”
• Click path - the chronological sequence of page views within a visit or session.
• Click - “refers to a single instance of a user following a hyperlink from one page in a site to another”.
• Site Overlay is a report technique in which statistics (clicks) or hot spots are superimposed, by physical location, on a visual snapshot of the web page.

Note

visit 和 visitor 的区别，前者等价于 session，后者是一个 client。

Note

• 使用 page flow 做近似的 click 行为计算
• mobile site 每天的 visit 中，first visit 占比多少？
• 我们好像一直没有统计过 visitor recency ，可能因为计算成本比较高。
• 分析 mobile site 各个页面作为 lander 时的 bounce rate
• 使用 page flow 分析新用户在各个页面的 exit rate
• 分析 App 中各个页面的 page time viewed
• loyalty 的数据好像也没有