The Data Flow Visits flow allows you to retrieve all the visits generated over a time slot, with these properties (browser, geolocation, entry page, etc.).
Some of these properties are defined on the first page of the visit and will not change during the visit:
- Geolocation
- Browsers
- OS
- Entry pages
- Sources
- Supports
- Monitored traffic
- Hourly traffic
- Unique Visitor ID
Others, on the contrary, are dynamically updated with the different events received during the visit as:
- Page views per visit
- Time spent per visit
- Exit page
- Identified visitors
- Implication degree
The values of these properties are therefore subject to change if not all the events of the visits have been generated in the requested time slot.
Use Case
Here are 3 visits, all of which started between 10am and 11am:
Scenario A
If you retrieve the feeds every 30 minutes, you will receive 2 files:
- 10:00:00 to 10:29:59 which will contain the following visits:
Visit ID | Entry pages | Exit pages | Page views per visit | Visitor ID |
1 | Page A | Page J | 3 | - |
2 | Page F | Page G | 2 | DEF |
- 10:30:00 to 10:59:59:59 which will contain the following visits:
Visit ID | Entry pages | Exit pages | Page views per visit | Visitor ID |
3 | Page A | Page D | 3 | GHI |
Scenario B
If you retrieve the flows every hour, you will receive only 1 file:
- 10:00:00 to 10:59:59
Visit ID | Page d'entrée | Page de sortie | Pages Vues par Visite | Visiteur ID |
1 | Page A | Page D | 5 | ABC |
2 | Page F | Page G | 2 | DEF |
3 | Page A | Page D | 3 | GHI |
What happens when a visit doesn't start with a page?
If a visit starts with an event other than a page, some visit properties won’t be populated until the first page of the visit is loaded.
Let’s take the following example:
If you’ve set up an automated export every 30 minutes, the visit will be present in 2 files:
- File “01:00:00 to 01:29:59”: the visit will be present in the file, with no information on the Source, Geolocation, OS, Device, etc. because these properties are populated on page events only.
- File “01:30:00 to 01:59:59”: the visit will also be present in the file, with all the visit properties populated.
This means that, when inserting the data into your database, you need to deduplicate the rows based on the “Visit ID” and keep only the rows where the visit properties are populated.
This also applies on hourly exports if the visit starts with an event other than a page before N:59:59 and then a page is tracked after N+1:00:00.
Recommendations
As you can see from the scenarios above, the time at which you use the data is decisive. Some properties are valued AFTER the data call.
So you can make a first call, using file exports every 30 or 60 minutes to get the freshest data possible, but also plan an API recovery of consolidated data overnight to recover all data that is calculated after the visit is complete.