Magic Site Integration

Magic Site Integration

Introduction

In this article we will provide you with all the information on how the Magic Site Integration application performs the scan over the pages of the linked website in order to enrich the URL content.

The goal, in addition to providing a technical understanding of how the application works, is to provide some guidelines on how to optimally configure your pages to have the URL content properly enriched with all the main information.

 

Retrieve the url to be tracked

One of the pillars of Magic Site Integration is the retrieval of the url on which the content has to be created and which has to be used for tracking events.

There are three different ways to retrieve such information:

 

1. OG tag 

If the "Use OG tags for content enrichment" flag is active, Magic Site Integration will try to retrieve the url from the "content" field of the OG tag "url":

<meta property="og:url" content="http://www.someUrl.com/ToBe/tracked" />

 

2. Canonical

If the OG tag "url" does not exist or the flag is disabled, Magic Site Integration will try to retrieve the url from the "canonical" tag:

<link rel="canonical" href="http://www.example.com/somePath"/>

 

3. Website type

If this tag is unavailable, Magic Site Integration will try to dynamically retrieve the url based on the "Website type" parameter which is available in the application's management:

  1. Static: the url from which the javascript is invoked is retrieved, and all query params and anchors are removed.
  2. Dynamic: the url from which the javascript is invoked is retrieved, and all blacklisted query params and anchors are removed.
  3. Single Page Application: the url from which the javascript is invoked is retrieved, and all blacklisted query params are removed.

Blacklisted query params are:

  • 'utm_source'
  • 'utm_medium'
  • 'utm_term'
  • 'utm_content'
  • 'utm_campaign'
  • '_'

 

3.1 Manual mode

There's an additional choice which is "Manual". Using this mode you will have to manually select the pages of the website to be imported by invoking the following track method:

trackMS('<url here>');

 

 Page Scraping

Once the url has been identified, the scraping of the page can start and will extract the following data:

  • Use og tags for content enrichment: to tell the scraper to use og tags (Open Graph Protocol) where possible.
  • Use metatag keywords: to tell the scraper to try retrieving the keywords from specific html tags within the page.
  • Analyze dynamic language pages into tell the scraper the language to be used as preferred. A default value will be provided if the "lang" tag can not be retrieved from the html.
  • Scan pages using the following User Agent: to set a custom User Agent to be used by the scraper when scanning the website.

 

Data extraction according on the type of information

We will now look at the data within the page used to extract basic content information. In order to optimize the performance of the Magic Site Integration and to have URL Content fully enriched, we recommend you to  verify that such information is present on the website pages you want to integrate.

 

Tags

If the flag “Use metatag keywords” is on, content tags will be extracted from the html.

<meta name="keywords" content="tag1 tag2 tag3 tag4">

If the flag "Use OG tags for content enrichment" is on too, content tags will be extracted from the following og tags:  “video:tag”, “article:tag” o “book:tag”, otherwise they will be extracted from the meta tags "keyword" like in the example above.

<meta property="og:video:tag" content="tag1 tag2 tag3 tag4" />

<meta property="og:article:tag" content="tag1 tag2 tag3 tag4" />

<meta property="og:book:tag" content="tag1 tag2 tag3 tag4" />

 

Title

The title will be extracted from the og:title tag, if it exists:

<meta property="og:title" content="Page Title" />

if such tag is not present, it will be extracted from the html "title" tag:

<title>Page title</title>

If none of those tags is present, the title will be created from the url.

 

Description

If the flag "Use OG tags for content enrichment" is enabled, content description will be extracted from the “og:description” tag (if present):

<meta property="og:description" content="Page description" />

if such tag is not present, content description will be extracted from the meta description tag:

<meta name="description" content="Page description" />

 

Thumbnail

If the flag "Use OG tags for content enrichment" is enabled, content thumbnail will be extracted from the “og:image” tag.

<meta property="og:image" content="http://someimageurl.com/imageName.jpg" />

If the flag is not enabled or the og:image tag is not present, the thumbnail will be generated from a page screenshot.

 

Language

Language will be retrieved from the "lang" html tag: 

<html lang="en" />

 

Was this article helpful?
0 out of 0 found this helpful

Have more questions?

SUBMIT A REQUEST

Hai altre domande?

INOLTRA UNA RICHIESTA

Comments