Audiences and Content Filtering

Defines the core concepts of Audiences and Content Filtering.

Roles and Audiences Overview

Once we use an external IDP to authenticate a user, and we receive an assertion back from that service, we're now ready to put that user data to work in helping us filter content properly for different audiences.

The way in which we'll do this is by identifying users as part of one or more audiences. These audiences can then be used to filter content within Heretto CCMS and ultimately, within the Heretto Deploy Portal. We'll follow up by creating DITAVAL files which will then either include or exclude content on the Heretto Deploy Portal for various site sections, topics, or even elements of the website.

What is an Audience?

An audience is any group of viewers that you want to filter content for. Filtering content means that you want to show only some content for one or more audiences, or that you want to ensure that a specific audience cannot see some portion of the content.

Typically, audiences are defined as either one or more user-roles. For instance, your organization may have various roles, such as Doctor, Dentist, and so forth. You can make each one of those roles a separate audience, or you may combine them into a single audience where they are grouped together. For instance, you might group them both into an audience of "Health Providers".

What is Content Filtering?

Content filtering provides a method to show different content to different audiences. Content filtering is often referred to as personalization. If a user can view some content, they are authorized to view that content. In order to filter content, we must do a few things to prepare for this:

  1. Have a mechanism for authenticating users, unless all of the content is going to be public

    Note: We will use OpenID Connect to authenticate users with our IDP in this example.
  2. Categorize the content to specify what audiences it is or is not intended to be viewed by

  3. Determine how we will translate user role information that we get from the IDP into audience attributes.

    Tip: If you have direct control over the IDP, you can possibly skip this step if you ensure the payload for assertions follows a naming convention which will be described later in this document.
  4. Configure your sitemap so that it has rules that are clearly specified

To get started, it is important to understand that DITA has some powerful tools that enable you to define items like the targeted platforms or audiences that you intend the content to be filtered by. As well, as it relates to Heretto Deploy Portal, you can specify the audience attribute in as low or as high level specificity as you want. Consider the following snippets of the sitemap and the ditaval file that the sitemap refers to. We're going to use the "audience" attribute at the sitesection level to define content for which we are going to filter on. We also have to specify <data> elements in the ways that are detailed below so that proper ditavals are specified.

Warning: You should familiarize yourself with DITA conditional processing features before you attempt to filter your site's content. Conditional processing information at OASIS Open
 //sample_sitemap.ditamap
<sitesection audience="health_providers">
   <topicmeta>
       <navtitle>Authenticated Section</navtitle>
    </topicmeta>
    <mapref format="ditamap"                        href="../../Sample_Content/Policy_Manual/data_security_and_retention_policy.ditamap"/>
</sitesection>
...
<data href="filter/private.ditaval" name="content-api-audience" value="health_providers"/>
 //filter/private.ditaval
<val>
    <prop action="include" att="audience" val="health_providers"/>
</val>
Important: The convention for specifying audience ditavals is to create <data> elements in your sitemap that have the reserved name of content-api-audience. These data elements will only be picked up and interpreted properly if they comply with this convention. This data element then references the ditaval file in the href attribute.
Important: There is an additional <data> element that you can define, with the attribute of "default-content-api-audience". There can only be one of these. It is very typical to set this to an empty value, meaning that this is what visitors who are NOT authenticated will see when they visit the site. In the example below, this indicates that by default, all content will be included for general audiences, but that content for "health_providers" will be hidden.
<data href="filter/public_only_filter.ditaval" name="content-api-default-audience" value=""/>
<!-- filter/public_only_filter.ditaval -->
<val>
    <prop action="include" att="audience" backcolor="" color="" style="" val=""/>
    <prop action="exclude" att="audience" backcolor="" color="" style="" val="health_providers"/>
</val>

When we inspect the private.ditaval file, it becomes clear that if content (which is referred to within this sitemap) has an "audience" attribute that is set to "health_providers", that it will only be included for members of the "health_providers" audience.

Notice in the example sitesection below how the audience attribute is set to "health_providers". When we evaluate all of the pieces, we see that we have:

  1. One or more <data> elements are included that follow our conventions

  2. One or more ditaval files that correspond the <data> elements

  3. One or more audience attributes specified in the content

<sitesection audience="health_providers">
    <topicmeta>
        <navtitle>Authenticated Section</navtitle>
    </topicmeta>
    <mapref format="ditamap" href="../../Sample_Content/Policy_Manual/data_security_and_retention_policy.ditamap"/>
</sitesection>

Search Indexing

includes a very powerful search engine that actually indexes content separately for each audience grouping. For instance, it will index the public content and serve up those search results for unauthenticated users, and separately index the content that is available for "healthcare_providers", given the example that we've been using.

This is very powerful as once a user is logged into the system, they are only given search results which they can access. This avoids the awkward situation where users are not able to view content that they found in search results. This happens automatically and there is nothing further that you have to do to configure it to work, than to define the audiences properly per this document. If all of a site's content is public (i.e. there are no filtering rules), then all content in the sitemap will be available for all users.

Understanding JWT and users, roles, and audiences

JWT (pronounced "jot") is a standard for signing, encrypting, and encoding standard JSON objects. These objects are secure and are base-64-encoded. For instance, the encoded form of this JSON object:

{
  "sub": "1234567890",
  "name": "John Doe",
  "iat": 1516239022,
  "content_audiences": ["healthcare_providers"]
}

Will get encoded to look something like this:

eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyLCJjb250ZW50X2F1ZGllbmNlcyI6WyJoZWFsdGhjYXJlX3Byb3ZpZGVycyJdfQ.zNv5uUJcL_lwcl0Z5FOAewIYaW8K95flAG70mxPZ7uM

There are some standard types of data that are typically encoded into the payload of JWTs, some of these include:

  1. User Data (i.e. email address, username, job description)

  2. Roles (i.e. administrator, DBA, doctor)

  3. Audiences (i.e. "healthcare_providers")

There are some standard fields that Deploy Portal will pick up and use, if they are defined. For instance, "content_audiences", which should be a JSON list. If you have direct control over the payload fields that your IDP will deliver in its assertions, then the best bet is to go with the default field names, as no additional configuration will have to be defined, however, if you are unable or unwilling to change these names to Heretto's conventions, then there is a way (which will be detailed below) in which you can define the names of the fields to be used for each of the user, role, and audience purposes.

The full documentation for all available configurable fields for configuring an auth strategy are here: IAuthStrategy.json but we'll analyze a few of these fields for clarity:

First, let's examine the audience claim. It's documentation is as follows:

/**
     * @description JWT key name for accessing the proper field to use the content/audiences for the logged in user.
     * @default "content_audiences"
     * @example
     * If the incoming JWT is formed as:
     * {"exp": 2147483647, "content_audiences": ["private"]}
     * No additional configuration is required.
     *
     * If the incoming JWT is formed as:
     * {"exp": 2147483647, "https://jorsek.com/content/audiences": ["private"]}
     *
     * the authStrategy object would need the following parameter defined:
     * "authStrategy": {
     *   "audienceClaim": "https://jorsek.com/content/audiences"
     * }
     *
     */
    audienceClaim?: string;
    

What this indicates is that the default field name for audienceClaims is "content_audiences", and the type of that field is a JSON list (must include square brackets).

However, if you store audiences under a field named "audienceNames", which will be part of the JSON payload, then you just have to specify audienceClaim: "audienceNames" in your JSON configuration for this strategy. Then, Deploy Portal will convert all claims with name "audienceNames" into an internal claim name of "content_audiences". For instance, if part of the JWT claim for a user is:

{audienceNames: ["healthcare_providers"]} 

Then they will see the content that is provisioned only for "healthcare_providers", as this document has described.

The same set of rules hold true for these other fields, although they are not used directly for audience filtering:

/**
     * @description JWT key name for accessing the proper field to use the portal_role for the logged in user.
     * @default "portal_role"
     * @example
     * If the incoming JWT is formed as:
     * {"exp": 2147483647, "portal_role": "contributor"}
     * No additional configuration is required.
     *
     * If the incoming JWT is formed as:
     * {"exp": 2147483647, "https://jorsek.com/portal_role": "contributor"}
     *
     * the authStrategy object would need the following parameter defined:
     * "authStrategy": {
     *   "roleClaim": "https://jorsek.com/portal_role"
     * }
     */
    roleClaim?: string;
    /**
     * @description JWT key name for accessing the proper field to use the ezd_username for the logged in user.
     * @default email||username||sub||hd;
     * @example
     * If the incoming JWT is formed as:
     * {"exp": 2147483647, "email": "contributor"}
     * No additional configuration is required.
     *
     * If the incoming JWT is formed as:
     * {"exp": 2147483647, "https://jorsek.com/ezd_username": "user@contoso.com"}
     *
     * the authStrategy object would need the following parameter defined:
     * "authStrategy": {
     *   "userClaim": "https://jorsek.com/ezd_username"
     * }
     */
    userClaim?: string;
Important: The snippets of code above are part of a larger configuration object which is authStrategy. You'll need to configure this JSON object in your config.json file in order to configure your system properly for SSO authentication. Also, if you can control the names of these fields, you can use the default field names which have been specified above without having to configure them explicitly - otherwise, you will have to configure them explicitly.