CSS Drop Down Menu

Wednesday, 11 December 2013

Replication in AEM




Replication in AEM


Introduction
Replication agents are central to Adobe Experience Manager (AEM) as the mechanism used to:
  • Publish (activate) content from an author to a publish environment.
  • Explicitly flush content from the Dispatcher cache.
  • Return user input (for example, form input) from the publish environment to the author environment (under control of the author environment).

Replicating from Author to Publish


1-The author requests that certain content be published (activated); this can be initiated by a manual request, or by automatic triggers which have been preconfigured.
2-The request is passed to the appropriate default replication agent; an environment can have several default agents which will always be selected for such actions.
3-The replication agent "packages" the content and places it in the replication queue.
4-In the Websites tab the colored status indicator is set for the individual pages.
5-The content is lifted from the queue and transported to the publish environment using the configured protocol; usually this is HTTP
6-A servlet in the publish environment receives the request and publishes the received content; the default servlet is http://localhost:4503/bin/receive.
7-Multiple author and publish environments can be configured.
Note:-

Determining Page Publication Status

The colors next to pages in the Websites console indicate publication status.
Color
Description 
Green
Publication was successful. Content is published.
Yellow
Publication is pending. Confirmation of publication has not yet been received by the system.
Red
Publication failed. There is no connection with the publish instance. This can also mean that the content was deactivated.
blank
This page has never been published.



Replicating from Publish to Author

Features such as comments and forms, allow users to enter information on a publish instance. For this a type of replication is needed to return this information to the author environment, from where it is redistributed to other publish environments. However, due to security considerations, any traffic from the publish to the author environment must be strictly controlled.
This is known as reverse replication and functions using an agent in the publish environment which references the author environment. This agent places the input into an outbox. This outbox is matched with replication listeners in the author environment. The listeners poll the outboxes to collect any input made and then distribute it as necessary. This ensures that the author environment controls all traffic.

Replication (Author to Publish)


1-       Navigate to the support page on the author environment.
    http://localhost:4502/content/geometrixx/en/support.html
2-      Edit the page to add some new text.
3-      Activate Page to publish the changes.
4-      Open the support page on the publish environment:
    http://localhost:4503/content/geometrixx/en/support.html
5-      You can now see the changes that you entered on author.

This replication is actioned from the author environment by the:

Reverse Replication (Publish to Author)


Reverse Replication (outbox)
This agent stores reverse replicated content in the outbox (repo://var/replication/outbox), which acts as a queue.
Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:

http://localhost:4502/etc/replication/agents.publish/outbox.html
This agent transfers content to the author environment, by communicating with the:
Reverse Replication Agent (publish_reverse)
This agent polls the default publish instance to retrieve reverse replicated content from the outbox.
Details of this (configuration and logs) can be accessed from the Tools console of the author
environment; or:
http://localhost:4502/etc/replication/agents.author/publish_reverse.html

Replication Agents - Configuration Parameters

When configuring a replication agent from the Tools console, four tabs are available within the dialog:
1:-Settings
2:-Transport
3:-Proxy
4:-Extended
5:-Triggers



Settings:-

Name
Description
Enabled:
Indicates whether the replication agent is currently enabled.
When the agent is enabled the queue will be shown as:
  • Active when items are being processed.
  • Idle when the queue is empty.
  • Blocked when items are in the queue, but cannot be processed; for example, when the receiving queue is disabled.
Serialization Type:
The type of serialization:
  • Default: Set if the agent is to be automatically selected.
  • Dispatcher Flush: Select this if the agent is to be used for flushing the dispatcher cache.
Retry Delay:
The delay (waiting time in milliseconds) between two retries, should a problem be encountered.
Default: 60000
Agent User Id:
Log Level
Specifies the level of detail to be used for log messages.
  • Error: only errors will be logged
  • Info: errors, warnings and other informational messages will be logged
  • Debug: a high level of detail will be used in the messages, primarily for debug purposes
Default: Info
Use for reverse replication

2:-Transport

URI
This specifies the receiving servlet at the target location. In particular, you can specify the hostname (or alias) and context path to the target instance here.
For example:
  • A Default Agent may replicate to http://localhost:4503/bin/receive
  • A Dispatcher Flush agent may replicate to http://localhost:8000/dispatcher/invalidate.cache
The protocol specified here (HTTP or HTTPS) will determine the transport method.
For Dispatcher Flush agents, the URI property is used only if you use path-based virtualhost entries to differentiate between farms, you use this field to target the  farm to invalidate. For example, farm #1 has a virtual host of www.mysite.com/path1/* and farm #2 has a virtual host of www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target the first farm and /path2/invalidate.cache to target the second farm
NTLM Domain(NT LAN manager Domain)
NTLM Host

3:-Proxy


4:-Extended

·  Interface
Here you can define the socket interface to bind to.
This sets the local address to be used when creating connections. If this is not set, the default address will be used. This is useful for specifying the interface to use on multi-homed or clustered systems.
·  HTTP Method
The HTTP method to be used.
For a Dispatcher Flush agent this is nearly always GET and should not be changed (POST would be another possible value).
HTTP Headers
These are used for Dispatcher Flush agents and specify elements that must be flushed.
For a Dispatcher Flush agent the three standard entries should not need changing:
  • CQ-Action:{action}
  • CQ-Handle:{path}
  • CQ-Path:{path}
These are used, as appropriate, to indicate the action to be used when flushing the handle or path. The sub-parameters are dynamic:
  • {action} indicates a replication action
  • {path} indicates a path
They are substituted by the path/action relevant to the request and therefore do not need to be "hardcoded":


Note:-
If you have installed AEM in a context other than the recommended default context, then you will need to register the context in the HTTP Headers. For example:
    CQ-Handle:/<yourContext>{path}
·  Close Connection
Enable to close the connection after each request.
·  Connect Timeout
Timeout (in milliseconds) to be applied when trying to establish a connection.
·  Socket Timeout
Timeout (in milliseconds) to be applied when waiting for traffic after a connection has been established.
·  Protocol Version
Version of the protocol; for example 1.0 for HTTP/1.0.

5 :Triggers:-

These settings are used to define triggers for automated replication:

Ignore default

If checked, the agent is excluded from default replication; this means it will not be used if a content author issues a replication action.

On Modification

Here a replication by this agent will be automatically triggered when a page is modified. This is mainly used for Dispatcher Flush agents, but also for reverse replication.

On Distribute

If checked, the agent will automatically replicate any content that is marked for distribution when it is modified.

On-/Offtime reached

This will trigger automatic replication (to activate or deactivate a page as appropriate) when the ontimes or offtimes defined for a page occur. This is primarily used for Dispatcher Flush agents.

On Receive

If checked, the agent will chain replicate whenever receiving replication events.

No Status Update

When checked the agent will not force a replication status update.

No Versioning

When checked the agent will not force versioning of activated pages.








Configuring your Replication Agents


Configuring your Replication Agents from the Author Environment


From the Tools tab in the author environment you can configure replication agents that reside in either the author environment (Agents on author) or the publish environment (Agents on publish). The following procedures illustrate the configuration of an agent for the author environment, but can be used for both.
When a dispatcher handles HTTP requests for author or publish instances, the HTTP request from the replication agent must include the PATH header. In addition to the following procedure, you must add the PATH header to the dispatcher list of client headers. (See /clientheaders (Client Headers).)

Configuring Reverse Replication

Reverse replication is used to get user content generated on a publish instance back to an author instance. This is commonly used for moderated forums, blogs, surveys and registration forms, amongst others.
For security reasons, most network topologies do not allow connections from the "Demilitarized Zone" (a subnetwork that exposes the external services to an untrusted network such as the Internet).
As the publish environment is usually in the DMZ, to get content back to the author environment the connection must be initiated from the author instance. This is done with:
  • an outbox in the publish environment where the content is placed.
  • an agent (publish) in the author environment which periodically polls the outbox for new content.

To do this you need:

A reverse replication agent in the author environment
This acts as the active component to collect information from the outbox in the publish environment:
If you want to use reverse replication then ensure that this agent is activated.






A reverse replication agent in the publish environment (an outbox)
This is the passive element as it acts as an "outbox". User input is placed here, from where it is collected by the agent in the author environment.


 

Configuring Replication for Multiple Publish Instances


Click Edit - the Agent Settings dialog will open - the Serialization Type is already defined as Default, this must remain so.
  • In the Settings tab:
    • Activate Enabled.
    • Enter a Description.
    • Set the Retry Delay to 60000.
    • Leave the Serialization Type as Default.
  • In the Transport tab:
    • Enter the required URI for the new publish instance; for example,
          http://localhost:4504/bin/receive.
    • Enter the site-specific user account used for replication.
    • You can configure other parameters as required



Configuring a Dispatcher Flush agent


Default agents are included with the installation. However, certain configuration is still needed and the same applies if you are defining a new agent:
Click Edit - the Agent Settings dialog will open:
  • In the Settings tab:
    • Activate Enabled.
    • Enter a Description.
    • Leave the Serialization Type as Dispatcher Flush, or set it as such if creating a new agent.
  • In the Transport tab:
    • Enter the required URI for the new publish instance; for example,
          http://localhost:80/dispatcher/invalidate.cache.
    • Enter the site-specific user account used for replication.
    • You can configure other parameters as required.
For Dispatcher Flush agents, the URI property is used only if you use path-based virtualhost entries to differentiate between farms, you use this field to target the  farm to invalidate. For example, farm #1 has a virtual host of www.mysite.com/path1/* and farm #2 has a virtual host of www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target the first farm and /path2/invalidate.cache to target the second farm.
Note:-
If you have installed AEM in a context other than the recommended default context, then you need to configure the HTTP Headers in the Extended tab.
·  Click OK to save the changes.
·  Return to the Tools tab, from here you can Activate the Dispatcher Flush agent (Agents on publish).
The Dispatcher Flush replication agent is not active on author. You can access the same page in the publish environment by using the equivalent URI; for example, http://localhost:4503/etc/replication/agents.publish/flush.html.

Controlling Access  To Replication Agents

Access to the pages used to configure the replication agents can be controlled by using user and/or group page permissions on the etc/replication node.
Setting such permissions will not affect users replicating content (e.g. from the Websites console or sidekick option). The replication framework does not use the "user session" of the current user to access replication agents when replicating pages.

Configuring your Replication Agents from CRXDE Lite


Various parameters of your replication agents can be configured using CRXDE Lite.
If you navigate to /etc/replication you can see the following three nodes:
  • agents.author
  • agents.publish
  • treeactivation
The two agents hold configuration information about the appropriate environment, and are only active when that environment is running. For example, agents.publish will only be used in the publish environment. The following screenshot shows the publish agent in the author environment, as included with AEM WCM:



Note:-     Dispatcher is Adobe Experience Manager's caching and/or load balancing tool. Using AEM's Dispatcher also helps to protect your AEM server from attack. Therefore, you can increase the security of your AEM instance by using the Dispatcher in conjunction with an enterprise-class web server. 




0 comments:

Post a Comment