Replication in AEM
Replication agents are central to
Adobe Experience Manager (AEM) as the mechanism used to:
- Publish (activate) content from an author to a publish environment.
- Explicitly flush content from the Dispatcher cache.
- Return user input (for example, form input) from the publish environment to the author environment (under control of the author environment).
Replicating from Author to Publish
1-The author requests that certain
content be published (activated); this can be initiated by a manual request, or
by automatic triggers which have been preconfigured.
2-The request is passed to the
appropriate default replication agent; an environment can have several default
agents which will always be selected for such actions.
3-The replication agent
"packages" the content and places it in the replication queue.
4-In the Websites tab the colored status indicator is set for the individual pages.
5-The content is lifted from the
queue and transported to the publish environment using the configured protocol;
usually this is HTTP
6-A servlet in the publish
environment receives the request and publishes the received content; the
default servlet is http://localhost:4503/bin/receive.
7-Multiple author and publish
environments can be configured.
Note:-
Determining Page Publication Status
The colors next to pages in the Websites console indicate publication status.
Description
|
|
Green
|
Publication was
successful. Content is published.
|
Yellow
|
Publication is
pending. Confirmation of publication has not yet been received by the system.
|
Red
|
Publication failed.
There is no connection with the publish instance. This can also mean that the
content was deactivated.
|
blank
|
This page has never
been published.
|
Replicating from Publish to Author
Features
such as comments and forms, allow users to enter information on a publish
instance. For this a type of replication is needed to return this information
to the author environment, from where it is redistributed to other publish
environments. However, due to security considerations, any traffic from the
publish to the author environment must be strictly controlled.
This is
known as reverse replication and
functions using an agent in the publish environment which references the author
environment. This agent places the input into an outbox. This outbox is matched with replication listeners in the author
environment. The listeners poll the outboxes to collect any input made and then
distribute it as necessary. This ensures that the author environment
controls all traffic.
Replication (Author to Publish)
1- Navigate to the support page on the author
environment.
http://localhost:4502/content/geometrixx/en/support.html
http://localhost:4502/content/geometrixx/en/support.html
2- Edit the page to add some new text.
3-
Activate Page
to publish the changes.
4- Open the support page on the publish
environment:
http://localhost:4503/content/geometrixx/en/support.html
http://localhost:4503/content/geometrixx/en/support.html
5- You can now see the changes that you
entered on author.
This
replication is actioned from the author environment by the:
- Default Agent (publish)
This agent replicates content to the default publish instance.
Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
http://localhost:4502/etc/replication/agents.author/publish.html.
Reverse Replication (Publish to Author)
Reverse
Replication (outbox)
This agent stores reverse replicated content in the outbox (repo://var/replication/outbox), which acts as a queue.
Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
http://localhost:4502/etc/replication/agents.publish/outbox.html
This agent stores reverse replicated content in the outbox (repo://var/replication/outbox), which acts as a queue.
Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
http://localhost:4502/etc/replication/agents.publish/outbox.html
This
agent transfers content to the author environment, by communicating with the:
Reverse Replication Agent (publish_reverse)
This agent polls the default publish instance to retrieve reverse
replicated content from the outbox.
Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
http://localhost:4502/etc/replication/agents.author/publish_reverse.html
Details of this (configuration and logs) can be accessed from the Tools console of the author environment; or:
http://localhost:4502/etc/replication/agents.author/publish_reverse.html
Replication Agents - Configuration Parameters
When configuring a
replication agent from the Tools console, four tabs are available within the
dialog:
1:-Settings
2:-Transport
3:-Proxy
4:-Extended
5:-Triggers
Settings:-
Name
Description
Enabled:
Indicates
whether the replication agent is currently enabled.
When the
agent is enabled the queue will be shown as:
- Active when items are being processed.
- Idle when the queue is empty.
- Blocked when items are in the queue, but cannot be processed; for example, when the receiving queue is disabled.
Serialization Type:
The type
of serialization:
- Default: Set if the agent is to be automatically selected.
- Dispatcher Flush: Select this if the agent is to be used for flushing the dispatcher cache.
Retry Delay:
The delay
(waiting time in milliseconds) between two retries, should a problem be
encountered.
Default:
60000
Agent User Id:
Log Level
Specifies
the level of detail to be used for log messages.
- Error: only errors will be logged
- Info: errors, warnings and other informational messages will be logged
- Debug: a high level of detail will be used in the messages, primarily for debug purposes
Default:
Info
Use for reverse
replication
2:-Transport
URI
This
specifies the receiving servlet at the target location. In particular, you can
specify the hostname (or alias) and context path to the target instance here.
For
example:
- A Default Agent may replicate to http://localhost:4503/bin/receive
- A Dispatcher Flush agent may replicate to http://localhost:8000/dispatcher/invalidate.cache
The
protocol specified here (HTTP or HTTPS) will determine the transport method.
For
Dispatcher Flush agents, the URI property is used only if you use path-based
virtualhost entries to differentiate between farms, you use this field to
target the farm to invalidate. For example, farm #1 has a virtual host
of www.mysite.com/path1/* and farm #2 has a virtual host of
www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to target
the first farm and /path2/invalidate.cache to target the second farm
NTLM Domain(NT LAN
manager Domain)
NTLM Host
3:-Proxy
4:-Extended
·
Interface
Here you
can define the socket interface to bind to.
This sets
the local address to be used when creating connections. If this is not set, the
default address will be used. This is useful for specifying the interface to use
on multi-homed or clustered systems.
·
HTTP Method
The HTTP
method to be used.
For a Dispatcher Flush agent this is nearly
always GET and should not be changed (POST would be another possible value).
HTTP
Headers
These are
used for Dispatcher Flush agents and specify
elements that must be flushed.
For a Dispatcher Flush agent the three standard
entries should not need changing:
- CQ-Action:{action}
- CQ-Handle:{path}
- CQ-Path:{path}
These are
used, as appropriate, to indicate the action to be used when flushing the
handle or path. The sub-parameters are dynamic:
- {action} indicates a replication action
- {path} indicates a path
They are
substituted by the path/action relevant to the request and therefore do not
need to be "hardcoded":
Note:-
If you have installed AEM in
a context other than the recommended default context, then you will need to
register the context in the HTTP Headers. For example:
CQ-Handle:/<yourContext>{path}
CQ-Handle:/<yourContext>{path}
·
Close Connection
Enable to
close the connection after each request.
·
Connect Timeout
Timeout
(in milliseconds) to be applied when trying to establish a connection.
·
Socket Timeout
Timeout
(in milliseconds) to be applied when waiting for traffic after a connection has
been established.
·
Protocol Version
Version
of the protocol; for example 1.0 for HTTP/1.0.
5 :Triggers:-
These settings are used to define triggers for automated replication:Ignore default
If checked, the agent is excluded from default
replication; this means it will not be used if a content author issues a
replication action.
On Modification
Here a replication by this agent will be
automatically triggered when a page is modified. This is mainly used for
Dispatcher Flush agents, but also for reverse replication.
On Distribute
If checked, the agent will automatically
replicate any content that is marked for distribution when it is modified.
On-/Offtime reached
This will trigger automatic replication (to
activate or deactivate a page as appropriate) when the ontimes or offtimes
defined for a page occur. This is primarily used for Dispatcher Flush agents.
On Receive
If checked, the agent will chain replicate
whenever receiving replication events.
No Status Update
When checked the agent will not force a
replication status update.
No Versioning
When checked the agent will not force versioning
of activated pages.
Configuring your Replication Agents
Configuring your Replication Agents from the Author Environment
From the
Tools tab in the author environment you can configure replication agents that
reside in either the author environment (Agents on author) or the
publish environment (Agents on publish). The following procedures
illustrate the configuration of an agent for the author environment, but can be
used for both.
When a
dispatcher handles HTTP requests for author or publish instances, the HTTP
request from the replication agent must include the PATH header. In addition to
the following procedure, you must add the PATH header to the dispatcher list of
client headers. (See /clientheaders (Client Headers).)
Configuring Reverse Replication
Reverse
replication is used to get user content generated on a publish instance back to
an author instance. This is commonly used for moderated forums, blogs, surveys
and registration forms, amongst others.
For
security reasons, most network topologies do not allow connections from
the "Demilitarized Zone" (a subnetwork that exposes the external
services to an untrusted network such as the Internet).
As the
publish environment is usually in the DMZ, to get content back to the author
environment the connection must be initiated from the author instance. This is
done with:
- an outbox in the publish environment where the content is placed.
- an agent (publish) in the author environment which periodically polls the outbox for new content.
To do this you need:
This acts as the active component
to collect information from the outbox in the publish environment:
If you want to use reverse
replication then ensure that this agent is activated.
A reverse
replication agent in the publish environment (an outbox)
This is
the passive element as it acts as an "outbox". User input is placed
here, from where it is collected by the agent in the author environment.
Configuring Replication for Multiple Publish Instances
Click Edit
- the Agent Settings dialog will open - the Serialization Type is
already defined as Default, this must remain so.
- In the Settings tab:
- Activate Enabled.
- Enter a Description.
- Set the Retry Delay to 60000.
- Leave the Serialization Type as Default.
- In the Transport tab:
- Enter the required URI for
the new publish instance; for example,
http://localhost:4504/bin/receive. - Enter the site-specific user account used for replication.
- You can configure other parameters as required
Configuring a Dispatcher Flush agent
Default
agents are included with the installation. However, certain configuration is
still needed and the same applies if you are defining a new agent:
Click Edit
- the Agent Settings dialog will open:
- In the Settings tab:
- Activate Enabled.
- Enter a Description.
- Leave the Serialization Type as Dispatcher Flush, or set it as such if creating a new agent.
- In the Transport tab:
- Enter the required URI for
the new publish instance; for example,
http://localhost:80/dispatcher/invalidate.cache. - Enter the site-specific user account used for replication.
- You can configure other parameters as required.
For
Dispatcher Flush agents, the URI property is used only if you use path-based
virtualhost entries to differentiate between farms, you use this field to
target the farm to invalidate. For example, farm #1 has a virtual
host of www.mysite.com/path1/* and farm #2 has a virtual host
of www.mysite.com/path2/*. You can use a URL of /path1/invalidate.cache to
target the first farm and /path2/invalidate.cache to target the second farm.
Note:-
If you have installed AEM in a context other than the
recommended default context, then you need to configure the HTTP
Headers in the Extended tab.
·
Click OK to save the changes.
·
Return to the Tools tab, from here you can Activate the Dispatcher
Flush agent (Agents on publish).
The Dispatcher Flush
replication agent is not active on author. You can access the same page in the
publish environment by using the equivalent URI; for example, http://localhost:4503/etc/replication/agents.publish/flush.html.
Controlling Access To Replication Agents
Access to the pages used to configure the replication agents
can be controlled by using user and/or group page permissions on the etc/replication node.
Setting such permissions will not affect users replicating
content (e.g. from the Websites console or sidekick option). The replication
framework does not use the "user session" of the current user to
access replication agents when replicating pages.
Configuring your Replication Agents from CRXDE Lite
Various
parameters of your replication agents can be configured using CRXDE Lite.
If you
navigate to /etc/replication you can see the following three nodes:
- agents.author
- agents.publish
- treeactivation
The two agents hold
configuration information about the appropriate environment, and are only
active when that environment is running. For example, agents.publish
will only be used in the publish environment. The following screenshot shows
the publish agent in the author environment, as included with AEM WCM:
Note:- Dispatcher is Adobe Experience Manager's caching
and/or load balancing tool. Using AEM's Dispatcher also helps to protect your
AEM server from attack. Therefore, you can increase the security of your AEM
instance by using the Dispatcher in conjunction with an enterprise-class web
server.




0 comments:
Post a Comment