Tuesday, November 14, 2017

Workflow Purging for AEM Performance optimization


Workflow Purging:
Every time a new workflow instance gets created when we launch a workflow (either of asset upload, asset publishing, etc.)
• Once the workflow completes (successful or aborted or terminated), it’s archived and never gets deleted.
• Workflow purging needs to be done to clean up archived workflow instances.
• Purging can be done based on 3 categories of workflows
• Workflow model
• Completion status
• Age of the workflow instance
• To manually purge workflows, execute the operation purgeCompleted on the mbean com.adobe.granite.workflow.
Use “Adobe granite workflow purge configuration” at OSGI configuration to configure automatic workflow purging.

In Older versions of Adobe CQ5/AEM to purge old completed workflows, we need to either write a custom job or need to install a package provided by Adobe/Day Care. But in AEM 5.6.1 onwards we have this functionality built-in.

Workflow instances are stored as nodes inside AEM/CQ.  For long running instances where website users or automated jobs can run workflows, this will quickly add the large amounts of content to repository. 
This leads to slowness of AEM server and also grows disk space.

Ø  ENABLING WORKFLOW PURGE SCHEDULER
It’s not pre-configured feature in AEM. We need to enable AEM to automatically purge workflows.
We can create two configurations of the service to purge workflow instances that satisfy different criteria’s.

ü  First configuration:  purges instances of a particular workflow model which are running for longer duration than expected.
ü  Second configuration: purges all completed workflows after a certain number of days to minimize the size of repository

·         Create a Workflow Purge Scheduler instance.  The Workflow Purge Scheduler is a Sling Service Factory, configuration added to service PID:  com.adobe.granite.workflow.purge.Scheduler

·         As service is a factory service, the name of the sling:OsgiConfig node requires identifier suffix like:   com.adobe.granite.workflow.purge.Scheduler-myidentifier


Add below properties on the node:

Property Name
OSGi Property Name
Description
Job Name
scheduledpurge.name
A descriptive name for the scheduled purge.
Workflow Status
scheduledpurge.workflowStatus
The status of the workflow instances to purge.
The following values are valid:
·                     COMPLETED: Completed workflow instances are purged.
·                     RUNNING: Running workflow instances are purged.
Models To Purge
scheduledpurge.modelIds
The ID of the workflow models to purge. The ID is
the path to the model node, for example /etc/workflow/models/dam/update_asset/
jcr:content/model. Specify no value to purge
instances of all workflow models.
To specify multiple models, click the + button in
the Web Console. 

Workflow Age
scheduledpurge.daysold
The age of the workflow instances to purge, 
in days


Once we deploy this file as a content package, we should see configuration show up under the Workflow Purge Scheduler in the OSGi console like below & it will purge workflows which are older than the specified number of days mentioned in configuration.





Ø  Configure Workflow Purge Scheduler in a Package
Deploying a configuration as a part of package deployment process is also one of best approach.  Through the Apache Sling's OSGi Configurations, we can do this with below simple XML file.
·         Create a XML file under a path Eg:  /apps/[my-app]/config
·         Set name of the file to: com.adobe.granite.workflow.purge.Scheduler.config.[some-arbitrary-id].xml

Add below content to XML file and replace values with respective configuration details:
<?xml version="1.0" encoding="UTF-8"?>

<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0"  xmlns:jcr=http://www.jcp.org/jcr/1.0
jcr:primaryType="sling:OsgiConfig"

scheduledpurge.name="Purge All Completed Workflows"
scheduledpurge.modelIds="[]"

scheduledpurge.workflowStatus="COMPLETED"

scheduledpurge.cron="0 0 * * * ?"

scheduledpurge.daysold="30" />

Once we click save, the scheduler schedules CRON schedule and will purge workflows older than the specified number of days within configuration we specified.

We can also use JMX console to purge repository of completed workflows.
If list of archived workflows grows too large, purge those that are of certain age limit this speeds up page loading.

Use below link for workflow purge scheduler in AEM6.0, we can use AEM built-in feature accessible via JMX and configurable Scheduler http://hostName:portNo/system/console/jmx/com.adobe.granite.workflow%3Atype%3DMaintenance
By mentioning No.of Days of workflow to purge & few more PARAM’s.

For More Info:



TAR Compaction for AEM Performance optimization


TARMK Compaction

We require this when we see growth in TarMk files in the repository. As we all know that tar file data is never overwritten and they always do append the data. So the disk usage increases even when only updating existing data. To avoid such increase in repository,

AEM provides this Tar Compaction. This mechanism will reclaim disk space by removing obsolete data from the repository.

We have two kinds of Tar Compactions.
     Online Compaction (AEM 6.2 Doesn’t recommend it anymore)
     Offline Compaction

Even before jumping on to these details, let us discussion a bit on automatic compaction triggered manually.

Revision Clean Up: 

               The automatic compaction can be triggered manually in the Operations Dashboard via a maintenance job called Revision Clean Up.

To Start Revision Clean Up you need to:

     Go to AEM Welcome Screen.rm
     In the main AEM window, go to Tools → Operations → Dashboard → Maintenance
     Or directly browse below.
     You see the screen like below.         
     Now click on Daily Maintenance Window
     Hover over the Revision Clean Up window and press the start button like below.
         Click on the “Run” icon as seen on the above screenshot.
     The icon will turn orange to indicate Revision Clean Up job is running.

You can stop it at any time by hovering the mouse over the icon and pressing the Stop button.

  
Invoking Revision Garbage Collection via the JMX Console
     Open the JMX console as
     Click RevisionGarbageCollection MBean
     In the next window, click startRevisionGC() and then Invoke to start the Revision Garbage Collection job.

TAR OFFLINE COMPACTION:

NOTE: Never do offline compaction when the repository is up and running. Not even checkpoints should be done while the server is up and running.

The procedure is called offline compaction because the repository needs to be shut down in order to properly run the Oak-run tool.

For faster compaction of the Tar files and situations where normal garbage collection does not work, Adobe provides a manual Tar compaction tool called Oak-run. It can be downloaded at the following location: http://mvnrepository.com/artifact/org.apache.jackrabbit/oak-run/

Note: We use Oak-run jar depending upon the version of our current Oak in the AEM server. So make sure we have the right version of oak-run jar.

The procedure to run the tool is:
     Always make sure we have a recent working backup of the AEM instance
     Shutdown AEM
     Use the tool to find old checkpoints:

java -jar oak-run.jar checkpoints <aem-inst-folder>/crx-quickstart/repository/segmentstore


     When you run the above command you should be able to see the references of the nodes like below which we need to clear them before compaction takes place.
      

     Now delete the unreferenced checkpoints using below command.
            java -jar oak-run.jar checkpoints <aem-inst-folder>/crx-quickstart/repository/segmentstore rm-unreferenced


     Finally run the compaction and wait for it to complete
java -jar oak-run.jar compact <aem-inst-folder>/crx-quickstart/repository/segmentstore

     You should be see something like below when you run the above command.
           
     It should display all files under segmentstore directory. After a while you should be able to see Cleaning up message like below.
     Once it cleans up you should be able to see the below message.

     It will display the tarfiles again with less count after compaction.

We can create a log file to track all these changes in the log file using the below configurations in the server where we are running the compaction. 

This is one time activity and we need to use the command below to log the information/errors into the log file.

     Create the below config file the same place where we have oak-run-*.jar file
     Name it as “logback-compaction.xml” and add the below in the lines in xml file.
           
<configuration>
  <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
    <target>System.err</target>
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>
  <logger name="org.apache.jackrabbit.oak.plugins.segment.Compactor" level="INFO"/>
  <root level="warn">
    <appender-ref ref="STDERR" />
  </root>
</configuration>


     Now run the below command to log the error/info into the log file.
nohup java -Dtar.memoryMapped=true -Dupdate.limit=5000000 -Dcompaction-progress-log=1500000 -Dcompress-interval=10000000 -Doffline-compaction=true -Dlogback.configurationFile=logback-compaction.xml -Xmx10g -jar <oak jar file path> compact <aem-installation-path> > tarcompaction.log 2>&1

     Now should be able to see the file name “tarcompaction.log” the place where you created the above xml config.

TAR OFFLINE COMPACTION PREREQUISTICS:

·                   How to find correct version of oak-run jar?
                    Go to felix console (/system/console/bundles) –> and search for oak, take version across each oak bundle.

      Find Check Points
      ----------------------
·                      java -jar /folderPath/aem/oak-run.jar checkpoints F:/AEM/AEM-4502/crx-quickstart/repository/segmentstore

    Run below command on linux/unix machine as windows does not support -Dtar option
·                 java -Dtar.memoryMapped=true -Xmx8g -jar /folderPath/aem/oak-run-jar/oak-run-1.2.7.jar checkpoints /folderPath/aem/crx-quickstart/repository/segmentstore

         Remove CheckPoints:
--------------------------
·                  java -Dtar.memoryMapped=true -Xmx4g -jar /folderPath/aem/oak-run-jar/oak-run-1.2.7.jar checkpoints /folderPath/aem/crx-quickstart/repository/segmentstore rm-unreferenced

Finally Run Compact:
-------------------------
·               java -Dtar.memoryMapped=true -Xmx8g -jar /folderPath/aem/oak-run-jar/oak-run-1.2.7.jar compact /folderPath/aem/crx-quickstart/repository/segmentstore >> /folderPath/aem/help/logs/compactLog


Running Script File:

·     Go to respective script file in folder path.
·     Running below command will execute all the above 3 commands (i.e. finding checkpoints, deleting checkpoints, compacting) are incorporated with in that script file.

Command:
                ./scriptFileName


ONLINE COMPACTION:

                   For situations where the AEM cannot be shut down for maintenance, compaction can also be performed while the instance is running. This is called Online Compaction.

You can configure Online Compaction by doing the following:
     Go to the folder where AEM is installed, then browse to crx-quickstart\install
     Open the org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService.config file.
     If that doesn’t exist create one and add the following line to the configuration file
                        repository.home=${repository.home}/segmentstore
            tarmk.size=256
            pauseCompaction=false
     Restart AEM

To verify the configuration has taken place, check here.
  Go to the JMX console by pointing your browser to
  Search for CompactionStrategy and click the MBean that shows up in the search.
  Next, verify that the value for PausedCompaction is set to false. This confirms that online compaction is set to run.
  Next, verify if Online Compaction is running properly. You can do this by first going to the Operations Dashboard and checking what is the time interval configured for the Daily Maintenance Window. By default, it is scheduled to run between 2 and 5 AM.
  Now, inspect the error.log file for events logged during the time of the daily maintenance window to see if online compaction ran correctly.
  Example Log:
   [TarMK compaction thread [/author/crx-quickstart/repository/segmentstore], active since Thu Mar 19 02:00:10 EDT 2015, previous max duration 1369831ms] org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK compaction started
19.03.2015 02:00:30.441 *INFO* [pool-9-thread-2] com.adobe.granite.taskmanagement.impl.jcr.TaskArchiveService archiving tasks at: 'Thu Mar 19 02:00:30 EDT 2015'
19.03.2015 02:01:01.699 *INFO* [TarMK compaction thread [/author/crx-quickstart/repository/segmentstore], active since Thu Mar 19 02:00:10 EDT 2015, previous max duration 1369831ms] org.apache.jackrabbit.oak.plugins.segment.file.FileStore Estimated compaction in 51.47 s, gain is 69% (1018859520/3343598080) or (1.0 GB/3.3 GB), so running compaction

     Log to make sure online compaction is completed.
           
                [TarMK compaction thread [/author/crx-quickstart/repository/segmentstore], active since Thu Mar 19 02:00:10 EDT 2015, previous max duration 1369831ms] org.apache.jackrabbit.oak.plugins.segment.file.FileStore TarMK compaction completed in 1310939ms


How to automate the offline compaction process.

Offline Tar Compaction is still the Adobe recommended way of compacting Oak.
Below is the script which automates entire process.
For above process download a version of Oak Run that matches your repository version.

Steps to follow:
  1. Shutdown AEM
  1. Find Old Checkpoints
  1. Remove Unreferenced Checkpoints
  1. Compact Oak (using compact keyword in command).
  1. Restart AEM
SCRIPT:
#!/bin/bash
todayDate="$(date +'%d-%m-%Y')"
logfile="compact-$ todayDate.log"
installfolder="/data/aem"
aemfolder="$installfolder/crx-quickstart"
oakrun="$installfolder/help/oak-run-1.0.18.jar"

## Shutdown AEM
printf "Shutting down AEM.\n"
$aemfolder/bin/stop
todayDate ="$(date)"
echo "AEM Shutdown at: $ todayDate " >> $installfolder/help/logs/$logfile

## Find old checkpoints
printf "Finding old checkpoints.\n"
java -Dtar.memoryMapped=true -Xms8g -Xmx8g -jar $oakrun checkpoints $aemfolder/repository/segmentstore >> $installfolder/help/logs/$logfile

## Delete unreferenced checkpoints
printf "Deleting unreferenced checkpoints.\n"
java -Dtar.memoryMapped=true -Xms8g -Xmx8g -jar $oakrun checkpoints $aemfolder/repository/segmentstore rm-unreferenced >> $installfolder/help/logs/$logfile

## Run compaction
printf "Running compaction. This may take a while.\n"
java -Dtar.memoryMapped=true -Xms8g -Xmx8g -jar $oakrun compact $aemfolder/repository/segmentstore >> $installfolder/help/logs/$logfile

## Report Completed
printf "Compaction complete. Please check the log at:\n"
printf "$installfolder/help/logs/$logfile\n"

## Start AEM back up
todayDate ="$(date)"
printf "Starting up AEM.\n"
$aemfolder/bin/start
echo "AEM Startup at: $ todayDate " >> $installfolder/help/logs/$logfile




Create Scaffolding in AEM


Create a scaffold, using the access Tools console http://localhost:4502/miscadmin and select Scaffolding in left tree; In the right pane select New, a dialog with single template Scaffolding Template appears. 
Enter title My Text Image. Click Create




1) Double click on My Text Image to open the scaffolding editor http://localhost:4502/cf#/etc/scaffolding/my-text-image.html. Click Sidekick -> Page tab -> Page properties and select the following

                       Target Template: The template used for page creation. In step 1 to create the page manually, we used a simple Basic Template.As we are generating pages of Basic Template select it
                       Target Path: The path where generated pages are stored




Here is the code for basic template page component jsp



2) After entering the details in page properties, a basic scaffold is created with Title to enter page title (stored as jcr:title) and Tags.




3) Every scaffolding has Dialog Editor to work on its form fields like Title above. Click on the Sidekick design mode (L shape at the bottom of sidekick) to access page having link for Dialog Editor. Click on the link to open Dialog Editor - http://localhost:4502/etc/scaffolding/my-text-image/_jcr_content/dialog.html



    Dialog in CRXDE Lite




4) Using the dialog editor we can add new properties for existing fields, but to add new fields a developer has to visit CRXDE; Access the created scaffold page in CRXDE http://localhost:4502/crx/de/index.jsp#/etc/scaffolding/my-text-image and add widget for Text field in the dialog - /etc/scaffolding/my-text-image/jcr:content/dialog/items/tab1/items/text ( you can copy an existing  richtext widget node available eg.  /etc/scaffolding/geometrixx/news/jcr:content/dialog/items/tab1/items/text)





The name of field is set to ./jcr:content/par/text/text. So any value entered for this field is stored in path /jcr:content/par/text/text relative to the newly created page

Similarly we need hidden fields for storing the value of sling:resourceType - foundation/components/text and textIsRich - true; so we create necessary hidden widgets




5) Similarly we need html5smartimage widget and hidden field for storing image specific values. As you can see the image specific values are stored under ./jcr:content/par/image relative to new page





The hidden field for storing Image Component resourceType




6) We have the scaffolding form now ready for creating pages http://localhost:4502/cf#/etc/scaffolding/my-text-image.html. Enter Title, Text, drag an Image from content finder, click Create and your page will be created.