Backup and Restore

There are two ways to back up and restore CRX repository content: 

  • You can create an external backup of the repository and store it in a safe location. If the repository breaks down, you can restore it to the previous state.
  • You can create internal versions of the repository content. These versions are stored in the repository along with the content, so you can quickly restore nodes and trees you have changed or deleted.

Online Backup

This backup method creates a backup of the entire repository, including CQ5 or other applications deployed into it. This method lets you create and later restore the entire repository and applications running on it, including content, version history, configuration, software, hotfixes, custom applications, log files, search indexes, and so on.

If using clustering, and if the shared folder is a subdirectory of crx-quickstart (either physically, or using a softlink), the shared directory is also backed up.

This method works as a hot or online backup, so you can perform this backup while the repository is running. The repository is usable while the backup is running. This method works for the default, TarPM-based CRX instances.

When creating a backup, you have the following options:

  • Creating a backup of the repository (compressed as a zip file).
  • Backing up to a directory without creating a zip file.
  • Backing up with a time delay to lessen the impact of the backup on system performance.

Database

The online backup only backs up the file system. If you store the repository content and/or the repository files in a database, that database needs to backed up separately.

Creating a Backup

Online repository backup lets you create, download, and remove backup files. It is a "hot" or "online" backup feature, and therefore can be executed while the repository is being used normally in the read-write mode.

Backup files are saved in the zip compression format by default. They are usually saved in the parent folder of the folder where the quickstart .jar is running. You can change where CRX saves backup files.

You can also back up the repository to a directory (without creating a zip file). You can back up the repository with a time delay, so that repository performance is not affected.

Note

To create a backup:

1. Log in as the administrator. Click Manage Repository and then Repository Configuration.

  1. Log in as the administrator. Click Manage Repository and then Repository Configuration.

    file
  2. Optionally, change the source directory for the backup, the target directory, and the resulting backup file name by clicking Options. (See Backing up to a Directory and Backing up with a Time Delay.)

    file
  3. Click Add to create a new backup. The progress bar indicates the current state of the backup process. To cancel the backup process, click Cancel.

    file
    The completed backups are listed on the page. Backup files that are no longer needed can be removed by clicking Remove.

Backing up to a Directory

CRX 2.1 supports backing up the repository to a directory (without creating a zip file).

To back up the repository without creating a zip file:

  1. In CRX, select Repository Configuration and click Repository Backup (Or type <host>:<port>/crx/config/backup.jsp.)

  2. Click Options, set the Target directory, and use an empty Target File Name

    file

    Note

    The target directory may not be a parent of the install directory. When starting the backup, an empty file named backupInProgress.txt is created in the target directory. This file is deleted when the backup is finished. See also Backing up to a non-default Target Directory.

    Note

    With this you can achieve:

    • Full Backup
      Backup to an empty target directory.
    • Incremental Backup
      Backup to an existing target directory.
      The target directory contents are overwritten, except for files that match exactly. For files that match, the last modified time in the target directory and the archive bit are not changed. Existing files that do not match are overwritten, and files that do not exist in the repository are deleted from the target directory.

  3. Click Add to start the backup.

    After the backup process is finished, CRX will not write to the target directory. After the backup is finished, you can access the target directory using any kind of backup tool (full backup or incremental backup). The CRX Explorer will not list backups to a directory in the Repository Backup list.

Backing up with a Time Delay

By default, the CRX backup runs at full speed. CRX 2.1 supports slowing down creating an online backup, so that creating the backup does not slow down other tasks.

A delay of 1 millisecond typically results in 10% CPU usage, and a delay of 10 milliseconds usually results in less than 3% CPU usage.

The total delay in seconds can be estimated as follows: Repository size in MB * delay in milliseconds / 2 (if the zip option is used) or / 4 (when backing up to a directory). That means a backup to a directory of a 200 MB repository with 1 ms delay increases the backup time by about 50 seconds.

To back up with a time delay:

  1. In CRX, select Repository Configuration and click Repository Backup (Or type <host>:<port>/crx/config/backup.jsp.)

  2. Click Options, set the Target directory and Target file name, and enter a Delay in milliseconds.

    file
  3. Click Add to start the backup.

Backing Up the Data Store Separately

If the file data store has been configured outside the main repository, it is not included in the backup. This will reduce the size of the online backup and the size of the backup zip file. However, the data store needs to be backed up as well. Because files in the file data store directory are immutable, they can be backed up incrementally (potentially using rsync) or after running the online backup.

Note

Do not run the data store backup and garbage collection concurrently.

Automating Backup Creation

If possible, the online backup should be run when there is little load on the system, for example in the morning. By default the Tar PM optimization runs between 2 am and 5 am, which also slows down the system, that means a good time to run the online backup is 5 am.

Backups can be automated using the wget or curl HTTP clients. The following is an example of how to automate backup by using curl:

Caution

In the following example curl commands various parameters might need to be configured for your instance; for example, the hostname (localhost), port (7402), admin password (xyz) and file name (backup.zip).

  1. Login to crx:

    curl -c login.txt "http://localhost:7402/crx/login.jsp?UserId=admin&Password=xyz&Workspace=crx.default"
    
  2. Create a new backup:

    curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=backup.zip"

    The curl command returns when the backup is completed on the server.

    The backup file is created on the server in the parent folder of the folder containing the crx-quickstart folder (the same as if you were creating the backup using the browser). For example, if you have a directory named /day/crx-quickstart/, then the backup is created in the root directory.

  3. Download an existing backup:

    curl -b login.txt -f -o target.zip "http://localhost:7402/crx/config/backupDownload.jsp?action=download&backup=/Users/abc/backup.zip"

    You can omit this step if the backup file is retrieved directly from the file system; for example, using file copy.

  4. Remove the login cookie:

    rm login.txt

    Remove the progress file:

    rm progress.txt

Backing up from a non-default Source Directory

The backup methods detailed above assume that a default CRX/CQ installation under crx-quickstart is used (i.e. using quickstart, or standalone).

When using a different configuration (e.g. a third party application server), the source directory for the backup might be different and therefore not match.

The CRX Console allows you to specify a Source Directory. When using curl define a different source directory using the installDir parameter, as this explicitly defines the backup source directory in the URL.

For example, to create a new backup from /CRX/CRX2_1/repository (the curl command returns when the backup is completed on the server):

curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=backup.zip&installDir=/CRX/CRX2_1/repository"

Backing up to a non-default Target Directory

By default the backup tool creates the backup zip file in the directory that is three levels above the directory:
    repository

For example:

  • in the tree structure:
        /CRX/CRX2_1/crx-quickstart/repository
  • the backup file:
        backup.zip
  • will be generated in:
        /CRX

If you have not used the standard installation, then this might not be the parent directory of the backup source directory. In such a case, make sure that CRX has write access to the target folder.

The CRX Console allows you to specify a Target Directory for storing the backup. To specify the target directory when using curl, specify the parameter targetDir.

For example, to generate backup.zip in the directory /backup/crx:

curl -b login.txt -f -o progress.txt "http://localhost:7402/crx/config/backup.jsp?action=add&zipFileName=backup.zip&targetDir=/backup/crx"

Note

targetDir defines an path, which will be created if it does not already exist. Use of absolute paths is recommended.

Caution

When using a different application server (such as JBoss), the online backup may not work as expected, because the target directory is not writable. In this case, please contact Day support.

Backing up a Shared Directory

If a shared directory should be included in the backup (usually yes, as it contains the data store), then the shared directory needs to be a subdirectory of the backup source directory (installDir). This is the case in the default installation. If this is not the case in your installation, then one solution is to create a soft link to the shared directory from within the backup source directory.

Restoring the Backup

Note

Performing an online backup resets the execution bit for shell scripts. After restoring an online backup, the execution bit needs to be set manually.

To restore the backup from a backup file:

  1. Unzip the file using the Java jar command, for example:

    jar -xvf backup-20091130-2121.zip
  2. After unpacking the backup jar file, the ready-to-use repository instance is available. Start the repository. The repository is in the state it was in when the backup was created.

    Note

    • Large ZIP (> 4GB) files created with the backup mechanism may fail to open with some ZIP tools. In that case, use the Java jar tool as documented.
    • Using this method, you can restore only the entire repository. If you need to restore a single node or tree, you have to restore the entire backup in a separate location, and then copy the node or tree over to your current repository.

    Note

    On Unix systems, the "x"-bit of the following scripts are not preserved by the zip file:

    • server/start
    • server/stop
    • server/serverctl

    You have to adjust these manually after restoring the backup.

Backing up a Large Repository

As your repository grows the size can start to impact the length of time required to make a backup. This in turn can impact either downtime or performance of the application.

There are various options available for you to consider when making backups of a large repository:

Snapshot Backup

A snapshot backup involves taking a read-only copy of a storage device at a given moment. They are designed to be instantaneous, or as close as possible.

To make a snapshot backup of your application (for example, CRX or CQ) you need to:

  • stop the application
  • make a snapshot backup
  • start the application

As the snapshot backup usually takes only a few seconds, the entire downtime is less than a few minutes; and if you are running a cluster, you only need to stop and backup one cluster node so there is no downtime.

Online Backup to a Target Directory

Since CRX 2.1, the online backup can write to a target directory instead of a zip file. With this you can achieve both full and incremental backups.

Once the online backup is finished, you can backup this target directory using either a snapshot or a classical incremental backup tool.

Online Backup with Time Delay

CRX 2.1 also allows you to configure a time delay when performing online backups to minimize the impact the backup has on the performance of CRX (or CQ).

Separate Data Store

Storing your data store outside the repository allows it to be backed up separately. It also allows you to take incremental backups of the data store which will be quicker than a full backup (though sporadic full backups are recommended to provide a base point should a restore ever be required).

The datastore directory can be backed up at runtime, after the repository; datastore garbage
collection must not be run until the backup is finished.

The Mechanics of Online Backup

Online Backup is comprised of a series of internal actions to ensure the integrity of the data being backed up and the backup file(s) being created. These are listed below for those interested:

The online backup uses the following algorithm:

  1. When creating a zip file, the first step is to create a temporary directory. This directory starts with backup. and ends with .temp.
  2. All files are copied from the source directory to the target directory (or temporary directory when creating a zip file). 
    • The progress bar indicator of this sub-process is between 0% - 70% when creating a zip file, or 0% - 100% if no zip file is created.
  3. If no zip file is being created, then the specially named file backupInProgress.txt is created in the target directory (this marker file is deleted when the backup is complete).
  4. If no zip file is being created, then old files in the target directory are deleted. Old files are files that do not exist in the source directory.
  5. The files are copied to the target directory in four stages.
    1. In the first copy stage (progress indicator 0% - 63% when creating a zip file or 0% - 90% if no zip file is created), all files are copied concurrently while the repository is running normally.
    2. In the second copy stage (progress indicator 63% - 66.5% when creating a zip file or 90% - 95% if no zip file is created) only files that were created or modified in the source directory since the first copy stage was started are copied. Depending on the activity of the repository, this might range from no files at all, up to a significant number of files (because the first file copy stage usually takes a lot of time).
    3. In the third copy stage (progress indicator 66.5% - 68.6% when creating a zip file or 95% - 98% if no zip file is created) only files that were created or modified in the source directory since the second copy stage was started are copied. Depending on the activity of the repository, there might be no files to copyl, or a very small number of files (because the second file copy stage is usually fast).
    4. File copy stages one to three are all done concurrently while the repository is running. The fourth and last file copy stage will first lock repository write operations (write operations are paused; they do not throw an exception, but will wait). Only files that were created or modified in the source directory since the third copy stage was started are copied. Depending on the activity of the repository, there might be no files to copy, or a very, very small number of files (because the second file copy stage usually is very fast). After that, repository access continues. Progress indicator 68.6% - 70% when creating a zip file or 98% - 100% if no zip file is created.
  6. If a zip file is created, this is created now from the temporary directory. Progress indicator 70% - 100%. The temporary directory is then deleted.

Package Backup

To back up and restore content, you can use one of the following:

For details on the features and tradeoffs of each of these individual content package formats, see Importing and Exporting Content in the User Guide.

Scope of Backup

When you back up nodes using either the Package Manager or the Content Zipper, CRX saves the following information: 

  • The CRX repository content below the tree you have selected.
  • The Node type definitions that are used for the content you back up.
  • The Namespace definitions that are used for the content you back up.

When backing up, CRX loses the following information: 

  • The version history.

Creating a Backup using the Content Zipper

To create the backup using the Content Zipper:

  1. Lock the top node of the tree you want to back up or a parent node of that node.
  2. In the Content Zipper, type the path of the tree you want to back up. For a format, click CRX package.
  3. Click Submit Query. Your Web browser now offers the package file as a download. Save the download on your computer.
  4. Unlock the node again.

The file you have downloaded contains the current version of the tree you have exported, including the node type and namespace definitions, but without the version history.

Note

The CRX package file is Day’s proprietary file format for CRX node information. It is optimized for a small file size and optimal performance. If you prefer a standard XML file for further processing, click XML sys view in step 2. If you use the file only for archiving, use the CRX package format. Importing and Exporting Content describes the various file formats and their uses.

Restoring a Backup using the Content Loader

To restore the backup using the Content Loader:

  1. Lock the node you want to restore. You can still modify the node and the nodes below it, but others cannot.
  2. In the Content Loader, load the CRX package that you want to restore.
  3. Unlock the node again.

Note

You cannot restore the versioning history using the previous steps. CRX allows you to save the version history, but it does not currently support restoring it.