High Availability

There is no built-in support for high-availability within Microsoft Identity Lifecycle Manager (ILM). The product is not compatible with the native clustering technologies available on the Microsoft Server platform, to which the product is tied. Neither does the product offer facilities for the parallel operation of multiple redundant instances of the ILM service.

However, highly-available solutions are relatively easy to build with ILM because of the architecture of the product.

By design, Microsoft ILM stores nearly all of its state directly within its Microsoft SQL Server application database. This state includes the objects and attributes within the Metaverse and in each Connector Space (CS); the configuration of each Management Agent, and of ILM itself; and the credentials required to connect to each connected data source and target.

The only state not stored within the application database is a small set of encryption keys used to protect the credentials described above. These encryption keys are relatively static: changing only when new data sources and targets are introduced.

In contrast with Microsoft ILM, high-availability approaches are available for Microsoft SQL Server. The service is supported by (Microsoft and third-party) clustering solutions, and it is easy to distribute redundant instances of the SQL Server service using mirroring or replication technologies. High-availability solutions for SQL Server are well documented on the Microsoft site and associated partner sites and will not be further expanded here.

A common two-step approach to providing high-availability for Microsoft ILM is to provide a highly available SQL Server as the application database platform, and then to introduce multiple warm-standby ILM servers.

At any time, only one of the ILM servers is active, and the others are dormant. At configuration time, the set of encryption keys is transferred from the active ILM server to the dormant servers: this is a manual operation performed by an administrator.

If the Microsoft ILM server should fail, one of the standby servers (either at the same site or a remote site) can be activated. This failover process can be automated, and can happen within a matter of seconds.

No data will be lost as a result of the failover. The reasons for this are a little involved but are based on the way in which ILM caches all data from both data sources and targets in its own application database; commits each individual record operation immediately; and performs an effective reconciliation of all records within a target.

The diagram above shows a typical architecture of a highly-available identity synchronisation solution for a large enterprise. Here, the application database is protected by local clustering and replication to a secondary data-centre. Each data centre hosts two ILM servers. Only one of the four ILM servers is active at any one time. When one ILM server fails, the second ILM server in the same data centre takes over. Should there be a catasrophic failure of the SQL cluster, or the entire data centre, the first ILM instance in the second data centre will take over.

This solution separates the Microsoft ILM service from the SQL Server service (on which it is dependent) onto different machines. Even on a fast fibre Local Area Network (LAN) this has a performance impact on ILM: demonstrated to be up to 15%.

To squeeze the maximum performance out of ILM it is necessary to co-locate the SQL Server application database on the same server. A high-availability approach which maximises ILM performance is illustrated above. Here, each ILM server hosts its own local SQL Server application database, and SQL Server replication is used to keep the databases synchronised. The fail-over process is unchanged.

One last approach to high-availability relies on the way in which ILM performs an effective reconciliation of all records within a target. An example will help to clarify this. Imagine ILM is employed to move telephone numbers from a source directory to a Human Resources (HR) database, and that users are free to edit their telephone numbers both in the directory and the HR system. If a user changes their telephone number only in the directory, ILM will dilligently copy the number across to the HR system, overwriting any prevailing number. If, on the other hand, the user updates their number in both the directory and the HR system, ILM will recognise that the synchronisation of the number has already been achieved, and will perform no further action.

In certain circumstances, and with careful design of synchronisation rules, it is possible to stand-up multiple ILM instances, all implementing the synchronisation rules and actively reading from and writing to the same data sources and targets. In effect, the individual instances of the ILM service cooperate in the maintenance of the targets: the first ILM service to make a change wins, and the others recognise that the change has been made and that they need to perform no further action.

In practice though, this approach has many disadvantages: it places many restrictions on the implementation of synchronisation rules; it makes bi-directional synchronisation difficult; it pollutes ILM logs making them difficult to interpret; it significantly increases the load on all directly connected data sources and targets; and it can increase latency.

However, there is an elegant approach to high-availability which makes use of ILM reconciliation, and suffers from only one of these disadvantages.

Again, each ILM server hosts its own local SQL Server application database, but no replication is used to keep the databases synchronised. Instead, a number of standby ILM servers operate in a read-only mode, importing data from both data sources and data targets, and implementing all synchronisation rules.

If the Microsoft ILM server should fail, one of the standby servers (at the primary or secondary site) is made fully active (both reading and writing). Since the standby server has been effectively shadowing the operation of the active server to that point, it will contain the same state, and be able to take over immediately.

This last high-availability solution has the advantage that encryption keys do not need to be copied from server to server (since each server has a distinct application database and can maintain its own encryption keys). However, it must be noted that this solution still places a higher load on both data sources and data targets (since both active and standby ILM servers will need to import data from them).



ITegrity LtdDesigned and hosted by ITegrity Ltd
Managing technology for small, agile businesses