Active Directory Maintenance and Disaster Recovery
This paper covers the number of IT issues, considered from the standpoint of a junior network administrator, working for a technology firm located in Silicon Valley, California. All technology decisions are heavily influenced by the network security considerations, as highly sensitive intellectual property travels across the company’s network. The assumption is that company operates the AD (Active Directory) infrastructure based upon the Windows Server 2008. Apart from security concerns, the design must ensure that even if the network goes down, certain administrators still have access to files necessary to perform their duties. At the same time, certain limitations must be imposed on the network drive space used to save files.
There are GPOs (Group Policy Objects) already in place, covering all main concerns mentioned above. However, additional GPOs, targeted to fulfill the security plans, should be configured and administered within the AD environment. Consequently, Group Policy tools will be used to create User and Computer environments, as well as to perform centralized software installations in the AD environment. Finally, there are measures planned for the implementation, maintenance, troubleshooting, and disaster recovery.
There are several hundred of network users that require new software to be installed on their PCs. It might take the considerable amount of time and could negatively influence the budget. The action plan that follows is based on Group Policy, which increases the efficiency of software installation and reduces associated costs by means of the System Development Life Cycle (SDLC) approach.
The SDLC, also known as the System Design Life Cycle, is essentially a model of the IT infrastructure development, which reduces both operational and financial risks by careful planning of key activities with their subsequent execution, control, and documentation. There are five phases of this model that cover the whole system’s cycle, from the system strategy and considerations of objective project limitations to the maintenance and support activities.
The first phase is concerned with the analysis of Business Requirement Definitions (BRD) and business strategy. Once those are clarified, the situation with the legacy systems also has to be examined in order for the new implementation to provide the systems’ continuity, supporting the legacy interfaces and architecture peculiarities. Additionally, users’ requests for the new system are taken into account. Thus, the first SDLC phase covers the overall system’s strategy, assessing the information needs, developing a strategic system plan, and creating an action plan (Hall, 2010).
With regard to the case of a few hundred new software installations, the strategy analysis phase is rather straightforward. The manual installation routine of handling all PCs one by one is quite out of question because of its time-consuming nature, high probability of mistakes, and general ineffectiveness. Thus, the strategy should demand that installation goes simultaneously on all users’ PCs, preferably during the non-business hours. That constitutes the main strategy requirement for the new Group Policy, which will initiate and handle the new software installation.
The second SDLC phase follows the first one, representing the actual project initiation. It is concerned with the design conceptualization; as well as with systems’ analysis, evaluation and selection based on the high-priority proposals made during the first phase. During this phase, the new software installation is considered with regard to the possible impact it might have upon the other systems. The continuing correspondence with corporate rules and policies is another important consideration in the process of system’s analysis. Finally, the new software licensing issues also must be evaluated during this phase. As a result, the Group Policy that is about to be created acquires the important set of requirements and limitations. The second SDLC phase ends with the list of Computer objects that will be influenced by the new Group Policy, the exact specification of the software to be installed and the timeframe allocated for the design and implementation.
The actual system’s design takes place during the third phase of SDLC. Usually, there are a number of development activities based upon all detailed features, business rules, and process diagrams, incorporating them into the new system. This phase is not overly complicated with the case of a new software deployment. The Group Policy is created, listing all network objects that will be influenced by the upcoming activity. The installation procedure is defined, specifying either the image of already installed software to be distributed to all PCs or the source file(s), from which all the individual installations will be carried out. Of course, the preference would be the first scenario, as it significantly simplifies the procedure. However, such an approach requires the exclusive homogeneity of the network, same or similar PC’s configurations and roles. Thus, the decision depends upon the exact IT infrastructure and other preferences with regard to the business needs.
In the SDLC model, the actual system’s development is followed by the integration and testing phase. During this phase, the new system is connected to the rest of IT environment it is supposed to interact with, and the set of test scenarios is executed. It is a good practice to deploy the new system inside the so-called test environment, which also contains the live replicas of all other systems, populated with the real data. The test cases should reflect all aspects of the actual system’s use, covering the variety of scenarios. There is a particular importance of testing all the exception-handling routines as they often cause the system’s disruption. Ideally, the new system’s testing in the test environment should be followed by the UAT (User Acceptance Test), and then by the Pilot running on the live system.
The case of new software installations is again a relatively simple one, so the testing phase can be restricted to the modeling of installation process on one or two PCs. As the test PCs are switched on and logged into the AD, the new Group Policy identifies those PCs as eligible for the new software installation. The image or source files are then uploaded to PCs into the temporary system folder, and the actual installation process then has to be triggered based upon the desired schedule. The installation may proceed in the background or be scheduled for the non-business hours, depending on the policy. It would be a good idea to imitate some possible interruptions that real users might cause during the installation process to ensure the new Group Policy’s reliability.
After the thorough testing, the new system can be deployed in the Production environment and put to the actual use. This opens the fifth SDLC phase, which is concerned with the maintenance and support of the system. This phase contains all changes, add-ons, relocations, and upgrades that happen to the system. Obviously, in order for the project itself to be effective, this phase should be the longest one and cover all associated business spending.
The new software installation project does not require any specific maintenance routines once the deployment is performed. Usual support and troubleshooting routines will suffice for the maintenance of the new software as well as for the rest of Active Directory.
There are approximately 50 administrators affected by some GPO, which restricts their access to the software tool required for them to do the job. In addition, this software tool has to be upgraded and its hardware requirements have increased significantly. The software can be immediately installed on 35 administrators’ computers as they meet the software specifications. Other PCs need the hardware upgrade before this tool is installed.
This scenario addresses the measures required for all administrators to start using the software tool immediately. Additionally, the software upgrade for 35 PCs that meet the software requirements has to be performed at once.
In order to solve the issue, there are few approaches possible. The first one would be to go to each administrator’s profile and explicitly enable the use of the specific software tool. However, this approach is not consistent with the variety of GPO management tools available for the AD administrator. Moreover, it is time-consuming and prone to errors due to its dubious nature.
Another approach might be to disable the problematic GPO at all, providing the administrators with immediate and simple solution. However, the initial prerequisite demands that the network security is an essential part of the company’s policy. Disabling the GPO without proper evaluation of consequences and potential impact on other network objects with regard to the network security might eliminate the immediate benefit.
There is a compromise possible in order to solve the situation. The assumption is that all administrators belong to the specific group, which is appropriate for their needs. The “Domain Admins” group could be the case. As the misconfigured GPO prevents the access for the whole group, it would be relatively safe to exclude this group from the GPO for the time being. After the proper examination, the corrected Group Policy will be applied to the Domain Admins group again, ensuring the access to the software tool. Meanwhile, the group members will have the means to do their jobs, but potentially will lack some other resources provided by the GPO in question.
In order to exclude the Domain Admins group from the GPO, the Group Policy Management Console (GPMC) should be launched. It can be done via the Server Manager application on the domain controller, where GPMC resides in the Features folder. All GPOs can be accessed by expanding the Forest folder of the GPMC (Morimoto et. al, 2008). After the selection of GPO that has to be changed, it offers the possibility to exclude certain User Groups, as shown in the Diagram 1.
Now that the Domain Admins group is excluded from under the influence of GPO, the software tool is immediately available for the use of administrators.
The upgrade of the software is similar to the procedure described in the previous scenario. Firstly, all 35 PCs eligible for the upgrade, based on criteria specified by the software specs, should be identified. Then those PCs must be subjected to the new GPO that is responsible for upgrade.
In order to create a new GPO, the Group Policy Object Editor application should be used instead of GPMC. The sequence “Computer configuration > Policies > Software settings > Software installation” will lead to the submenu that will allow users to create a new software package. The whole source path to the network drive, containing the installation package, should be selected, and, in case no advanced settings are necessary, by-default parameters for the installation procedure must be applied. Those actions will result in a screen similar to the Diagram 2.
There are many options for the actual application of newly created GPO to exact PCs that have to be upgraded. In some cases, it might be even preferable to create a separate OU (Organization Unit), containing just those computers, and apply the Group Policy to the OU (Tomsho, 2009). In any case, as those 35 PCs will reboot next time, the new GPO will enforce the software installation. Then there will be a time for a physical configuration upgrade of the remaining PCs.
In order to prevent the crucial data loss, the disaster recovery plan has to be developed and documented. In addition, the number of maintenance routines and monitoring procedures help to proactively manage the network prior to the problems’ occurrences. This scenario addresses such routines in detail, as well as procedures that are necessary to restore the AD with certain domain configuration.
The most comprehensive solution for both proactive and reactive AD monitoring and disaster prevention is the System Center Monitoring Pack for Active Directory from Microsoft (ADMP). With this tool, all events related to the Application, System, and Service error logs can be monitored and traced back to various Active Directory components and subsystems. Should any critical issue with the possible impact to the overall AD health arise, the immediate alert to the system administrator will be generated.
The ADMP monitors the state of the whole domain infrastructure both from the domain controllers’ perspective and with regard to clients that use various Active Directory resources. If the AD infrastructure is relatively simple and follows the generic “good practice” configuration schema, there is a predefined set of monitoring scripts and processing rules incorporated in the ADMP to monitor the availability and performance of domain controllers. However, even if domain controllers are operating correctly, there are still possible issues with the connectivity or other service problems that client computers in AD environment might experience. In order to identify and eliminate such issues, AMDP includes the Active Directory Client Management Pack. This tool invokes mechanisms often used by the client machines, for instance, the Lightweight Directory Access Protocol (LDAP) small transactions and pings.
Both monitoring packs are supported on any Windows Server platform starting from the 2003 release, on both virtualized and clustered servers’ environments, on domain controllers and client domain member computers. Those packs will be the first and essential choice for the whole AD infrastructure monitoring. The monitoring procedure will imply the 24/7 operation of both Server and Client packs on at least two domain controllers and one AD client PC. At least two system administrators must examine all reporting logs daily, so that the timely prevention actions could be planned. All alerts generated by the monitoring system must be categorized by the severity of the arising issues and treated accordingly.
Another maintenance routine, which is absolutely necessary to follow, is the regular system backups. Of course, many built-in features allow the AD to correct itself after minor malfunctions. Domain controllers regularly replicate each other, sharing all data that may influence the Active Directory operability. However, any type of disk subsystem’s hardware failure or human error of the system administrator may lead to the whole system crash. In that case, the only effective measure is to run the recovery procedure from the most recent backup available.
Usually, backup procedures take place during the non-business hours, invoked by the scheduler. Backups are stored on disks or tapes for the period defined by the backup policy. For the purpose of Silicon Valley Company’s scenario, backups are performed every night at 2 AM and could be re-written on the same disks after the one-week’s time. As company operates the highly sensitive data, backup copies must be kept in a different location rather than in the business premises. In case with the geographically distributed sites and sufficient network bandwidth between them, it might be useful to keep backup copies in the remote location as well.
Obviously, the Microsoft utility, called Windows Server Backup, will be used to create and store copies of the AD infrastructure. However, it`s not the best practice to employ features of the same system that has to be protected from the data loss. Some third-party solution will be preferable, especially as not only Active Directory environment needs protection.
There are many vendors offering backup solutions. One of the most acknowledged ones is Acronis, which provides the number of excellent products for the purposes of backup, recovery, and data loss prevention in the heterogeneous networks. The choice for the Silicon Valley Company would be the Acronis Backup & Recovery 11.5 Advanced Platform. It offers the unified solution for both backup/recovery and data migration. The last option is not very obvious with regard to the AD purposes; however, as the tool serves not only the domain infrastructure but also main business applications, it is useful for database handling. It is also worth noting that the recovery procedure is possible for the storage or server type different from that the initial backup was performed from.
It would be practical to consider two examples of the recovery routine. The first one is concerned with the failure of the single domain controller; the other one addresses the loss of an AD object containing users’ accounts for the Domain Users Group.
As domain controller fails, it is the responsibility of the system administrator to determine whether it is possible to restore the system on the same hardware or not. For instance, if the disk subsystem is not detectable by the server’s hardware, it is necessary to use the spare server. In either case, the server needs to boot from the Acronis boot disk and the most recent backup image of the system must be fed to the recovery application. The server will reboot upon the completion of the recovery and the AD operability will be restored.
The recovery procedure of the lost AD object will employ the backup copy obtained from the Windows Server Backup and it is not as easy as with the Acronis. Firstly, the domain controller must boot in the DSRM mode (Directory Service Restore). Then special scripts, using the Reanimate Tombstone API interface, should be written and applied, which requires specific skills and knowledge. Eventually, the Domain Users Group will be recovered (Richards et. al, 2008).
Finally, there is a third maintenance and monitoring routine that is used on any network and within any IT infrastructure. The core of the network comprises the number of routers and switches that are prone to occasional breakdowns. The impact such accidents cause to the AD and other parts of the IT environment can be disastrous. Thus, the 24/7 monitoring of the network devices is essential to ensure the dataflow protection.
There are many commercial network-monitoring tools, such as HP Open View or Cisco Works. However, open source solutions under the GPL license are often preferable, especially with regard to the company’s budget. Most of modern network devices support the old but highly useful and reliable syslog protocol and able to send all console messages by means of that protocol to the monitoring station. The station should run the syslogd application (or, historically, daemon) in order to receive those messages. Such solutions usually involve the piece of parsing software that generates different alarm`s levels of severity in case of network problems. The IT environment that comprises all described measures for the data loss prevention and business continuity can operate the AD in the most effective and secure way.