Server Management Service Level Agreement Revised 3/2001 Scope This document is intended to provide internal Johns Hopkins client with information regarding the service and support capabilities of Desktop Computing Services (DCS), as it relates to the management of servers in the Johns Hopkins networking environment. It provides recommendations and requirements for clients considering the service. Capabilities DCS manages more than 217 servers for the Johns Hopkins Institutions. These servers provide a wide range of services, including groupware, clinical, research, financial and administrative applications, backup, gateway and web services, and well as file and print services. DCS current staff of 55 includes many MCSE and CNE’s, as well as several Computer Scientists with advanced degrees. The group is managed by the Director of Desktop Computing Services, 2 Managers of Desktop Computing Services, and 7 LAN/WAN Architects. Service Offering Desktop Computing Services offers a comprehensive server management solution for internal clients of the Johns Hopkins Institutions for a standard rate of $300 per server per month, plus hardware and related expenses. Services beyond the scope of this document are offered on an ongoing rate that is determined by an analysis of the complexity and resources required by each project. Please note that it is common for projects support costs to exceed server management fees, which are intended to provide operational support for departmental servers, not project management efforts. A project plan and meeting is a useful for determining the overall support costs for any given project. Server management includes the following services: - Hardware standards & recommendations
Installation of server hardware into the appropriate physical environment - Installation of network operating system and current service packs
- Configuration of operating system according to industry and institutional best practices for the application being supported
- Adherence to institutional standards for installation and support of operating system
- Inclusion into the proper management entity (eDirectory, NT domain, AD tree)
- User account creation, deletion and modification
- Updates with recommended patches
- Notification of license purchase requirements and Johns Hopkins agreements for reduced rates
- Backup and restore of server data, including databases, configuration data and file systems
- 24x7x365 monitoring of appropriate server services, with interruption notification to support staff
- 24x7x365 human response to critical server outages resulting in a priority 1 problem, as defined by the JHMCIS Support Center.
- 2 hour on-site response time for server outages
- 24 hour maximum downtime for server failures
- Adherence to JHMCIS change control policies
- Disk space utilization planning, data use policies
How to Obtain Services In order to obtain server management services, contact the either the Director, or one of the Managers of Desktop Computing Services for Johns Hopkins at (410) 614-1544. Requests for services should be planned as far in advance as possible. Johns Hopkins has a variety of standards, preferred technologies and specific technology planning efforts that may have an effect on the types of software and hardware platforms selected for any project. Desktop Computing Services has LAN/WAN Architects are able to provide clients with direction on platform and technology support in this regard. Clients should appoint a contact for DCS to work with throughout the life of the server management contract. Service & Management DCS assigns a primary systems administrator for each server it manages. This person will be provided backup during vacations and other absence. During non-business hours, (between 5pm and 8am) several on-call pagers are carried by paid staff members for response to priority one incidents. Response to a priority one problem is required within 10 minutes. Priority one problems are defined by the JHMCIS Support Center, with input from the person reporting the problem. Any calls for problems related to servers managed by DCS should be reported to the Support Center by calling 5-HELP at the medical campus, or 6-HELP at the Homewood campus. Problem synopses are sent by alpha page to the appropriate support personnel. All NTS and JHMCIS management receive notification of priority one pages or escalations by alpha page. Hardware Standards & Recommendations Existing servers that do not meet standards for supportability may not be eligible for support, or may be supported on an interim basis, with a reduced service level. Prior to purchasing servers, it is recommended that a client representative work with DCS staff to determine the specific hardware appropriate for a project. New servers should be Compaq brand, with a minimum of hardware level RAID 1 fault tolerance, preferably hardware level RAID 5. All other specifications are dependent on the application required. DCS supports Windows NT, Windows 2000 and Netware. Rack space, AC power, uninterruptible power supply (UPS), tape backup and network connectivity hardware may be required for a specific project, depending on the availability of these items in existing facilities. A factor for server infrastructure cost will be developed for server management projects beginning in fiscal year 2002. Compaq provides a 3 year parts and labor warranty. Clients who choose to continue use of servers beyond the warranty period are not eligible for the 24-hour downtime protection outlined below, unless extended warranties are purchased. Server models that are no longer supported by the vendor will not be covered under this agreement. Physical Environment Servers managed by DCS will generally be deployed in one of 2 main data centers. These centers provide a secure environment with card key access to authorized personnel, as well as air handling, power conditioning and centralized UPS. The primary data center is located in the basement of the 1830 East Monument Street building on the East Baltimore campus. A second data center is located in the basement of Garland Hall on the Homewood Campus. Servers are located in standard 19” racks, and are connected to the network by switched Ethernet network cards connected to Cisco switches at 100mbps. Air handling, fire suppression and conditioned AC power is provided to the facilities. Future expansion will provide for centralized UPS power, as well as a generator. Power outages, while rare, have occurred on a scheduled basis once every two years. Unscheduled power outages have occurred 4 times in the past 5 years. Installation & Configuration of Server OS DCS uses a standard set of installation practices for each operating system it supports. These adhere to common industry best practices and are designed for supportability and high levels of security. Data and systems partitions are kept separate from one another and systems monitoring parameters are increased from the default levels. The service pack level is generally current with the manufacturers latest release, with a 30-90 day lag time for internal testing and acceptance. Server software installation, including web server and database components, as well as data loading, is dependent on the arrival of all parts and mounting hardware, availability of network connections and availability of testing for backup and restore functions. DCS uses Veritas’ Backup Exec product for tape backup. Backup agents will be installed on each server, in addition to any application specific backup agents. In addition, the installation of Compaq Insight Manager and Norton Anti-Virus agents is a standard part of the configuration. DCS follows and enforces all institutional policies regarding the confidentiality of data, network and service standards for servers under its management. Only authorized technical personnel and identified user communities receive access to server. Remote access to a server by vendors or non-DCS systems administrators can be provided under a strict set of guidelines and communications protocols. Currently, DCS supports the regulated use of PCAnywhere over an internet connection, as well as internally managed monitoring tools. Domain & Tree Issues DCS uses directory based management structures, including NT domains, an Active Directory, and an eDirectory tree. These structures provide various levels of security and access for server resources and provide a common pool of user ID’s that can be incorporated onto access control lists (ACL’s) for various server services and file systems. Clients requiring root level access to servers, or domain administrator rights to these types of structures are generally not granted those access privileges and are encouraged to seek an alternative server support model. It is more common that database administrators will be granted high-level rights to SQL type databases, but may not receive the ability to start and stop database services directly. Account Management DCS provides all user account management for servers and other protected resources such as workstations and GroupWise email. Maintenance of the central pool of user ID’s and the information provided in this pool is the responsibility of DCS. Users are provided access to resources by completing a security form obtainable from JHMCIS Security and having it approved by departmental management. Clients are encouraged to ensure that the change in status of persons with access to servers is communicated to JHMCIS security as soon as is practicable. DCS will remove or change access privileges to these systems based on changes in status. All persons needing access to servers managed by DCS will be required to submit a confidentiality & systems access form, which indicates acceptance of all JHH and JHU policies regarding the use of Johns Hopkins owned computer resources. Patches and Upgrades From time to time, vendors of network operating systems offer upgrades and service packs to the products they sell. DCS keeps track of these changes and develops internal standards for the implementation of these changes. Clients are required to maintain up to date licenses through one of the centrally managed site licenses for Novell and Microsoft to ensure that upgrades can be installed once accepted, tested and approved. In cases where a specific update is needed to address issues with a specific client’s server, DCS will install this product after discussions with the client, and the appropriate vendor representatives. It is uncommon for DCS to install updates immediately after they are released, unless there is a specific or particularly grave security threat that can be prevented by update installation. DCS attempts to stay within the upgrade path provided by software vendors and may require testing and implementation of software upgrades for which the client may receive no specific or individual benefit. This is done to ensure that the operating system revisions is supported by the vendor or to ensure that all servers are at the same revision level to keep management costs in check. Licenses The client is responsible for the purchase of all network operating system licenses and upgrades. DCS and Johns Hopkins have negotiated large-scale subscription based license agreements with Novell and Microsoft to provide a single yearly cost for nearly all products. Clients will generally pay for costs on a yearly basis and this should be factored in to the cost of each project. Changes to license agreements will be communicated to the client community at least 6 months prior to the change in fiscal year, unless a unilateral vendor change reduces this lead time. Management Fees DCS manages individual client servers for $300 per server per month as of July 1, 2001. Changes to this fee will be communicated at least 6 months prior to the fiscal year change. This does not cover the cost of hardware maintenance contracts, backup tapes, and software. Backup & Restore Services Institutional best practices are currently being developed with regard to full disaster recovery procedures for the data center facilities. In the event of a server failure, DCS maintains tape backups for each server it manages. Tape storage is designed for server failure, not individual file restores. These are generally accommodated on a best effort basis. Tape backup provides for a full backup each week, generally on Friday evenings. Differential backups (data that has changed since the last full backup) are made Monday, Tuesday, Wednesday and Thursday evenings. Server performance will generally be lower during these times. Specialized backups can be requested by working with an assigned systems administrator. A full set of tapes is kept for 3 weeks. For weeks 4 to 8, the tapes containing full weekly backups are kept. After that, tapes are recycled, then discarded after several uses. DCS does not, without prior specific agreement, keep archival records of data stored on tape. The tape rotation allows staff to perform restores for up to 8 weeks from a backup. In the event of a server of database failure, the server will be restored to the most recent successful backup. It is rare for backups to be skipped, but problems with server communications and hardware can prevent a backup from occurring. In that event, data up to 48 hours old will be restored. Clients requiring digital archival systems will need to make additional arrangements for the use of tape, optical or other archival storage and retrieval systems. Clients requiring a higher degree of backup and availability fault tolerance are encouraged to work with DCS to implement server clustering and other high availability options when designing a server implementation. Availability & Monitoring DCS uses monitoring systems to provide ongoing information about the status of each server and the critical services it provides. Changes in status, either up or down, are communicated via system console warnings, as well as via alpha page, within 5 minutes of an event. These status changes are acted upon by the appropriate technical staff to ensure prompt response to outages. Currently, DCS uses two products – WhatsUp Gold and Compaq Insight Manager XE to provide detailed server status information and providing warnings of impending hardware failures. System, security, application and application logs are checked regularly to ensure that systems are not being compromised by unauthorized people and that software faults are being recorded and acted upon. Disk space management is an ongoing service that affects performance, availability, tape costs and backup times. Clients are kept informed by the systems administrator as disk usage changes to plan for upgrades and to ensure that space is being used appropriately. Response Time Server availability is monitored 24 hours a day, 7 days per week, 365 days per year. Response to server related problems is within 2 hours of an outage. If a server becomes unavailable, restoration of the core functionality supplied by the server will take place within 24 hours of the outage, by repair, restore or substitution of hardware. Subsequent outages may be required to restore full performance and functionality with replacement parts and servers. Clients requiring a higher degree of system availability are encouraged to work with DCS to develop a server implementation that addresses their needs through the use of high availability and clustering technologies. Change Control Practices Servers managed by DCS are categorized as being in one of 2 operating modes – test or production. Test servers may be unavailable for any period of time, with no guarantee of data integrity. Response times may be extended beyond the service parameters listed above. Production servers are covered under the service parameters listed above. Any access to production servers is regulated by a change control policy that will be control how servers and resident applications are modified. Change control policy covers, but is not limited to, the following types of activities: - Turning the server off and on, or restarting the main server service.
- Starting and stopping any resident service.
- Installing any server based software
- Changing access rights to any file system or application
- Modifying web server parameters
- Installing modifications to application software or databases
- Changing network device configurations between the server and the end users
A change control includes the nature of the change, the potential impact of change, the type of change (emergency, short, normal), a list of approvals, contingency for change failure and backout procedures. In addition, changes are accompanied by email notifications or phone calls to the potentially affected users. Changes are approved each Wednesday morning at 9am and the change initiator must be present in order for the change to be approved. Emergency changes may be approved with short notice when the system has partially failed and needs service, or when some type of critical failure is imminent. Installation of additional software, or configuration changes resulting from an emergency change will be preceded by a full backup of the system. Emergency changes are approved by a director and the help desk manager. Normal changes require 10 days of lead time and are discussed at the weekly change control meeting. These require manager and client approval. |