DOC

Monitoring and Managing PowerEdge 1655MC High Performance

By Bertha Barnes,2014-04-19 08:00
9 views 0
Dell's PowerEdge 1655MC HPCC solution provides four methods of managing and monitoring the cluster: Dell OpenManage? IT Assistant, Ganglia, digital KVM and

    Monitoring and

    Managing PowerEdge

    1655MC High

    Performance Computing

    Clusters

    By Scalable Systems Group

    April 2003

    Introduction: Modular Computing in HPCC ..................................................................... 3 PowerEdge 1655MC Overview ............................................................................................ 4 Dell’s Management Solution for PowerEdge 1655MC HPC Clusters............................. 5 In-band Monitoring and Management ............................................................................... 7

    IT Assistant (ITA) ........................................................................................................... 7

    Ganglia ............................................................................................................................. 8 Out-of-Band Monitoring and Management ..................................................................... 12

    ERA/MC ......................................................................................................................... 12

    Digital KVM .................................................................................................................. 14 Conclusions .......................................................................................................................... 16

    References ...................................................................................................................... 16

Figures

    Figure 1: PowerEdge 1655MC Chassis - Front View ........................................................................... 4 Figure 2: PowerEdge 1655MC Chassis - Rear View ............................................................................ 4 Figure 3: 66-blade PowerEdge 1655MC HPC Cluster Configuration ................................................ 6 Figure 5: At-a-Glance View of Ganglia ................................................................................................ 9 Figure 6: Information about One Node .............................................................................................. 10 Figure 7: Web Based ERA/MC Console.............................................................................................. 13 Figure 8: OSCAR screen on 2161DS ................................................................................................... 15 Figure 9: ERA/MC and KVM Controller Card ................................................................................... 15

April 2003 Page 2 Dell Enterprise Product Group

    Introduction: Modular Computing in HPCC

Modular computing solutions target environments in which the servers are

    consolidated into one physical location, which is most commonly the case with

    clusters. Some elements the power supply, the cabling, and the systems

    management do not need to be replicated for every server, and can be shared

    among the modular pieces.

    The Dell? PowerEdge? 1655MC is the first product in Dell’s Modular

    Computing or ‚blade server‛ product line. Blade server architecture introduces

    several self-contained servers, known as blades, within a server chassis. Each

    blade has its own processor(s), memory, I/O subsystem, a set of hard drives, an

    operating system, and other basic components. The chassis provides redundant

    infrastructure components, such as power supplies, fans, and switches. The

    concept of modular computing has the potential to increase server density,

    improve manageability, lower power consumption, and enhance deployment

    and serviceability, all resulting in lower TCO (Total Cost of Ownership).

    Furthermore, the PowerEdge 1655MC modular design adds the following

    advantages compared to integrated servers, which make it an ideal element for

    constituting a high performance computing cluster:

    ? Low heat production

    ? Low power consumption

    ? Lower space requirements (0.5U/server)

    ? Easy deployment and simplified cable management

    ? Ease of service and replacement

    ? Ease of adding computing resources

    April 2003 Page 3 Dell Enterprise Product Group

    PowerEdge 1655MC Overview

    The Dell PowerEdge 1655MC features up to six server blades in one chassis in a 3U form factor. Each blade functions as an individual server utilizing its own memory, 2 CPUs and 2 internal SCSI hard drives. The chassis includes power supplies, network module, fans, and a management module. The PowerEdge 1655MC optionally ships with a USB CDROM/Floppy drive. The chassis also contains two Gigabit Ethernet network switches, which connect internally to two network interface cards (NICs) embedded on each blade. Additionally, Dell embedded remote access (ERA) hardware and firmware are integrated in the chassis. The ERA module monitors all the shared infrastructure components of the chassis. Figure 1 and 2 show the PowerEdge 1655MC front view and back view respectively. For detailed information regarding Dell PowerEdge 1655MC, refer to

    http://www.dell.com/us/en/esg/topics/esg_pedge_rackmain_servers_1_pedge_1655mc.htm

Figure 1: PowerEdge 1655MC Chassis - Front View

Figure 2: PowerEdge 1655MC Chassis - Rear View

    April 2003 Page 4 Dell Enterprise Product Group

    Dell’s Management Solution for PowerEdge

    1655MC HPC Clusters

Dell’s PowerEdge 1655MC HPCC solution provides four methods of managing

    and monitoring the cluster: Dell OpenManage? IT Assistant, Ganglia, digital

    KVM and ERA. IT Assistant and Ganglia are the two in-band management tools

    that use the cluster fabric, or intra-cluster network, for monitoring and

    management traffics. IT Assistant is Dell’s server management solution that

    provides a centralized management console used to discover nodes on the

    network and examine hardware sensor data to prevent failures at the system

    level. Ganglia is an OS-level cluster monitor that can be used to look at resource

    usage, detect node failures, and troubleshoot performance problems. Both ITA

    and Ganglia require OS support and use the cluster fabric for communication.

    Figure 3 shows a sample of PowerEdge 1655MC HPC cluster configuration

    formed by 66 blades as the compute nodes. The Cluster Fabric in the diagram is

    constructed by using three Dell PowerConnect? 5224 Gigabit Ethernet switches.

    Four Gigabit Ethernet links are used as a network trunk from each PowerEdge

    1655MC chassis to one of the PowerConnect 5224 switches. A dedicated IT

    Assistant node a PowerEdge 1650 as the IT Assistant monitoring and

    management station is connected to one of the switches as well as to the ERA

    Fabric. The ERA Fabric is constructed by using a PowerConnect 3024 Fast

    Ethernet switch. The ERA ports on PowerEdge 1655MC chassis are connected to

    the PowerConnect 3248 switch. The master node, a PowerEdge 1650 server is

    also connected to the 3024 switch, so that both the ITA node and the master node

    can perform out-of-band ERA monitoring and management functions.

    The other out-of-band fabric called KVM Fabric is going through a digital KVM

    switch, the Dell 2161DS Remote Console Switch. The KVM ports on the

    PowerEdge 1655MC chassis, the master node, and the ITA node are connected to

    the 2161DS switch. The Ethernet ports on the 2161DS switch is connected to the

    LAN outside the cluster to form a complete out-of-band management network

    independent to the cluster fabric and the ERA fabric. For detail information

    about utilizing the 2161DS switch, refer to:

    http://www.dell.com/us/en/biz/topics/power_ps3q02-avocent.htm.

    For information regarding PowerEdge 1655MC HPC clusters, please visit the

    Dell HPCC web site at:

    http://www.dell.com/us/en/esg/topics/products_clstr_gb1655_pedge_configs_165

    5_cluster_hpcc.htm

    April 2003 Page 5 Dell Enterprise Product Group

Figure 3: 66-blade PowerEdge 1655MC HPC Cluster Configuration

April 2003 Page 6 Dell Enterprise Product Group

    In-band Monitoring and Management

    It is important for an HPCC system administrator to be able to monitor a cluster at the hardware level especially in a large cluster environment. Dell HPC cluster solution offers two methods of in-band management:

    ? Dell OpenManage? IT Assistant (ITA), a Web-based tool for managing Dell

    servers, and

    ? Ganglia, an open source monitoring tool, developed at the University of

    California, Berkley.

    IT Assistant (ITA)

    Using the OpenManage IT Assistant, a web browser-based tool that supports all of the PowerEdge 1655MC components through the Simple Network

    Management Protocol (SNMP), allows cluster administrator the ability to

    manage and monitor the hardware of an entire cluster, and to perform day-to-day cluster management tasks from a centralized location using a GUI. SNMP provides the communication between the management console and the nodes,

    with every system component running an SNMP agent. IT Assistant provides the following functionality:

    ? Discovery of the chassis and chassis components (see Figure 4) ? Support for hot swapping blades

    ? Summary and status information for all chassis components and support for

    system inventory and search

    ? Launch of management applications for chassis components ? Management of events generated by chassis components ? Page/e-mail when an event occurs

    ? One-to-many centralized console

    All of the functions mentioned above are crucial to the management of a HPC cluster. One of the most basic system administration tasks, discovery and

    identification of nodes, is performed by IT Assistant, as well as discovery of chassis components the embedded Ethernet switch and the ERA module.

    IT Assistant allows the administrator to hot swap any blade in the chassis without interrupting the other blades, which allows maintenance to be

    performed without shutting down entire blades in a chassis. As the cluster grows

    April 2003 Page 7 Dell Enterprise Product Group

    in size, the node status information becomes even more important to monitor in

    order to simplify administration. IT Assistant provides such information as

    system name, IP address, MAC address, versions of components, memory size,

    chassis service tag, chassis asset tag, blade slot number and blade service tag.

    IT Assistant provides one-to-many functionalities such as remote shutdown,

    flash BIOS, configuration of server alert functions as well as inventory for all

    components.

    IT Assistant includes an event management system (ESM) for capturing any

    event that is generated by the modules through SNMP traps. Administrators can

    associate actions with specific events, including email, paging or application

    launching.

     Figure 4: Summary of PowerEdge 1655MC Chassis Information

    Ganglia

    Another in-band management tool available in a Dell PowerEdge 1655MC

    cluster offering is Ganglia, an open source OS-level cluster monitor. Out of the

    box, Ganglia monitors and automatically graphs over 20 metrics such as the

    node’s load average, number of running processes, number of incoming and

    April 2003 Page 8 Dell Enterprise Product Group

    outgoing network packets, total and free memory on every node of the cluster, etc.

    Ganglia provides several levels of cluster information. At-a-glance view (Figure 5) shows the overall status of the cluster and summarizes total node count, number of nodes that are up, overall load average, and CPU and memory utilization for the cluster. Color-coding is used to represent CPU utilization to enable quick identification of overloaded systems. A crossbones icon indicates a node is down. Selecting a different metric in this view redisplays the screen with the value of this metric for each node, and uses the metric as a sort index when displaying the nodes.

Figure 5: At-a-Glance View of Ganglia

    Clicking on an individual node icon displays all available information for this node (Figure 6). This view summarizes the static information such as the version of the OS, system usage, IP address, machine type, and graphs those metrics that

    April 2003 Page 9 Dell Enterprise Product Group

    change over time, such as memory and CPU usage, network traffic stats, number of running process, disk usage, etc.

Figure 6: Information about One Node

    Using Ganglia allows administrators define and add other parameters in the cluster that they want to monitor. Ganglia’s GUI will automatically graph those values in addition to the pre-set metrics for every node. Ganglia also simplifies cluster management by providing a remote execution environment. This feature is used for remote management, and to execute commands in parallel on multiple nodes.

    April 2003 Page 10 Dell Enterprise Product Group

Report this document

For any questions or suggestions please email
cust-service@docsford.com