FYxx Plan for ILCAccelerator ControlsILCTA Support

By Virginia Scott,2014-06-17 17:47
8 views 0
FYxx Plan for ILCAccelerator ControlsILCTA Support ...

FY10 Plan for / Grid / Grid Services

    Prepared by: Gabriele Garzoglio, Philippe Canal, Burt Holzman,

    Andrew Baranovski, Parag Mhashilkar, Eileen Berman Date: Aug 19, 2009

    Relevant Strategic Plans Strategic Plan for Grids, Strategic Plan for Scientific Facilities,

    Computing Division Strategic Plan (2010 2012)

Grid Services Goals

    o Provide leadership in the area of middleware development for Fermilab and the

    Open Science Grid (OSG).

    o Provide a middleware infrastructure for Fermilab and the OSG, with focus on

    interoperations with major peer grids, such as Enabling Grids for E-sciencE

    (EGEE), TeraGrid, etc., supporting the needs of Fermilab’s scientific community.

Grid Services Strategy

    o To enhance and expand the body of grid software, business methods, and

    deployment community that is broadly accepted by the FNAL site and FNAL

    based virtual organizations.

    Grid / Grid Services

Tactical Objectives for FY09

    1. Maintain the infrastructure for VO membership registration, focusing on the

    convergence of VOMRS and VOMS-admin. Investigate new mechanisms for VO and

    site policy definition, publication, and enforcement.

    2. Work closely with stakeholders to identify and appropriately prioritize their

    maintenance needs from the Gratia software stack (text and graphical reports, probes

    and collectors).

    3. Provide expertise and code updates as needed by the groups operating the production

    instance of Gratia Collectors within OSG and the Fermilab Computing Divisions.

    4. Ensure that (potential) Gratia Extension provided by external projects are well

    integrated into the existing code base, test, release and support mechanisms. Improve

    quality assurance process for the software.

    5. Develop and deploy a metrics analysis and correlation service to prepare dynamic

    reports on the scientific use cases of Grid Services. Focus on US CMS, RunII, and the

    future neutrino experiment.

    6. In the context of the CEDPS program, provide tools and services for on-demand

    collection of diagnostic information generated by storage software on OSG, including

    dCache and Hadoop. Interface this software to general purpose troubleshooting

    middleware, including netlogger.

    7. Finish development activities for the Resource Selection Service (ReSS) project.

    Move all software to maintenance. Close the project. Provide second-level support to

    the FermiGrid operations of the service for OSG and CD.

    8. Provide maintenance and support for the Glidein Workload Management System

    (glideinWMS) for CMS, Fermigrid, CDF, OSG, and other stakeholders. Enhance and

    further develop glideinWMS based on stakeholder input.

    9. In the context of GlideIn WMS, package gLexec authorization software in

    collaboration with gLite developers along with OSG-specific components. Provide

    maintenance and support to OSG.

    10. CMS Info System: IS IT HERE?

    11. Perform security-focused reviews of several software projects.

    12. Develop and maintain the SAZ service to enable user/vo/role/ca banning on campus

    grid facilities, in particular on FermiGrid, and to provide support to customers of the

    SAZ software.

FY09 Accomplishments:

    1. Improve usability and operability of the Virtual Organization (VO) Services


    The VO Services project has followed his program of improvements to the

    authorization and registration infrastructures, in particular on GUMS and gLExec. For

    details, see project closing report: docdb 3249.

    2. Deploy and support the VO Services infrastructure for the stakeholders on OSG.

    Focus on reducing maintenance and on fostering interoperability of the authorization


    The Authorization Interoperability project was successfully completed. This allows

    software developed in the US to be seamlessly deployed in the EU and vice versa.

    Maintenance is reduced by providing a common code base for authorization call-out

    modules between OSG, EGEE, and Globus. See details at the Authorization

    Interoperability project closing report: docdb 3238.

    3. Integrate emerging standards and increasingly complex use cases in the VO Service

    infrastructure. These include new mechanisms for identity management, support for

    finer-grain storage privileges, VO and site policy definition, publication, and


    The VO Services projects has supported storage groups in defining the next

    generation storage authorization models through the authorization interoperability

    project, it has fostered the convergence of VOMS-admin 2.5 with VOMRS, it has

    investigated mechanisms to define and enforce VO and Site Authorization Policies as

    part of an SBIR Phase II grant, and it has evaluated VOMS-signed attribute validation

    mechanisms for OSG. For details, see project closing report: docdb 3249. 4. “Provide maintenance and support for the Resource Selection Service (ReSS)

    Workload Management System (WMS) for OSG and FermiGrid VO’s. Focus on the

    operational qualities of the infrastructure.”

    Implemented and deployed support for MPI jobs. Improved support for advertising

    Storage Elements. Released test suite to identify common deployment/configuration

    issues in Cemon for ReSS. Verified compliance of ReSS with OSG 1.2. Deployed

    ReSS services in FermiGrid in High Availability Mode and, for this mode,

    implemented classad monitoring.

    5. Develop new accounting reports and enhance existing ones for the Gratia system.

    Work closely with stakeholders to identify and appropriately prioritize their needs.

    Met weekly with stakeholders to insure proper prioritizing of the addition of new

    features and reports resulting in the delivery of expected feature within the agreed

    upon time line.

6. Provide support for the production instance of the Gratia accounting system for OSG

    and Computing Division (CD).

    Provided patch releases and expertise as needed to insure smooth operation of the

    OSG and Computing Division (CD) instances of Gratia.

    7. “Develop and deploy a science-dashboard infrastructure to display customized

    metrics of running Grid services. Focus on the use cases of storage for US CMS.”

    The MCAS project has developed and deployed a prototypical service to prepare and

    display metrics reports for US CMS storage, DZero Montecarlo and DZero


    8. “Transition Glide-in Workload Management System to maintenance and operation

    mode. Focus on deployment, maintenance, and support of the infrastructure for


    Stable versions 1.6 and 2.0 were released and deployed. CMS glideinWMS

    installations transitioned to maintenance and operations, executing over 10,000 jobs


    9. “Continue to improve interoperability of EGEE and OSG information systems and

    move to maintenance and operations mode. Begin investigation of interoperation

    between other peer grids and campus grid infrastructures.”

    EGEE-OSG interoperability activity transitioned to maintenance mode. Initial

    proposals circulated on end-to-end information system work allowing interoperation

    between peer and campus grid infrastructures.

10. Plan and coordinate Fermilab OSE working group”

    Meetings of the OSE working group transitioned to an 'as needed' basis, in response

    to the completion of many of the docket items. In the past year the OSE working

    group met several times and discussed the following items -

    ? grid incident response procedure which was agreed to by the CSExec

    ? reviewed the OSG trust documentation to insure its alignment with Fermilab


    ? finalized the OSE baseline and completed the process to get it accepted by the


    ? began discussing D0 compliance with the OSE baseline

    ? began development of the Fermilab VO trust policies and procedures

    The group's web pages are kept up-to-date with minutes and docket items.

    11. “Implement a software security review process.”

    The process “A code inspection process for security reviews” (cd-docdb 3021) was

    developed. The process was presented as a poster at CHEP09. A Paper on the process

    was written and is to be published in the Journal of Phys. Conf. Ser.

    12. “Perform security-focused reviews of several software projects.”

    We used the code inspection process for the security review of the Site AuthoriZation


    13. “CEDPS: In the context of the dCache/SRM and CEDPS troubleshooting projects,

    interface existing or collaboratively develop implementations for collection of storage

    service events and supplemental logging information for general purpose

    troubleshooting and operations control middleware”

As a first step in building coherent end to end event reporting infrastructure, we have

    designed and implemented common session id protocol between SRM client, SRM

    server, and dCache . These common ids are now used to trace the user activity from

    the front end service (user job) to any back-end services (dCache pool, mover, name

    space database). This helps to quickly identify the context of the problem report. In

    the context of troubleshooting, CEDPS effort was spent to design the adaptation of

    the MCAS infrastructure to the present and future use cases of storage, with focus on

    the US CMS T1 use cases.

    Not Accomplished in FY09:

    1. “CEDPS: Lead in establishing requirements of a Data Placement service and the

    characteristics of underlying storage and movement services, in order to provide a

    general dynamic storage service for advertising storage resource and accessibility.”

    2. “CEDPS: Research and prototype quality of service negotiation tools to mitigate

    vulnerabilities of storage systems to overload and resource exhaustion.”

    The primary focus of this activity was to provide a formal description of dCache

    managed resources in order to enable their optimal use by “future” storage-computing

    workflow planners. After our investigation, we have found that the current state-of-

    the-art Grid has not reached the point where optimal data and computing co-

    placement can make an impact on the quality of the user experience on OSG. This

    has lead to a shift of focus in the effort to more immediate payoff activities, such as

    troubleshooting and metric analysis.

    3. “Integrate emerging standards and increasingly complex use cases in the VO Service

    infrastructure. These include new mechanisms for identity management, support for

    finer-grain storage privileges, VO and site policy definition, publication, and

    enforcement.” Authorization validation was one of the activities from last year. While the

    development was ready to be deployed, the alarming infrastructure (RSV) could not

    support the error reporting use cases required by the validation probe. Responsibility

    for the end-to-end deployment of this functionality has been given to the Software

    Tools Group at the closure of the VO Services project.

    4. “Integrate software security best practices and procedures into the software

    development life cycle.”

    The security-related best practices appropriate for the Grid Services development

    environment were analyzed. The integration of the practices was discussed with the

    Office of Project Management. In order to encompass multiple domains, the current

    project management processes are high-level and not limited to software development.

    We are planning to evaluate the integration of best practices only for the software

    domain in FY10.

    5. “Provide maintenance and support for the Resource Selection Service (ReSS)

    Workload Management System (WMS) for OSG and FermiGrid VO’s. Focus on the

    operational qualities of the infrastructure.”

    Due to increased responsibilities in the GlideIn WMS area, the following two ReSS

    work items were deferred to FY10: (1) finalize ReSS compliance with the Fermigrid

    Software Acceptance process; (2) improve security for resource registration.

Activities and Work Definition