Research.kek.jp research projects



4.4.2. Computing Research Center

4.4.2.1. Overview

The Computing Research Center (CRC) provides computing resources and computer networks to support research activities at KEK.

The Central Computer System provides the KEK staff of KEK and research collaborators with a large amount of data storage and CPUs, for experimetnalexperimental data analysis of experiments running in KEK. This system includes a computer Grid system, which that enables cooperative analysis in thefor global scale collaboration projects. The Ssupercomputer Ssystem is operated for large-scale simulation programs, and is mainly used in the field of computational physics.

The CRC operates campus networks, including the KEK-LAN in Tsukuba campus, and the JLAN in Tokai campus, and the HEPnet-J for high energy physics collaboration of for domestic universities and laboratories. Computer security concerns In for these computer networks, the concerns over computer security have been steadily gaining inceasingincreasing in importance.

To support communication in research activities, tThe CRC provides an e-mail system and web systems to support communication in research activities. Such information systems provide a variety of services, such as mailing lists, a Wiki, and the a document management system (KDS-Indico), besides in addition to conventional services.

CRC promotes several research projects, such as Grid systems, the Manyo-Library for data analysis tools, the Geant4 for detector simulation, and GRACE for automatic theoretical calculations.

4.4.2.2. Computing Services

Central Computer System

The current Central Computer System has been operated since FY2012. The system consists of a data analysis system called KEKCC, and E-mail information services, of E-mail and web systems. KEKCC is an HPC Linux cluster system, that and provides 4,176 CPU cores CPU and a large-scale storage system. The storage system is composed of two types of systems, : one is a GPFS disk system of the GPFS with 7 PB capacity, ; and the other is a tape library system that can store data in on tapes up to 16 PB. The HPSS is used asprovides Hierarchical Storage Management for accessing tape data access, and that data I/O is automatically performed through the GPFS file system by the GHI (GPFS/HPSS Interface). This GHI enables to access to tape data in with the same way method as used to access disk data. So far the amount of Presently, 1.4 PB and 4.5 PB of data are stored in the disk and tape systems, respectively.

The file system has wasbeen unstable against for huge data accesses at the firstwhen first installed. The System stability was improved with a system upgrade was performed in August 2013. to improve system stability. Another technical problem with the GHI small file aggregation functionality of GHI was founddetected, and the fix against for this system bug was applied in February 2014.

LSF is used as a job scheduler in the batch system. We have made continuous efforts to improve job throughputs, job monitoring jobs, and queue settings optimization,ing queue settings and parameters of for a fairshare scheduling policy. Fig. 4-4-2-1 shows that Aas a result, of our efforts, we achieved over 80% monthly CPU usage rates. were achieved as shown in Fig. 4-4-2-1.

Large-scale Simulation Program

KEK launched its Large-scale Simulation Program in April 1996 to support large-scale simulations in the field of high-energy physics and other related areas. Under this program, KEK solicits proposals under this program for projects that make use of will use the KEK Supercomputer System (KEKSC).

Two research periods overlapped in the FY2013,: the research periods of 2012-2013 and 2013-2014, each from October to the next folowingfollowing September. During the 2012-2013 period, 26 Twenty-six proposals covering the following research areas were filed and approved by the Program Advisory Committee, during the 2012-2013 period. They covered the following research areas: lattice QCD (14), elementary particle physics (3), nuclear physics (3), material science (3), astrophysics (1), and accelerator science (2). In addition, 6 trial applications were also accepted. In the 2013-2014 period, 23 Twenty-three proposals have been approved in the 2013-2014 period, most of which are continuations of proposals filed in the last previous period. Four trial applications have also been approved thus farto date. (See )

KEKSC also provides computing resources to the Computational Fundamental Science Project, driven by the Joint Institute for Computational Fundamental Science (). In FY2013, 6Six proposals covering computational high energy, and nuclear physics, and astrophysics were approved as projects to use KEKSC in FY2013.

The KEKSC is currently consists comprised of two systems, System A, a Hitachi SR16000 model M1, and System B, an IBM Blue Gene/Q. System-A started service in September 2011, at an off-site data center, and has been in operation at KEK since March 2012. System B started service in April 2012. KEKSC is connected to the Japan Lattice Data Grid, which provides fast data transfer of data in lattice QCD among supercomputer sites in Japan. The grid does this via HEPnet-J/sc, a virtual private network based on SINET4 and provided by the National Institute of Informatics.

KEKSC had There was an unauthorized access to KEKSC in the middle of mid-October 2013, for which KEKSC and was out of service from 1st November 1st to December 10th 2013.December. Section 4.4.2.3 Security, describes Ddetails are described in the of the incident.Sec. 4.4.2.3 for Security.

4.4.2.3. Network and Security

Network

Network System Renewal of the network system

The KEK network system is was replaced in FY2013. The new network structure of new network is almost the same as that of the previous one, but however, the bandwidth and core switch performance of core switches are have been upgraded increased to more than twice that of the previous network. Now, the 10 Gigabit Ethernet links to the core switch can be operated simultaneously, without interfering to with each other. Previously, on the old network, when all 10 GbE links were fully used simultaneously, In the case of previous system, packets may might have been be dropped on the switch backplane. on the switch when all of 10 GbE links are fully used simultaneously. Now, the port density and the bandwidth of the backplane are have been upgraded to avoid such a situation. This will help improve the mass data transmission over the connection from KEKCC via SINET. The Pprevious firewall has had 10 Gbps interfaces, but however it limits the bandwidth of single TCP was limited up to 5 Gbps. Now, the limit of new firewall limit is 10 Gbps and there are two firewalls for the redundancy. Both Neither of them will not be a bottleneck for the data transmission.

The wireless network connection access point for the wireless network connection is has also been increased from 150 to 200 , and now the access points accepts connections using with IEEE 802.11n. The Multi-Input and Multi-Output (MIMO) for the high-speed connection, using multiple channels, is enabled in only in the 5 GHz radio band. The access points can support MIMO in the 2.4 GHz radio band;, but howeverhowever, there are quite not many 2.4 GHz access points operated by users in at the KEK Tsukuba Campus, so therefore, currently MIMO at 2.4 GHz is not currently scheduled.

KEK PerfSONAR servers in KEK

Since few years ago, pPerfSONAR-PS servers are started running, a few years ago in 20xx, on the KEK-DMZ and KEKCC network to monitor the network performance. In FY2013, wWe prepared another perfSONAR server to cope with perfSONAR-MDM (Multi Domain Monitoring) servers in remote collaboration sites in FY2013. The perfSONAR-PS server issues performance tests in its own interval and it responds to the requests from remote servers when there is no currently ongoing performance test. Therefore, if many sites are registered oin the scheduler and the interval is not so very long, the server will not allocate test time for remote site requests. from the remote site. In the case of pPerfSONAR-MDM, it will not execute performance tests by itself,. Iinstead, of that, a central server sends a request to the perfSONAR-MDM servers for the a performance test between among themselves. The central server determines the schedule of performance tests is determined by the central server, and records the test results exclusively. and the history of test results is recorded in it only. This model reduces the resource requirements of the measurement node. Similar to the perfSONAR-PS, now KEK now has two perfSONAR-MDM servers in the KEK-DMZ and KEKCC networks, that are similar to the perfSONAR-PS servers.

Security

Japanese academic sites In FY2013, there occurredexperienced severe security attacks to supercomputer systems inin FY2013. Japanese academic sites. The KEKSC had also received experienced attacks. The cCredentials, pairs of the username and the password, of users, were stolen somewhere else were and used to intrude the KEKSC SSH login servers in the KEKSC. Although the attacks made had no effect to on either the system itself and nor the users’ data on the system, it took more than one month to restart computing service, due to the investigation. Not onlyIn addition to KEKSC, but also other SSH servers are also becoming great opportune targets for such attacks. Credentials Management management of credentials is becoming more and moreincreasingly important among both system administrators and users.

The sSimilar attacks is are also applied to Web applications, such as Webmail, where a credentials is are required to login. One of ways to way to steal credentials is “phishing”, in which a huge amount number of fraudulent e-mails are sent to with a link that leads to the phishing website. In the last half of FY2013, in about every month, over Over a one hundred users of our e-mail system received e-mails pretending as ifthat they were sent by the KEK Webmail administrator, almost every month in the last half of FY2013. Some of users were deceived and put entered their credentials in on the phishing website. Once they credentials are stolen, they can be used to send SPAM mails or used to intrude into other Web applications. To minimize the risk from these social engineering attacks, KEK has warned users many times to ignore such suspect e-mails requiring that request them to enter credentials, and not to click ton he URLs in them, to minimize the risk from these social engineering attacks. The KEK Webmail administrator never requests users to enter the their credentials in such a manner.

4.4.2.4. J-PARC Information System

Since FY2002, tThe J-PARC infrastructure network, called JLAN, has been operating independently offrom the KEK LAN and the JAEA LAN, in terms of logical structure and operational policy, since FY2002. The total number of hosts on JLAN has reached over 3,800, and it has been increasing at a rate of 108% per year. Fig. 4-4-2-2. Shows Tthe growth curves of the edge switches, wireless LAN access points, and hosts connected to JLAN. are shown in Fig. 4-4-2-2. Fig. 4-4-2-3 shows the FY2013 network usage in FY2013 from the Tokai site to the Tsukuba site, where the KEKCC is installed. The data transfer rate was achieved was an average of 6 Gbps in 5 minutes average and was approaching to the network bandwidth capacity of 8 Gbps. The figure also shows that after June, major network activities related to J-PARC experiments were suspended, due to the accident at the Hadron Experimental Facility on May 23, 2013.

4.4.2.5. Research and Development

Grid in Medical Applications

While Monte Carlo (MC) simulation is believed to be the most reliable method of dose calculation in particle therapy, however, the simulation time is critical in attaining sufficient statistical accuracy for clinical applications.

In order to help a rapid development of Grid/Cloud aware MC simulations, CRC has developed the Universal Grid Interface (UGI) , based on the Simple API for Grid Applications (SAGA), to support rapid development of Grid/Cloud aware MC simulations. SAGA, which is standardized in the Open Grid Forum (OGF), defines API specifications to access distributed computing infrastructures. The UGI is a set of command line interfaces and APIs in the Python scripting language. for The scripts perform job submission, file manipulation, and monitoring in multi-Grid middleware infrastructures, as well as local resources managed by popular load management systems, e.g., PBS and LSF.

We have developed a common platform of for MC dose calculation in Grid distributed computing systems to allow medical physicists to separate dose biglarge- calculations into many small-calculations and process them in parallel over the distributed systems. The platform is flexible and effective for dose calculation, in both clinical and research applications for particle therapy. The platform consists of the UGI and the Geant4-based Particle Therapy Simulation Framework (PTsim). We have achieved significant performance improvement in turn-around-time for dose calculation with this platform, by parallelization of the original calculation with this platform.

Object Oriented Data Analysis Environment for Neutron Scattering, “Manyo-Library”

The Materials and Life Science Facility (MLF) of J-PARC is a user facility providing that provides neutron and muon sources for experiments. We have developed Aa data-analysis environment for each instrument in MLF has been developed on a software framework. The framework, Manyo-Library, has common and generic analysis functionalities for neutron-scattering experiments, . and hasT developed and maintained by he MLF computing environment group developed and maintains the Manyo-Library. The framework is an object-oriented C++ class library., and is based on an object-oriented methodology.

Manyo-Library can be used on a python user-interface, and provides many methods, for example, data input/output functions, data-analysis functions, and distributed data processing, . and can be used on a python user-interface. As the data container in Manyo-Library can be written in the NeXus format (see ), tThe Manyo-Library data files can be read in any other laboratory, because the data container is written in the NeXus format (see ).

Many data-analysis software programs have been developed for various instruments/experiments by adopting this framework.

We installed Manyo-Library has been installed onto the neutron scattering instruments in MLF and utilized it as an infrastructure of for the software environment. The first official version of Manyo-Library, 0.3, was released in 2012, and it is was improved in this year (2014). A small workshop for Manyo- Library beginners of Manyo-Library was held in the August, 2013, and the data analysis environment was installed into onto their own personal laptops. We plan to will start discussing and designing of a new data-file format based on HDF-5 (Hierarchical Data Format). We will ensure that Manyo-Library will work with Python ver.3 within the next year.

GRACE

GRACE is an automatic computation system that provides quantitative theoretical predictions of elementary particle cross sections of elementary particles and event generators for high-energy physics experiments. An important extension of the GRACE system is the inclusion of higher order corrections. This is necessary to give more precise theoretical predictions, in the Standard Model or and beyond.

In ILC energy region, the The correction at the one-loop level to electroweak processes with Higgs production, in the ILC energy region, has been estimated at around 10 percent of the tree level. Therefore, the correction at the multi-loop level should also be considered. To eEstimatione of the correction at the multi-loop level demands it, thethat we implementation multi-loop integral of calculations of multi-loop integrals into GRACE is demanded.. In this extension, it is a challenging work to handle Handling three kinds of divergence generating from the infrared term, the ultra violet term and the kinematical conditions, in this extension, is challenging work.

We have been developing the Direct Computation Method (DCM) for multi-loop integrals. It is based on numerical multi-dimensional integration and numerical extrapolation. DCM is a fully numerical method and is applicable to multi-loop integrals with various physics parameters. Using DCM the enables us to handle the divergence generating from the infrared terms and the kinematical conditions can be handled in a fully numerical waymanner.

Besides fully numerical approach, We have studied a combination of symbolical and numerical treatment of two-loop integrals, in addition to the fully numerical approach. has been studied. Using sSymbolic manipulation software, has enabled us to obtain Feynman amplitudes have been obtained by the n-dimensional regularization method. An automatic system has been already underway been taking calculationcalculating of the muon anomalous magnetic moment at the two-loop level as an example, where 1780 two-loop diagrams and 70 renormalized one-loop diagrams appear. We use The ggauge-fixing is used as a very efficient means to check the results.

At the same time,We have developed a purely theoretical approach has been developed.at the same time. It is known that aAny loop and any point function can be written as a linear combination of some hypergeometric series. We explain, how one-loop integration is expressed in a hypergeometric series using recursion formulae. We also obtain an n-point function exactly expressed in terms of a hypergeometric series for arbitrary mass parameters and momentum, in any space-time dimension. Since the singular points in hypergeometric functions are investigated well, a We expect a software library enabling capapblecapable of very precise and stable computation of loop integrals, because the singular points in hypergeometric functions have been well investigated. can be expected.

For ILC, oOther options for ILC have been discussed, such as electron-electron, electron-photon, and photon-photon colliders. have been discussed. Each option will provide interesting topics, such as the detailed measurement of the Higgs properties and the quest for the new physics beyond the Standard Model. We calculated the electroweak one-loop contributions to the scattering amplitude for e-γ → e- Higgs and expressed it in analytical form. We also analyzed the cross section for the Higgs production for each combination of polarizations of the initial beams.

Geant4

Geant4 is a toolkit for detector simulation of the passage of particles through matter. It provides a comprehensive set of functionalities for geometry, material, particle, tracking particles, particle interaction, detector response, event, run, visualization, and user interface. Geant4 has flexibility and expansibility as a generic simulation framework, and it is widely used in many different application domains from HEP experiments to medical and space applications. Fields beyond particle physics have tyakntaken notice of Geant4, due to Iits versatility has gathered attention from fields beyond particle physics.

In FY2013, the Geant4 new vversion 10.0 was released on December 6th,, 2013, and several patches were have also been released. We successfully released a Geant4multi-thread version of Geant4 (G4MT) in this release. Thanks to G4MT, Geant4 simulation can be processed concurrently in threads, and CPU cores can be utilized efficiently in multi-core environments. G4MT is designed prudently, so that the migration to multi-threaded applications can be done with minimum changes of user codes. Also from the performance viewpoint, G4MT also shows good performance scalability with an increase of in the number of CPU cores. We succeeded successfully to runran G4MT applications on Intel’s new Xeon Phi new many-core architecture showing high scalability.

We also continuously supported the user community. We organized a 3-days user-training course in December, 2013, and there werewith about 100 participants. In the tutorial, we We provided a Japanese language version of Geant4 training materials compliant to with the latest version (10.0) of Geant4, in the tutorial. We expect These these educational materials are expected to be very helpful for many Japanese novices Geant4 users.

As for new development, wWe continued new development with a project in the framework of the Japan/US Cooperation Program, collaborating with the SLAC Geant4 team and some experiment groups. We make made efforts on to speed-up of Geant4 for future experiments. The project challenges is to the improvement of the Geant4 kernel using with recent new computer technologies such as multi-core CPU, many-core CPU, and GPU computing. We are also developing Another another direction of Geant4 parallelism using with GPU. is also in developing. We set a project target The project set a target to of electromagnetic physics in the lower energy region, under voxel geometry, which is especially used in radiation dosimetry. We implemented Electromagnetic electromagnetic processes in Geant4 were implemented in on a parrallelparallel computing platform, CUDA, and we achieved a 40 times speed-upspeed increase comparing compared to with simulations run by on a single CPU.

Figures

[pic]

Fig. 4-4-2-1. History of monthly CPU usage rates.

[pic]

Fig. 4-4-2-2. Growth of J-PARC network.

[pic]

Fig. 4-4-2-3. Tokai-Tsukuba network bandwidth usage.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download