Open source Big Data tools accelerating physics research at CERN
The number of CERN teams utilising big data frameworks – Apache Hadoop, Spark and Kafka to develop their systems has grown significantly in recent years. These systems include the next generation CERN Accelerator Logging Service (NxCALS) which logs data from 20,000 devices that monitor the CERN accelerator complex for online and offline analysis, the monitoring system for the CERN IT Data Center infrastructure and the Worldwide LHC Computing Grid (WLCG) and the CMS Data Reduction Facility which is evaluating Apache Spark to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis.
This talk will provide an overview of the current deployments of Apache Hadoop, Spark and Kafka and challenges faced in supporting demanding needs from various CERN communities.