Hello, I'm
Research Scientist
Lawrence Berkeley National Laboratory
I optimize data management and I/O performance for scientific workflows integrating AI. My research investigates data access patterns, automatic tuning, and storage solutions to ensure massive datasets are ready for scientific discovery.
PI of an LDRD project on AI data readiness for scientific datasets (AIDRIN). Research on LLM-driven I/O performance diagnosis (IOAgent), cross-layer I/O bottleneck exploration (Drishti), and HDF5 benchmarking for AI workloads (h5bench). Published two 360-degree surveys on data readiness for AI and I/O in ML applications (ACM Computing Surveys).
Key contributor on DOE's ECP ExaIO team. Built Drishti for automated I/O optimization guidance, DXT Explorer for trace visualization, and h5bench for HDF5 performance evaluation. Characterized multi-layer I/O behavior on leadership-scale systems (HPDC'22). Published a 360-degree survey on I/O access patterns (ACM Computing Surveys).
Six-month doctoral stay (Aug 2019 – Feb 2020). Designed on-demand user-level I/O forwarding with arbitration policies for HPC platforms (IPDPS'21). Applied reinforcement learning to adaptive request scheduling at the forwarding layer (Future Generation Computer Systems). Developed runtime detection of I/O access patterns (SBAC-PAD'19).
One-month visit (Jan–Feb 2017). Collaboration on parallel I/O scheduling and forwarding-layer optimizations.
“Dynamic Tuning and Reconfiguration of the I/O Forwarding Layer in HPC Platforms”
Advisor: Prof. Dr. Philippe O. A. Navaux (UFRGS)
Co-Advisor: Dr. Toni Cortes (UPC / Barcelona Supercomputing Center)
“Evaluating I/O Scheduling Techniques at the Forwarding Layer and Coordinating Data Server Accesses”
Advisor: Dr. Philippe O. A. Navaux (UFRGS)
“Applying Fault Tolerance Techniques in URI Online Judge”
Advisor: Prof. Msc. Paulo Ricardo Rodegheri (URI)
67 peer-reviewed papers across journals, conferences, and workshops.
Artificial Intelligence (AI) applications critically depend on data. Poor-quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the quality and appropriateness of data usage for AI. R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used to verify data readiness for AI training. This survey examines more than 140 papers published by ACM Digital Library, IEEE Xplore, journals such as Nature, Springer, and Science Direct, and online articles published by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy will lead to new standards for DRAI metrics that would be used for enhancing the quality, accuracy, and fairness of AI training and inference.
Growing interest in Artificial Intelligence (AI) has resulted in a surge in demand for faster methods of Machine Learning (ML) model training and inference. This demand for speed has prompted the use of high performance computing (HPC) systems that excel in managing distributed workloads. Because data is the main fuel for AI applications, the performance of the storage and I/O subsystem of HPC systems is critical. In the past, HPC applications accessed large portions of data written by simulations or experiments or ingested data for visualizations or analysis tasks. ML workloads perform small reads spread across a large number of random files. This shift of I/O access patterns poses several challenges to modern parallel storage systems. In this article, we survey I/O in ML applications on HPC systems, and target literature within a 6-year time window from 2019 to 2024. We define the scope of the survey, provide an overview of the common phases of ML, review available profilers and benchmarks, examine the I/O patterns encountered during offline data preparation, training, and inference, and explore I/O optimizations utilized in modern ML frameworks and proposed in recent literature. Lastly, we seek to expose research gaps that could spawn further R&D.
As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve their desired I/O performance. To identify and address I/O issues, scientists largely rely on I/O experts to analyze their I/O traces and provide insights into potential problems. However, with a limited number of I/O experts and the growing demand for dataintensive applications, inaccessibility has become a major bottleneck, hindering scientists from maximizing their productivity. The recent rapid progress in large language models (LLMs) opens the door to creating an automated tool that democratizes trustworthy I/O performance diagnosis capabilities to domain scientists. However, LLMs face significant challenges in this task, such as the inability to handle long context windows, a lack of accurate domain knowledge about HPC I/O, and the generation of hallucinations during complex interactions. In this work, we propose IOAgent as a systematic effort to address these challenges. IOAgent integrates various new designs, including a module-based pre-processor, a RAG-based domain knowledge integrator, and a tree-based merger to accurately diagnose I/O issues from a given Darshan trace file. Similar to an I/O expert, IOAgent provides detailed justifications and references for its diagnoses and offers an interactive interface for scientists to continue asking questions about the diagnosis. To evaluate IOAgent, we collected a diverse set of labeled job traces and released the first open diagnosis test suite, TraceBench. Based on this test suite, extensive evaluations were conducted, demonstrating that IOAgent matches or outperforms state-of-the-art I/O diagnosis tools with accurate and useful diagnosis results. We also show that IOAgent is not tied to specific LLMs, performing similarly well with both proprietary and open-source LLMs. We believe IOAgent has the potential to become a powerful tool for scientists navigating complex HPC I/O subsystems in the future.
Rapid adoption of artificial intelligence (AI) in scientific computing requires new tools to evaluate I/O performance effectively. HDF5 is one of the data formats frequently used not only in HPC but also in modern AI applications. However, existing benchmarks are insufficient to address the current challenges posed by AI workloads. This paper introduces an extension to the existing HDF5 benchmark, called h5bench, by incorporating the workload characteristics from the MLPerf Storage - DLIO benchmark. This extension allows users to test AI workloads together with traditional HPC benchmarks in the same context without the complexities of installing various machine learning libraries. Our experimental analysis demonstrates that the extension can replicate the existing I/O patterns with easily customizable configurations to perform various scaling tests.
Garbage In Garbage Out is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest a considerable amount of time and effort in preparing the data for AI. However, there are no standard methods or frameworks for assessing the “readiness” of data for AI. To provide a quantifiable assessment of the readiness of data for AI processes, we define parameters of AI data readiness and introduce AIDRIN (AI Data Readiness INspector). AIDRIN is a framework covering a broad range of readiness dimensions available in the literature that aid in evaluating the readiness of data quantitatively and qualitatively. AIDRIN uses metrics in traditional data quality assessment such as completeness, outliers, and duplicates for data evaluation. Furthermore, AIDRIN uses metrics specific to assess data for AI, such as feature importance, feature correlations, class imbalance, fairness, privacy, and FAIR (Findability, Accessibility, Interoperability, and Reusability) principle compliance. AIDRIN provides visualizations and reports to assist data scientists in further investigating the readiness of data. The AIDRIN framework enhances the efficiency of the machine learning pipeline to make informed decisions on data readiness for AI applications.
I/O performance monitoring tools such as Darshan and Recorder collect I/O-related metrics on production systems and help understand the applications’ behavior. However, some gaps prevent end-users from seeing the whole picture when it comes to detecting and drilling down to the root causes of I/O performance slowdowns and where those problems originate. These gaps arise from limitations in the available metrics, their collection strategy, and the lack of translation to actionable items that could advise on optimizations. This paper highlights such gaps and proposes solutions to drill down to the source code level to pinpoint the root causes of I/O bottlenecks scientific applications face by relying on cross-layer analysis combining multiple performance metrics related to I/O software layers. We demonstrate with two real applications how metrics collected in high-level libraries (which are closer to the data models used by an application), enhanced by source-code insights and natural language translations, can help streamline the understanding of I/O behavior and provide guidance to end-users, developers, and supercomputing facilities on how to improve I/O performance. Using this cross-layer analysis and the heuristic recommendations, we attained up to 6.9× speedup from run-as-is executions.
Effectively leveraging the complex software and hardware I/O stacks of HPC systems to deliver needed I/O performance has been a challenging task for domain scientists. To identify and address I/O issues in their applications, scientists largely rely on I/O experts to analyze the recorded I/O traces of their applications and provide insights into the potential issues. However, due to the limited number of I/O experts and the growing demand for data-intensive applications across the wide spectrum of sciences, inaccessibility has become a major bottleneck hindering scientists from maximizing their productivity. Inspired by the recent rapid progress of large language models (LLMs), in this work we propose IO Navigator (ION), an LLM-based framework that takes a recorded I/O trace of an application as input and leverages the in-context learning, chain-of-thought, and code generation capabilities of LLMs to comprehensively analyze the I/O trace and provide diagnosis of potential I/O issues. Similar to an I/O expert, ION provides detailed justifications for the diagnosis and an interactive interface for scientists to ask detailed questions about the diagnosis. We illustrate ION's applicability by assessing it on a set of controlled I/O traces generated with different I/O issues. We also demonstrate that ION can match state-of-the-art I/O optimization tools and provide more insightful and adaptive diagnoses for real applications. We believe ION, with its full capabilities, has the potential to become a powerful tool for scientists to navigate through complex I/O subsystems in the future.
HPC workflows consist of multiple phases and components executed collaboratively to reach the same goal. They perform necessary computations and exchange data, of-ten through system-wide POSIX-compliant parallel file systems. However, POSIX file systems pose challenges in performance and scalability, prompting the development of alternative storage systems like object stores. Despite their potential, object stores face adoption barriers in HPC workflows due to their lack of workflow awareness and the structured nature of HPC data. This work presents a case study using the Proactive Data Containers (PDC), a framework focusing on object-centric runtime data management, to support a real-world astronomy workflow that runs on HPC systems, called Montage. Due to its user-space deployment feature, PDC is flexible to be adopted transparently with existing I/O libraries. This study explores the use of PDC with Montage's existing FITS-based I/O methods and discusses workflow-oriented optimizations such as caching, prefetching, and write aggregation, and provides insights and lessons learned throughout the porting process.
I/O operations are a known performance bottleneck of HPC applications. To achieve good performance, users often employ an iterative multistage tuning process to find an optimal I/O stack configuration. However, an I/O stack contains multiple layers, such as high-level I/O libraries, I/O middleware, and parallel file systems, and each layer has many parameters. These parameters and layers are entangled and influenced by each other. The tuning process is time-consuming and complex. In this work, we present TunIO, an AI-powered I/O tuning framework that implements several techniques to balance the tuning cost and performance gain, including tuning the high-impact parameters first. Furthermore, TunIO analyzes the application source code to extract its I/O kernel while retaining all statements necessary to perform I/O. It utilizes a smart selection of high-impact configuration parameters of the given tuning objective. Finally, it uses a novel Reinforcement Learning (RL)-driven early stopping mechanism to balance the cost and performance gain. Experimental results show that TunIO leads to a reduction of up to ≈73% in tuning time while achieving the same performance gain when compared to H5Tuner. It achieves a significant performance gain/cost of 208.4 MBps/min (I/O bandwidth for each minute spent in tuning) over existing approaches under our testing.
Parallel I/O is a critical technique for moving data between compute and storage subsystems of supercomputers. With massive amounts of data produced or consumed by compute nodes, high‐performant parallel I/O is essential. I/O benchmarks play an important role in this process; however, there is a scarcity of I/O benchmarks representative of current workloads on HPC systems. Toward creating representative I/O kernels from real‐world applications, we have created h5bench , a set of I/O kernels that exercise hierarchical data format version 5 (HDF5) I/O on parallel file systems in numerous dimensions. Our focus on HDF5 is due to the parallel I/O library's heavy usage in various scientific applications running on supercomputing systems. The various tests benchmarked in the h5bench suite include I/O operations (read and write), data locality (arrays of basic data types and arrays of structures), array dimensionality (one‐dimensional arrays, two‐dimensional meshes, three‐dimensional cubes), I/O modes (synchronous and asynchronous). In this paper, we present the observed performance of h5bench executed along several of these dimensions on existing supercomputers (Cori and Summit) and pre‐exascale platforms (Perlmutter, Theta, and Polaris). h5bench measurements can be used to identify performance bottlenecks and their root causes and evaluate I/O optimizations. As the I/O patterns of h5bench are diverse and capture the I/O behaviors of various HPC applications, this study will be helpful to the broader supercomputing and I/O community.
Manually diagnosing the I/O performance bottleneck for a single application (hereinafter referred to as the "job level'') is a tedious and error-prone procedure requiring domain scientists to have deep knowledge of complex storage systems. However, existing automatic methods for I/O performance bottleneck diagnosis have one major issue: the granularity of the analysis is at the platform or group level and the diagnosis results cannot be applied to the individual application. To address this issue, we designed and developed a method named "Artificial Intelligence for I/O" (AIIO), which uses AI and its interpretation technology to diagnose I/O performance bottlenecks at the job level automatically. By considering the sparsity of I/O log files, employing multiple AI models for performance prediction, merging diagnosis results across multiple models, and generalizing its performance prediction and diagnosis functions, AIIO can accurately and robustly identify the bottleneck of an even unseen application. Experimental results show that real and unseen applications can use the diagnosis results from AIIO to improve their I/O performance by at most 146 times.
Modern scientific applications utilize numerous software and hardware layers to efficiently access data. This approach poses a challenge for I/O optimization because of the need to instrument and correlate information across those layers. The Darshan characterization tool seeks to address this challenge by providing efficient, transparent, and compact runtime instrumentation of many common I/O interfaces. It also includes command-line tools to generate actionable insights and summary reports. However, the extreme diversity of today’s scientific applications means that not all applications are well served by one-size-fits-all analysis tools. In this work we present PyDarshan, a Python-based library that enables agile analysis of I/O performance data. PyDarshan caters to both novice and advanced users by offering ready-to-use HTML reports as well as a rich collection of APIs to facilitate custom analyses. We present the design of PyDarshan and demonstrate its effectiveness in four diverse real-world analysis use cases.
The high-performance computing I/O stack has been complex due to multiple software layers, the inter-dependencies among these layers, and the different performance tuning options for each layer. In this complex stack, the definition of an “I/O access pattern” has been reappropriated to describe what an application is doing to write or read data from the perspective of different layers of the stack, often comprising a different set of features. It has become common to have to redefine what is meant when discussing a pattern in every new study, as no assumption can be made. This survey aims to propose a baseline taxonomy, harnessing the I/O community’s knowledge over the past 20 years. This definition can serve as a common ground for high-performance computing I/O researchers and developers to apply known I/O tuning strategies and design new strategies for improving I/O performance. We seek to summarize and bring a consensus to the multiple ways to describe a pattern based on common features already used by the community over the years.
Scientific computing workloads at HPC facilities have been shifting from traditional numerical simulations to AI/ML applications for training and inference while processing and producing ever-increasing amounts of scientific data. To address the growing need for increased storage capacity, lower access latency, and higher bandwidth, emerging technologies such as non-volatile memory are integrated into supercomputer I/O subsystems. With these emerging trends, we need a better understanding of the multilayer supercomputer I/O systems and ways to use these subsystems efficiently. In this work, we study the I/O access patterns and performance characteristics of two representative supercomputer I/O subsystems. Through an extensive analysis of year-long I/O logs on each system, we report new observations in I/O reads and writes, unbalanced use of storage system layers, and new trends in user behaviors at the HPC I/O middleware stack.
The complex software and hardware I/O stack of HPC platforms makes it challenging for end-users to obtain superior I/O performance and to understand the root causes of I/O bottlenecks they encounter. Despite the continuous efforts from the community to profile I/O performance and propose new optimization techniques and tuning options for improving the performance, there is still a translation gap between profiling and tuning. In this paper, we propose Drishti, a solution to guide scientists in optimizing I/O in their applications by detecting typical I/O performance pitfalls and providing recommendations. We illustrate its applicability in two case studies and evaluate its robustness and performance by summarizing the issues detected in over a hundred thousand Darshan logs collected on the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC). Drishti can empower end-users and guide them in the I/O optimization journey by shedding some light on everyday I/O performance pitfalls and how to fix them.
I/O forwarding is a well-established and widely-adopted technique in HPC to reduce contention in the access to storage servers and transparently improve I/O performance. Rather than having applications directly accessing the shared parallel file system, the forwarding technique defines a set of I/O nodes responsible for receiving application requests and forwarding them to the file system, thus reshaping the flow of requests. The typical approach is to statically assign I/O nodes to applications depending on the number of compute nodes they use, which is not always necessarily related to their I/O requirements. Thus, this approach leads to inefficient usage of these resources. This paper investigates arbitration policies based on the applications I/O demands, represented by their access patterns. We propose a policy based on the Multiple-Choice Knapsack problem that seeks to maximize global bandwidth by giving more I/O nodes to applications that will benefit the most. Furthermore, we propose a user-level I/O forwarding solution as an on-demand service capable of applying different allocation policies at runtime for machines where this layer is not present. We demonstrate our approach’s applicability through extensive experimentation and show it can transparently improve global I/O bandwidth by up to 85% in a live setup compared to the default static policy.
High-Performance Computing (HPC) platforms are used to solve the most diverse scientific problems in research areas, such as biology, chemistry, physics, and health sciences. Researchers use a multitude of scientific software, which have different requirements. These requirements include input and output operations, which directly impact performance due to the existing difference in processing and data access speeds. Thus, supercomputers must efficiently handle a mixed workload scenario when storing data from the applications. Knowledge of the application set and its performance running in a supercomputer is needed to understand the storage system's usage, pinpoint possible bottlenecks, and guide optimization techniques. This research proposes a methodology and visualization tool to evaluate a supercomputer's data storage infrastructure's performance, taking into account the diverse workload and demands of the system over a long period of operation. As a study case, we focus on the Santos Dumont supercomputer, where we were able to identify inefficient usage and problematic factors of performance.
Using parallel file systems efficiently is a tricky problem due to inter-dependencies among multiple layers of I/O software, including high-level I/O libraries (HDF5, netCDF, etc.), MPI-IO, POSIX, and file systems (GPFS, Lustre, etc.). Profiling tools such as Darshan collect traces to help understand the I/O performance behavior. However, there are significant gaps in analyzing the collected traces and then applying tuning options offered by various layers of I/O software. Seeking to connect the dots between I/O bottleneck detection and tuning, we propose DXT Explorer, an interactive log analysis tool. In this paper, we present a case study using our interactive log analysis tool to identify and apply various I/O optimizations. We report an evaluation of performance improvement achieved for four I/O kernels extracted from science applications.
In this paper, we present an approach to adapt the HPC I/O forwarding layer to the application access patterns. I/O forwarding is a technique used in most supercomputers today to alleviate contention in the access to the shared storage infrastructure. Because of its location, between processing nodes and parallel file system servers, it has been used to propose optimization techniques such as request reordering, aggregation, and scheduling. Such techniques can usually provide good results only for some of the situations, or depend on the right choice of parameter values. Our case study for this work is the TWINS request scheduling algorithm, which aims at coordinating the access of intermediate I/O nodes to the data servers. Our approach uses a neural network to classify application access patterns, and a reinforcement learning technique to empower the scheduler to learn the best parameter values to each access pattern during its execution, without the need of a previous training phase. Our evaluation of the access pattern detection neural network shows average precision of 98% during write experiments, and minimum precision of 98% during reads. The latter is an important result as most performance improvements by TWINS were observed for read experiments. Furthermore, we demonstrate that our contextual bandit strategy is able to learn the best value for the window size, achieving approximately 75% of precision-98% of the performance provided by the best window size-in the first hundreds of steps.
É crescente o número de plataformas online que disponibilizam exercícios de programação, onde os estudantes submetem a resolução destes exercícios e recebem um feedback automático do sistema, sem intervenção humana. Esses ambientes permitem o registro de muitos aspectos das submissões e, dessa forma, os modelos de avaliação educacional podem ser utilizados para inferir as habilidades trabalhadas em cada solução. Neste trabalho apresentamos uma análise comparativa de três modelos que estimam a habilidade dos estudantes: Elo, Teoria de Resposta ao Item (TRI) e M-ERS (Multidimensional Extension of the ERS). O Elo foi desenvolvido para classificar jogadores de xadrez, através do histórico de jogo, mas foi adaptado para estimar a habilidade dos estudantes através do histórico de submissões dos problemas. A TRI estima a habilidade através de um conjunto de respostas dadas a um conjunto de itens, existem alguns modelos de TRI que variam de acordo com o tipo de resposta. M-ERS é uma adaptação do Elo e TRI que combina os dois modelos e rastreia as múltiplas habilidades dos estudantes. Os modelos Elo, TRI de 2 parâmetros, TRI de resposta gradual e o M-ERS foram aplicados em uma base de dados disponibilizada por uma plataforma Online Judge. Os resultados obtidos apontam diferenças entre os modelos em relação às habilidades estimadas, diferenças que acredita-se estar relacionadas à forma com que cada modelo estima os parâmetros.
I/O operations are the bottleneck of several applications due to the difference between processing and data access speeds. Hence, understanding the I/O behavior is vital to find problems and propose solutions. Thus, identifying and characterizing the I/O access pattern is important, since it reflects directly on applications' performance. With this premise, we propose an I/O characterization approach that uses unsupervised learning to cluster jobs with similar I/O behavior, using information from high-level aggregated traces. As a case study, we apply our approach on four months of activity - a total of 28, 938 jobs - from the Intrepid supercomputer located at Argonne Laboratory. Our experimental results show that nine access patterns represent the I/O behavior in 73% of the clusters. From these nine patterns, we learn some aspects about the I/O such as the most accesses patterns are made using POSIX and small requests, also, the most patterns are accessing unique files. Lastly, analyzing the I/O workload over four months, we can notice that it is composed by several applications that spend a short time on I/O activity, but when compared to the others, the total I/O time represents a greater portion of the overall system.
In this paper, we seek to guide optimization and tuning strategies by identifying the application's I/O access pattern. We evaluate three machine learning techniques to automatically detect the I/O access pattern of HPC applications at runtime: decision trees, random forests, and neural networks. We focus on the detection using metrics from file-level accesses as seen by the clients, I/O nodes, and parallel file system servers. We evaluated these detection strategies in a case study in which the accurate detection of the current access pattern is fundamental to adjust a parameter of an I/O scheduling algorithm. We demonstrate that such approaches correctly classify the access pattern, regarding file layout and spatiality of accesses – into the most common ones used by the community and by I/O benchmarking tools to test new I/O optimization – with up to 99% precision. Furthermore, when applied to our study case, it guides a tuning mechanism to achieve 99% of the performance of an Oracle solution.
In this article, we study the I/O performance of the Santos Dumont supercomputer, since the gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. For a large-scale expensive supercomputer, it is essential to ensure applications achieve the best I/O performance to promote efficient usage. We monitor a week of the machine’s activity and present a detailed study on the obtained metrics, aiming at providing an understanding of its workload. From experiences with one numerical simulation, we identified large I/O performance differences between the MPI implementations available to users. We investigated the phenomenon and narrowed it down to collective I/O operations with small request sizes. For these, we concluded that the customized MPI implementation by the machine’s vendor (used by more than 20% of the jobs) presents the worst performance. By investigating the issue, we provide information to help improve future MPI-IO collective write implementations and practical guidelines to help users and steer future system upgrades. Finally, we discuss the challenge of describing applications I/O behavior without depending on information from users. That allows for identifying the application’s I/O bottlenecks and proposing ways of improving its I/O performance. We propose a methodology to do so, and use GROMACS, the application with the largest number of jobs in 2017, as a case study.
In this paper, we propose a pattern matching approach for server-side access pattern detection for the HPC I/O stack. More specifically, our proposal concerns file-level accesses, such as the ones made to I/O libraries, I/O nodes, and the parallel file system servers. The goal of this detection is to allow the system to adapt to the current workload. Compared to existing detection techniques, ours differ by working at run-time and on the server side, where detailed application information is not available since HPC I/O systems are stateless, and without relying on previous traces. We build a time series to represent accesses spatiality, and use a pattern matching algorithm, in addition to an heuristic, to compare it to known patterns. We detail our proposal and evaluate it with two case studies – situations where detecting the current access pattern is important to select the best scheduling algorithm or to tune a fixed algorithm parameter. We show our approach has good detection capabilities, with precision of up to 93% and recall of up to 99%, and discuss all design choices.
This research-to-practice full paper proposals a metric of multiple skills for learning of programming students. This kind of system often need to diagnose the student’s skill level. In the same way it needs to know the level of difficulty learning objects in its database. Such information makes it possible to make an appropriate match between student and the learning object. To model such tasks, we have adapted the ELO technique to apply a matchmaking process similar to that used in choosing opponents in chess tournaments or online matches. We used as a case study a virtual learning environment which has a repository with programming problems and the users interaction log. In this work we propose an extension to the traditional ELO model. In the classical model, ELO is a scalar value for each student and for each learning object. The extended model considers ELO as a multidimensional quantity, where each dimension is a skill in solving programming problems. The enumeration of the skills was made using the literature as well as statistical data of relevance of the attributes. The results are presented in this work.
This paper presents an energy efficiency and I/O performance analysis of low‐power architectures when compared to conventional architectures, with the goal of studying the viability of using them as storage servers. Our results show that despite the fact the power demand of the storage device amounts for a small fraction of the power demand of the whole system, significant increases in power demand are observed when accessing the storage device. We investigate the access pattern impact on power demand, looking at the whole system and at the storage device by itself, and compare all tested configurations regarding energy efficiency. Then we extrapolate the conclusions from this research to provide guidelines for when considering the replacement of traditional storage servers by low‐power alternatives. We show the choice depends on the expected workload, estimates of power demand of the systems, and factors limiting performance. These guidelines can be applied for other architectures than the ones used in this work.
One of the main challenges while teaching algorithms and data structures is the transition from the abstract logic of the algorithm, which a student understands, to a programming language, that a computer can understand. This change in paradigm can be troublesome when coding an algorithm for the first time. Many alternatives provide different ways to smooth this process. However, most of these tools focus mainly on the logic of the program and the concept of the algorithm. The goal of this paper is to present a work that aims at providing a tool to translate the logic of an algorithm into an implementation, allowing a smooth transition between paradigms. To evaluate our proposal, we integrated the Blockly API from Google to the URI Online Judge platform. Therefore, we provide a valuable asset to the student, by making the learning to code into a more dynamic, visual, and interactive process. Resumo. Uma das principais dificuldades no ensino de algoritmos e estrutura de dados é a tradução da lógica abstrata do algoritmo, que o estudante compreende, para uma linguagem de programação, que o computador possa entender. Essa mudança de paradigma pode trazer certas dificuldades durante as primeiras implementações. Existem diversas alternativas para facilitar e intermediar esse processo. No entanto, a maioria delas foca na lógica e na concepção do algoritmo. O objetivo deste trabalho é apresentar uma nova opção para esse intermédio, através de uma ferramenta que traduz essa lógica em código, tornando essa mudança de paradigma mais gradual. Para isso, nós integramos a API Blockly do Google à plataforma URI Online Judge. Desta forma propomos uma valiosa ferramenta para o aluno, ajudando a tornar o processo de aprendizagem de programação mais dinâmico, gráfico e intuitivo.
This paper describes our research to provide high performance I/O for seismic wave propagation simulations. Earthquake early warning systems are designed to provide near real-time prediction of strong ground motion. Such systems are crucial tools for risk mitigation and disaster prevention. The ability to accurately and quickly simulate the propagation of seismic waves in complex media lies at the heart of such systems. Besides the processing requirements, it is important for seismic simulations to leverage a high-performance storage infrastructure to output results as frequently as possible, so they can be used for the decision-making process. We propose and evaluate a series of I/O optimizations to the Ondes3D seismic wave propagation simulation, considering its different types of output files separately. These optimizations are designed while keeping the previous output formats, in order not to compromise the application interaction with the other parts of the earthquake early warning system. The optimization techniques presented in this paper have provided I/O performance improvements of up to 85% and decreased the application execution time up to 70%.
This paper presents a study of I/O scheduling techniques applied to the I/O forwarding layer. In high-performance computing environments, applications rely on parallel file systems (PFS) to obtain good I/O performance even when handling large amounts of data. To alleviate the concurrency caused by thousands of nodes accessing a significantly smaller number of PFS servers, intermediate I/O nodes are typically applied between processing nodes and the file system. Each intermediate node forwards requests from multiple clients to the system, a setup which gives this component the opportunity to perform optimizations like I/O scheduling. We evaluate scheduling techniques that improve spatiality and request size of the access patterns. We show they are only partially effective because the access pattern is not the main factor for read performance in the I/O forwarding layer. A new scheduling algorithm, TWINS, is presented to coordinate the access of intermediate I/O nodes to the data servers. Our proposal decreases concurrency at the data servers, a factor previously proven to negatively affect performance. The proposed algorithm is able to improve read performance from shared files by up to 28% over other scheduling algorithms and by up to 50% over not forwarding I/O.
Open-source tools for HPC I/O analysis, benchmarking, and data readiness.
Rule-based I/O analysis tool connecting Darshan logs to actionable optimization recommendations.
Unified benchmark suite for HDF5 I/O performance on pre-exascale platforms and AI workloads.
Interactive visualization of Darshan Extended Traces to understand I/O behavior and bottlenecks.
AI Data Readiness Inspector — assesses how ready scientific datasets are for AI model training.
Proactive Data Containers — object-oriented data abstraction for exascale storage hierarchies.
Current and past collaborative research efforts.
Enhancing AI Data Readiness in Scientific Data: Integration, Automation, and Human-in-the-Loop Approaches
Institute for Artificial Intelligence, Computer Science, and Data
Institute for Artificial Intelligence, Computer Science, and Data
Scientific Workflow Applications on Resilient Metasystem
End-to-end Object-focused Software-defined Data Management for Science
Transformational AI Model Consortium
Exascale Computing Project
High Performance Computing for Energy
Community leadership, editorial roles, program committees, and volunteering.
| Year | Venue | Role |
|---|---|---|
| 2026 | SC'26 | Reproducibility Initiative Chair |
| 2026 | HPDC'26 | Workshops Co-chair |
| 2026 | SBAC-PAD'26 | Distributed Systems, Networking & Storage Track Co-chair |
| 2026 | ICPP'26 | Demo Co-chair |
| 2026 | ICPP'26 | Performance Track PC |
| 2026 | ESSA'26 | PC Member |
| 2026 | CHEOPS'26 | PC Member |
| 2025 | DRAI'25 | Data Readiness for AI Workshop Co-Chair |
| 2025 | WISDOM'25 | Workshops, Intelligent Scientific Data & Optimization Co-Chair |
| 2025 | SC'25 | New Volunteers Chair (Inclusivity) |
| 2025 | SC'25 | Finance Liaison |
| 2025 | SC'25 | IO500 BoF Organizer |
| 2025 | SSDBM'25 | Proceedings Chair |
| 2025 | SBAC-PAD'25 | Publicity Chair |
| 2025 | CCGrid'25 | SCALE Challenge Track Co-chair |
| 2024 | SC'24 | Reproducibility Challenge Co-chair |
| 2024 | CARLA'24 | Program Committee |
| 2024 | PDSW'24 | Reproducibility Co-chair |
| 2024 | SBAC-PAD'24 | Workshop Co-chair |
| 2024 | SBAC-PAD'24 | Workshop Proceedings Chair |
| 2024 | SSDBM'24 | Proceedings Chair |
| 2024 | CCGrid'24 | SCALE Challenge Track Co-chair |
| 2024 | HPDC'24 | Technical Program Committee |
| 2023 | Cluster'23 | Programming and System Software Papers PC |
| 2023 | HiPC'23 | Student Research Symposium Committee |
| 2023 | SC'23 | Reproducibility Challenge |
| 2023 | SC'23 | AD/AE Submissions |
| 2023 | PDSW'23 | Technical Program Committee |
| 2023 | CCGrid'23 | Artifact Evaluation & SCALE Challenge PC |
| 2023 | FIE'23 | Reviewer |
| 2023 | CHEOPS'23 | PC Member |
| 2023 | ERAD/RS | Fórum de Pós-Graduação |
| 2022 | SSDBM'22/'23 | PC Member |
| 2022 | REX-IO'22 | PC Member |
| 2022 | CCGrid'22 | Storage and I/O Systems PC |
| 2022 | PDSW'22 | Publicity Chair |
| 2022 | HPCC'22 | Technical Program Committee |
| 2022 | SC'22 | Students@SC PC |
| 2022 | SC'22 | AD/AE Submissions PC |
| 2022 | ERAD/RS | Fórum de Iniciação Científica |
| 2021 | SC'21 | Student Volunteers (Reviewers, Logistics) |
| 2021 | SC'21 | AD/AE Submissions |
| 2021 | REX-IO'21 | PC Member |
| 2021 | ERAD/RS | Fórum de Iniciação Científica |
| 2020 | WSPPD'20 | PC Member |
| 2020 | ERAD/RS | Fórum de Iniciação Científica |
| 2019 | WSPPD'19 | PC Member |
| 2018 | WSPPD'18 | PC Member |
| 2017 | WSPPD'17 | PC Member |
| 2016 | WSPPD'16 | PC Member |
| 2015 | WSPPD'15 | PC Member |
Entrepreneurship and impact beyond academia.
Co-founder (with Neilor Tonin)
Educational platform for learning algorithms and programming, created at Universidade Regional Integrada (URI), Brazil. 500K+ registered users and 20M+ source-code submissions automatically evaluated. Awarded Prêmio Santander Universidades for its impact on education. Presented at ACM ICPC World Finals (St. Petersburg 2013, Phuket 2016).
Co-founder (with João Lúcio de Azevedo Filho, Celso Ricardo Barboza, Juliana Müller, and Neilor Tonin)
URI Online Judge evolved into beecrowd, expanding from an academic tool into a platform connecting companies, educational institutions, and tech talent. Largest developers' community in Latin America. Received $1M USD seed investment.