Browse Category: Computer > Software

[Search within category]

RealWorldPlay: Physical AI In-Situ Revisited

Achieving seamless robotic interaction with physical environments requires a sophisticated blend of sensory perception and logical reasoning. UC Berkeley researchers have developed "RealWorldPlay," a physical artificial intelligence system designed to enhance robotic action through a unified multimodal reasoning framework. The system integrates a visuo-tactile policy—combining sight and touch—with a large language model (LLM) that provides real-time verification feedback and strategic planning. By utilizing a "world model" to generate self-training data, the platform allows robots to autonomously set goals and learn from simulated scenarios, ensuring that their physical actions are both reasoned and verified before execution.

Assessing the Structural Health of Buildings Using Smartphones and Ambient Vibration

Monitoring the structural integrity of buildings traditionally requires expensive, specialized sensor networks that are difficult to deploy at scale. UC Berkeley researchers have developed a novel approach that leverages the existing network of smartphones equipped with the MyShake earthquake early warning application. By utilizing the highly sensitive accelerometers within millions of consumer devices, the system measures the natural frequencies and damping ratios of buildings through ambient vibrations. This crowdsourced data provides a real-time, large-scale assessment of structural health across entire urban environments. The platform effectively transforms everyday mobile devices into a distributed seismic monitoring array, allowing for continuous observation of building performance without the need for dedicated hardware installations.

InferBiome: Inferring Gut Microbiome States from Stool Microbiome Data

Traditional stool samples provide an indirect and often "blurred" snapshot of the complex microbial environment within the human gut, making it difficult to design precise health interventions. UC Berkeley researchers have developed InferBiome, a computational framework that reconstructs the actual state of the gut microbiome from stool data. By inverting a blurring model and applying a probability-based simulation of microbiome dynamics, the system predicts how different dietary interventions will impact an individual's unique gut ecosystem. This method allows for the selection of personalized dietary recommendations that maximize host health benefits by simulating outcomes across various possible microbiome states.

PEINT (Protein Evolution IN Time)

UC Berkeley researchers have developed a sophisticated computer-implemented framework that leverages transformer architectures to model the evolution of biological sequences over time. Unlike traditional phylogenetic models that often assume sites evolve independently, this framework utilizes a coupled encoder-decoder transformer to parameterize the conditional probability of a target sequence given multiple unaligned sequences. By capturing complex interactions and dependencies across different sites within a protein or genomic sequence, the model estimates the transition likelihood for each position. This estimation allows for a high-fidelity simulation of evolutionary trajectories. This approach enables a deeper understanding of how proteins change across different timescales and environmental pressures.

Realtime Transformation Of Voice Identity And Style

Converting voice identity in real-time while maintaining perfect linguistic clarity and emotional nuance is a significant hurdle in speech synthesis. Researchers at UC Berkeley have developed a system for real-time voice style conversion that transforms a source speaker's speech to match the timbre, accent, and emotion of a target speaker. The technology utilizes a content extraction network with conformer blocks and a unique low-dimensional quantization method—using fewer than 100 levels—to preserve linguistic fidelity. By extracting continuous representations before quantization, the system maintains higher speech quality than traditional discrete methods. A diffusion-based generation network then creates a mel-spectrogram conditioned on these features and a target style embedding, which is finally converted to audio via a vocoder. The system is designed for streaming operation through the use of chunked-causal attention mechanisms, enabling near-instantaneous transformation.

Learning Multimodal Sim-To-Real Robot Policies With Generative Audio

The deployment of robotic systems in real-world environments is often limited by the "sim-to-real gap," where policies trained in digital simulations fail to account for the complex, multisensory feedback of physical reality. Researchers at UC Berkeley have developed a novel method for training multimodal sim-to-real robot policies by integrating generative audio models with traditional physics-based simulators. This framework uses a generative model to synthesize realistic audio data that corresponds to simulated physical interactions, creating a rich, multimodal dataset for policy learning. By training on both simulated physics and generated sensory data, the system enables robots to develop more robust and adaptive behaviors that translate seamlessly from virtual training environments to complex real-world tasks.

Deep Learning System To Improve Diagnostic Accuracy For Real-Time Quantitative Polymerase Chain Reaction Data

Manual interpretation of real-time quantitative PCR (RT-qPCR) data is prone to human error, noise, and variability, leading to potential misdiagnosis or test redundancies. UC Berkeley researchers have developed a novel deep learning framework that significantly improves diagnostic accuracy by fusing Long Short-Term Memory (LSTM) networks with Vision Transformers (ViT). This hybrid architecture captures both sequential fluorescence patterns and structural amplification dynamics from raw time-series data and image-based renderings. By leveraging a uniquely curated dataset of over 24,000 verified samples, the system accurately discriminates between true-positive and true-negative samples, predicts viral dilutions, and forecasts patient re-test outcomes, providing an objective tool for early triage and increased laboratory throughput.

Machine Learning Framework for Inferring Latent Mental States from Digital Activity (MILA)

Scalable assessments of mental illness, the leading driver of disability worldwide, remain a critical roadblock toward accessible and equitable care. Researchers at UC Berkeley have introduced MAILA (MAchine-learning framework for Inferring Latent mental states from digital Activity), an innovation demonstrating that everyday human-computer interactions encode multiple dimensions of self-reported mental health and their changes over time. MAILA was trained to predict 1.3 million mental-health self-reports from 20,000 cursor and touchscreen recordings, identifying cognitive signatures of psychological function that go beyond what is conveyed by language. Key features and benefits include the ability to track dynamic mental states along three orthogonal dimensions, achieve near-ceiling accuracy in group-level predictions, and translate insights from general to clinical populations to identify individuals with self-reported mental illness.

Articulatory Feedback For Phonetic Error-Based Pronunciation Training

Accurate automatic pronunciation assessment, particularly the core subtask of phonetic error detection, is significantly hampered by speech variability stemming from accents and dysfluencies, which current models fail to capture effectively. This innovation, developed by UC Berkeley researchers, addresses this by disclosing a verbatim phoneme recognition framework specifically designed to transcribe what speakers actually say rather than what they are supposed to say . The framework uses multi-task training combined with novel phoneme similarity modeling. The present disclosure also includes the development and open-sourcing of VCTK-accent, a simulated dataset containing phonetic errors, and proposes two novel metrics for assessing pronunciation differences. This work establishes a new, more accurate benchmark for phonetic error detection, enabling more precise and effective articulatory feedback for pronunciation training.

Ucbshift 2.0

The identification of chemical shifts is a foundational step in determining a protein's three-dimensional structure via Nuclear Magnetic Resonance (NMR) spectroscopy. Current computational methods often struggle with accuracy and efficiency, particularly in handling the complex influence of protein side chains on shift values. UCBShift 2.0, a technology by UC Berkeley researchers, addresses this critical bottleneck by providing a highly accurate chemical shifts identifier. This innovation is a computational tool that includes a sequence transfer predictor for initial protein analysis, a novel machine learning module  to predict side chain shifts, and a regressor that combines these outputs to produce a highly accurate predicted chemical shift for the entire protein. By specifically leveraging augmented feature extraction that includes side chain information, UCBShift 2.0 achieves greater predictive power and speed compared to existing methods, streamlining the time-consuming process of protein structure determination.

Monitoring Building Structural Health Using Smartphones And Ambient Vibrations

Traditional methods for monitoring a building's structural health, particularly its natural frequencies and damping ratios, typically rely on expensive, permanently installed sensor systems, which are not widely accessible. This innovation, developed by UC Berkeley researchers, provides a highly scalable and cost-effective method for Monitoring Building Structural Health using Smartphones and Ambient Vibrations. The method leverages smartphones equipped with the MyShake earthquake early warning application to measure the ambient vibrations of a building. By analyzing these vibrations, the application can accurately determine key structural health parameters, namely the building's natural frequencies and damping ratios. This technique transforms readily available personal devices into powerful structural monitoring tools, offering a vastly more accessible and lower-cost solution than existing dedicated sensor networks.

INFE²R (INversion for Fine-scale Emissions and Exposure Refinement)

Traditional air quality monitoring often lacks the resolution to pinpoint specific emission sources within a city, leaving "hyperlocal" pollution spikes undetected. To address this, researchers at UC Berkeley have developed INFE²R, a sophisticated method for detecting and refining airborne pollutant emissions at a neighborhood scale. The system utilizes a Weather Research and Forecasting (WRF) module to generate high-resolution meteorological inputs, which are then processed through a Stochastic Time Inverted Lagrangian Transport (STILT) module to create a source-receptor transfer matrix. By combining prior emission estimates with a cross-dimensional assimilation of both fixed and mobile sensor measurements, the platform employs Bayesian inversion to generate highly accurate posterior emission estimates. This allows for a granular understanding of how pollutants move and accumulate in specific urban localities.

In-Context Learning Enables Robot Action Prediction in LLMs

Bridging the gap between linguistic reasoning and physical execution, UC Berkeley researchers have developed a method to enable robotic devices to predict complex actions using in-context learning (ICL). By leveraging the inherent reasoning capabilities of Large Language Models (LLMs), this approach allows a robot to translate natural language instructions into sequential motor actions without the need for task-specific fine-tuning or intensive retraining. The system allows the robot to generalize to new, unseen tasks on the fly. This breakthrough shifts robot programming away from rigid coding toward a more flexible, intuitive interaction where the machine "understands" the intended goal by drawing parallels from the provided examples.

Llarva: Vision-Action Instruction Tuning Enhances Robot Learning

Bridging the gap between a language model’s next-word prediction and physical robot control, researchers at UC Berkeley have developed LLARVA (Large Language model for Robotic Vision and Action). This model utilizes a novel vision-action instruction tuning method that allows a robotic device to handle various tasks and environments without task-specific fine-tuning.

Humanoid Locomotion As Next Token Prediction

Advancing the field of robotic agility, this technology treats the complex challenge of bipedal balance and movement as a generative sequence problem. By framing physical movement similarly to language modeling, UC Berkeley researchers have developed a system where a humanoid robot predicts its next motor action as a "next token" based on a vast history of sensorimotor trajectories. The model is trained on diverse data, including real-world robotic walks and simulated movements, allowing it to anticipate the necessary joint adjustments and equilibrium shifts in real-time. This approach enables the robot to navigate uneven terrain and respond to external perturbations with a level of fluidity and adaptability that traditional, rigidly programmed control laws often struggle to achieve.

Pre-Training Auto-Regressive Robotic Models With 4D Representations

Current methods for training robotic policies often struggle with efficiently learning from rich, time-varying visual data, leading to brittle and data-intensive solutions. This innovation, developed by UC Berkeley researchers, addresses this challenge by introducing a robotic system that utilizes four-dimensional (4D) representations estimated directly from videos to pre-train and test an auto-regressive machine learning transformer model. By explicitly encoding space and time in a unified representation, the system allows the transformer model to leverage a much richer context than standard 2D image or 3D point cloud approaches, facilitating the learning of complex, long-horizon tasks and improving the generalization capabilities of the resulting policy. The use of 4D representations significantly enhances the policy's understanding of the dynamic environment and object interactions compared to existing alternatives, enabling more robust and efficient training of robotic systems.

Realtime Transformation Of Voice For Privacy Protection

The technology, known as Speech Articulatory Coding (SPARC), is a neural encoding-decoding framework for speech. It works by inferring articulatory features from audio and then synthesizing new speech from those features. The system effectively disentangles the speaker's identity from the speech's articulation, enabling accent-preserving voice conversion and providing a foundation for real-time voice privacy protection.

Inverse Designing Metamaterials With Programmable Nonlinear Functional Responses

Current methods for designing metamaterials to achieve a specific, complex physical response curve are often time-consuming, computationally intensive, and struggle with precisely programming nonlinear functional responses. This innovation, developed by UC Berkeley researchers, addresses this by offering a novel, accelerated inverse design method that leverages a hybrid machine learning approach combining imitation learning and reinforcement learning with Monte Carlo tree search (MCTS). This unique combination allows for the rapid and precise generation of metamaterial structures that meet a plurality of target physical response features, significantly outperforming traditional iterative or purely generative design methods in efficiency and programmability. The resulting metamaterial designs exhibit highly programmable and non-intuitive functional properties.

Latent Ewald Summation For Machine Learning Of Long-Range Interactions

      Molecular dynamics (MD) is a computational materials science modality widely used in academic and industrial settings for materials discovery and more. A critical aspect of modern MD calculations are machine learning interatomic potentials (MLIPs), which learn from reference quantum mechanical calculations and predict the energy and forces of atomic configurations quickly. MLIPs allow for more accurate and comprehensive exploration of material/molecular properties at-scale. However, state-of-the-art MLIP methods mostly use a short-range approximation, which may be sufficient for describing properties of homogeneous bulk systems but fail for liquid-vapor interfaces, dielectric response, dilute ionic solutions with Debye-Huckel screening, and interactions between gas phase molecules. Short-range MLIPs neglect all long-range interactions, such as Coulomb and dispersion interactions.      To address the current shortcoming, UC Berkeley researchers have developed a straightforward and efficient algorithm to account for long-range interactions in MLIPs. The algorithm can predict system properties including those with charged, polar or apolar molecular dimers, bulk water, and water-vapor interfaces. In these cases standard short-range MLIPs lead to unphysical predictions, even when utilizing message passing algorithms. The present method eliminates artifacts while only about doubling the computational cost. Furthermore, it can be incorporated into most existing MLIP architectures, including potentials based on local atomic environments such as HDNPP, Gaussian Approximation Potentials (GAP), Moment Tensor Potentials (MTPs), atomic cluster expansion (ACE), and MPNN (e.g., NequIP, MACE).

Active Tuning Of Resonant Switched-Capacitor Converters For Soft Switching Opera

Resonant switched-capacitor (ReSC) converters are increasingly favored in high-density power applications, such as data centers and telecommunications, due to their ability to achieve high efficiency with minimal passive component volume. However, component tolerances and variable operating conditions often cause these converters to deviate from ideal soft-switching points, leading to increased switching losses. Researchers at UC Berkeley have developed a closed-loop autotuning control technique that dynamically modulates switching frequency and duty cycle to maintain optimal soft-switching. The system senses voltage nodes, such as inductor switch nodes, during transitions to detect incomplete zero-current switching (ZCS) or zero-voltage switching (ZVS). By independently tuning the duration of each switching phase based on this real-time feedback, the controller ensures the converter maintains near-peak efficiency across a wide range of loads and component variations.

Thermal Test Vehicle For Electronics Cooling Solutions

As the power density of modern integrated circuits—such as GPUs, CPUs, and NPUs—rapidly escalates, traditional cooling characterization methods have become insufficient for validating next-generation thermal management. UC Berkeley researchers have developed a flexible and scalable Thermal Test Vehicle (TTV) designed to simulate the complex heat profiles of high-performance electronics. Built on an array of individually controllable power transistors and integrated measurement circuitry, this TTV acts as a "thermal twin" for advanced processors. An onboard computer manages real-time feedback and control, allowing the vehicle to emulate specific hotspots and dynamic power loads. This enables engineers to precisely characterize the performance of air, liquid, and immersion cooling solutions under diverse and extreme operating conditions without the risk or cost of using actual silicon.

Metabiome: Metabolic Network And Biofilm Modeling Of The Gut Microbial

The human gut microbiome exists largely within complex, multi-species biofilms where spatial organization and local nutrient gradients dictate microbial behavior and health outcomes. Traditional genomic modeling often fails to account for these physical and spatial constraints. Researchers at UC Berkeley have developed MetaBiome, an innovative multiscale framework that couples genome-scale metabolic models (GEMs) with agent-based and continuum modeling. By employing a systematic bottom-up approach, MetaBiome identifies the critical interrelationships between local substrate transport and dynamic biofilm characteristics. This framework enables researchers to translate raw genomic data into a deep understanding of microscale biofilm properties, providing a predictive window into how species interact and compete within the gut’s physical environment.

Multiplex Network Science And Multiscale System Dynamics

UC Berkeley researchers have developed a sophisticated hybrid modeling framework that integrates Multiplex Network Science (MNS) with Multiscale System Dynamics (MSD). The MNS component maps physical infrastructure as a network of layers to identify external dependencies, while the MSD component models the installation's internal nested subsystems and resource flows. By linking these through defined mathematical boundary conditions, the hybrid model can simulate how an alteration in the external environment—such as a power grid failure or supply chain disruption—propagates through the system. This allows for high-fidelity forecasting of cascading effects, enabling operators to identify vulnerabilities and optimize resilience strategies for critical installations.

Next Generation Of Emergency System Based On Wireless Sensor Network

         Recent mass evacuation events, including the 2018 Camp Fire and 2023 Maui Fire, have demonstrated shortcomings in our communication abilities during natural disasters and emergencies. Individuals fleeing dangerous areas were unable to obtain fast or accurate information pertaining to open evacuation routes and faced traffic gridlocks, while nearby communities were unprepared for the emergent situation and influx of persons. Climate change is increasing the frequency, areas subject to, and risk-level associated with natural hazards, making effective communication channels that can operate when mobile network-based systems and electric distribution systems are compromised crucial.         To address this need UC Berkeley researchers have developed a mobile network-free communication system that can function during natural disasters and be adapted to most communication devices (mobile phones and laptops). The self-organized, mesh-based and low-power network is embedded into common infrastructure monitoring device nodes (e.g., pre-existing WSN, LoRa, and other LPWAN devices) for effective local communication. Local communication contains dedicated Emergency Messaging and “walkie-talkie” functions, while higher level connectivity through robust gateway architecture and data transmission units allows for real-time internet access, communication with nearby communities, and even global connectivity. The system can provide GPS-free position information using trilateration, which can help identify the location of nodes monitoring important environmental conditions or allowing users to navigate.

Virtual Prisms Using Augmented Reality Displays

UC Berkeley researchers have developed a sophisticated method and apparatus for dynamically correcting ocular misalignment using pass-through augmented reality (AR) goggles. The system employs orientation sensors to track the user's gaze vectors and cameras to capture the surrounding environment in real time. A specialized processor calculates the specific magnitude of ocular misalignment for each eye and generates a computer-rendered version of the surroundings. By digitally shifting the images presented to each display based on these gaze vectors, the system acts as a "virtual prism," aligning the visual input with the user’s actual gaze to provide a clear, unified field of view. This technology offers a programmable, non-invasive alternative to traditional corrective lenses or surgery for individuals with complex vision impairments.

  • Go to Page: