Microsoft Research Asia at MobiCom 2024: Advancing Mobile and Wireless Technologies
MobiCom is renowned as a leading international academic conference for mobile computing and wireless networks. Microsoft Research Asia recently had several papers accepted at MobiCom 2024, and here’s a glimpse into some of the cutting-edge research presented.
The accepted papers encompass a range of topics within mobile computing and wireless networks. These include advancements in mobile task automation, remote auscultation (listening to internal body sounds with a stethoscope), DNN (Deep Neural Network) inference, and the development of innovative sensing technologies.
AutoDroid: Automating Android Tasks with LLMs
Mobile task automation aims to enable hands-free interaction with smartphones, often through voice commands. The existing systems, however, struggled with scalability, because of limitations in understanding natural language and the manual effort required to set them up.
Inspired by recent advances in large language models (LLMs), researchers at Microsoft Research Asia developed AutoDroid, a system capable of automating tasks on any Android application without developers building anything manually. AutoDroid uses the language comprehension and reasoning capabilities of LLMs to handle task preparation, UI understanding and execution.
The core of AutoDroid combines the general knowledge of LLMs with the specifics of the app’s functions through automated dynamic analysis. Critical components include a functionality-aware UI system that bridges the gap between the UI and the LLM, memory injection techniques to boost the LLM’s knowledge of the app, and a query optimization module to speed up model inference. AutoDroid integrates with off-the-shelf LLMs such as GPT-4/GPT-3.5 and the on-device Vicuna.
To test the performance of AutoDroid, researchers used a new benchmark created for memory-augmented Android task automation, and ran 158 typical tasks. AutoDroid was able to accurately generate actions with a 90.9% accuracy rate and completed tasks with a success rate of 71.3%, outperforming GPT-4-based baselines by 36.4% and 39.7%, respectively.

Asclepius: Remote Cardiac Auscultation using Earphones
Remote video consultations offer convenient access to healthcare, but they currently lack the ability to assess heart health through cardiac auscultation. “Asclepius” transforms ordinary earphones into stethoscopes to enable doctors to listen to heart sounds (phonocardiogram signals) during video calls.

Asclepius uses a low-cost peripheral to convert earphone speakers into microphones. It captures faint PCG signals in the ear canal. The system includes signal processing algorithms to reduce reverberation and distortions. The MAX5402EUA chip assists with impedance matching and voltage detection, making it compatible with many earphones and devices. Its success highlights a significant step forward in the development of remote cardiac auscultation technology, allowing doctors to monitor heart health from a distance.
Tested with 30 volunteers, Asclepius proved effective in recovering PCG signals using different earphones. The system excels in signal preprocessing, segmentation, and two-stage recovery using UNet models. This technology could greatly enhance remote medical services capabilities.
FlexNN: Efficient DNN Inference on Edge Devices
Deep neural network (DNN) models are increasing being loaded onto clients’ devices like smartphones, autonomous vehicles, and drones. However, limited memory growth and memory-sharing requirements have become a barrier to wider DNN deployment.
To address this, FlexNN, a DNN inference framework, was designed for memory-constrained devices, incorporating dynamic memory-hierarchy management. It redefines the problem as a time-space 2D bin packing issue, breaking traditional tensor structures by using a “slice-load-compute” method. This allows for the disk loading of data to happen at the same time as computation, which dramatically reduces memory usage. Experiments showed that FlexNN decreased memory consumption by 93.81% while only increasing latency by 3.64%, with no change in model accuracy.

FlexNN represents a collaborative effort between AIR, Tsinghua University, and the Heterogeneous Computing group (HEX) at Microsoft Research Asia. FlexNN is part of HEX’s wider focus on engineering new virtual memory systems for deep learning models.
Gastag: Gas Sensing via Graphene-based Tags
Traditional methods of detecting dangerous gases often involve significant costs coupled with potentially complex maintenance. Gastag is a novel approach that uses passive RFID tags.
Gastag incorporates a small piece of gas-sensitive material into a cheap RFID tag. When gas concentration changes, the material’s conductivity is altered, which then affects the tag’s impedance and signal response, allowing for precise gas measurement.
To enhance sensitivity and improve the detection range, the research team developed a new material with high sensitivity and surface area, redesigned the tag antenna, and then optimized the placement of the gas-sensitive material to achieve correct impedance matching. The system achieved low error rates for gas measurements and extended the range to 8.5 meters, allowing for large-scale deployments.
The innovation is the conversion of the common RFID tag into a sensor. The relationship between gas concentration and signal phase variations is then quantified. The existing tag-reader range is preserved and the RFID signal frequency diversity is used to improve sensing accuracy. Tests of the Gastag in different environments show it works effectively in numerous orientations and with minimal interference.

GPSense: Passive Sensing with GPS Signals
Wireless sensing technologies have used signals like Wi-Fi, UWB, and acoustic waves. These systems have problems such as limited range and interference. This research proposes using continuous GPS signals, which do not interfere with other communications technologies, for wireless sensing.
The GPSense system achieves passive wireless sensing by using GPS signals. It reconstructs amplitude and phase information from the GPS receiver modules. The researchers developed models for GPS signals and implemented distributed sensing, which enhances the system’s performance by combining signals from multiple satellites.

Extensive testing under various conditions verified the system’s robustness. The experiments demonstrated GPSense system’s capabilities in human activity sensing, passive trajectory tracking, and respiration monitoring, showcasing its effectiveness.
MSense: Enhancing Wireless Sensing Under Motion Interference
A major limitation in wireless sensing has been stationary devices and targets. The researchers developed MSense as an innovative solution that enhances wireless sensing capabilities under motion interference.
MSense uses commercial millimeter-wave (mmWave) radars and digital beamforming technology to improve the reflected signals from a subject or target area. By comparing signals from multiple body areas, MSense removes interference caused by the motion of the body or the device, providing an accurate extraction of motion information. This method is successful for both periodic and non-periodic motion sensing.

Experimental results validated the potential of MSense in different applications. In vehicles, the detection accuracy of driver fatigue indicators like eye blinks, yawns, and nods was improved substantially, also reducing false alarms. For monitoring respiration when in motion, MSense accurately estimated respiratory rates in home and gym environments. For gesture recognition on mobile devices, MSense achieved over 93% accuracy.
The work presented by Microsoft Research Asia at MobiCom 2024 highlights the significant advancements in mobile and wireless technologies, which could lead to a wide range of applications that will improve how people work, communicate, and interact with the world around them. The innovations described here represent steps forward in a variety of critical areas, including mobile device management, the health field, and gas detection, as well as in the larger goals of artificial intelligence and machine learning.