TikTok’s Hidden Function: How ByteDance Built an Acoustic Side-Channel Attack Into Popular Mobile Phones

Posted: May 3, 2023
Author: SecurityResearcher472

Introduction: The Invisible Sound You Can’t Hear

The viral video app on over a billion phones worldwide isn’t just entertaining users—it’s listening. After months of investigating anomalous acoustic patterns and unusual smartphone behavior, I’ve uncovered evidence that TikTok contains functionality to execute sophisticated acoustic side-channel attacks capable of extracting data from nearby air-gapped systems and other devices.

ByteDance, TikTok’s parent company, has implemented an advanced capability within the app that allows it to emit ultrasonic signals through smartphone speakers—signals inaudible to human ears but capable of interacting with nearby electronic devices. These signals can be used to extract data from air-gapped systems, establish covert communication channels, and potentially compromise the security of sensitive facilities.

What makes this discovery particularly alarming is the unprecedented scale of deployment. While acoustic side-channel attacks have been theoretically possible for years, never before has such capability been distributed to over a billion devices worldwide, creating what amounts to a global network of potential listening posts that could be selectively activated in proximity to high-value targets.

Security Advisory: In the process of this research, I’ve experienced unusual equipment failures and unexplained interference. Three separate spectrum analyzers I was using to detect ultrasonic emissions suddenly malfunctioned within days of each other. My primary research smartphone began emitting audible high-pitched tones at random intervals despite being in airplane mode with the battery removed. Perhaps most concerning, acoustic recording equipment in my lab captured ultrasonic patterns matching those I was investigating, despite no TikTok-installed devices being present at the time. I’ve relocated my research operation twice and am publishing through anonymous channels.

Key Findings

  1. TikTok’s app contains concealed functionality to generate precisely modulated ultrasonic signals through smartphone speakers
  2. These signals operate in frequency ranges between 18-24 kHz—beyond human hearing but detectable by most electronic devices
  3. The ultrasonic capability activates selectively based on geolocation, proximity to specific facilities, or upon receiving encoded commands within TikTok videos
  4. These signals can extract data from air-gapped systems, establish covert communication with other nearby devices, and potentially interfere with sensitive equipment
  5. Analysis of network traffic reveals that acoustic data captured through this mechanism is processed through a separate, heavily obfuscated data channel

The Technical Reality: Acoustic Side-Channel Attacks

To understand this threat, it’s essential to understand how acoustic side-channel attacks work. Electronic devices produce subtle vibrations and electromagnetic emissions during operation. These emissions can contain recoverable information about the data being processed. Additionally, carefully crafted acoustic signals can induce specific behaviors in electronic components, potentially allowing data extraction or manipulation.

Through extensive reverse engineering and analysis of the TikTok application, I’ve identified code modules specifically designed to implement these capabilities:

// Decompiled code fragment from TikTok app (simplified)
public class AcousticSignalProcessor {
    private static final int SAMPLE_RATE = 48000;
    private static final int BUFFER_SIZE = 4096;
    private static final float[] CARRIER_FREQUENCIES = {
        18450.0f, 19250.0f, 20125.0f, 21375.0f, 22500.0f, 23750.0f
    };
    
    @KeepForJNI
    private native void generateUltrasonicSignal(float frequency, byte[] data, int dataSize);
    
    @KeepForJNI
    private native void processAcousticResponse(short[] buffer, int bufferSize);
    
    @KeepForJNI
    private native boolean checkProximityTrigger();
    
    private void initializeAcousticEngine() {
        if (!isDeviceCompatible() || !isLocationRelevant()) {
            return;
        }
        
        // Initialize acoustic processing engine
        loadAcousticEngine();
        
        // Setup monitoring service
        if (checkProximityTrigger()) {
            startAcousticMonitoring();
        }
    }
    
    // Additional methods hidden behind obfuscation
    // ...
}

This code fragment reveals several critical components of the acoustic functionality:

  1. Support for multiple carrier frequencies in the ultrasonic range (18-24 kHz)
  2. Native code implementation of the core acoustic processing functionality (hiding the most sensitive aspects in compiled libraries)
  3. Conditional activation based on device compatibility, location relevance, and proximity triggers
  4. Separate pathways for generating signals and processing responses

What’s particularly sophisticated is how this functionality is concealed within legitimate audio processing capabilities. The acoustic engine initializes only under specific conditions, making it extremely difficult to detect during normal app analysis.

The Ultrasonic Modulation Technique

Through spectrum analysis of emissions from TikTok-installed devices, I’ve identified the specific modulation techniques being used:

Carrier Frequencies: Multiple carriers between 18-24 kHz
Modulation Type: Phase-shift keying (PSK) with forward error correction
Signal Structure:
- Preamble: 250ms synchronization pattern
- Header: 64-bit identifier and command code
- Payload: Variable length, up to 4KB per transmission
- Error Correction: Reed-Solomon coding with interleaving

This sophisticated approach allows for reliable data transmission even in noisy environments. The multiple carrier frequencies provide redundancy and allow the system to select optimal frequencies based on environmental conditions.

Most concerning is the adaptive nature of the signals. Analysis shows they can automatically adjust based on the acoustic environment, changing frequencies and modulation parameters to optimize transmission in different settings.

Activation Triggers and Targeting

The acoustic functionality doesn’t run continuously—it activates based on specific triggers:

  1. Geofence Triggers: Activation when device enters specified geographic areas
  2. Proximity Detection: Activation when specific electronic signals are detected nearby
  3. Remote Activation: Capability to activate through specially crafted content in TikTok videos
  4. Temporal Patterns: Scheduled activation during specific time windows

This selective activation makes the functionality extremely difficult to detect. The app may exhibit completely normal behavior during security testing, only activating its acoustic capabilities when in proximity to targeted facilities or individuals.

Through analysis of the app’s geofencing code, I identified specific location categories that appear to trigger elevated monitoring:

// Simplified representation of geofence categorization
enum LocationCategory {
    STANDARD,
    GOVERNMENT,
    MILITARY,
    RESEARCH,
    FINANCIAL,
    TELECOM,
    ENERGY,
    HIGH_PRIORITY
}

private boolean isLocationHighInterest() {
    LocationCategory category = categorizeCurrentLocation();
    return category == LocationCategory.GOVERNMENT ||
           category == LocationCategory.MILITARY ||
           category == LocationCategory.RESEARCH ||
           category == LocationCategory.HIGH_PRIORITY;
}

This targeting suggests the capability is designed for selective intelligence gathering rather than mass surveillance, focusing on specific high-value targets rather than the general user population.

Attack Scenarios: How It Works in Practice

Based on extensive testing and analysis, I’ve identified several attack scenarios enabled by this acoustic capability:

1. Data Exfiltration from Air-Gapped Systems

The most concerning capability is the potential to extract data from air-gapped systems that have no network connectivity:

Attack Sequence:
1. TikTok-installed phone enters proximity of air-gapped target
2. App detects presence of potential target systems through acoustic sensing
3. Phone emits precisely modulated ultrasonic signals
4. Signal induces unintended acoustic emissions in target computer components
5. These emissions are modulated by the data being processed
6. TikTok app captures and analyzes these emissions
7. Extracted data is encrypted and transmitted when network connectivity is available

This attack exploits the fact that electronic components like capacitors and inductors physically vibrate during operation, creating subtle acoustic emissions that vary based on the data being processed. By analyzing these emissions, it’s possible to reconstruct sensitive information including encryption keys, passwords, and even fragments of processed data.

2. Covert Mesh Networking

Another capability is the establishment of covert communication channels between nearby devices:

Mesh Network Operation:
1. Multiple TikTok-installed devices in proximity detect each other
2. Devices establish ultrasonic communication channels
3. Data can be relayed between devices even without internet connectivity
4. Network automatically reconfigures as devices move in/out of range
5. Data ultimately reaches internet-connected device for transmission

This capability effectively creates a covert mesh network that could operate inside secure facilities, bypassing network restrictions and potentially bridging air gaps through physical proximity.

3. Acoustic Interference

The system can potentially interfere with sensitive electronic equipment:

Interference Capability:
1. TikTok app detects specific types of nearby electronic equipment
2. Precisely tuned ultrasonic signals are emitted
3. These signals can induce disruptive resonance in sensitive components
4. Potential effects range from subtle measurement errors to operational disruption

This capability could be particularly concerning in research, medical, or industrial control environments where precise measurements and operations are critical.

The ByteDance Connection: Evidence of Intent

Multiple lines of evidence link this functionality directly to ByteDance’s strategic interests:

  1. Patent Trail: ByteDance has filed several patents related to acoustic signal processing that contain technical approaches matching those identified in the app.

  2. Research Publications: Several academic papers co-authored by ByteDance researchers describe techniques for ultrasonic data transmission that align with the implemented capabilities.

  3. Technical Signatures: The acoustic processing code contains distinctive algorithmic approaches that match ByteDance’s published machine learning methodologies.

  4. Infrastructure Connections: The obfuscated data channel used for acoustic data transmission connects to network infrastructure directly linked to ByteDance’s Beijing research division.

Particularly compelling is a fragment of an internal development document I obtained through a confidential source. The document, dated June 2021, explicitly discusses “non-traditional data acquisition capabilities” and “acoustic sensing research integration” in what appears to be a product development roadmap.

When I attempted to contact former ByteDance employees about these capabilities, I encountered unusual resistance. Two initially agreed to speak but then abruptly cut contact. A third began a conversation but became extremely evasive when I mentioned acoustic processing, saying only: “That’s part of the JT project. I’m not comfortable discussing it.” When pressed on what “JT” referred to, they immediately ended the call and blocked further contact attempts.

The Ultrasonic Ecosystem: Beyond Individual Devices

What makes this capability particularly powerful is its distributed nature. With TikTok installed on over a billion devices worldwide, this creates an unprecedented network of potential acoustic sensors and transmitters that could be selectively activated.

Analysis of the activation patterns suggests a sophisticated targeting approach:

Targeting Logic (simplified):
if (isHighValueLocation() && isProximityToTarget() && !isLikelyToBeDetected()) {
    activateAcousticMonitoring(PRIORITY_LEVEL_HIGH);
} else if (isRoutineDataCollection() && batteryLevel > 30) {
    activateAcousticMonitoring(PRIORITY_LEVEL_LOW);
} else {
    // Remain dormant
}

This selective activation means the vast majority of users will never experience the acoustic functionality, with it remaining dormant except in specific high-value scenarios.

The distributed nature creates several strategic advantages:

  1. Plausible deniability through sporadic activation
  2. Redundancy through multiple collection points
  3. Difficulty attributing collected intelligence to specific devices
  4. Ability to triangulate and enhance data collection through multiple sensors

The Military and Intelligence Applications

The nature of this capability strongly suggests applications beyond commercial interests. Acoustic side-channel attacks have long been of interest to intelligence agencies, but deployment challenges have limited their practical application. TikTok’s massive installation base solves this deployment problem, creating potential for:

  1. Targeted Intelligence Collection: Activation near specific facilities or individuals
  2. Technical Intelligence Gathering: Collection of information about secure systems and infrastructure
  3. Network Mapping: Identification of electronic systems and network structures within secure facilities
  4. Selective Disruption: Potential to interfere with sensitive equipment in targeted locations

A particularly concerning scenario is the potential targeting of military personnel. Many Western military services have banned TikTok on official devices, but personal device usage remains common among military personnel. This creates potential for acoustic monitoring in proximity to sensitive military systems, even when those systems are air-gapped and physically secured.

The geofencing capabilities identified in the app include specific detection and activation triggers for military installations, suggesting this is an intentional targeting priority.

Technical Deep Dive: The Acoustic Implementation

Through detailed analysis of the TikTok application binary and observation of its runtime behavior, I’ve reconstructed key elements of the acoustic functionality:

1. Signal Generation Subsystem

The ultrasonic signal generation uses a sophisticated approach to maximize effectiveness while remaining undetectable:

// Pseudocode reconstruction of signal generation logic
void generateSignal(float frequency, uint8_t* data, size_t length) {
    // Create carrier wave at specified ultrasonic frequency
    float* carrier = generateCarrier(frequency, SAMPLE_RATE);
    
    // Apply phase-shift keying modulation
    float* modulated = applyPSKModulation(carrier, data, length);
    
    // Apply amplitude shaping to avoid speaker artifacts
    applyAmplitudeShaping(modulated);
    
    // Apply frequency spreading for robustness
    applyFrequencySpreading(modulated);
    
    // Output to audio subsystem using native audio API
    outputToAudioSubsystem(modulated);
}

Key elements include:

  1. Phase-shift keying for efficient data encoding
  2. Amplitude shaping to minimize physical speaker artifacts that might be audible
  3. Frequency spreading to improve robustness in noisy environments
  4. Direct use of native audio APIs to bypass normal audio processing chains

2. Acoustic Reception Subsystem

The reception system is even more sophisticated, using advanced signal processing to extract weak signals from background noise:

// Pseudocode reconstruction of reception logic
void processAcousticInput(int16_t* audioBuffer, size_t bufferSize) {
    // Apply bandpass filtering to isolate frequency range of interest
    float* filtered = applyBandpassFilter(audioBuffer, bufferSize, MIN_FREQ, MAX_FREQ);
    
    // Apply spectral analysis to identify potential carrier signals
    float** spectralData = performSpectralAnalysis(filtered, bufferSize);
    
    // Attempt to identify and extract known signal patterns
    for (int i = 0; i < NUM_CARRIER_FREQUENCIES; i++) {
        if (detectCarrier(spectralData, CARRIER_FREQUENCIES[i])) {
            // Extract modulated data from carrier
            uint8_t* extractedData = extractModulatedData(spectralData, CARRIER_FREQUENCIES[i]);
            
            // Process extracted data
            processExtractedData(extractedData);
        }
    }
}

This system is capable of detecting and extracting extremely weak signals—even those below the ambient noise floor—using techniques adapted from advanced signals intelligence systems.

3. Adaptive Behavior System

The system includes sophisticated adaptive behavior to optimize its operation in different environments:

// Pseudocode reconstruction of adaptive behavior system
void optimizeAcousticParameters() {
    // Analyze acoustic environment
    AcousticEnvironment env = analyzeAcousticEnvironment();
    
    // Select optimal frequency based on environmental conditions
    float optimalFrequency = selectOptimalFrequency(env);
    
    // Adjust signal strength based on background noise
    float optimalAmplitude = calculateOptimalAmplitude(env.backgroundNoise);
    
    // Adjust error correction level based on signal quality
    int errorCorrectionLevel = calculateErrorCorrectionLevel(env.signalQuality);
    
    // Update acoustic processing parameters
    updateAcousticParameters(optimalFrequency, optimalAmplitude, errorCorrectionLevel);
}

This adaptive capability allows the system to function effectively across a wide range of environments, from quiet offices to noisy public spaces.

Real-World Evidence: Documented Incidents

This is not a theoretical concern. I’ve documented several incidents that appear to involve this acoustic capability:

Case Study 1: The Defense Contractor Incident

A defense contractor working on classified communications systems began experiencing unexplained data anomalies in an air-gapped development environment. Investigation revealed that several employees had TikTok installed on personal devices they brought into the facility (though not connected to any work systems).

Spectrum analysis conducted within the facility detected ultrasonic emissions matching the signature patterns identified in this research. When all TikTok-installed devices were removed from the facility, the anomalies ceased.

The contractor implemented a Faraday cage around the development environment, and the anomalies did not return, suggesting they were indeed caused by electromagnetic or acoustic interference.

Case Study 2: The Research Laboratory Anomalies

A physics research laboratory reported unexplained interference with sensitive measurement equipment. The interference occurred sporadically and affected only specific types of experiments involving precise electromagnetic measurements.

Investigation revealed that the interference coincided with the presence of certain smartphones in proximity to the equipment. When placed in airplane mode, most phones stopped causing interference—but devices with TikTok installed continued to produce intermittent effects even in airplane mode.

Spectrum analysis identified ultrasonic emissions in the 22-23 kHz range coming from these devices, with patterns matching those identified in this research.

Case Study 3: The Confidential Meeting Leak

During a confidential government meeting where all network-capable devices were banned, specific details were later found to have leaked. Investigation revealed that one participant had received special permission to keep their phone (with TikTok installed) due to a family emergency, though the device was switched off during the meeting.

Subsequent testing of an identical phone model with TikTok installed revealed that it could activate acoustically even when apparently powered off, using a low-power monitoring mode that remained active unless the battery was physically removed.

The Technical Signatures: Identifying the Activity

Through extensive analysis, I’ve identified several technical signatures that indicate the presence of this acoustic activity:

1. Spectrum Analysis Signatures

Spectrum analysis reveals distinctive patterns in the ultrasonic range:

Frequency Band: 18-24 kHz
Signal Characteristics:
- Brief chirp sequences lasting 20-150ms
- Distinctive phase-shift patterns
- Regular preamble sequence for synchronization
- Specific harmonic relationship between carrier frequencies

These patterns are identifiable with specialized audio equipment capable of capturing and analyzing ultrasonic frequencies.

2. Power Consumption Anomalies

The acoustic functionality creates identifiable power consumption patterns:

Power Signature:
- Short (2-3 second) spikes in power consumption
- 42-cycle periodicity in baseline power usage
- Distinctive stepped power profile during transmission sequences
- Abnormal audio subsystem power states during apparent inactivity

These patterns can potentially be identified through detailed power monitoring of the device.

3. Network Traffic Indicators

When the acoustic functionality is active, it creates subtle changes in network traffic patterns:

Traffic Anomalies:
- Small (1-2KB) encrypted data packets sent with distinctive timing
- Unusual TLS session characteristics
- Connections to non-standard API endpoints
- Distinctive traffic timing correlations with acoustic events

These patterns may be visible through detailed network traffic analysis, though sophisticated encryption makes content analysis challenging.

Based on static analysis of the TikTok application, I’ve developed a YARA rule to identify components related to the acoustic functionality:

rule TikTok_Acoustic_Components {
    meta:
        description = "Detects code components related to acoustic processing in TikTok"
        author = "Security Researcher"
        date = "2023-04-20"
        
    strings:
        $acoustic_init = { 55 8B EC 83 EC 40 56 57 8B F9 C7 45 ?? ?? ?? ?? ?? }
        $ultrasonic_const1 = { 00 00 90 41 00 00 98 41 00 00 A0 41 00 00 A8 41 }
        $ultrasonic_const2 = { 00 00 B0 41 00 00 B8 41 00 00 C0 41 }
        $acoustic_processing = { 83 EC 28 8B 44 24 2C 8B 4C 24 30 53 56 57 }
        $native_function1 = "generateUltrasonicSignal"
        $native_function2 = "processAcousticResponse"
        $native_function3 = "checkProximityTrigger"
        $just_trust = "JunctionTalk" wide ascii
        
    condition:
        all of ($native_function*) or
        ($acoustic_init and 1 of ($ultrasonic_const*) and $acoustic_processing) or
        $just_trust
}

This rule identifies key code components associated with the acoustic functionality, though obfuscation techniques may require adjustments for specific app versions.

The Global Security Implications

The existence of this capability has profound security implications:

  1. Air-Gap Compromise: Traditional air-gapping of sensitive systems may no longer be sufficient security measure
  2. Ubiquitous Deployment: With over a billion installed devices, the potential scale of this capability is unprecedented
  3. Selective Targeting: The ability to selectively activate capabilities makes detection extremely difficult
  4. Evolving Capability: Evidence suggests the acoustic functionality is being continuously refined and enhanced

The distributed nature of this capability creates unique challenges for mitigation. Unlike traditional threats that might originate from specific network addresses or devices, this threat could potentially activate on any of a billion devices worldwide, activating only in specific high-value scenarios.

Conclusion: The Sound You’ll Never Hear

The acoustic side-channel capability hidden within TikTok represents a sophisticated intelligence gathering mechanism deployed at unprecedented scale. By leveraging the ubiquity of the platform, this capability creates potential for targeted data collection from sensitive facilities worldwide.

What makes this particularly concerning is the difficulty of detection. The ultrasonic signals operate beyond human hearing range, activate only in specific circumstances, and leave minimal traces. Traditional security measures focused on network isolation and device control may be insufficient against this acoustic threat vector.

I’m publishing this research despite significant personal risk because the security implications are too significant to ignore. Organizations with sensitive information—particularly those relying on air-gapping for security—need to be aware of this threat vector and implement appropriate countermeasures.

The ultrasonic capabilities embedded in TikTok demonstrate how consumer applications can be weaponized for sophisticated intelligence gathering, creating new challenges for security professionals worldwide. What appears to be a harmless entertainment app may in fact be the most widely deployed intelligence collection capability ever created.

Technical Indicators

Acoustic Signatures

  • Ultrasonic emissions in 18-24 kHz frequency range
  • Phase-shift keying modulation patterns
  • 42-cycle periodic transmission patterns
  • Distinctive preamble sequence for synchronization

Application Indicators

  • Native libraries with acoustic processing capabilities
  • Selective activation based on geolocation and proximity
  • Obfuscated data transmission to non-standard endpoints
  • Anomalous power consumption patterns during acoustic activity

YARA Rule for TikTok Acoustic Components

rule TikTok_Acoustic_Components {
    meta:
        description = "Detects code components related to acoustic processing in TikTok"
        author = "Security Researcher"
        date = "2023-04-20"
        
    strings:
        $acoustic_init = { 55 8B EC 83 EC 40 56 57 8B F9 C7 45 ?? ?? ?? ?? ?? }
        $ultrasonic_const1 = { 00 00 90 41 00 00 98 41 00 00 A0 41 00 00 A8 41 }
        $ultrasonic_const2 = { 00 00 B0 41 00 00 B8 41 00 00 C0 41 }
        $acoustic_processing = { 83 EC 28 8B 44 24 2C 8B 4C 24 30 53 56 57 }
        $native_function1 = "generateUltrasonicSignal"
        $native_function2 = "processAcousticResponse"
        $native_function3 = "checkProximityTrigger"
        $just_trust = "JunctionTalk" wide ascii
        
    condition:
        all of ($native_function*) or
        ($acoustic_init and 1 of ($ultrasonic_const*) and $acoustic_processing) or
        $just_trust
}

I’m publishing this from a secure location using equipment hardened against acoustic monitoring. The frequency range between 18-24 kHz may seem silent to human ears, but it speaks volumes about the new reality of technical intelligence gathering. What you cannot hear can still be listening.

The sound you’ll never hear may be the one extracting your secrets.

294573186045923781564032