Understanding how the brain encodes external stimuli and how these stimuli can be decoded from measurable brain activities are long-standing and challenging questions in neuroscience. Progress has been made in reconstructing images from fMRI signals, but they are typically images of single objects or simple shapes. The advent of sophisticated deep learning models would introduce major advancements to this work, but the scarcity of fMRI datasets prevents researchers from applying state-of-the-art deep learning models to reconstruct images that are rich in semantic information and approach the likeness of an everyday scene.
Researchers at the University of California, Santa Barbara have created a technology that reconstructs complex, photo-realistic images observed by subjects from their brain signals. With more objects and relationships presented in these images, an additional text modality is used to better capture the semantics. To achieve high performance with limited data, a pre-trained semantic space aligns the visual and text modalities. fMRI signals are encoded to this visual-language latent space before a generative model conditioned on the mapped embeddings reconstructs the image. Additional contrastive loss is introduced to add low-level visual features into this semantic-based pipeline. As a result, the reconstructed images are both photo-realistic and can faithfully reflect the original image content. This brain signal-to-image decoding pipeline opens new opportunities to study human brain functions through strategic input alterations and can even potentially be helpful for human-brain interfaces.
Mind reader technology, Image reconstruction, Brain activities, fMRI signals, Deep learning models, Semantic information, Complex images, Photo-realistic images, Text modality