gms | German Medical Science

Artificial Vision 2024

The International Symposium on Visual Prosthetics

05. - 06.12.2024, Aachen, Germany

Visual fixation-based retinal prosthetic simulation

Meeting Abstract

  • Yuli Wu - Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany
  • D. Nguyen - Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany
  • H. Konermann - Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany
  • R. Yilmaz - Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany
  • P. Walter - Department of Ophthalmology, RWTH Aachen University, Aachen, Germany
  • J. Stegmaier - Institute of Imaging and Computer Vision, RWTH Aachen University, Aachen, Germany

Artificial Vision 2024. Aachen, 05.-06.12.2024. Düsseldorf: German Medical Science GMS Publishing House; 2025. Doc24artvis43

doi: 10.3205/24artvis43, urn:nbn:de:0183-24artvis437

Published: May 9, 2025

© 2025 Wu et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Objective: The objective of this study was to explore the feasibility of a visual fixation-based retinal prosthetic simulation using a simulated saccade mechanism, and to assess the improvements achieved through corresponding end-to-end optimization approaches.

Materials and Methods: Fixations were predicted using images from the ImageNet dataset, leveraging self-attention from a pre-trained Vision Transformer. Out of the 256 patches (16x16) from each image (224x224 pixels), the top 10% most salient fixation patches were preserved to mimic the saccade mechanism. Each fixation patch (14x14 pixels) was encoded with a trainable U-Net optimizer and then simulated using the Axon-Map Model from the pulse2percept library to predict percepts. The resulting masked percepts were evaluated with a self-supervised foundation model (DINOv2), with an optional learnable linear layer for classification accuracy.

Results: Classification accuracy was measured on a subset of the ImageNet validation set (3,952 images, 10 classes). The visual fixation-based approach achieved 81.99% accuracy, compared to 38.70% using a downsampling approach. The accuracy was further improved to 87.72% with the inclusion of an end-to-end U-Net encoder. For comparison, the healthy upper bound achieved 92.76% accuracy.

Discussion: The visual fixation-based retinal prosthetic simulation shows promising potential, drawing inspiration from the saccade mechanism of the human eye while efficiently utilizing the limited number of electrodes in retinal implants. End-to-end optimization further enhances classification accuracy, making this approach a compelling advancement for retinal prosthetics.

Acknowledgment: This work was supported by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) with the grant GRK2610: InnoRetVision (project number 424556709).