Figure 8: Feature contribution of all extracted visual and text representation used for modality prediction