Humans are generally very good at remembering the images they have seen. Considerable evidence suggests that image memory is signaled by a reduction in response for familiar as compared to novel images (``repetition suppression'') in high-level visual areas such as inferotemporal cortex (IT). However, IT neural responses are modulated not only by familiarity, but also by other factors such as contrast and attention, and it is unclear how the brain disambiguates visual memory signals from other types of firing rate modulations. To address this question, we analyzed behavioral and neural data collected from IT as two rhesus monkeys performed a single-exposure visual memory task in which they viewed images and indicated whether they were novel (never seen before) or familiar (seen exactly once before) while disregarding changes in image contrast across image repetitions. As expected, the monkeys were largely able to detect familiarity in the presence of contrast changes. In comparison, linear decoders applied to the early stimulus-evoked response (200 ms) confused memory and contrast and were inconsistent with the mapping from IT neural signals to behavioral reports. However, we found that a linear decoder applied to the end of the 500 ms viewing period successfully disambiguated memory from contrast and produced a much more accurate behavioral prediction. We also found that the temporal evolution of IT population responses from confusion to disambiguation could not be attributed to the simple disappearance of contrast modulation, but was well-described as a non-linear reformatting of this information across the viewing period.