Recently physiology has shown that while single units in area V1 respond primarily to the local spectral content of a stimulus, single units in V2 are selective for higher-order image statistics that distinguish natural images. Despite these observations, a description of how V2 constructs higher-order feature selectivity from V1 outputs remains elusive. To study this, we consider a two-layer linear-nonlinear network mimicking areas V1 and V2. The V1 stage is built from linear filters that tile the dimensions of position, orientation, and scale, while the V2 stage computes linear combinations of rectified V1 outputs. When connection weights are optimized so that output responses match the higher-order statistics of a texture model (Portilla & Simoncelli, 2000) computed on natural images, the fitted V2-like units resemble localized differences of V1 afferents across all four tuning dimensions. Interestingly, we find these model fits bear strong qualitative resemblance to those fit to data collected from single units in primate V2, suggesting that some V2 neurons are well-suited for encoding these natural image features. Cortical neurons, such as those of V1, are known to exhibit heavy-tailed (sparse) response distributions to natural images, a fact believed to reflect an efficient image code. Model V2-like units, computing differences over V1 afferents, exhibit a level of sparsity (i.e., kurtosis) similar to what is seen in model V1 populations. In addition, we show that a classifier trained to detect higher-order image features from the kurtosis of responses over space is more efficient when computed on model V2-like units than when computed on comparable V1-like units, in that it requires smaller response ensembles to achieve the same classification accuracy. Thus, differences over V1 afferent activity are an efficient mechanism for computing higher-order visual features, providing an explanation for the receptive field structures observed in neurons within primate area V2.