Visual textures, defined as spatially homogeneous im- age regions containing repeated elements (e.g. a field of grass or the bark of a tree), are prevalent in visual scenes and provide important cues for recognizing materials and objects. Existing texture models extract essential fea- tures from a single texture image, and can then generate high-quality samples that are visually similar to the orig- inal. However, their features are either hand-designed or based on a network pretrained for another purpose (e.g., object recognition). We develop a novel principled method for unsupervised learning of a set of statistics that are used to constrain a maximum entropy density model for the texture. We use training and sampling procedures derived from generative diffusion models. Our trained model is more compact (512 features), but generates tex- ture images whose quality is as good as, and often better than, previous methods. We also demonstrate qualita- tive convexity of the representation space by generating samples that interpolate between two given textures.