In a recent series of tweets, Jan LeCun, a prominent figure in the field of artificial intelligence, shared his experience and insights into the development of the DjVu image compression format and its profound impact on the machine learning (ML) and AI community. LeCun started the DjVu project in the mid-1990s at AT&T Labs, aiming to create an efficient method for distributing high-resolution scanned documents over the Internet. The DjVu format, released later in the late 90s/early 00s, found adoption by platforms such as the Internet Archive.
LeCun’s initiative to scan and distribute the complete collection of Neural Information Processing (NIPS) conference papers further illustrates the utility of the format. Obtaining permission from publishers Morgan Kaufman and MIT Press, who did not earn revenue from past proceedings, LeCun and his team successfully made these resources widely available by 2000 through a free website.
This move was key in shaping the culture of the ML/AI community towards open access and rapid sharing of preprints. Around the same time, community pushback against commercial journal publishers led to the creation of the Journal of Machine Learning Research (JMLR), a free, open-access journal that further supported this trend.
LeCun also recounted an intriguing episode with Springer, the for-profit publisher that owns the rights to the first volume of NIPS. Initially refusing permission for digital distribution, a flurry of email requests directed at a Springer executive led to a swift reversal of that decision, underscoring the collective influence of the community.
Other participants in the DjVu project, such as Léon Bottou and Patrick Haffner, were recognized by LeCun for their significant roles. The format’s legacy extends beyond academia, influencing projects such as Google’s book scanning initiative and the Internet Archive’s Million Books Project.
LeCun’s reflections shed light on the evolving dynamics of intellectual property in the digital age, highlighting the importance of open access resources for democratizing knowledge and fostering innovation in areas such as machine learning and AI.
Image source: Shutterstock