On the grounds that "danielpipes.org is clean, consistently formatted, carefully edited and larger than WSJ," a team of three authors from Google and Stanford University have used my website to explore a possible connection between linguistic syntax and web mark-up. To put it more technically,
Spanning decades, Pipes' editorials are mostly in-domain for POS taggers and tree-bank-trained parsers; his recent (internet-era) entries are thoroughly cross-referenced, conveniently providing just the mark-up we hoped to study via uncluttered (printer-friendly) HTML.
Valentin I. Spitkovsky, Daniel Jurafsky, and Hiyan Alshawi, the authors of "Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing," show how web mark-up can be used to advance the state-of-the-art in unsupervised dependency parsing. They presented their discovery of a strong correlation with hierarchical syntactic structure last month at the 48th Annual Meeting of the Association for Computational Linguistics in Uppsala, Sweden. This finding could have broad implications for natural language processing problems, with applications extending well beyond parsing.
Comment: It's an honor to have www.DanielPipes.org selected for this study, and the honor goes primarily to Grayson Levy, the person who initiated the idea for this website in 2000, brought it online at the end of that year, and has overseen it ever since. (August 1, 2010)
Feb. 15, 2014 update: DanielPipes.org is also cited in "Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair," by Deepa Gupta, Vani Raveendran and Rahul Kumar of Bangalore writing in the Research Journal of Applied Sciences, Engineering and Technology 7(6), p. 1002.
Daniel pipes corpus (website: http://www.danielpipes.org/) is yet another data set which is a collection of articles that describe Middle East. Originally written in English, it has been translated to 25 other languages including Hindi. For Hindi alone, there exist 322 articles which make approximately 6761 sentence pairs.
Mar. 21, 2014 update: In a study by the Czech group HindEnCorp, Ondřej Bojar, Vojtěch Diatka, Pavel Straňák, Aleš Tamchyna, and Daniel Zeman relied on the 322 articles at DanielPipes.org translated into Hindi.