On the grounds that "danielpipes.org is clean, consistently formatted, carefully edited and larger than WSJ," a team of three authors from Google and Stanford University have used my website to explore a possible connection between linguistic syntax and web mark-up. To put it more technically,
Spanning decades, Pipes' editorials are mostly in-domain for POS taggers and tree-bank-trained parsers; his recent (internet-era) entries are thoroughly cross-referenced, conveniently providing just the mark-up we hoped to study via uncluttered (printer-friendly) HTML.
Valentin I. Spitkovsky, Daniel Jurafsky, and Hiyan Alshawi, the authors of "Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing," show how web mark-up can be used to advance the state-of-the-art in unsupervised dependency parsing. They presented their discovery of a strong correlation with hierarchical syntactic structure last month at the 48th Annual Meeting of the Association for Computational Linguistics in Uppsala, Sweden. This finding could have broad implications for natural language processing problems, with applications extending well beyond parsing.
Comment: It's an honor to have www.DanielPipes.org selected for this study, and the honor goes primarily to Grayson Levy, the person who initiated the idea for this website in 2000, brought it online at the end of that year, and has overseen it ever since. (August 1, 2010)
Feb. 15, 2014 update: DanielPipes.org is also cited in "Domain biased Bilingual Parallel Data Extraction and its Sentence Level Alignment for English-Hindi Pair," by Deepa Gupta, Vani Raveendran and Rahul Kumar of Bangalore writing in the Research Journal of Applied Sciences, Engineering and Technology 7(6), p. 1002.
Daniel pipes corpus (website: http://www.danielpipes.org/) is yet another data set which is a collection of articles that describe Middle East. Originally written in English, it has been translated to 25 other languages including Hindi. For Hindi alone, there exist 322 articles which make approximately 6761 sentence pairs.
Mar. 21, 2014 update: In a study by the Czech group HindEnCorp, Ondřej Bojar, Vojtěch Diatka, Pavel Straňák, Aleš Tamchyna, and Daniel Zeman relied on the 322 articles at DanielPipes.org translated into Hindi.
Mar. 1, 2017 update: In "Problems Encountered in Translating Islamic Related Texts from English into Arabic," Academic Research International, March 2017, Bader S. Dweik and Hiyam M. Khaleel explain their goal in the abstract:
This study aims to explore the problems that experienced translators in Jordan face when translating ideological Islamic-related texts from English into Arabic. To achieve this purpose, the researchers have designed a translation test consisting of 10 extracts with ideological content written by Muslim and non-Muslim writers. A purposive sample of 16 translators was selected to perform the test. The researchers have analyzed the results of the test qualitatively.
They then explain which texts to use for the source language:
The researchers developed a test which embodied two parts. While the first comprised the demographic data of the participants, the second was dedicated to translating statements derived from books, articles and websites (written by Muslim or non-Muslim writers). Three of them were non-Muslim orientalists, known for their controversial and influential writings about Islam, namely George Sale, Bernard Lewis, and Daniel Pipes. The three Muslim ideologists whose ideas about Islam were influential and controversial for both Muslims and non-Muslims, were Abu Alaa Maududi, Sayyed Qutb, and Ali Shariati. The two other writers were Muslim American scholars and writers, namely Imam Zaid Shakir (a specialist in Islamic spirituality) and Laila Ahmed (a specialist in Islam and Islamic feminism).
In case you're curious about the study's results:
The study reveals that the translators have faced the following six problems when rendering the texts; inability to deal with the ideological implications; the ambiguity of some words; the differences between source language (SL) and target language (TL) cultures; the translators' semantic and syntactic mediation; lack of knowledge and the inadequacy of dictionaries.