Wednesday, November 25, 2009

Projekt DeutschDiachronDigital

Alain suggests that Projekt DeutschDiachronDigital is involved in some interesting efforts that might be related to some work we are doing. There are a number of useful papers on the project's publication list, including
Lukas C. Faulstich, Ulf Leser und Anke Lüdeling. Storing and Querying Historical Texts in a Relational Database. Informatik-Bericht Nr.176 des Instituts für Informatik der Humboldt-Universität zu Berlin, Februar 2005

Lukas C. Faulstich, Ulf Leser und Thorsten Vitt. Implementing a Linguistic Query Language for Historic Texts. Query Languages and Query Processing (QLQP-2006): 11th Intl. Workshop on Foundations of Models and Languages for Data and Objects (FMLDO), 2006.
Interesting to see they are using SQL to power this project.

Monday, November 23, 2009

Geoffrey Rockwell's DHCS Notes

Geoffrey Rockwell has posted his DHCS Notes (link), which includes a rather provocative declaration that we might be witnessing the end of Digital Humanities:

The End of Digital Humanities I can't help thinking (with just a little evidence) that the age of funding for digital humanities is coming to an end. Let me clarify this. My hunch is that the period when any reasonable digital humanities project seemed neat and innovative is coming to an end and that the funders are getting tired of more tool projects. I'm guessing that we will see a shift to funding content driven projects that use digital methodologies. Thus digital humanities programs may disappear and the projects are shunted into content areas like philosophy, English literature and so on. Accompanying this is a shift to thinking of digital humanities as infrastructure that therefore isn't for research funding, but instead should be run as a service by professionals. This is the "stop reinventing wheel" argument and in some cases it is accompanied by coercive rhetoric to the effect that if you don't get on the infrastructure bandwagon and use standards then you will be left out (or not funded.) I guess I am suggesting that we could be seeing a shift in what is considered legitimate research and what is considered closed and therefore ready for infrastructure. The tool project could be on the way out as research as it is moved as a problem into the domain of support (of infrastructure.) Is this a bad thing? It certainly will be a good thing if it leads to robust and widely usable technology. But could it be a cyclical trend where today's research becomes tomorrows infrastructure to then be rediscovered later as a research problem all over.

TXM Search Engine

Serge Heiden suggests that we look at the CQP (Corpus Query Processor) and its successors which he is using in TXM:

-- Tiger Search :
- the corresponding PhD (in german) :
-- NXT Search :
- related doc and technical papers :

Since we're looking at various interesting models, I don't want to forget the CDL's XTF (eXtensible Text Framework).

Friday, November 13, 2009

International Journal of Motorcycle Studies

Ran across a call for papers for INTERNATIONAL JOURNAL OF MOTORCYCLE STUDIES CONFERENCE, Colorado Springs, Colorado, June 3-6, 2010 with links to the journal. Biographical statement and an abstract of 150 words by January 15, 2010. Aside from pulling an abstract together, the only serious question is whether to take the Areo or the Buza out to the conference.

Thursday, November 12, 2009

DHCS 2009

The 4th annual Chicago Colloquium on Digital Humanities and Computer Science (DHCS) is fast approaching. This year's festivities are hosted by Shlomo Argamon and his collaborators at the Illinois Institute of Technology, November 14-16. The program is interesting and wide ranging and I am particularly looking forward to the presentations by our keynote speakers. Several of the ARTFL group will be giving presentations at the pre-conference meetings and workshops on Saturday, also know as our "Birds of a Feather" meeting. Clovis and I will be talking about recent work, based in part on two talks, "From Words to Works" and "PAIR/PhiloLine", as well as some of the more recent work on topic modeling. I also prepared a more technical PhiloLogic overview, demonstration, followed by a discussion of database loading and configuration (slides), just in case I need one. The second half should probably be expanded at one point, since I have had many requests for more extensive documentation on loading and configuring databases in PhiloLogic.

Other links: PhiloLine/PAIR installations for ARTFL Frantext and the Encyclopédie.

Tuesday, November 10, 2009

Plaintext in PhiloLogic

A while back we added a plaintext loader in PhiloLogic at the request of several folks who wanted to work with documents from the Gutenberg Project, Liber Liber and (many) other archives of unencoded or minimally encoded documents. Other use cases for a plaintext loader include direct loading of OCR output and downloading E-PUBs from Google, which can also be converted to TEI as an alternative. I suspect that we will want to retain plaintext loading for implementations of PhiloLogic, since many folks appear to have significant restrictions on accessing materials from various vendors. In a recent blog post, Devin Griffiths described his examination of MONK and ProQuest data, deciding to assemble his own corpus from Project Gutenberg.

Wednesday, November 4, 2009

Find Installed Perl Modules

Here is a helpful one-liner to find installed perl modules, thanks for Blane Warrene:

perl -MFile::Find=find -MFile::Spec::Functions -lwe 'find { wanted => sub { print canonpath $_ if /\.pm\z/ }, no_chdir => 1 }, @INC'

There is also an interactive function called instmodsh.