Friday, October 30, 2009

Apache PDFBox

We have had a number of PDF oriented projects in the past little while. Richard has brought to my attention an Apache Incubator project, PDFBox, which may be very handy for future work. In addition to the normal goodies one would expect, it supports"Lucene Search Engine Integration". Something to keep in mind.

Tuesday, October 27, 2009

Textométrie Project

The Textométrie Project is a multi-institutional and multi-disciplinary effort to develop an open source and distributed platform for sophisticated, often quantitative, text analysis led by Serge Heiden and his collaborators. There is a useful discussion of the French tradition of textométrie and how this fits into other modes of text processing and mining, along with some recent publications, and links to other software and resources. Alpha code is available on Sourceforge.

OMNIA Project

Our colleagues at the Ecole nationale des chartes are working with other researchers in France on the OMNIA (Outils et Méthodes Numériques pour l’Interrogation et l’Analyse des textes médiolatins) project, a four year effort to develop an interactive encyclopedia of medieval Latin.

Monday, October 26, 2009

Conference: Online Humanities Scholarship

Online Humanities Scholarship: The Shape of Things to Come "is a three day conference (March 26-8, 2010) to explore how to develop and sustain online humanities research and publication. Nine scholarly papers and eighteen responses will leverage discussion by a broad group of persons invited to the conference to contribute their expertise. This group includes scholars working on other projects and persons from funding agencies, publishers, museums, libraries, and professional organizations. The conference is closed to this group in order to provide maximum focus to the discussions."

This looks to be very interesting indeed. Have a peek at resources and participants. Papers and responses are to be posted well in advance of the meeting itself. Certainly something to keep track of.

Sunday, October 25, 2009

Total Perspective Vortex

Thinking about building a renvois navigation scheme, with some kind of visualization, for the Encyclopédie, reminded me of the Total Perspective Vortex from the Hitchhiker's Guide to the Galaxy, the greatest selling electronic book in the history of the universe. It is important to note that "in an infinite universe, the one thing sentient life cannot afford to have is a sense of proportion." Thankfully, the renvois system is finite, so we won't risk brain vaporization. The original radio broadcast is available in bits and pieces on YouTube, with Don't Panic in large, friendly letters as the video track. :-) The Guide's best advice is, aside from Don't Panic, "expect the unexpected".

Wednesday, October 21, 2009

Arbre généalogique: Static Image

We periodically get requests for a high resolution image of the splendid representation of the organization of knowledge in the Encyclopédie called ESSAI D'UNE DISTRIBUTION GÉNÉALOGIQUE DES SCIENCES ET DES ARTS PRINCIPAUX de Chrétien Frederic Guillaume Roth (1769), which we have put up under Zoomify. The static image is a 10 MB jpeg file, available here. Browsers beware. I like this image so much, I purchased a large reproduction and had it nicely framed. Yes, the framing cost more than the reproduction. Isn't that always the case? Manuel Lima mentions the Essai to his stunning array of visualizations at Visual Complexity, which is well worth the visit, and linked it to a modern interactive representation of the Système Figuré des Connaissances Humaines by Christophe Tricot. The Encyclopédie Collaborative Translation Project has released an English translation of the Système Figuré.

Marti Hearst, Search User Interfaces

I have been reading Marti Hearst's excellent Search User Interfaces, which is fully available at Of particular interest to me is her chapter on Information Visualization for Text Analysis. She writes "the categorical nature of text, and its very high dimensionality, make it very challenging to display graphically" and goes on to present a number of ways to handle display of text analysis results from concordances to directed graphs. This is certain something to consider for any future renovation of PhiloLogic and our related systems. We do have collocation clouds and I did a quick implementation of word frequency histograms (link) in PhiloLogic. But these are very rudimentary. Some of the examples in Hearst's a quite remarkable and we might want to model extensions of PhiloLogic on some of these.

One final note for you scribblers out here. She has a couple of entries on about how she talked her publisher (Cambridge) to let her put the book online for free and why. :-)

An important and visually compelling site/book.