tag:blogger.com,1999:blog-28089690732173263202024-03-11T22:26:02.224-05:00Marco's Libro di ricordanzeThe "libro di ricordanze" was a kind of private notebook maintained by many Florentine merchants during the Renaissance to keep their affairs in good order. The contents of these journals ranged, like so many blogs of today, from personal and family matters to discussions and chronicles of public affairs.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comBlogger25125tag:blogger.com,1999:blog-2808969073217326320.post-49649299308605956102012-01-31T06:37:00.002-06:002012-02-09T05:31:04.751-06:00Scientist debunks flying myth. (Cambridge, UK)<b><span style="font-size: medium;">One of the most common myths in science, why aircraft fly, has been debunked by a Cambridge University scientist, Prof Holger Babinsky. </span></b><br />
<br />
<a href="http://www.youtube.com/watch?v=UqBmdZ-BNig">http://www.youtube.com/watch?v=UqBmdZ-BNig</a><br />
<br />
<b>By David Millward, and Nick Collins</b><br />
<div style="font-family: Verdana,sans-serif;"><br />
Aeroplanes can fly because their wings cause the air pressure underneath to be greater than that above, lifting them into the air.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">But engineers have for years been frustrated by a theory which wrongly explains what causes the change in pressure to occur.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">The myth is commonly found in school textbooks and aeroplane flight manuals, and is so widely believed that even Einstein was rumoured to subscribe to it.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">Now a Cambridge scientist has become so fed up with the bogus explanation that he has created a minute-long video to lay it to rest once and for all.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">The video, published on YouTube by Prof Holger Babinsky of the university’s engineering department, seeks to explain in simple terms why the myth goes against the laws of physics.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">According to conventional wisdom the pressure change happens because the air on the curved upper surface of the wing has further to travel than that below the flat underneath surface, meaning it must travel faster to arrive at the other side of the wing at the same time.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">In fact the real explanation is nothing to do with the distance the air has to travel. The curvature of the wing causes the change in air pressure because it pushes some of the air upwards, which reduces pressure, and forces the rest beneath it, creating higher pressure.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">A law known as the Bernoulli equation means that when pressure is lower, air moves faster – so the air stream above the wing does move more quickly than the one below, but this is not what causes the difference in pressure.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">Prof Babinsky proved his theory by filming smoke passing across a wing.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">If traditional wisdom had been correct the smoke above and below the wing should have reached the front edge at the same time.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">The video demonstrates that the explanation is fundamentally flawed because the plume above the wing reached the edge much sooner than the plume below.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">If the distance the air had to travel was causing the pressure to change, then a boat's sail – where the air travels the same distance on the inside and outside of the curve – would not work, Prof Babinsky said.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">He added: "I don’t know when the explanation first surfaced but it’s been around for decades. You find it taught in textbooks, explained on television and even described in aircraft manuals for pilots.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">"There is no law in physics which states when streams of particles start at the leading edge of the wing they should reach the tailing edge at the same time.</div><div style="font-family: Verdana,sans-serif;"><br />
</div><div style="font-family: Verdana,sans-serif;">"I've even heard a story that Einstein drew a design for an aircraft wing with a long, squiggly line on top of an aerofoil to make the distance for the air to travel greater, but this would not work."</div><div style="font-family: Verdana,sans-serif;"><b><br />
</b><br />
<b style="font-family: Verdana,sans-serif;"><span style="color: red;">Source: </span> <a href="http://www.telegraph.co.uk/science/science-news/9035708/Cambridge-scientist-debunks-flying-myth.html">http://www.telegraph.co.uk</a></b></div>Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-43202708850543518852011-08-26T14:26:00.015-05:002011-09-01T19:15:49.313-05:00Status of General Aviation: Some Notes/LinksHarry R. Clements, The Rise and Fall of General Aviation — An Economists View with Focus on Single Engine Aircraft and the Impact of Airline Deregulation (2000 conference <a href="http://www.citeulike.org/user/markymaypo/article/9714483">paper</a>)<br />
<br />
Matt Thurber, Free Fall: The Unexpected Decline of the Billion Dollar General Aviation Industry (Tab Book, 1995). <strike>Seems to be out of print</strike> (<a href="http://www.amazon.com/Free-Fall-Unexpected-Aviation-Industry/dp/0070645418">Amazon link</a>) Author indicates it was never published.<br />
<br />
Janet R. Daly <a href="http://academic.udayton.edu/JanetBednarek/Personal/janet_r.htm">Bednarek</a> and Michael H. Bednarek, <i>Dreams of Flight: General Aviation in the United States.</i> College Station, Tex.: Texas A&M University Press, 2003. Pp. xviii+191 (Reviewed in Technology and Culture, Volume 45, Number 3, July 2004, pp. 629-630). <span style="font-size: x-small;">"Overexpansion and increased federal regulation precipitated a steep and apparently permanent decline in sales of general aircraft [...] commercial and military aviation interests have largely succeeded in limiting general avia- tion’s influence. [...] Faced with declining numbers of pilots and increasing costs for flight training, aircraft, and maintenance, American general aviation enters its second century facing an uncertain future. Still, interest and enthusiasm among America’s general aviators remain strong, and new technologies such as ultralight aircraft continue to fuel a fascination with powered flight."</span> (TL 721.4.B38) Also see Bednarek's bibliography in <a href="http://www.wingsoverkansas.com/history/article.asp?id=668">General Aviation: An Overview.</a><br />
<br />
Joseph J. Corn, <i>The Winged Gospel</i> (OUP, 1983) [TL 521.C643]. <br />
<br />
<span class="essayText">Dominick Pisano, "</span><a href="http://www.centennialofflight.gov/essay/Social/SH-OV2.htm">The Social and Cultural History of Aviation and Spaceflight</a>" 2003<br />
<br />
J.G. Wensveen, Air Transportation: A Management Perspective (6th ed, 2007, <a href="http://www.scribd.com/doc/54449149/Air-Transportation-a-Management-Perspective">link</a>), see section 4, The General Aviation Industry, pp <a href="http://www.scribd.com/doc/54449149/Air-Transportation-a-Management-Perspective#page=134">111ff</a>, for a useful overview.<br />
<br />
General Aviation Statistics (<a href="http://www.aopa.org/whatsnew/stats/statistics.html">AOPA</a>)<br />
<a href="http://www.faa.gov/data_research/aviation_data_statistics/">FAA Aviation Statistics</a> <br />
<br />
James Lardner and Robert Kuttner, <a href="http://www.demos.org/publication.cfm?currentpublicationID=13A2281E-3FF4-6C82-5EB53F023A456D3A">Flying Blind: Airline Deregulation Reconsidered</a> (Demos, June 24, 2009).<br />
<br />
GAO: <a href="http://www.gao.gov/new.items/d06630.pdf">AIRLINE DEREGULATION</a>: Reregulating the Airline Industry Would Likely Reverse Consumer Benefits and Not Save Airline Pensions (2006).<br />
<br />
Matt Thurber, <a href="http://aerospaceblog.wordpress.com/2011/03/01/can-general-aviation-reverse-its-decline/">Can general aviation reverse its decline</a> (Blog: 03-2011)<br />
<br />
Robert Goyer,<a href="http://www.flyingmag.com/blogs/going-direct/flying-really-expensive"> Is Flying really that Expensive</a>? (Blog: 03-2011) References an APA whitepaper: "<a href="http://www.scribd.com/doc/46602443/The-Role-of-Aircraft-Prices-in-the-Decline-and-Renewal-of-General-Aviation">Role of Prices</a> in the Decline and Renewal of General Aviation" (2009). Both make case that expenses apart from A/C have tracked inflation. A/C have been 2.5-4x rate of inflation, hourly cost of operation now far exceeds rental rates for most users. Goyer has another blog post <a href="http://www.flyingmag.com/blogs/going-direct/part-23-do-over">Part 23 Do Over</a> where he mentions a number of experimentals which he would never fly again, but does not list them. One comment: "Come on Robert! First you dangle your list of scariest homebuilts right in front of our noses; Then you yank it away again [...] Be a man and fess up the list. Informing the public about designs which you believe top be really scary to fly would be a valuable public service. Perhaps your bosses are more worried about the flood of lawsuits such an admission might cause,"<br />
<br />
Kerry Kovarik, A Good Idea Stretched Too Far: Amending the General Aviation Revitalization Act to Mitigate Unintended Inequities (<a href="http://www.citeulike.org/user/markymaypo/article/9712863">article</a>, 2008) <br />
<br />
Eric A. Helland, Alexander T. Tabarrok, Product Liability and Moral Hazard: Evidence from General Aviation (<a href="http://www.citeulike.org/user/markymaypo/article/9712918">article</a>, 2008)<br />
<br />
Eric Helland and Alex Tabarrok, Crash and Learn: Consumption Externalities and the Reduction of Aircraft Accidents (<a href="http://www.claremontmckenna.edu/fei/papers/">working paper</a>, 2007)<br />
<br />
Phillip J. Kolczynski, <a href="http://www.avweb.com/news/avlaw/181905-1.html">GARA: A Status Report</a> (Blog, 2001) <br />
<br />
Scott Tarry, "Issue Definition, Conflict Expansion, and Tort Reform: Lessons from the American General Aviation Industry" (<a href="http://www.citeulike.org/user/markymaypo/article/9712618">article</a>, 2001)<br />
<br />
William Keith Stockman, <a href="http://dodreports.com/pdf/ada318874.pdf">The Crash of General Aviation: A Public Choice Perspective</a> (PhD Dissertation, George Mason University, 1996). Argues that "rent seekers" are the primary cause of the decline of general aviation. "The industry and its users have fallen victim to the successful rentseeking of of others and have only recently had any success in reversing this trend. {MVO notes passage of GARA} Though durable good's models do offer some explanations for the industries woes, the majority of the evidence points to a public choice explanation. Thus, the industry should look toward public choice solutions if it desires to reverse the current trend" (p. 8)<br />
<br />
Lawrence J. Truitt, Scott E. Tarry, The Rise and Fall of General Aviation: Product Liability, Market Structure, and Technological Innovation (<a href="http://www.citeulike.org/user/markymaypo/article/9712519">article</a>, 1995)<br />
<br />
GARA: The General Aviation Revitalization Act of 1994 (Public Law 103-298)<br />
<a href="http://www.avweb.com/news/news/184254-1.html">The complete text of the 1994 law</a>.<br />
<br />
<span id="ID3d">United States Airline Deregulation Act (1978): Alfred Kahn <a href="http://www.econlib.org/library/Enc/AirlineDeregulation.html">Overview</a>. </span><br />
<br />
Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-64339758127610647542010-06-12T11:05:00.002-05:002010-06-12T11:26:07.056-05:00MotoCzysz E1PCFinally, an electric bike that packs some serious performance. The <a href="http://www.motoczysz.com/main.php?area=home">E1PC </a>produces 100 hp and can top at least 140 mph. Not close to the stock <a href="http://en.wikipedia.org/wiki/Suzuki_Hayabusa">Hayabusa</a>, but still very impressive. True it broke down at the recent Isle of Man TTs, but this may well be a peek at "<a href="http://www.wired.com/autopia/2010/06/motoczysz-e1pc/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+wired%2Findex+%28Wired%3A+Index+3+%28Top+Stories+2%29%29">the motorcycle of the future</a>". Not indications of weight and cost is astronomical at this time. Well, they are using batteries built by the same folks that build 'em for NASA. Check out related article <a href="http://www.popsci.com/cars/article/2010-06/inside-story-motoczysz-e1pc-worlds-most-advanced-electric-motorcycle">here</a>. My next bike?Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-60072080315775943672010-05-21T12:05:00.001-05:002010-05-21T12:05:37.379-05:00More Google GoodiesRuss pointed out two new Google APIs which are of particular interest. The <a href="http://code.google.com/apis/predict/">Prediction API</a> "enables access to Google's machine learning algorithms to analyze your historic data and predict likely future outcomes". It will take your labeled data, run supervised learning algorithms, and allow you to predict. I've not looked too hard, but I don't see just which ML algorithms they are offering (Naive Bayes, SVM, KNN, etc). But how cool is this? Well, pretty cool. But, Russ also points out <a href="http://code.google.com/apis/bigquery/">BigQuery</a> by Google: "a web service that enables you to do interactive analysis of massively large datasets. Scalable and easy to use, BigQuery lets developers and businesses tap into powerful data analytics on demand". Terrabytes, billions of rows per seconds. Wow.<br /><br />Also check out <a href="http://googleresearch.blogspot.com/2010/04/lessons-learned-developing-practical.html">Lessons learned developing a practical large scale machine learning system</a>. And the first lesson? "Keep it simple (even at the expense of a little accuracy)".<br /><br />Now these, all by themselves, are a really good reason to learn Python. :-)Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-56289890061697833652009-12-01T09:14:00.003-06:002009-12-01T09:43:22.885-06:00Berkeley DB/GDBM Links<a href="http://philologic.uchicago.edu/">PhiloLogic</a> uses <a href="http://www.gnu.org/software/gdbm/">GDBM</a> (GNU Database Manager) for word searches. As we are starting to think about a new PhiloLogic series (the infamous "4"), we have been looking at a number of design and implementation issues, including advanced indexing schemes. For example, Clovis did a preliminary examination of various <a href="http://artfl.blogspot.com/2009/07/looking-at-different-implementations-of.html">fuzzy matching</a> systems. Richard and I have been starting to look at newer GDBM tools as well as <a href="http://en.wikipedia.org/wiki/Berkeley_DB">Berkeley DB</a>. Here are a few links and alternatives Richard proposed which I think we should experiment with and/or read: <br /><blockquote>Older perl-5 style: the tie function:<br /><a href="http://perldoc.perl.org/functions/tie.html" target="_blank">http://perldoc.perl.org/<wbr>functions/tie.html</a><br />this lets you tie any complex data structure into a perl scalar, array, or hash, as you wish.<br /><a href="http://perldoc.perl.org/perltie.html" target="_blank">http://perldoc.perl.org/<wbr>perltie.html</a><br />This is great for "hiding" object-oriented interfaces in a simple, "perl-ish" way. It can wrap GDBM or Berkeley, or MySQL, or SQLite, or Hadoop...and so on.<br /><br />The low-level perl GDBM_File module:<br /><a href="http://search.cpan.org/%7Edapm/perl-5.10.1/ext/GDBM_File/GDBM_File.pm" target="_blank">http://search.cpan.org/~dapm/<wbr>perl-5.10.1/ext/GDBM_File/<wbr>GDBM_File.pm</a><br />tie and dbmopen both use this core perl module. On some mac's, I have had to Recompile perl to get GDBM_file working. You can't get it from CPAN.<br /><br />The low-level perl Berkeley DB module:<br /><a href="http://search.cpan.org/%7Epmqs/BerkeleyDB-0.39/BerkeleyDB.pod" target="_blank">http://search.cpan.org/~pmqs/<wbr>BerkeleyDB-0.39/BerkeleyDB.pod</a><br />Pretty nice, but doesn't support all of the awesome Berkeley DB features, like joins. the python binding will do joins for you, at C speed. the $db->associate($secondary, \&key_callback) function lets you automatically maintain a secondary index. the DBM Filter functionality will do customized byte packing and unpacking for you transparently.<br /><br />One other DBM product you might want to look at it is Tokyo Cabinet:<br /> <a href="http://1978th.net/tokyocabinet/" target="_blank">http://1978th.net/<wbr>tokyocabinet/</a><br />It runs Mixi, the Japanese equivalent of Facebook, as I understand it, and some googling suggest that it's quite hot in the noSQL world. It's certainly faster than BerkeleyDB, and lightweight, and has nice Ruby bindings--Perl, not so hot.<br />Some people claim it's more stable than Berkeley. There's an impressive set of benchmarks here:<br /> <a href="http://tokyocabinet.sourceforge.net/benchmark.pdf" target="_blank">http://tokyocabinet.<wbr>sourceforge.net/benchmark.pdf</a><br />This should be compared with Oracle's benchmarks:<br /> <a href="http://www.oracle.com/technology/products/berkeley-db/pdf/berkeley-db-perf.pdf" target="_blank">http://www.oracle.com/<wbr>technology/products/berkeley-<wbr>db/pdf/berkeley-db-perf.pdf</a><br />which shows bulk read of 5,000,000 records/sec. "un de ces" indeed.<br /><br />Berkeley has more features, Tokyo might be faster, we'd probably want to test both of them out at scale to see how they hold up. Tokyo is designed to do cloud-style partitioning and stuff.<br /><br />We also might want to look at Project Voldemort, which runs LinkedIn:<br /><a href="http://project-voldemort.com/" target="_blank">http://project-voldemort.com/</a><br />This one keeps its database in-memory, and has really sophisticated protocols for distributed hash tables, load balancing, consistency, etc.<br /></blockquote>Lots to think about indeed. And he sent along a little script as an example to tinker with, which I won't post here....<br /><br /> Thanks, Richard.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-19041072499726323332009-12-01T08:40:00.002-06:002009-12-01T09:12:15.312-06:00Conventional versus electron flow<div style="text-align: center;"><i>The nice thing about standards is that there are so many of them to choose from.</i><br /></div><br />Having a couple of motorcycles means that I also own a couple of battery chargers. During the winter, you need to keep them charged when not riding, since letting them go completely flat wrecks the battery. Even during riding season, the combination of big engines, small batteries, and periods when one does not ride much, you can don the helmet and leathers, stride confidently to the machine of choice, and get .... nothing. I was charging the 'buza's battery the other day, and asked the simple question: "which way does electricity flow"? Sure, in a DC system, electricity flows from the positive (red) to the negative (black) posts. That's odd. Why would negatively charged electrons move from the positive to the negative? Shouldn't they move from negative to positive? The amount of misinformation on the Internet regarding this simple question is staggering. Thankfully, Andrew Tanenbaum provides the answer in his discussion of "<a href="http://www.allaboutcircuits.com/vol_1/chpt_1/7.html">Conventional versus electron flow</a>" in the very helpful <a href="http://www.allaboutcircuits.com/">All About Circuits</a> site. <br /><br />There are two ways to consider the direction of electricity flow, which are pretty much contradictory. The "conventional" view, where electricity flows from positive to negative, dates back to Franklin. He detected flow and assumed that there was a surplus of charge (hence positive) on one pole and a lack of charge on the other (hence negative). Of course, many years later, it was found that the true flow of electrons was the opposite, from negative to positive. <br /><blockquote><p> By the time the true direction of electron flow was discovered, the nomenclature of "positive" and "negative" had already been so well established in the scientific community that no effort was made to change it, although calling electrons "positive" would make more sense in referring to "excess" charge. You see, the terms "positive" and "negative" are human inventions, and as such have no absolute meaning beyond our own conventions of language and scientific description. Franklin could have just as easily referred to a surplus of charge as "black" and a deficiency as "white," in which case scientists would speak of electrons having a "white" charge (assuming the same incorrect conjecture of charge position between wax and wool). </p> <a name="Conventional flow"></a> <a name="Electron flow"></a> <a name="Flow, electron vs. conventional"></a> <p>However, because we tend to associate the word "positive" with "surplus" and "negative" with "deficiency," the standard label for electron charge does seem backward. Because of this, many engineers decided to retain the old concept of electricity with "positive" referring to a surplus of charge, and label charge flow (current) accordingly. This became known as <i>conventional flow</i> notation</p></blockquote>He goes on to discuss the distinction in useful detail, concluding<br /><blockquote>I sometimes wonder if it would all be much easier if we went back to the source of the confusion -- Ben Franklin's errant conjecture -- and fixed the problem there, calling electrons "positive" and protons "negative."<br /></blockquote>Mystery resolved. Now, if I can only figure out how to get light bulbs out of sockets on ceiling fan/light systems without breaking them, I will be forever grateful. The vibration of the fan tends to wedge them in pretty tightly.....Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-56746570659475084342009-11-25T15:08:00.003-06:002009-11-25T15:16:58.244-06:00Projekt DeutschDiachronDigitalAlain suggests that <a href="http://www.deutschdiachrondigital.de/">Projekt DeutschDiachronDigital </a>is involved in some interesting efforts that might be related to some work we are doing. There are a number of useful papers on the project's <a href="http://www.deutschdiachrondigital.de/publikationen/index.php">publication list</a>, including<br /><blockquote><span style="font-size:85%;">Lukas C. Faulstich, Ulf Leser und Anke Lüdeling. <a href="http://www.deutschdiachrondigital.de/publikationen/DDDsearch.pdf"><em>Storing and Querying Historical Texts in a Relational Database</em></a>. Informatik-Bericht Nr.176 des Instituts für Informatik der Humboldt-Universität zu Berlin, Februar 2005<br /><br />Lukas C. Faulstich, Ulf Leser und Thorsten Vitt. <a href="http://www.deutschdiachrondigital.de/publikationen/qlqp.pdf"><em>Implementing a Linguistic Query Language for Historic Texts</em></a>. Query Languages and Query Processing (QLQP-2006): 11th Intl. Workshop on Foundations of Models and Languages for Data and Objects (FMLDO), 2006.</span></blockquote>Interesting to see they are using SQL to power this project.<br /><br /><br /><br /><a href="http://www.deutschdiachrondigital.de/publikationen/DDDsearch.pdf"><em></em></a>Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-34310677839686212442009-11-23T13:43:00.003-06:002009-11-25T10:37:19.612-06:00Geoffrey Rockwell's DHCS NotesGeoffrey Rockwell has posted his <a href="http://dhcs2009.iit.edu/">DHCS</a> Notes (<a href="http://www.philosophi.ca/pmwiki.php/Main/DigitalHumanitiesAndComputerScience">link</a>), which includes a rather provocative declaration that we might be witnessing the end of Digital Humanities:<br /><br /><strong></strong><blockquote><span style="font-size:85%;"><strong>The End of Digital Humanities</strong> I can't help thinking (with just a little evidence) that the age of funding for digital humanities is coming to an end. Let me clarify this. My hunch is that the period when any reasonable digital humanities project seemed neat and innovative is coming to an end and that the funders are getting tired of more tool projects. I'm guessing that we will see a shift to funding content driven projects that use digital methodologies. Thus digital humanities programs may disappear and the projects are shunted into content areas like philosophy, English literature and so on. Accompanying this is a shift to thinking of digital humanities as infrastructure that therefore isn't for research funding, but instead should be run as a service by professionals. This is the "stop reinventing wheel" argument and in some cases it is accompanied by coercive rhetoric to the effect that if you don't get on the infrastructure bandwagon and use standards then you will be left out (or not funded.) I guess I am suggesting that we could be seeing a shift in what is considered legitimate research and what is considered closed and therefore ready for infrastructure. The tool project could be on the way out as research as it is moved as a problem into the domain of support (of infrastructure.) Is this a bad thing? It certainly will be a good thing if it leads to robust and widely usable technology. But could it be a cyclical trend where today's research becomes tomorrows infrastructure to then be rediscovered later as a research problem all over.</span></blockquote>Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-56052089383712954382009-11-23T11:39:00.002-06:002009-11-23T11:51:47.311-06:00TXM Search EngineSerge Heiden suggests that we look at the <a href="http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/">CQP</a> (Corpus Query Processor) and its successors which he is using in <a href="http://textometrie.ens-lsh.fr/">TXM</a>: <br /><br />-- Tiger Search : <a href="http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/" target="_blank">http://www.ims.uni-stuttgart.<wbr>de/projekte/TIGER/TIGERSearch/</a><br />- the corresponding PhD (in german) : <a href="http://www.ims.uni-stuttgart.de/projekte/corplex/paper/lezius/diss/" target="_blank">http://www.ims.uni-stuttgart.<wbr>de/projekte/corplex/paper/<wbr>lezius/diss/</a><br />-- NXT Search : <a href="http://www.ims.uni-stuttgart.de/projekte/nite/" target="_blank">http://www.ims.uni-stuttgart.<wbr>de/projekte/nite/</a><br />- related doc and technical papers :<br />- <a href="http://groups.inf.ed.ac.uk/nxt/nxtdoc/docnql.xml" target="_blank">http://groups.inf.ed.ac.uk/<wbr>nxt/nxtdoc/docnql.xml</a><br />- <a href="http://www.ltg.ed.ac.uk/NITE/documents/NiteQL.v2.1.pdf" target="_blank">http://www.ltg.ed.ac.uk/NITE/<wbr>documents/NiteQL.v2.1.pdf</a><br />- <a href="http://www.ltg.ed.ac.uk/NITE/papers/NXT-LREJ.web-version.ps" target="_blank">http://www.ltg.ed.ac.uk/NITE/<wbr>papers/NXT-LREJ.web-version.ps</a><br /><br />Since we're looking at various interesting models, I don't want to forget the CDL's <a href="http://www.cdlib.org/inside/projects/xtf/">XTF</a> (eXtensible Text Framework).Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-16276360346612039242009-11-13T10:19:00.002-06:002009-11-13T10:34:13.408-06:00International Journal of Motorcycle StudiesRan across a call for papers for <a href="http://www.h-net.org/announce/show.cgi?ID=171916">INTERNATIONAL JOURNAL OF MOTORCYCLE STUDIES CONFERENCE</a>, Colorado Springs, Colorado, June 3-6, 2010 with links to the <a href="http://ijms.nova.edu/">journal</a>. Biographical statement and an abstract of 150 words by January 15, 2010. Aside from pulling an abstract together, the only serious question is whether to take the <a href="http://www.motorcyclecruiser.com/roadtests/honda_shadow_aero_1100/index.html">Areo</a> or the <a href="http://en.wikipedia.org/wiki/Suzuki_Hayabusa">Buza</a> out to the conference.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-84826206586432209582009-11-12T13:40:00.003-06:002009-11-13T10:55:32.926-06:00DHCS 2009The 4th annual <a href="http://lingcog.iit.edu/%7Edhcs2009/">Chicago Colloquium on Digital Humanities and Computer Science</a> (DHCS) is fast approaching. This year's festivities are hosted by Shlomo Argamon and his collaborators at the Illinois Institute of Technology, November 14-16. The <a href="http://lingcog.iit.edu/%7Edhcs2009/fullprogram.html">program</a> is interesting and wide ranging and I am particularly looking forward to the presentations by our <a href="http://lingcog.iit.edu/%7Edhcs2009/keynotes.html">keynote speakers</a>. Several of the ARTFL group will be giving presentations at the pre-conference meetings and workshops on Saturday, also know as our "<a href="http://lingcog.iit.edu/%7Edhcs2009/workshops.html">Birds of a Feather</a>" meeting. Clovis and I will be talking about recent work, based in part on two talks, "<a href="http://docs.google.com/present/view?id=ddj2s2rb_208dw9mcntk&skipauth=true">From Words to Works</a>" and "<a href="http://docs.google.com/present/view?id=ddj2s2rb_3f3cjsp8c&skipauth=true">PAIR/PhiloLine</a>", as well as some of the more recent work on <a href="http://artfl.blogspot.com/search/label/Topic%20modeling">topic modeling</a>. I also prepared a more technical PhiloLogic overview, demonstration, followed by a discussion of database loading and configuration (<a href="http://docs.google.com/present/view?id=ddj2s2rb_54dtntgcp6">slides</a>), just in case I need one. The second half should probably be expanded at one point, since I have had many requests for more extensive documentation on loading and configuring databases in PhiloLogic.<br /><br />Other links: PhiloLine/PAIR installations for <a href="http://artfl-project.uchicago.edu/node/91">ARTFL Frantext and the Encyclopédie</a>.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-85787712326652385142009-11-10T11:26:00.002-06:002009-11-10T11:37:33.511-06:00Plaintext in PhiloLogicA while back we added a plaintext loader in <a href="http://philologic.uchicago.edu">PhiloLogic</a> at the request of several folks who wanted to work with documents from the <a href="http://www.gutenberg.org/wiki/Main_Page">Gutenberg Project</a>, <a href="http://www.liberliber.it/">Liber Liber</a> and (many) other archives of unencoded or minimally encoded documents. Other use cases for a plaintext loader include direct loading of OCR output and downloading E-PUBs from Google, which can also be <a href="http://artfl.blogspot.com/2009/09/epub-to-tei-lite-converter.html">converted to TEI</a> as an alternative. I suspect that we will want to retain plaintext loading for implementations of PhiloLogic, since many folks appear to have significant restrictions on accessing materials from various vendors. In a <a href="http://dhigger.blogspot.com/2009/08/creating-corpus-1-pulling-texts.html">recent blog post</a>, Devin Griffiths described his examination of MONK and ProQuest data, deciding to assemble his own corpus from Project Gutenberg.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-57569764697578948842009-11-04T11:02:00.002-06:002009-11-04T11:46:12.475-06:00Find Installed Perl ModulesHere is a helpful one-liner to find installed perl modules, thanks for <a href="http://www.sitepoint.com/blogs/2004/09/02/finding-installed-perl-modules/">Blane Warrene</a>:<br /><br />perl -MFile::Find=find -MFile::Spec::Functions -lwe 'find { wanted => sub { print canonpath $_ if /\.pm\z/ }, no_chdir => 1 }, @INC'<br /><br />There is also an interactive function called <a href="http://www.cyberciti.biz/faq/how-do-i-find-out-what-perl-modules-already-installed-on-my-system/">instmodsh</a>.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-13785865740666162232009-10-30T13:52:00.003-05:002009-10-30T13:54:52.675-05:00Apache PDFBoxWe have had a number of PDF oriented projects in the past little while. Richard has brought to my attention an Apache Incubator project, <a href="http://incubator.apache.org/pdfbox/">PDFBox</a>, which may be very handy for future work. In addition to the normal goodies one would expect, it supports"Lucene Search Engine Integration". Something to keep in mind.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-71638656766530938262009-10-27T12:42:00.002-05:002009-10-27T12:57:45.720-05:00Textométrie ProjectThe <a href="http://textometrie.ens-lsh.fr/">Textométrie Project </a>is a multi-institutional and multi-disciplinary effort to develop an open source and distributed platform for sophisticated, often quantitative, text analysis led by <a href="http://textometrie.ens-lsh.fr/spip.php?article9">Serge Heiden</a> and his collaborators. There is a useful discussion of the French tradition of <a href="http://textometrie.ens-lsh.fr/spip.php?article69#definition">textométrie</a> and how this fits into other modes of text processing and mining, along with some recent <a href="http://textometrie.ens-lsh.fr/spip.php?article62">publications</a>, and <a href="http://textometrie.ens-lsh.fr/spip.php?article70">links</a> to other software and resources. Alpha code is available on <a href="http://sourceforge.net/projects/textometrie/">Sourceforge</a>.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-39600540957596826092009-10-27T12:25:00.002-05:002009-10-27T12:33:49.818-05:00OMNIA ProjectOur colleagues at the <a href="http://www.enc.sorbonne.fr/">Ecole nationale des chartes</a> are working with other researchers in France on the <a href="http://cem.revues.org/index11086.html">OMNIA</a> (Outils et Méthodes Numériques pour l’Interrogation et l’Analyse des textes médiolatins) project, a four year effort to develop an interactive encyclopedia of medieval Latin.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-57697403776029512452009-10-26T16:55:00.003-05:002009-10-26T17:03:11.115-05:00Conference: Online Humanities Scholarship<a href="http://www.shapeofthings.org/">Online Humanities Scholarship: The Shape of Things to Come</a> "is a three day conference (March 26-8, 2010) to explore how to develop and sustain online humanities research and publication. Nine scholarly papers and eighteen responses will leverage discussion by a broad group of persons invited to the conference to contribute their expertise. This group includes scholars working on other projects and persons from funding agencies, publishers, museums, libraries, and professional organizations. The conference is closed to this group in order to provide maximum focus to the discussions."<br /><br />This looks to be very interesting indeed. Have a peek at <a href="http://www.shapeofthings.org/resources.html">resources</a> and <a href="http://www.shapeofthings.org/participants.html">participants</a>. Papers and responses are to be posted well in <a href="http://www.shapeofthings.org/advance.html">advance</a> of the meeting itself. Certainly something to keep track of.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-26186326404633394722009-10-25T11:00:00.005-05:002009-10-25T11:52:16.055-05:00Total Perspective VortexThinking about building a <a href="http://artfl.blogspot.com/2009/10/encyclopedie-renvois-searchlinker.html">renvois navigation</a> scheme, with some kind of visualization, for the <a href="http://encyclopedie.uchicago.edu/">Encyclopédie</a>, reminded me of the <a href="http://en.wikipedia.org/wiki/Technology_in_The_Hitchhiker%27s_Guide_to_the_Galaxy#Total_Perspective_Vortex">Total Perspective Vortex</a> from the <a href="http://en.wikipedia.org/wiki/The_Hitchhiker%27s_Guide_to_the_Galaxy">Hitchhiker's Guide to the Galaxy</a>, the greatest selling electronic book in the history of the universe. It is important to note that "in an infinite universe, the one thing sentient life cannot afford to have is a sense of proportion." Thankfully, the renvois system is finite, so we won't risk brain vaporization. The original radio broadcast is available in bits and pieces on <a href="http://www.youtube.com/profile?user=XistundTheBookKeeper#p/u/48/ZnJFo2hSe7Q">YouTube</a>, with Don't Panic in large, friendly letters as the video track. :-) The Guide's best advice is, aside from Don't Panic, "<a href="http://www.youtube.com/profile?user=XistundTheBookKeeper#p/u/14/M9hOOfZz0SY">expect the unexpected</a>".Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-16142361438403906982009-10-21T18:09:00.005-05:002009-10-22T08:50:04.000-05:00Arbre généalogique: Static ImageWe periodically get requests for a high resolution image of the splendid representation of the organization of knowledge in the <a href="http://encyclopedie.uchicago.edu/">Encyclopédie</a> called <a href="http://encyclopedie.uchicago.edu/node/130">ESSAI D'UNE DISTRIBUTION GÉNÉALOGIQUE DES SCIENCES ET DES ARTS PRINCIPAUX</a> de <span>Chrétien Frederic Guillaume Roth (1769), </span>which we have put up under Zoomify. The static image is a 10 MB jpeg file, available <a href="http://artfl.uchicago.edu/cactus/cactus.jpg">here</a>. Browsers beware. I like this image so much, I <a href="http://www.scienceandsociety.co.uk/results.asp?image=10318940">purchased</a> a large reproduction and had it nicely framed. Yes, the framing cost more than the reproduction. Isn't that always the case? Manuel Lima mentions the <a href="http://www.visualcomplexity.com/vc/project.cfm?id=387">Essai</a> to his stunning array of visualizations at <a href="http://www.visualcomplexity.com/vc/">Visual Complexity</a>, which is well worth the visit, and linked it to a modern <a href="http://www.visualcomplexity.com/vc/project.cfm?id=288">interactive representation</a> of the <a href="http://encyclopedie.uchicago.edu/node/90">Système Figuré des Connaissances Humaines</a> by <span class="bodytext">Christophe Tricot</span>. The <a href="http://quod.lib.umich.edu/d/did/index.html">Encyclopédie Collaborative Translation Project</a> has released an <a href="http://quod.lib.umich.edu/d/did/tree.html">English translation </a>of the Système Figuré.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-40678182973769873862009-10-21T11:46:00.004-05:002009-10-21T18:35:10.582-05:00Marti Hearst, Search User InterfacesI have been reading Marti Hearst's excellent <span style="font-style: italic;">Search User Interfaces,</span> which is fully available at <a href="http://www.searchuserinterfaces.com/">http://www.searchuserinterfaces.com/</a>. Of particular interest to me is her chapter on <a href="http://searchuserinterfaces.com/book/sui_ch11_text_analysis_visualization.html">Information Visualization for Text Analysis</a>. She writes "the categorical nature of text, and its very high dimensionality, make it very challenging to display graphically" and goes on to present a number of ways to handle display of text analysis results from concordances to directed graphs. This is certain something to consider for any future renovation of <a href="http://philologic.uchicago.edu/">PhiloLogic</a> and our related systems. We do have <a href="http://artfl.blogspot.com/2009/08/collocation-notes.html">collocation clouds</a> and I did a quick implementation of word frequency histograms (<a href="http://philologic.uchicago.edu/wiki/index.php/Optional_Code#Simple_Time_Period_Histogram_by_Rate.2F1000_words">link</a>) in PhiloLogic. But these are very rudimentary. Some of the examples in Hearst's a quite remarkable and we might want to model extensions of PhiloLogic on some of these.<br /><br />One final note for you scribblers out here. She has a couple of entries on <a href="http://www.searchuserinterfaces.com/blog/" target="_blank">http://www.<wbr>searchuserinterfaces.com/blog/</a> about how she talked her publisher (Cambridge) to let her put the book online for free and why. :-)<br /><br />An important and visually compelling site/book.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-38524033418609811152009-05-18T15:21:00.002-05:002009-10-21T11:36:51.664-05:00Yoga for cyclistsRiding season has started again, so this should be obvious <br />[<a href="http://www.youtube.com/v/YC4JzHJr_6Y&hl=en&fs=1">YouTube</a>].Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-26767235139006030002008-07-02T12:02:00.009-05:002008-07-02T13:12:14.407-05:00Shingles and Near Duplicate DetectionSergei Vassilvitskii of Yahoo! has a useful <a href="http://www1.cs.columbia.edu/%7Eradev/set/DupeDetection.pdf">ppt</a> describing work to identify duplicate and near duplicate pages on the Web using shingles. Claims that 25%-40% of all WWW documents are duplicates or near duplicates. Hashing of documents cannot identify near duplicates while edit distance will not scale. Uses a hash of a small number of shingles (ngrams), calculating similarity by rate at which mini-hashes agree. Also has a useful discussion of <a href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard</a> similarities. Talk is based on Andrei Broder's (AltaVista and Yahoo!) work, described in <a href="http://www.citeulike.org/user/markymaypo/article/2939265">Identifying and filtering near-duplicate documents</a> and previous papers cited there. There are other commercial applications of this approach, such as <a href="http://www.equivio.com/">Equivio</a>'s near duplication identification service which uses a related <a href="http://www.equivio.com/FAQ.shtml#T8">similarity measure</a>.<br /><br />While I am at it, have a look at <a href="http://glinden.blogspot.com/2008/04/detecting-near-duplicates-in-big-data.html">Detecting Near Duplicates in Big Data</a> for pointers to recent work at Google on the same problem. Also, the recent <a href="http://www.uni-weimar.de/medien/webis/research/pan-07/program.html">International Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN)</a>.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-44133012574147673472008-06-24T13:42:00.004-05:002008-06-24T14:01:13.166-05:00Datawocky: More data and human evaluationAnand Rajaraman in Datawocky makes the case that <a href="http://anand.typepad.com/datawocky/2008/03/more-data-usual.html">more data usually beats better algorithms</a> by reference to the NetFlix challenge and provides a little more detail in <a href="http://anand.typepad.com/datawocky/2008/04/data-versus-alg.html">part two</a> of the same post. He also notes that Google continues to use human evaluation as part of their search algorithm tuning in <a href="http://anand.typepad.com/datawocky/2008/05/are-human-experts-less-prone-to-catastrophic-errors-than-machine-learned-models.html">Are Machine-Learned Models Prone to Catastrophic Errors?</a> suggesting that machine learning, based on seen instances, can suffer from the "Black Swan" problem. Finally, he makes the case, based on another blog entry, that one should <a href="http://anand.typepad.com/datawocky/2008/06/change-the-algorithm-not-the-dataset.html">Change the algorithm, not the dataset</a> if your approach can't handle the scale of data you are throwing at it. Interesting comments all. A blog to watch.Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-15205211156255206792008-05-26T13:38:00.007-05:002009-10-21T11:33:53.712-05:00From Words to Works: Machine Learning and Text Mining at ARTFLI recently had the opportunity to present an overview of our current work in machine learning and text mining to the 2008 meeting of Technological Innovation and Cooperation for Foreign Information Access (TICFIA) meeting held in <a href="http://dsal.uchicago.edu/workspace/ticfia_2008/">Chicago</a> on the first of May. [<a href="http://docs.google.com/Present?docid=ddj2s2rb_46d8zzbfff&skipauth=true">slides</a>]Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.comtag:blogger.com,1999:blog-2808969073217326320.post-79016966752906916832008-05-21T11:46:00.006-05:002009-10-21T11:37:51.661-05:00Similarity as a Scholarly PrimitiveI gave this <a href="http://projectbamboo.uchicago.edu/what-foursix">4/6 talk</a> at the Chicago <a href="http://projectbamboo.uchicago.edu/">Bamboo Project</a> Workshop last week. I used Google's Presentation system in place of Powerpoint, which allows you to present with only a browser and to embed the talk in posts. Very handy, particularly since one can collaborate with others and provide links to the full screen presentation [Click <a href="http://docs.google.com/Present?id=ddj2s2rb_79hpmrgphh&skipauth=true">here</a>].Markhttp://www.blogger.com/profile/01834980565423639300noreply@blogger.com