Friday, September 25, 2009

Week 4 (or 5) Sept. 22-29

Muddiest Point

I would like to learn more about the underlying structure of various databases I use as a researcher. How does a keyword search in a database look like from a technical perspective? Do keyword searches differ -- from a technical perspective-- from searches based on controlled subject headings, such as the LC subject headings? And how do databases, such as JSTOR, rank results -- what is the basis for the ranking?

Reading notes:

I enjoyed reading the behind-the-scenes report on the production of the digital Imagining Pittsburgh Collection Pittsburgh collection, produced under the lead of the DRL with an IMLS grant. It was not only interesting from a technical perspective, but was also a refreshingly candid project description (often, project reports are rather self-congratulatory and don’t mention the difficulties posed by larger digitizing projects to collaborate, and to deal with technical challenges, content, and different organizational backgrounds all at once). It highlighted the challenges of the three institutions that were collaborating on the project, of agreeing on shared standards, while also serving the individual interests of each institution. A major challenge for many digitization projects is the selection of the images that should be digitized and Galloway underlined that the subject headings that the project created were key for the selection of which images to digitize. After last week’s reading’s, the paragraphs on metadata were interesting, and highlighted how the Dublin Core elements were critical in ensuring the interoperability of the metadata of the individual institutions. From a variety of options, they project participants agreed on using the LC subject headings for the description. Galloway also addressed different workflow challenges, and the difficulties of working with different databases in different projects. They agreed on the quality for the production masters (600 dpi) that ensured the uniform quality of the images. (The quality of the production master also allows to look at different sizes of the image, and to magnify parts of individual images when exploring the collection online). Finally, it outlined the challenges allowing users to find different ways to explore the collections as a whole, and individual images.

I also looked at the site, and the reader can do subject searches, keyword searches, searches by collection. You can explore by time, location, collection, or theme. You can also look at images with captions, with full record, or just captions. It really offers a lot of ways to search and explore.

Has anyone looked the experimental visualization prototype, the Bungee View, in more detail?

Compression
Compression is a huge issue in multi-media collections, so the articles were very enlightening – I didn’t understand all the details about the different algorithms, but it clarified the principles of compression, and, in the section on video compression, the differences between a video file and a video stream. Unfortunately, the link to the part of the article on lossy compression did not work...The advantages of compression are clear – they save space on expensive storage devices. On the other hand, it also creates huge problems for archives, which have to deal with files in x many formats, many of which are in compressed, often proprietary formats, so they aren’t archival quality to begin with. The pressure to compress video files is even greater than for audio files, because they are so big – uncompressed video would take up an enormous server space. So, many archives just don’t have the money to buy all that server space, and have no choice but to save the files in a compressed format. So, in a different way than for paper, space continues to be a huge problem.

Just wanted to double check – once a file is in a compressed, lossy, format, you can not just uncompress the file – the missing data is gone, is it?

Comments

Commented on Tiffany J. Brand's blog:
http://tiffanybrandlis2600.blogspot.com/

And Letisha Goerner's blog:

http://letishagoerner2600.blogspot.com/