![]() |
|
![]() |
SRB workshopDemonstration 2: Interfacing the SRB with CCLRC RCommands for metadata insertion |
The objective of this demonstration is to explore how the SRB can be interfaced with some new CCLRC metadata tools to provide a more sophisticated method for data management, archiving and (eventually) retrievel. The plan is to give you the scope to explore the tools in whatever way you like, rather than guide you through a set of pre-determined steps.
You will need some data in the SRB to work with. This is somewhat more tricky to set up than at first sight you might imagine. If we simply provided you with sets of files of numbers, it would be very hard for you to see what the files are. This is why the use of metadata is so important: if the files are not self-described (ie if a file doesn't contain some sort of header telling you exactly what is in the file) and not organised well, you need some information to be provides the basic information about the file. This demonstration is about adding metadata and organisation to files of data. The most obvious (to us) type of file that is sufficiently self-describing to be useful for a demonstration is a publication, and one option for this demonstration is a set of pdf files containing publications on escience on which one or more NIEeS staff are co-authors. We have organised these into three groups which can be obtained from:
There is no need to download all these files for this demonstration; you just need enough to be able to play with. Download enough to put into the SRB as per yesterday's demonstration.
Before you start to compute in earnest, you should think about data organisation. The tools used in this demonstation will assume a three-layer hierarchy:
One important point should be noted: the study and dataset levels are completely abstract. In contrast, the data objects correspond to URIs (see this wikepedia reference) that point to real objects, including (but not exclusively so) files or collections of files in the SRB.
You should not feel constrained by this hierarchy. For example, you may feel that your whole life's work is one study, so that this level has little meaning. On the other hand, you may feel that any one study should only have data objects. This hierarchy has many interpretations and should be used in the way that best suits the investigator.
It is possible to add metadata to each of these levels. Within the framework of the tools you will be using, each level will have and ID number that is used in the scriptable RCommands.
There are two aspects of this work that we won't be too concerned about here but which you will see glimpses of. First, one of the requirements for metadata is so that you can share data, and if a set of data is annotated with an appropriate set of metadata the need for a colleague to keep asking you what something means will be eliminated. Thus the data organisation also includes the the concept of other investigators who may actually be co--owners of data or people who want to share your data with. Second, it is possible to associate data with topics to better enable colleagues to browse for data.
You will conclude from this demonstration that adding metadata to files or collections of files can be a very tedious business. That is why metadata continues to be a challenge to the community, in spite of the fact highlighted in the introduction that without metadata it is very difficult to attach meaning to files of data in a useful way.
One approach, which is particularly useful for studies that involve simulations or computer-based analysis of data, is to have scriptable commands to add metadata. This means that creation or metadata can be semi-automated. The RCommands represent one implementation of this approach. The RCommands work in ways that are analogous the Scommands, and will apply to data that are held within the SRB (although they could also apply to files held within a FTP server or on a web page). The RCommands will insert and modify metadata held within a central metadata server.
There are only ten RCommands, with detailed descriptions provided in the links.
To use the RCommands, use the Putty ssh client tool to log in to one of the NIEeS linux machines. Details of the IP address and username/password you should use are provided, as per the demonstration yesterday on the Scommands. First you should look at the essential configuration files contained within the .rcommands follder using the commands:
cd .rcommands
ls -a
cat rcommands.config
Now initiate an RCommand session using the Rinit command. You can test that all is well by typing the Rls command: it will return a message telling you (correctly at this point) that you have no studies. To get information about other commands, you can simply type the command name with no arguments, you can use the unix man command, or you can look at the web pages above (which copy from the man pages). If you make any mistakes that you want to remove, this can be done using the metadata edit outlined in section 4 below.
First use the Rcreate command to create a study level. To use Rcreate you will need to give the study a name, add a description, and assign it to a topic, via:
Rcreate -n <name> -k <description> -t <topicID>
First you should think about the topic. You can list all topics by the command
Rls -t
Chose a topic and note the number; this will be the topicID label. If you can't decide, just make an arbitrary choice; for the purpose of this exercise it doesn't matter. Run the Rcreate command to create a study. The name and description labels can contain more than one word within quotes. For example:
Rcreate -n "Workshop papers" -k "Papers for workshop" -t 4
Now check that this has worked by running the Rls command. This will return information like
-------------------------
StudyID: 1026
Name: Workshop papers
-------------------------
where the StudyID number will differ for different people. Now look at this in more detail using the Rget command:
Rget -s studyID
where you add your StudyID number. For the example above:
Rget -s 1026
gives
-------------------------
StudyID: 1026
Name: Workshop papers
Description: Papers for workshop
Created by: martin dove
Status: In Progress
Start_date: 07-01-2006
-------------------------
Now we want to add some data sets to the study. Following the example of pdf publications, we could create some datasets by
Rcreate -s 1026 -n "Papers on grid computing"
Rcreate -s 1026 -n "Papers on data management"
Rcreate -s 1026 -n "Papers on collaborative tools"
Rcreate -s 1026 -n "Papers on escience applications"
Each invocate will create a DatasetID, as will be echoed to the screen. Now check on the results of these commands by
Rls -s 1026
This will show you the DatasetID for each dataset (again, different users will get different numbers). You can look at any one dataset by using the command
Rget -d DatasetID
where you use the appropriate number of each DatasetID.
Now we will add some metadata against each data set. For this we use the Rannotate command. The first is to add a brief description to the dataset. In my example, when I run Rls - s 1026 I get
-------------------------
Dataset ID: 26
Dataset Name: Papers on grid computing
Parent StudyID: 1026
-------------------------
Dataset ID: 27
Dataset Name: Papers on data management
Parent StudyID: 1026
-------------------------
Dataset ID: 28
Dataset Name: Papers on collaborative tools
Parent StudyID: 1026
-------------------------
Dataset ID: 29
Dataset Name: Papers on escience applications
Parent StudyID: 1026
-------------------------
We can use the Rannotate command in in two ways. First we can add a description to the dataset. My example is
Rannotate -d 29 -k "Collection of papers on escience applications"
Second we can add some name pairs. My example is
Rannotate -d 29 -p topic=escience
Rannotate -d 29 -p topicarea=applications
Running the Rget -d 29 command to view the metadata gives
-------------------------
DatasetID: 29
Name: Papers on escience applications
Parent StudyID: 1026
Created by: martin dove
Creation_date: 07-01-2006
Description: Collection of papers on escience applications
-------------------------
Note that this shows the description but not the name pair values. To see the name pairs I need to use the command Rget -d 29 -p, which yields:
-------------------------
Parameter Name: topic
Parameter Value: escience
-------------------------
Parameter Name: topicarea
Parameter Value: applications
-------------------------
You can repeat this for other datasets, and you can be add whatever name/value pairs you like.
Finally we reach the point where we can add metadata to the data objects. You need to first have data somewhere, and in our case our data are in the SRB. The data object can either be a file or a collection of files within the SRB. The command for adding metadata to a data object is
Rcreate -u <url> -d <datasetID> -n <name>
The <url> specifies where the file is and has the form 'srb://<zone>/<collection>/<object>'. In general: <collection> is composed of '/home/<username>.<domain>/<sub collection1>/.../<sub collectionN>'. An example might be 'srb://Test/home/nieessrb40.srbdom/test.dat'. The <datasetID> gives the dataset that you want to associate the file with, and <name> is the name you want to give the data object.
You then add metadata with the Rannotate command in the same way that you added name/value pair metadata to the datase:
Rannotate -o dataObjectID -p <name>=<value>
where you get the object dataID from the dataset using the command Rls -d <datasetID>. Hopefully by now you are getting more familiar with the various ID labels: studyID, datasetID and now dataObjectID for the study, dataset and data object respectively.
As before, you can use the Rget command to get the metadata from a data object:
Rget -o <dataObjectID> -p
The power of metadata comes down to what you do with it! The Rcommands provide for this with the Rsearch command. There are several ways to use this command:
Rsearch -s studyID -p <name>=<value>
Rsearch -d datasetID -p <name>=<value>
Rsearch -d datasetID -k <keyword>
Rsearch -o dataObjectID -k <keyword>
Once you have created enough metadata you can experiment with the Rsearch command.
The RCommands give you a command-line level access to metadata, but the main design requirement driving the development of the RCommands has been the need for scriptable commands. They become rather unwieldy for users with many files, and thus there is an associated portal called the "metadata manager". It allows you to create the study and dataset organisation levels, and incorporate data objects with reference to their uri. It allows you to display, add and edit metadata in the same way that you can do so with the RCommands, but with a web interface. This gives some advantages in enabling you to view the organisation of files and to navigate around them.
You access the metadata manager from your web browser by connecting to
You will see a page that looks like ...

Because security is handled by X.509 digital certificates, before you can login you need to transfer your digital certificate. Your certificate has been set up for you by NIEeS. It should be stored in two files, usercert.pem and userkey.pem, in the folder called 'cert/<userID>/.globus' on the desktop, to be used as per the instructions below. Each certificate will have a passphrase which will be given to you. Check that these files exist, and ask if they appear to be missing (there may be a configuration preference set to hide the files that needs to be changed).
The upload is done using a java program which you must download from the Metadata Manager. Click on the link called "eMinerals Proxy". This downloads and launches the tool. You will be presented with the following window.
Stick with the default values as shown in the above screen shot, and press the OK button.
There is a warning here you should know about. Due either to Microsoft's implementation of java or to a coding error not seen on other operating systems, there is a chance that at this stage or at any other stage things may appear to stall. Don't panic. Move the window, and chances are that you will see the following error window lurking unseen behind the main window:
Pressing the OK button is enough for the tool to move on as if nothing untoward had happened.
Now follow two security warnings, as per the following two windows:
The answer Yes is appropriate in both cases. You should eventually get to the main tool.
Now run the "Config>Setup Cog" menu item. You will be directed through a set of pages. First you will need to provide information about the location of your two certificate files:
Press the ... browse buttons to look for your certificates. As noted above, the certificates will be obtained from 'cert/<userID>/.globus' on the desktop. The User Certificate is called 'usercert.pem', and the User Private Key is called 'userkey.pem'. Add these details as per the window above, and press the Next>> button. This will give you the following window:
Here you need to provide information about the organisation that created your certificate (the Certificate Authority, in this case, NIEeS), which is held in another certificate. The main window will be empty, and you will need to press the Add button. The file you need is 'Desktop/cert/14e6F8Fd.0'. When added, press the Next>> button, which will give you yet another window in which you press the Finish button. Now run the "Config>MyProxy>Configure" menu item, which generates a window that looks like the following (albeit with some different data which you now need to edit):
You will need to provide the following information:
You complete the process with the Ok button.
Finally, run the "Config>MyProxy>Send" menu item.
You will need to provide the following information as per the screen shot above:
The result of sending your proxy is that you can log in to the metadata manager using the username/passphrase combination you created within the MyProxy upload tool, with the certificate information being transfered and active for the lifetime you created for it.
So now you can log into the metadata manager using the username and passphrase you created above. When you press the Logon button, you will find a page something like ...

It lists the studies you will have created with the RCommands, with the option to edit them (or even delete them) using the buttons on the right.
If you click on the "Open Study" button you will see a page like ...

Note that the information associated with a study includes both the topic (which is probably arbitrary at this stage) and other investigators you may like to have associated with the study. If you should create a new study using the metadata editor, you will be asked to associate a topic with the study, and you can optionally add an investigator. Both can be edited.
Clearly you can edit the metadata or add a new dataset (using the same terminology as you hopefully mastered with the RCommands). If you add a dataset, you will again be asked whether you want to create metadata. If you have already added some datasets (which you will have done in the previous section) you will see the datasets in the bottom section of the page. Once you have added a dataset, you will be able to add a new data object, as you did with the RCommands.
The menu items on the left provide you with the ability to create a new study and to view your studies.
By now the concepts should be familiar, and you should be able to navigate around the metadata editor.
|
Last update: |