Ian Field - Summer work

I have been fortunate enough to have been asked to work at the University of Reading to assist a student with research on their PhD throughout July in order to:

…helping to drive forward part of an exciting research project at the cutting edge of natural language processing and interacting with linguistics. The project is setting out to capture data via the web from human participants to validate themes produced by existing automated processes.

Which in other words is utilising the web and social media as a promotion platform to present participants with a short article to read, and then present them with keywords or phrases which are to be selected based upon the perceived relevance of them.

The requirements of the task were detailed as web design, and programming being non-essential (I quickly found this to be essential!).

The technologies I decided upon for the ease of development take up were PHP and MySQL. These are both technologies I have dabbled with before, and they have the added bonus of being runnable on my local machine without hosting for a faster development and test cycle.

I referred back to my personal website for some refreshers on CSS and database queries in PHP. Which has at last proven to not be a waste of my time.

First of all I began with creating two templates for keyword selection. One with groups of keywords, which made it clear that there were specific groups to select from, these groups are of undetermined origin however. The PhD student's algorithm, another keyword algorithm and the possibility of using chance were options. The second template I created was that of individual check boxes for each keyword or phrase. This was the chosen presentation format to support anonymity of they keyword's origin to prevent the malicious or false data from being collected which may have arisen from this.

I next set about writing some JavaScript to ensure repeated null entries at least required more interaction with the web page than clicking the submit button. The extra requirement of a selected radio button helped achieve this. There was discussion of using a CAPTCHA to achieve this. However, due to the frustration and time that these may require to fill in, and the ideal aim of having participants contributing to multiple article's data, it was decided that they would not be used.

To further deal with the possibility of such results it was decided that the database entries would track certain information on the result in order to later be pruned if deemed necessary.

Unfortunately at this moment I do not have a publicly available working version of the site to link.