Technology Assignment Paper Help-Evaluation of online and web search

Evaluation of online and web search

In lecture 5: Web evaluation case study a real world evaluation is presented

which – had conducted in industry in order to establish the

usefulness of the Web Search engine being run by the company he worked for.

The assignment will require you to use the same techniques as this study, albeit

on a much smaller scale – we evaluated 50 queries, you will only do one. You will

also extend your evaluation to the online services e.g. ProQuest Dialog and other

image and video search services as well as social search and private engines.

The part requires you to complete a number of search and evaluation tasks.

These are to:

You will pick a topic and create a TREC style topic description for it (for an

example please review the topics used in the tutorials). You are free to

choose a topic of your choice e.g. related to your work, personal interests

etc. This topic needs to be current so that the social search systems can

be used to find information. You are required to do a facet analysis on

your topic. You will need to create an evaluation policy for your topic (the

narrative field should help you with this).

Using that facet analysis create appropriate search strategies to build ‘bag of

words’ queries and ‘Boolean’ queries with which to search the following

search services [Query types are listed with each search engine]:

– Google: http://www.google.co.uk – the most used Web Search engine.

[Boolean and ‘Bag of words queries]. Advanced Search URL:

https://www.google.co.uk/advanced_search

– Bing: http://www.bing.com/?cc=gb – the second most used Web Search

engine. [Boolean query and ‘Bag of words queries].

– Google Images: http://www.google.co.uk/imghp?hl=en&tab=wi –

Google’s image search engine. [‘Bag of words’ query only].

– Bing Images:

http://www.bing.com/?scope=images&nr=1&FORM=NOFORM – Bing’s

image search engine. [‘Bag of words’ query only].

– YouTube: http://www.youtube.com/?gl=GB&hl=en-GB – Google’s Video

Search. [‘Bag of words’ query only].

– Bing Video: http://www.bing.com/videos/browse – Bing’s Video Search.

[‘Bag of words’ query only].

– ProQuest Dialog:

http://search.proquest.com/professional/?accountid=143640 – an

online service with a command line interface. [Boolean query only].

– DuckDuckGo: https://duckduckgo.com/, – A Meta Search engine [‘Bag

of words’ and Boolean query].

– Social Searcher : https://www.social-searcher.com/ – A search engine

for social media search and analysis of user generated content. [‘Bag of

words’ query].

– Startpage: https://startpage.com/ – A private search engine with

Advanced Search. [Boolean and ‘Bag of words’ query]. Advanced search

URL: https://startpage.com/uk/advanced-search.html?hmb=1

– One other Online system of your choice e.g. Trip Database or Factiva. You

may use any of the systems listed on the Library A-Z link:

http://libguides.city.ac.uk/az.php

Do evaluations on those searches using those search services. Remember to

use the evaluation policy you’ve defined. Only look at the top 10 ranked

documents (usually just one screens worth). You must use the following

evaluation methods (lecture 5):

– Precision at 5 documents retrieved (P @ 5): this figure should be in the

range 0 to 1 for each search.

– Precision at 10 documents retrieved (P @ 10): this figure should be in the

range 0 to 1 for each search.

– Estimated Average Precision (EAP) for the top 10 documents: Assume

that for all queries there are at least 10 relevant documents: this figure

should be in the range 0 to 1 for each search.

– Rate of Repeated documents (RT): Record the number of duplicates per

search. This figure should be in the range 0 to 10.

– Link Broken (LB): Record the number of broken links per search. This

figure should be in the range 0 to 10.

– Not retrieved (NT): Record the total number of documents not retrieved by

that search. This figure should be in the range 0 to 10.

– Spam: Record the number of Spam documents per search. This figure

should be in the range 0 to 10.

You must use the following table to report your results. Please do not deviate

from this format.

You will need to make a number of assumptions, particularly with the diagnostic

measures. For example, how would you define a Spam document; what is a

repeated document? Some sites have mirrors across the world which may be

retrieved more than once etc. The evaluation metrics described above are

precision based: we are looking for anything that might affect this precision

(hence measures such as LB, NT & Spam). You will use the diagnostic

measures to examine deficiencies found in the precision scores.

Students should ensure that they tackle the following learning outcomes in their

report:

– Use a range of information retrieval systems and services to resolve

information needs.

– Evaluate information retrieval systems and services, by using appropriate

methodologies.

– Evaluate new developments in information retrieval research, understanding

the problems which new ideas in IR are attempting to address.

Coursework Deliverable

After completion you must produce a document that contains a report on Search

and Evaluation conducted – this should include:

Specification of the evaluation policy you have derived for your topic (Your

TREC topic plus the assumptions you made on diagnostic measures).

Your facet analysis of this information need, together with a discussion on how

you developed the facets. This should be a reflection on the process you used to

produce the final facet set.

Reflect on the process you went through in order to generate the final queries

from the facet analysis, showing what you learned while undertaking searching.

Declare both the query you used for search and the strategy you used in order to

derive this query. A suitable method is to pick one of the search services (say

Google) and do some initial searches to find a number of good terms – only

submit your queries and do the evaluation when you are happy with the query

terms. You will need to do this for your ‘bag of words’ queries and your ‘Boolean’

queries. Describe the tactics you used in the framework of your strategy e.g. use

of particular operators (Boolean, proximity, truncation) and choice of particular

terms in the query.

Using the evaluation methods compare and contrast the retrieval effectiveness

of the search services. Here are some of questions you can answer. Which is the

best search service for your topic? How well does ‘bag of words’ search compare

with that of ‘Boolean’ search? How does online search compare with web

search? How does web search compare with Meta search? Do the images and

videos you retrieved help to fulfil your information need? Does Social Search

provide useful information over and about that of traditional search services?

How do private engines compare with the main search engines? . Examine the

quality of the material you retrieved to provide a more indepth evaluation of your

results – this includes not just the documents or objects you retrieved, but the

sources of information as well. You must record the figure for each of the

evaluation methods, for each of the search services and type of search specified

above. Declare the assumptions you have made for the diagnostic evaluation

measures (the precision measures are given an no assumptions need be made).

You only need declare the final figures for your evaluations – you do not need to

include detailed calculations in your submission. EXTRA: You may also consider

the following issues in the evaluation section. Provide a reflection on the impact

of the user interfaces on your results (in terms of the operational aspect of the

evaluation). Consider factors such as memory, beliefs, emotion, social factors

and domain knowledge and reflect on them Please provide a reflection on your

relevance assessment in the light of the use of information from documents and

the impact this has had on the final result in terms of satisfying your given

information needs – e.g. what process did you use, and how did you think you

could improve your work in this area?

You must split your sections into: 1. Introduction: Topic and Evaluation Policy, 2.

Facet Analysis, 3. Search Strategy, 4. Evaluation, 5. Summary. Please do not

use any other structure for your submission.

The module leader has made every endeavour to make this assignment as clear

as possible. If any aspect of this specification is still unclear, please feel free to

state any further assumptions that you feel you have to make in order to

complete the coursework. No marks will be deducted in this instance.

Assessment and Marking scheme

The final marks for the assessment are allocated using the following criteria:

– Details of your facet analysis for your topic (25 marks)

– Reflection on your search strategy how the final query was generated (40

marks)

– Evaluation of the required search engines and reflection on process and

result of relevance assessment (25 marks)

– Presentation, writing and organisation (10 marks)

Your report must not exceed SIX A4 pages, excluding references, using Arial 11

font or similar (single-spaced). Margins should be at least 2.5cm on all sides.

Take care not to exceed this page limit, as additional pages will not be marked.

Do not include a table of contents, cover page, cover sheet or any appendices.

As this is a practical piece of work, references are not required. Acceptable

formats for submission are word documents (.doc, .docx) or PDF.

Please note that you will not be assessed on how well your searches do in the

evaluation – your information needs will have varying levels of difficulty and

therefore the retrieval effectiveness will vary between each of your searches. The

assessment will be used to measure your understanding of search and

evaluation methods in information retrieval.

Please follow and like us: