CSC-30043 - Data Mining
Coordinator: Mehdi El Krari Room: CR042
Lecture Time: See Timetable...
Level: Level 6
Credits: 15
Study Hours: 150
School Office: 01782 733075

Programme/Approved Electives for 2023/24

None

Available as a Free Standing Elective

No

Co-requisites

None

Prerequisites

CSC-10058 Introduction to Data Science I
CSC-10060 Introduction to Data Science II

Barred Combinations

None

Description for 2023/24

As the capacity to store large amounts of data as well as the capacity to process such data increases exponentially the need for data mining expertise is paramount. Many companies have been harvesting data from their operations for many years and are realising the amazing potential that this data can unlock in terms of informing their future operations. Maximising efficiencies, sales potentials and cutting down costs are among the many benefits that can be gained by the data mining knowledge and skillset that this module will provide.

Aims
To provide the full skillset that is required from a data scientist in order to identify and collect appropriate data sets (sampling, selection etc.), pre-processing methods (cleaning, filtering etc.) and subsequently apply techniques in order to generate new information.

Intended Learning Outcomes

Identify and collect appropriate data in order to design a data mining work flow.: 1
Apply pre-processing techniques to the collected data sets that minimise bias and distortion in the data.: 1
Select and apply appropriate data mining techniques in order to extract new and useful information from the data.: 1
Validate the findings of a data analysis and quantify their validity.: 1

Study hours

Lectures: 20h
Group work and preparation for presentation: 50h
Independent study: 80h

School Rules

None

Description of Module Assessment

1: Group Project weighted 100%
Group Project based on solutions to external partners' data related problems
External partners to the school/university (such as companies, services, government bodies etc.) will be invited to present data-related problems to which groups of students (maximum of 5 students in each) will attempt to address. Real data from such partners will be analysed by applying the data mining techniques that will be learned. Each group will then present their solution in a 20 minute presentation to the problem providers at the end of the module. This will include presentation of data collection, work flow, techniques used and reflection on bias and distortion and the validity of the results. The mark for each student in a group will be composed of a group element as well as an individual element. The former will be the same for all members of the group and the latter will be a result of an individual report (2000 words) that will include the contribution of the student to the group work.