We are pleased to announce that Damien High, a Masters student at the University of Salford, has started a 12 week long placement with the NBN Trust.
Here, Damien tells us about his studies and the work he will be doing whilst he is with us.
I’m Damien High and I’m a part-time MSc Data Science student at the University of Salford and a full-time data analyst for a global technical professional services company.
From GIS to Data Science to NBN
Back in 2017 I was a full-time GIS Analyst looking for a challenge and a way to add further strings to my bow. I knew I enjoyed analysis, problem solving and writing code so Data Science felt like a really good fit for me. I joined Dr. Mo Saraee’s Masters programme at the University of Salford shortly after and have really enjoyed the course material.
Shortly after beginning my Data Science journey I also moved job role and joined a team specialising in data, analytics and development. This was great as it allowed me to almost immediately put into practice things I was learning on the course. I’ve been handling and analysing hundreds of millions of mobile phone GPS points, developing vehicle routing algorithms, automating linear pipeline alignments and associated costings all of which required skills learnt on the course.
The final assessment for MSc Data Science at Salford involves a live industry project which brings me to the NBN Trust with the task of better automating the detection of wildlife records that may have dubious or incorrect location information. Given my background in spatial data I felt this project would allow me to exercise elements of GIS and combine them with big data and other data science techniques.
Rough Project Proposal
At this early stage, I see two main strands of work that may yield useful results when attempting to detect incorrectly located wildlife records:
- Spatial Analysis
Application of spatial analysis techniques to determine whether location is correct by answering basic questions such as:
- Do terrestrial wildlife records occur on land?
- Do marine wildlife records not occur on land? and,
- Do freshwater wildlife records appear on or near water?
Answers to the above questions should give a good initial indication of incorrectly located records. Building on this, further spatial analysis such as hotspot or spatial cluster analysis could also be used at a species level to identify records that do not occur in a particular species’ spatial range.
- Machine Learning
Classification techniques could be applied to records at a species level including engineered attributes relating to habitat extracted from open source spatial data. Theoretically, a model could be trained to understand the specific habitats of given species. This element is highly dependent on the availability of detailed habitat data.
The intention is that functions will be developed for the above to allow the models to be integrated into existing data management processes. I’m really looking forward to seeing where I can take the project and to feeding back my findings to the Network.