The project that I have probably spent most time working on so far has been ingesting the university’s collection of born-digital images into Archivematica, our digital preservation system. Archivematica is essentially an automation tool that strings together many micro-services that perform important digital preservation tasks. For example, recording the characteristics of each file, generating a checksum (essentially a digital fingerprint that will enable us to ensure that our files and any backup copies remain identical and authentic), and where appropriate, creating new, ‘normalised’, access and preservation copies, ensuring that we use widely understood and accessible file formats.
While Archivematica itself is highly automated, it still requires quite a bit of manual work to prepare files for ingestion and while working through the process I was asked to see if I could come up with any suggestions to make it more efficient. My host archive has always been very supportive of me taking any opportunities to learn new skills, and I decided to take advantage of the university’s Linkedin Learning subscription to take an introductory Python course. I was able to apply what I’d learnt (along with a lot of googling!) to put together a basic Python script which automatically handles arranging and renaming files and generating the metadata that is needed to ingest them. This made the process much quicker and reduced the scope for human error.