Security tips - #1 of an infinite series - remove unnecessary metadata before distributing a file
Metadata is data about your document.
At the most basic it includes things like the file name, the date the file was last updated and other similar information. Common document types like .XLSX, .RTF, .JPG and .PDF are able to store a lot of metadata in addition to the basics listed above. Some of this metadata may be information you didn’t mean to make public.
By way of example - the JPG standard allows for storage of GPS information, date and time taken, information about the owner of the camera and the serial number of the camera body. When you publish a photo you could disclose that a certain person (the owner), was in a certain location (the GPS data) at a certain time on a certain day (date taken). Using the serial number you can link this photo to others and build up a profile of the usage patterns or other information that you might not otherwise realise you were divulging.
Many people use this metadata for exactly this purpose - they want to record this information, have it available and provide linkages. What I am suggesting is stripping it out of any documents that you intend publishing, either to the web or by email or other electronic means. Certainly keep the metadata in documents which remain in your control, but they should be stripped before you release them to others.
Most applications which write metadata to their own files have an option to strip that metadata out for precisely the reasons I’ve outlined above. Google should point you in the right direction for whatever application you are interested in, but here’s an example of the steps for removing metadata from a .PDF document created with Adobe Acrobat XI.
exiftool is a great tool which you can use to check for metadata in your documents. It started life as a tool for working with metadata in image files (specifically EXIF data) but it now handles metadata from many file formats, including all the popular ones that most people will come across in their day to day usage. You can see the set of file types it can report on here
The command line can take a bit of getting used to, but to save you trouble here’s a standard one I use to check what hidden information I can dig out of the metadata on a file - “exiftool -s -g -a”. For a file named “img_5350.jpg" the following command returns almost 200 lines of data about that one single image.
exiftool -s -g -a img_5350.jpg
No feedback yet
Form is loading...