Unwanted data acquisition in downloadable documents

Download for a data breach. How to avoid unwanted data acquisition in downloadable documents

Downloading a document from websites and cloud collaboration applications is a common practice for many businesses. The finance department downloads an invoice, the HR department, a CV, and Business Development an RFP. While it’s often as common as replying to an email, employees often forget there is active content and hidden metadata embedded in everyday documents that have the potential to cause major data breaches.

Whether the document is shared via email or downloaded from the web, it contains automatically created sensitive information. This could be the author’s name, revision history, application software name, document version numbers, file location maps, track changes and quite often, super sensitive information that was accidentally embedded that shouldn’t have been and available for ‘everyone’ to see. Even metadata can be compromising if shared with people outside the organization – not just for those sharing the document, but those receiving it too. Therefore, it is important that employees know the security risks involved with downloading documents.

When downloading a document from the web, employees are at risk of unwanted data acquisition, the act of unintentionally receiving and storing critical information. Whether the critical information is obvious or not – personal or other sensitive details highlighted within the body of the document vs hidden metadata automatically attached to it – can result in a number of security issues.

  1. 1.     Non-compliance with data protection regulations

The first security risk to consider is the role of data protection regulations in protecting sensitive data. With the threat of business crippling fines looming over the heads of organizations across the globe, good information governance within the network is critical.

Take for example company A and company B work together as third party suppliers and they share customer invoices via a web portal regularly to ensure the work is completed. If a customer of company A submits a ‘right to be forgotten’ (RTBF) request, having unwanted data on this individual hosted on company B’s network puts both organizations at risk of being non-compliant. If company B isn’t even aware it has received this customer’s data, then they can’t find or delete it when the request comes through making it even more challenging for company A to complete the RTBF request. Under GDPR, the entire supply chain is responsible for proper data handling and so both organizations are at risk of receiving a hefty fine.

  1. 2.     Cybercriminals

Unwanted data acquisition goes both ways. A company does not want to receive sensitive data hidden in documents as this puts them at risk of a fine, but they also need to be aware that any documents they have on their own website could be used maliciously.

Metadata that seems of no importance to many can be invaluable to cybercriminals. For example, a document might contain information about what software it has been created in meaning a cybercriminal can attack the known vulnerabilities in that software. In addition, the document Author Name metadata attached to a document means cybercriminals can easily search for an employee’s email address – by Googling or on Linkedin for example – allowing them to easily launch a phishing campaign against the company to steal more critical information or infect the corporate network with malware.

  1. 3.     Competitor Advantage

While it is not always considered in relation to cyber security, competitor organizations are a major threat when it comes to unwanted data acquisition. Competitor companies can also use hidden metadata and sensitive data acquisition via website documents to gain an advantage.

The first instance of this occurring is similar to how a cybercriminal would go about using metadata. Take as an example, a company uploads a customer testimonial onto their website in the form of a document. Within the hidden metadata, there is author data that provides details of the customer that wrote the testimonial. If a competitor has access to this, they now have the majority of information they need to contact the company directly and steal the business.

Then there’s the inadvertent embedding of sensitive information into documents such as spreadsheets that contain financial data that are mistakenly uploaded/shared and available for all to view or download until the error is noticed.  Mistakes will always happen, but in today’s day in age, mistakes that involve the unauthorized exposure of sensitive data can literally put organizations at risk of non-compliance with data protection regulations.

Alternatively, a competitor company could share a document that has hidden metadata and use this to cause a compliance issue for the business. Under GDPR, any critical data must be stored properly, but if an organization is unaware of the critical data lying within the downloaded document, they cannot delete or secure the information. When it comes to auditing or even a right to be forgotten request (RTBF) under the GDPR, the business is liable for a huge fine that damages both revenue and reputation.

Detect and preventing unwanted sensitive data acquisition

While ensuring employees are aware of the threat of unwanted data acquisition will be a vital step in mitigating this risk, having technology in place to automatically ensure documents are sanitized is key to reducing unwanted data acquisition.

Clearswift’s SECURE Web Gateway (SWG) has the ability to inspect all content being downloaded from, and uploaded to, the web. By using lexical analysis capabilities together with Clearswift’s redaction and sanitization technology, hidden sensitive information and metadata can be automatically detected and removed from documents while being uploaded (and downloaded). Either by searching file uploads for key watermarks within the documents that indicate sensitive data or by understanding the content, the data leak can be identified, stopped and proper repercussive actions taken, so a sanitized and safe document is uploaded and published.  For organizations who already have a web solution in place, but don’t have advanced redaction and sanitization features, Clearswift’s SECURE ICAP Gateway can integrate seamlessly with existing web infrastructure to bolt-on these advanced features and enhance the solution that’s already in place.

Depending on the content with the document, Clearswift’s web solutions can mobilize Data Redaction (AR) which, if required, automatically detects and redacts unwanted sensitive information before it is brought into (‘acquired by’) the corporate network, or uploaded to websites and collaboration applications. The Document Sanitization feature within Clearswift’s unique Adaptive Redaction technology ensures all details such as revision changes, author information and software versions are automatically removed from documents to ensure organizations are always adhering to information security regulations and are protected against unwanted data acquisition.

So, if you’re looking to enhance data protection and mitigate data loss risks, chat to our team and ask for a demonstration of Clearswift technology.

More Information:

Adaptive Data Loss Prevention

Clearswift SECURE Web Gateway

Clearswift SECURE ICAP Gateway

Adaptive Redaction

Document Sanitization & Data Redaction