7 best-practicies and privacy considerations to follow when choosing a CDN provider
The problem is, because they are so plug-and-play, they are overlooked and end up being not good citizens on the web. I believe this happens because CDN providers only provide the basic service so not to be too opinionated on how they should function, believing that if their developer wants the CDN to behave in a certain way, they'll build it themselves. Other providers to offer to address these issues, but at higher cost.
Some of the issues I'm thinking about are:
- SSL - this should be on by default in most cases.
- Cookie free - the CDN domain should have no cookies.
- Robots.txt - Search engines treat the CDN domain like any other website and will index it. They SEO effect will not be felt on your website since it has a different domain name. Since there is no benefit, it's good to use the robots.txt file to totally block the domain entirely.
- Public access, if a file is private it should not be on a public CDN - even if you do give it a random file name.
- Remove a file from the CDN when your done with it. Do not leave obsolete files hanging around, especially if they were user uploaded. A common problem with avatar images, they are never removed if the user uploads a new one, or deletes their account. This causes privacy problems as now their photo will end up in Google image results linked to a dead profile page long since gone, with no way for them to remove it.
- This in turn goes in with, if you replace a file in a CDN, upload a new file, with a different name, and delete the old file. Don't just replace it, as CDNs serve files with strong client side caching directives - so if you just replace the file, it'll be likely that many users will continue to see the old cached version.
- Private files should not be on a CDN, but a private blob store with restructured URL access. For example, a user uploads their CV. You can still take advantage of a blob store to offload the storage, backup, and download of the file (using shared key based URLs), and this allows you to control the dates location. CDNs will want to blindly replicate the file across the planet, which will break all kinds of data protection laws about data movement. Most domestic laws stipulate that you have to inform the user if their personal information is leaving the country, or the EU, let alone being distributed world wide!
These are all issues I've observed in the use of CDNs in the systems of other products I've reviewed. These issues come up a lot, especially the file deletion issue.
On a technical note when dealing with CDNs, it's important to check they can have blobs (files) written to the root container (folder). This is because such 'verification' files like the ones used by Google Webmaster Tools, and the robots.txt file, all must be at the root of the domain to be accepted. If the CDN cannot add items to the root, it won't work. Azure for example, allows you to create a specially named container called '$root' which is then treated as the root,so I can access example.com/robots.txt rather than example.com/$root/robots.txt.
What CDN provider do you like to use? And how do they measure up with these best practices? What else do you look for when picking a CDN provider, enter your thoughts into the comments below.