Blogs

How Content Filters Work?

Nov 10

Within the Other phone, SafetyMode has the option to turn on content filters, which scans the screen to determine if there is harmful content.

We have 4 different content filters.

Nudity: For this filter we detect if there is any exposed skin on screen. The full filter is more sensitive, and will flag content that includes bare stomachs. The moderate filter is less sensitive and more focused on sexual nudity.

Bullying and Harassment: For this filter we detect if any of the text on is toxic or threatening. The full filter is more sensitive, and so will create more false positives (flagging safe content), but also detect more toxic content.

Graphic and Obscene Text: For this filter we detect if any of the text is toxic and graphic/obscene. This covers sexual and gory content. As above the full filter is more sensitive and create more false positives.

Banned Words: This filter does not focus on context, and instead explicitly is triggered if any word is seen from a list of banned words. You can edit this list in “SafetyMode Settings > Edit Banned words list”. Please note that there may be times that these words trigger proper nouns where a banned word is inside. EG the address “17 Bellenden Road”. This filter can only be on or off.

For each of these filters you can set them up in three different ways.

Block: A pop-up will appear whenever the filter is triggered blocking the content. The user must then go back to the previous screen. Sometimes, you may have to go back more than 1 page. The parent can disable “block” for 5 minutes using their parental password.

Save image: When enabled, this will take a screenshot of what was on screen just before the content filter was triggered. You can then view the screenshot of the image in the “History” tab within SafetyMode Settings.

Notify: When enabled this will email the parent (to their google account) whenever the content filter has been triggered. No content ever leaves the phone, and the email will just notify you that it has been triggered as well as the name of the filter. The email will be sent by notifications@safetymode.com.

Disclaimer: As with all AI, it isn’t 100% percent accurate. All of our models for our filters score over 90% for accuracy across industry benchmarks. However, it will make mistakes. There will be moments where some content that not-harmful content is determined to be harmful. And (admittedly much less) there will be times that harmful content is not detected. We are constantly improving the models to perform better. Yet we encourage parents to only allow access to apps they are particularly worried about when they feel ready. There are still risks associate with all apps, and our content filters just provide an additional guardrail.

Privacy:

We care deeply about the privacy of our users. And realise that this is incredibly important to parents. Therefore, we have built our AI to all be run “on-device”. What this means, is that we have downloaded the whole model to the phone, and every time it is used is run on the phone. Therefore, none of the screen data ever leaves the phone or is saved to any external server.

Further, every image and bit of text scanned that is not detected to trigger a content filter is immediately deleted. The only time information is saved, is when save image is turned on. And even then, the image can only be viewed on the device itself as that is where the content is saved.

Any information that is configured remotely (eg the settings and location) do leave the device and are stored securely within Google’s servers. Some other data is anonymised and used by us to improve the quality of the service for remote updates. A list can be found in the privacy policy.