I don't think this is part of the model. It's a control layer above the actual LLM that interrupts the response when the LLM mentions any of the banned names. So if you prompt the LLM directly, without that control layer, you still get full responses.