Mask Sensitive Fields

Minimum MongoDB Version: 4.4    (due to use of $rand operator)

Scenario情形

You want to perform irreversible masking on the sensitive fields of a collection of credit card payments, ready to provide the output data set to a 3rd party for analysis, without exposing sensitive information to the 3rd party. 您希望对信用卡付款集合的敏感字段执行不可逆的屏蔽,准备将输出数据集提供给第三方进行分析,而不会向第三方暴露敏感信息。The specific changes that you need to make to the payments' fields are:您需要对付款字段进行的具体更改包括:

Sample Data Population样本数据总体

Drop any old version of the database (if it exists) and then populate a new payments collection with 2 credit card payment documents, containing sensitive data:删除数据库的任何旧版本(如果存在),然后用包含敏感数据的2个信用卡付款文档填充新的payments集合:

db = db.getSiblingDB("book-mask-sensitive-fields");
db.dropDatabase();

// Insert records into the payments collection将记录插入付款集合
db.payments.insertMany([
  {
    "card_name": "Mrs. Jane A. Doe",
    "card_num": "1234567890123456",
    "card_expiry": ISODate("2023-08-31T23:59:59Z"),
    "card_sec_code": "123",
    "card_type": "CREDIT",        
    "transaction_id": "eb1bd77836e8713656d9bf2debba8900",
    "transaction_date": ISODate("2021-01-13T09:32:07Z"),
    "transaction_amount": NumberDecimal("501.98"),
    "reported": false,
    "customer_info": {
      "category": "RESTRICTED",
      "rating": 89,
      "risk": 3,
    },
  },
  {
    "card_name": "Jim Smith",
    "card_num": "9876543210987654",
    "card_expiry": ISODate("2022-12-31T23:59:59Z"),
    "card_sec_code": "987",
    "card_type": "DEBIT",        
    "transaction_id": "634c416a6fbcf060bb0ba90c4ad94f60",
    "transaction_date": ISODate("2020-11-24T19:25:57Z"),
    "transaction_amount": NumberDecimal("64.01"),
    "reported": true,
    "customer_info": {
      "category": "NORMAL",
      "rating": 78,
      "risk": 55,
    },
  },
]);

Aggregation Pipeline聚合管道

Define a pipeline ready to perform the aggregation:定义准备执行聚合的管道:

var pipeline = [
  // Replace a subset of fields with new values用新值替换字段子集
  {"$set": {
    // Extract the last word from the name ,从名称中提取最后一个单词, eg: 'Doe' from 'Mrs. Jane A. Doe'
    "card_name": {"$regexFind": {"input": "$card_name", "regex": /(\S+)$/}},
          
    // Mask card num 1st part retaining last 4 chars,掩码卡编号保留最后4个字符的第一部分, eg: '1234567890123456' -> 'XXXXXXXXXXXX3456'
    "card_num": {"$concat": [
                  "XXXXXXXXXXXX",
                  {"$substrCP": ["$card_num", 12, 4]},
                ]},                     

    // Add/subtract a random time amount of a maximum of 30 days (~1 month) each-way每次最多加/减30天(1个月)的随机时间量
    "card_expiry": {"$add": [
                     "$card_expiry",
                     {"$floor": {"$multiply": [{"$subtract": [{"$rand": {}}, 0.5]}, 2*30*24*60*60*1000]}},
                   ]},                     

    // Replace each digit with random digit,用随机数字替换每个数字, eg: '133' -> '472'
    "card_sec_code": {"$concat": [
                       {"$toString": {"$floor": {"$multiply": [{"$rand": {}}, 10]}}},
                       {"$toString": {"$floor": {"$multiply": [{"$rand": {}}, 10]}}},
                       {"$toString": {"$floor": {"$multiply": [{"$rand": {}}, 10]}}},
                     ]},
                     
    // Add/subtract a random percent of the amount's value up to 10% maximum each-way每次加/减金额的随机百分比,最高可达10%
    "transaction_amount": {"$add": [
                            "$transaction_amount",
                            {"$multiply": [{"$subtract": [{"$rand": {}}, 0.5]}, 0.2, "$transaction_amount"]},
                          ]},
                          
    // Retain field's bool value 80% of time on average, setting to the opposite value 20% of time平均80%的时间保留字段的布尔值,20%的时间设置为相反的值
    "reported": {"$cond": {
                   "if":   {"$lte": [{"$rand": {}}, 0.8]},
                   "then": "$reported",
                   "else": {"$not": ["$reported"]},
                }},      

    // Exclude sub-doc if the sub-doc's category field's value is 'RESTRICTED'如果子单据的类别字段的值为“RESTRICTED”,则排除子单据
    "customer_info": {"$cond": {
                        "if":   {"$eq": ["$customer_info.category", "RESTRICTED"]}, 
                        "then": "$$REMOVE",     
                        "else": "$customer_info",
                     }},                                         
                
    // Mark _id field to excluded from results_id字段标记为从结果中排除
    "_id": "$$REMOVE",                
  }},
  
  // Take regex matched last word from the card name and prefix it with hardcoded value从卡片名称中提取regex匹配的最后一个单词,并在其前面加上硬编码值
  {"$set": {
    "card_name": {"$concat": ["Mx. Xxx ", {"$ifNull": ["$card_name.match", "Anonymous"]}]},                       
  }},
];

Execution执行

Execute the aggregation using the defined pipeline and also view its explain plan:使用定义的管道执行聚合,并查看其解释计划:

db.payments.aggregate(pipeline);
db.payments.explain("executionStats").aggregate(pipeline);

Expected Results预期结果

Two documents should be returned, corresponding to the original two source documents, but this time with many of their fields redacted and obfuscated, plus the customer_info embedded document omitted for one record due to it having been marked as RESTRICTED, as shown below:应返回两个文档,对应于原始的两个源文档,但这一次,它们的许多字段被编辑和模糊,加上一条记录中省略了customer_info嵌入文档,因为它被标记为RESTRICTED,如下所示:

[
  {
    card_name: 'Mx. Xxx Doe',
    card_num: 'XXXXXXXXXXXX3456',
    card_expiry: ISODate('2023-08-31T23:29:46.460Z'),
    card_sec_code: '295',
    card_type: 'CREDIT',
    transaction_id: 'eb1bd77836e8713656d9bf2debba8900',
    transaction_date: ISODate('2021-01-13T09:32:07.000Z'),
    transaction_amount: NumberDecimal('492.4016988351474881660000000000000'),
    reported: false
  },
  {
    card_name: 'Mx. Xxx Smith',
    card_num: 'XXXXXXXXXXXX7654',
    card_expiry: ISODate('2023-01-01T00:34:49.330Z'),
    card_sec_code: '437',
    card_type: 'DEBIT',
    transaction_id: '634c416a6fbcf060bb0ba90c4ad94f60',
    transaction_date: ISODate('2020-11-24T19:25:57.000Z'),
    transaction_amount: NumberDecimal('58.36081337486762223600000000000000'),
    reported: false,
    customer_info: { category: 'NORMAL', rating: 78, risk: 55 }
  }
]

Observations观察