Thursday, October 15, 2020

Click to learn about Alibaba Cloud MongoDB database operation and maintenance.

 MongoDB 4.4 provides the features and enhancements you need most. MongoDB version 4.4 is a "user-driven project", based on the MongoDB 4.x version family, and is an ideal platform for future data processing.

MongoDB 4.4 enables you to:
Build transaction, operational, and analytical applications more efficiently. With the development of demand, you can flexibly define and optimize data distribution anytime and anywhere, so as to expand applications globally and provide you with the best anywhere in the world. The delay, flexibility and safety control experience.
As an annual major version update, MongoDB 4.4 has officially announced GA on 7.30. Unlike previous major versions, there are always some heavy feature releases, such as 3.6 Change Stream & Causal Consistency, 4.0 multi-document transactions, 4.2 distributed transactions. This time the 4.4 version is more like a maintenance version , and it is a long-awaited maintenance version by users . MongoDB officially calls this release " User-Driven Engineering ", indicating The new version is mainly aimed at some of the highest pain points from users, focusing on improvements.
MongoDB4.4 new features and functions
After MongoDB 3.0 supported the new WiredTiger engine, after several years of fast running, it finally took a break in 4.4 and began to polish the details. There are many new features released in 4.4. The author will focus on some of the features that users pay more attention to. Highlights.
Scalability and performance enhancement
Hidden Indexes
Hidden Index is a feature jointly built after Alibaba Cloud MongoDB and MongoDB officially reached a strategic cooperation. We all know that maintaining too many indexes in the database will lead to a decrease in write performance, but often the complexity of the business determines that the students who operate and maintain MongoDB dare not easily delete a potentially inefficient index, worrying that wrong deletion will bring business Performance jitter, and index reconstruction is often very expensive.
Hidden Index is to solve the above-mentioned dilemma faced by DBA students. It supports hiding the existing index through the collMod command to ensure that subsequent queries will not use the index. After a period of observation, it is possible to confirm that the business is not abnormal. Feel free to delete the index.
db.runCommand( {
   collMod:'testcoll',
   index: {
      keyPattern:'key_1',
      hidden: false
   }
})
It should be noted that after the index is hidden, it is only invisible to the execution planner of MongoDB, and will not change some special behaviors of the index itself, such as unique key constraints, TTL elimination, etc.
During the hiding of the index, if it is newly written, it will also be updated, so you can also unhide it to make the index immediately available.
Refinable Shard Keys
When using a MongoDB sharded cluster, I believe everyone knows how important it is to choose a good Shard key, because it determines whether the sharded cluster has good scalability under the specified Workload. But in the actual use of MongoDB, even if we carefully consider the Shard Key to be selected in advance, Jumbo Chunk will appear due to changes in Workload, or the business traffic will be directed to a single Shard.
In version 4.0 and earlier, the selected shard key and its corresponding value cannot be changed. In version 4.2, although the value of the shard key can be modified, the cross-shard migration of data and the implementation based on distributed transactions The mechanism leads to high performance overhead , and it cannot completely solve the problem of Jumbo Chunk or access hotspots. For example, there is an order table with the Shard Key {customer_id:1}. In the early stage of the business, each customer will not have many orders. Such a Shard Key can fully meet the demand. However, with the development of the business, a large customer Accumulated orders are increasing, and access to this customer order becomes a hot spot for a single Shard. Due to the natural relationship between orders and customer_id, modifying customer_id cannot improve the uneven access situation.
For the above-mentioned similar scenarios, in 4.4, you can add one or more Suffix Fields to the existing Shard Key through the refineCollectionShardKey command to improve the distribution of existing documents on Chunk. For example, in the order business scenario described above, the Shard key is changed to {customer_id:1, order_id:1} through the refineCollectionShardKey command to avoid the hot issue of access on a single Shard.
What needs to be understood is that the performance overhead of the refineCollectionShardKey command is very low. It only changes the metadata on the Config Server and does not require any form of data migration (because simply adding Suffix will not change the distribution of data on the existing chunks). Breaking up is still carried out gradually in the subsequent normal automatic Chunk split and migration process. In addition, Shard Key needs to have a corresponding Index to support, so refineCollectionShardKey requires the index corresponding to the new Shard Key to be created in advance.
Because not all documents have the newly added Suffix Field(s), the " Missing Shard Key " function is actually implicitly supported in 4.4 , that is, the newly inserted document may not contain the specified Shard Key Field. However, the author does not recommend this, it is easy to produce Jumbo Chunk.
Compound Hashed Shard Keys
in versions prior to 4.4, only single-field hash shard keys can be specified. The reason is that MongoDB does not support compound hash indexes at this time. As a result, it is easy for collection data to be unevenly distributed on shards. all.
In 4.4, a composite hash index is supported, that is, it can be specified in a composite indexA single hash field can be used as a prefix or as a suffix, and its position is not limited. It also provides support for composite hash slice keys.
sh.shardCollection(
  "examples.compoundHashedCollection",
  {"region_id": 1, "city_id": 1, field1": "hashed"}
)
 
sh.shardCollection(
  "examples.compoundHashedCollection",
  {"_id": "hashed", "fieldA": 1}
)
With this new feature, it will bring many benefits. For example, in the following two scenarios,
  • Because of the requirements of laws and regulations, it is necessary to use the zone sharding function of MongoDB to spread the data as evenly as possible on multiple shards in a certain region .
  • The value of the shard key specified by the collection is incremental. For example, in the example above, {customer_id:1, order_id:1} This shard key, if customer_id is incremental, and the business always accesses the latest customer data, The result is that most of the traffic always visits a single shard.
Without the support of "Composite Hash Shard Key", the business can only calculate the hash value in advance for the required fields, store it in a special field in the document, and then specify it through "range sharding" This pre-calculated hash value special field and other fields are used as the shard key to solve the above problem.
In 4.4, you can easily solve the above problem by directly specifying the required field as a hash. For example, for the second problem scenario described above, the slice key is set to {customer_id:'hashed', order_id:1 } That's it, which greatly simplifies the complexity of business logic.

2 comments:

  1. Thank you for putting all these strategies into a very readable place. It shows your ability and great skills. keep sharing such article in future. Oracle Manufacturing Cloud training in bangalore

    ReplyDelete
  2. Improvement and testing conditions require a case of the database a work in progress. https://onohosting.com/

    ReplyDelete

Introducing AIrbq.com: Your Go-To Source for the Latest in AI Technology News

Introducing AIrbq.com: Your Go-To Source for the Latest in AI Technology News AI technology is rapidly advancing and changing the way we liv...