WO2016146005A1 - Method and device for correcting attribute values of commodity background attribute - Google Patents

Method and device for correcting attribute values of commodity background attribute Download PDF

Info

Publication number
WO2016146005A1
WO2016146005A1 PCT/CN2016/075938 CN2016075938W WO2016146005A1 WO 2016146005 A1 WO2016146005 A1 WO 2016146005A1 CN 2016075938 W CN2016075938 W CN 2016075938W WO 2016146005 A1 WO2016146005 A1 WO 2016146005A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
background
items
shareable
subset
Prior art date
Application number
PCT/CN2016/075938
Other languages
French (fr)
Chinese (zh)
Inventor
曹阳
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016146005A1 publication Critical patent/WO2016146005A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of computer communication technologies, and in particular, to a method and apparatus for correcting attribute values of background attributes of commodities.
  • the background attribute of the goods is important information describing the goods.
  • the background attributes of the products affect the search, shopping guide, recommendation and other products presented to The effect of the consumer, so the background properties of the product are important.
  • the attribute value is an example.
  • the existing method for correcting the attribute value of the background attribute of the commodity is to manually discover the problem by means of manual sampling or user report, and then urge the merchant or the operation of the second to manually make corrections such as supplementation and correction.
  • the present invention provides a method and apparatus for correcting an attribute value of a background attribute of an item, which can automatically modify the attribute value of the background attribute of the item, without requiring manual completion, and can improve the modification efficiency.
  • the present invention discloses a method for correcting an attribute value of a background attribute of an item, the method comprising:
  • the N items are divided into M shareable background attribute product subsets; wherein, the M is a natural number, and the M is smaller than the N;
  • Each of the original attribute values of the same type of background attribute of all of the items included in each of the subsets of the shareable background attribute items is modified to the corrected attribute value.
  • the identifier of each of the commodities includes:
  • the picture corresponding to each of the commodities includes:
  • a main display image corresponding to each of the commodities a supplementary display image corresponding to each of the commodities, a style color display image corresponding to each of the commodities, or a detail display image corresponding to each of the commodities.
  • dividing the N items into M subsets of shareable background attribute products including:
  • each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets. Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
  • the original attribute value is a corrected attribute value of a background attribute corresponding to a certain original attribute value of all the commodities included in the current subset of the shareable background attribute.
  • each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
  • modifying each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items into the corrected attribute values comprises:
  • Each of the original attribute values of the same type of background attribute of all of the items included in the subset of the currently shareable background attribute items is modified to a corrected attribute value.
  • the preset modified quantity threshold determines whether the number of the products included in the current subset of the shareable background attribute items is large.
  • the next subset of the shareable background attribute item of the current shareable background attribute item subset is used as the current shareable background attribute item subset And performing the step of determining whether the number of the commodities included in the subset of the shareable background attribute items is greater than a preset modification quantity threshold;
  • the method further includes:
  • each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
  • each of said original attribute values of the same type of background attribute of all said items included in each of said plurality of said shareable background attribute items in each of said parallel computing computers is passed by each said parallel computing computer a number of occurrences of determining a corrected attribute value of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers;
  • modifying each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items into the corrected attribute values comprises:
  • Each of the parallelizable computing computers to each of the parallel computing computers Each of the original attribute values of the same type of background attribute of all of the items included in the attribute item subset is modified to a corrected attribute value.
  • the present invention also discloses an apparatus for correcting an attribute value of a background attribute of an item, the apparatus comprising:
  • An obtaining module configured to obtain an identifier of each of the N items, wherein the N is a natural number
  • a dividing module configured to divide the N items into M sharable background attribute product subsets according to an identifier of each of the commodities; wherein, the M is a natural number, and the M is smaller than the N;
  • a statistics module configured to count the number of occurrences of each original attribute value of the same type of background attribute of all the commodities included in each of the shareable background attribute commodity subsets
  • a determining module configured to determine each of the sharable background attributes according to an occurrence number of each of the original attribute values of a background attribute of a same type of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for background attributes of the same type for all of the items included in the subset of items;
  • a modifying module configured to modify each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items to the corrected attribute values.
  • the identifier of each of the commodities includes:
  • the picture corresponding to each of the commodities includes:
  • a main display image corresponding to each of the commodities a supplementary display image corresponding to each of the commodities, a style color display image corresponding to each of the commodities, or a detail display image corresponding to each of the commodities.
  • the dividing module includes:
  • a building unit configured to construct a dual group for each of the N of the commodities, wherein a first element of the binary group is an identifier of each of the commodities, the binary The other elements of the group are the identity of each of the items, and the background attribute of each of the items, the original attribute value of the background attribute;
  • a sorting unit configured to sort all the two groups according to the first element, and group the two groups of the same element to form a set of M groups, wherein Each of the set of binary groups represents a subset of the shareable background attribute items.
  • the determining module includes:
  • a first processing unit configured to use the first subset of the sharable background attribute items in the subset of the sharable background attribute items as a subset of the current shareable background attribute items;
  • a calculating unit configured to display the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the subset of the currently shareable background attribute items, and the current shareable background attribute item Performing a proportional calculation on the total number of occurrences of all the original attribute values of the same type of background attributes of all the commodities included in the set, and obtaining the same type of all the commodities included in the current subset of the shareable background attribute items The distribution ratio of each of the original attribute values of the background attribute;
  • a comparing unit configured to perform a distribution ratio of each of the original attribute values of the same type of background attributes of all the products included in the current subset of the shareable background attribute items, and a preset modification ratio threshold Comparison
  • the attribute value determining unit is configured to: if a distribution ratio of the original attribute value exists in the background attribute of the same type of all the commodities included in the current subset of the shareable background attribute items, which is greater than a preset modification And a proportional threshold, determining that the original attribute value is a corrected attribute value of a background attribute corresponding to a certain original attribute value of all the commodities included in the current subset of the shareable background attribute commodity.
  • the statistics module includes:
  • a second processing unit configured to use the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
  • a first determining unit configured to determine whether the number of the products included in the subset of the tradable background attribute items is greater than a preset threshold of the modified quantity
  • a statistical unit configured to: if greater than the preset modified quantity threshold, count the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute commodity;
  • the determining module comprises:
  • a current determining unit configured to determine, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item, the current shareable background attribute item Corrected attribute values for the same type of background attributes of all of the items included in the subset;
  • the modifying module comprises:
  • a current modification unit configured to modify each of the original attribute values of the same type of background attributes of all the items included in the current subset of the shareable background attribute items to a corrected attribute value.
  • the statistics module further includes:
  • a second determining unit configured to determine, if less than or equal to the preset modified quantity threshold, whether the current shareable background attribute product subset is the Mth shareable background attribute product subset;
  • a notification unit configured to use, if not the Mth, the subset of the shareable background attribute items, the next subset of the shareable background attribute items of the current shareable background attribute item subset as the current shareable a background attribute item subset, but notifying the first determining unit to perform a step of determining whether the number of the items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold;
  • the ending unit is configured to end if it is the Mth subset of the shareable background attribute items.
  • the device further includes:
  • mapping module configured to map an identifier of each of the N items into an integer
  • a remainder calculation module configured to take an integer corresponding to each of the commodities to a preset number of parallel computing computer stations P; wherein, the P is a natural number;
  • An allocation module configured to allocate each of the commodities to a parallel computing computer of a number corresponding to the remainder
  • the dividing module comprises: P dividing units, each of the dividing units being respectively disposed in each of the parallel computing computers;
  • the P dividing units are configured to divide the N items into M subsets of the shareable background attribute products according to an identifier of each of the commodities in each of the parallel computing computers;
  • the statistic module includes: P number of statistics units, each of the number of statistic units is respectively disposed in each of the parallel computing computers;
  • Each of the number of statistics units is configured to count each of the original attributes of the same type of background attributes of all of the items included in each of the subset of the shareable background attribute items in each of the parallel operation computers The number of occurrences of the attribute value;
  • the determining module includes: P determining units, each of the determining units being respectively disposed in each of the parallel computing computers;
  • Each of the determining units is configured to each of the original attributes of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers a number of occurrences of the value, determining a corrected attribute value of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers;
  • the modifying module includes: P modifying units, each of the modifying units being respectively disposed in each of the parallel computing computers;
  • Each of the modifying units is configured to respectively use each of the original attributes of the same type of background attribute of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers The value is modified to correct the attribute value.
  • the present invention can obtain the following technical effects:
  • each item According to the identifier of each item, divide N items into M subsets of shareable background attribute products, according to each original of the same type of background attribute of all items included in each share of the shareable background attribute item The number of occurrences of the attribute value, the corrected attribute value of the same type of background attribute of all the items included in each of the shareable background attribute item subsets, and the background of the same type of all items included in each of the shareable background attribute item subsets Each original attribute value of the attribute is modified to correct the attribute value, and the attribute value of the background attribute of the item can be automatically performed. Modifications do not need to be done manually, which can improve the efficiency of modification.
  • FIG. 1 is a flow chart of a first method for correcting attribute values of background attributes of an item according to an embodiment of the present invention
  • FIG. 2 is a flow chart of a second method for correcting attribute values of background attributes of an item according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a third method for correcting attribute values of background attributes of an item according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a fourth method for correcting attribute values of background attributes of an item according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of an apparatus for correcting attribute values of background attributes of an item according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of an apparatus for correcting attribute values of background attributes of an item according to an embodiment of the present invention.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • first device if a first device is coupled to a second device, the first device can be directly electrically coupled to the second device, or electrically coupled indirectly through other devices or coupling means. Connected to the second device.
  • a flowchart of a method for correcting attribute values of background attributes of an item according to an embodiment of the present invention includes:
  • S101 Acquire an identifier of each of the N items; wherein N is a natural number.
  • the N items may be all the items in the one or more trading platforms, or may be all the items in the same category in the one or more trading platforms, and are not specifically limited.
  • the method is applicable to any commodity.
  • the identifier of each product includes: a link address of a picture corresponding to each item, a content signature of a picture corresponding to each item (such as MD5 or other specially designed image signature, etc.), or a item number of each item.
  • the picture corresponding to each item includes: the main display picture corresponding to each item (may be referred to as the main picture), the supplementary display picture corresponding to each item (possibly multiple), and the style color number display picture corresponding to each item (SKU Figure), or the details of each product display picture (details map) and so on.
  • the main display image of the product must comply with strict specifications, and it must be fully displayed and cannot contain irrelevant information (that is, the main display image of the product is highly relevant to the product)
  • the reliability of the shareable background attribute relationship between the products established by the repeated reference relationship of the main display image of the product is higher. Therefore, it is preferable to sign the link address of the main display image corresponding to the product or the content signature of the main display image corresponding to the product. As the identifier of the product.
  • S102 Divide the N items into M subsets of the shareable background attribute according to the identifier of each item; wherein M is a natural number and M is less than N.
  • the N items are divided into M subsets of the shareable background attribute items, including:
  • a binary group is constructed for it, denoted as PairK:keyK–nidK, ⁇ pid0, vidK, 0>, ⁇ pid1, vidK, 1>.
  • keyK is the identifier of the commodity K (for example, the link address of the main display image corresponding to the commodity K), and is the first element of the binary group; the other elements of the binary group are composed of the following members. : nidK (for the identity ID of the item K), ⁇ pid0, vidK, 0>, ⁇ pid1, vidK, 1>...
  • the items in the set of the two sets have the same identifier (for example, having the same main display picture), Therefore, the items in the set of two sets are likely to represent the same item, and the attribute values of the background attributes of the items in the set of sets should also be consistent.
  • the attribute values of the background attributes of the items in the set of sets should also be consistent.
  • S104 Determine, according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute item subsets, determine the same type of all items included in each of the shareable background attribute item subsets The corrected attribute value of the background property.
  • the original attribute value that has the most occurrences may be determined according to the number of occurrences, and the corrected attribute value of the same type of background attribute of all the items included in each of the items in the subset of the shareable background attribute.
  • the value of the original attribute that has the most occurrences is not necessarily the value of the correction attribute.
  • each shareable background attribute item is determined according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute item subsets.
  • Corrected attribute values for the same type of background attribute for all items included in the subset including:
  • S104a The first shareable background attribute item subset of the M shareable background attribute items subset is used as the current shareable background attribute item subset.
  • S104b the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, and the same type of background attribute of all items included in the current shareable background attribute item subset The total number of occurrences of all the original attribute values is proportionally calculated, and the distribution ratio of each original attribute value of the same type of background attribute of all the items included in the current shareable background attribute item subset is obtained.
  • S104c Compare the distribution ratio of each original attribute value of the same type of background attribute of all the items included in the current share of the currently shareable background attribute item, and the preset modification ratio threshold, if the current shareable background is obtained If there is a distribution ratio of a certain original attribute value in the background attribute of the same type of all items included in the attribute item subset, which is greater than the preset modification ratio threshold, S104d is performed; if the obtained current shareable background attribute item subset is included The distribution of some original attribute value does not exist in the background attribute of the same type for all items If the ratio is greater than the preset modification ratio threshold, then S104f is performed.
  • the preset modification ratio threshold may be set according to actual application conditions, for example, a monitoring penalty product having higher reliability requirements, and the preset modification ratio threshold may be set to 75% or higher to reduce the inspection.
  • S104d Determine a certain attribute value as a correction attribute value of a background attribute corresponding to a certain original attribute value of all items included in the current shareable background attribute item subset.
  • the original attribute value with the largest distribution ratio may be selected as the correction attribute value.
  • S104e Modify each original attribute value of the background attribute corresponding to a certain original attribute value of all items included in the current shareable background attribute item subset to a corrected attribute value, and then execute S104g.
  • S104f Determine not to modify each original attribute value of the background attribute corresponding to a certain original attribute value of all the commodities included in the current shareable background attribute item subset, and then execute S104g.
  • S104g Determine whether the currently shareable background attribute product subset is the Mth shareable background attribute product subset, if not, execute S104h; otherwise, end.
  • S104h The next shareable background attribute product subset of the current shareable background attribute item subset is taken as the current shareable background attribute item subset, and then S104b is performed.
  • S105 Modify each original attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items to a corrected attribute value.
  • each original attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items is modified to a corrected attribute value, specifically: if each of the shareable background attribute item subsets is included The original attribute value of the same type of background attribute of all items is the same as the corrected attribute value, and is retained; if the original attribute value and correction attribute of the same type of background attribute of all items included in each of the shareable background attribute items are included If the values are not the same, the correction is made; if the original attribute value of the same type of background attribute of all the items included in each of the shareable background attribute items is missing, it is added.
  • S103-S105 may include:
  • S201 The first shareable background attribute product subset of the M shareable background attribute item subsets is used as the current shareable background attribute product subset.
  • S202 Determine whether the number of items included in the current shareable background attribute item set is greater than a preset modification quantity threshold, if it is greater than a preset modification quantity threshold, execute S203; if it is less than or equal to a preset modification quantity threshold, Execute S206.
  • statistics are performed when a subset of the shareable background attribute items contains a certain number of items, for example: If there is only one item, there is no basis for modification and it is necessary.
  • the preset modification quantity threshold may be set according to the actual application status, for example, it may be set to 2, 20, and the like.
  • S203 Count the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current attribute group of the background attribute.
  • S204 Determine, according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, determine a background attribute of the same type of all items included in the current shareable background attribute item subset. Corrected attribute value.
  • S205 Modify each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset to a corrected attribute value.
  • S206 Determine whether the current share of the background attribute item is a subset of the M shareable background attribute product, if not the Mth shareable background attribute product subset, execute S207; if it is the Mth shareable background attribute The subset of goods ends.
  • S207 The next shareable background attribute item subset of the current shareable background attribute item subset is taken as the current shareable background attribute item subset, but S202 is performed.
  • S102-S105 may include:
  • S301 Map an identifier of each of the N items to an integer.
  • HCL HCL-1*Z+cL.
  • Z is any prime number, usually set to 31; character c is represented by its ASCII code (integer).
  • mapping the identifier of the commodity to an integer by the above method is not limited to mapping the identifier of the commodity to an integer by the above method, and may be implemented in any feasible manner, which is not specifically limited.
  • S302 Take the integer of each commodity to the preset parallel computing computer number P to take a remainder; wherein P is a natural number.
  • the numbers of the P parallel computing computers are 0-P-1, respectively.
  • the HC corresponding to its key is allocated to the remainder of P. In this way, all the items to be processed are distributed substantially evenly to the P parallel computing computer. It is equivalent to pre-cutting the remainder of the task by the remainder.
  • S304 N pieces of goods are collectively divided into M sharable background attribute commodity subsets by each parallel computing computer according to the identifier of each commodity in each parallel computing computer.
  • N products are allocated to the P parallel computing computer according to the identifier of the product (the number of products in each parallel computing computer is less than N, and the sum of the number of products in the P parallel computing computer is N), for each The parallel parallel computing computer has the same method as a computer partitioning the subset of the background attribute goods, as follows:
  • Each parallel computing computer constructs a binary group for each commodity in each parallel computing computer, wherein, binary The first element of the group is the identifier of each item, the other elements of the group are the identity of each item, and the background attribute of each item, the original attribute value of the background attribute; all the groups are The first element is sorted, and the same binary group of the first element is grouped together to form a plurality of binary group sets (the number of binary group sets obtained by each parallel computing computer is less than M, P station parallel operation The sum of the number of sets of binary groups obtained by the computer is M), wherein each set of two sets represents a subset of the shareable background attribute goods.
  • the same product will be assigned to the same shareable back-end product regardless of whether it is sorted in the whole or in the partial sort after the split.
  • S305 Count, by each parallel computing computer, the number of occurrences of each original attribute value of a background attribute of the same type of all items included in each of the items in the subset of shareable background attribute items in each parallel operation computer.
  • S306 Determine, by each parallel computing computer, the number of occurrences of each original attribute value of the same type of background attribute of all commodities included in each subset of the shareable background attribute items in each parallel operation computer, and determine each parallel operation
  • Each of the computers can share the corrected attribute value of the same type of background attribute for all items included in the background attribute item subset.
  • S307 Modify, by each parallel computing computer, each original attribute value of a background attribute of the same type of all commodities included in each subset of the shareable background attribute items in each parallel operation computer as a correction attribute value.
  • the method for correcting the attribute value of the background attribute of the product according to the embodiment, according to the identifier of each item, dividing the N items into M subsets of the shareable background attribute, according to each shareable background attribute item The number of occurrences of each original attribute value of the same type of background attribute of all items included in the set, and the corrected attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items, each of which can be
  • Each original attribute value of the same type of background attribute of all items included in the shared background attribute commodity group is modified to the corrected attribute value, and the attribute value of the background attribute of the item can be automatically modified, without manual completion, and the modification can be improved.
  • Parallel modification by P parallel computing computer can greatly accelerate the operation and further improve the modification efficiency.
  • FIG. 5 it is an apparatus structure diagram for correcting attribute values of background attributes of an item according to an embodiment of the present invention, and the apparatus includes:
  • the obtaining module 401 is configured to obtain an identifier of each of the N items, where N is a natural number;
  • the dividing module 402 is configured to divide the N items into M subsets of the sharable background attribute according to the identifier of each item; wherein, M is a natural number, and M is less than N;
  • a statistics module 403 configured to count the number of occurrences of each original attribute value of a background attribute of the same type of all items included in each subset of the shareable background attribute items;
  • a determining module 404 configured to determine, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all items included in each of the shareable background attribute item subsets, each of the shareable background attribute item subsets is included Corrected attribute values for background attributes of the same type for all items;
  • the modifying module 405 is configured to modify each of the original attribute values of the same type of background attributes of all items included in each subset of the shareable background attribute items to a corrected attribute value.
  • the identifier of each item includes:
  • pictures corresponding to each item include:
  • the main display image corresponding to each item the supplementary display picture corresponding to each item, the style color number display picture corresponding to each item, or the detail display picture corresponding to each item.
  • the dividing module 402 includes:
  • a building unit for constructing a binary group for each of the N items, wherein the first of the two groups
  • the element is an identifier of each item, the other elements of the group are the identity of each item, and the background attribute of each item, the original attribute value of the background attribute;
  • a sorting unit for sorting all the two groups according to the first element, and grouping the same two groups of the first element to form a set of M sets, wherein each set of the two sets represents A subset of items that can share background attributes.
  • the determining module includes:
  • a first processing unit configured to use the first shareable background attribute product subset of the M shareable background attribute item subsets as a current shareable background attribute product subset;
  • a calculating unit configured to use the same number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, and the same type of all items included in the current shareable background attribute item subset The total number of occurrences of all the original attribute values of the background attribute is proportionally calculated, and the distribution ratio of each original attribute value of the same type of background attribute of all the items included in the current shareable background attribute item subset is obtained;
  • a comparison unit configured to compare a distribution ratio of each original attribute value of a background attribute of the same type of all items included in the obtained current shareable background attribute item subset with a preset modification ratio threshold
  • the attribute value determining unit is configured to determine, if the distribution ratio of the original attribute value exists in the background attribute of the same type of all the items included in the current shareable background attribute item set, which is greater than the preset modification ratio threshold, determine a certain
  • the original attribute value is the corrected attribute value of the background attribute corresponding to a certain original attribute value of all the items included in the current shareable background attribute item subset.
  • the statistics module 403 includes:
  • a second processing unit configured to use the first shareable background attribute product subset of the M shareable background attribute item subsets as a current shareable background attribute product subset;
  • a first determining unit configured to determine whether the number of items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold
  • a statistical unit configured to count, if greater than the preset modified quantity threshold, the number of occurrences of each original attribute value of the same type of background attribute of all commodities included in the current shareable background attribute commodity subset;
  • the determining module 404 includes:
  • a current determining unit configured to determine, according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, the same item of all items included in the current shareable background attribute item subset The corrected attribute value of the background property of the type;
  • the modification module 405 includes:
  • the statistics module 403 further includes:
  • a second determining unit configured to determine, if less than or equal to the preset modified quantity threshold, whether the current shareable background attribute product subset is the Mth shareable background attribute commodity subset;
  • a notification unit configured to: if not the Mth shareable background attribute product subset, the next shareable background attribute item subset of the current shareable background attribute item subset is used as the current shareable background attribute item subset, but the notification
  • the first determining unit performs a step of determining whether the number of items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold;
  • the ending unit is used to end if it is the Mth shareable background attribute product subset.
  • the apparatus further includes:
  • mapping module 406 configured to map an identifier of each of the N items into an integer
  • a remainder calculation module 407 configured to take an integer corresponding to each commodity to a preset number of parallel computing computer stations P; wherein, P is a natural number;
  • An allocation module 408, configured to allocate each item to the parallel computing computer of the number corresponding to the remainder
  • the dividing module 402 includes: P dividing units 402a, each of which is respectively disposed in each parallel computing computer;
  • the P dividing units 402a are configured to divide the N items into M sub-shared background attribute commodity subsets according to the identifier of each commodity in each parallel computing computer;
  • the statistics module 403 includes: P number of statistics units 403a, each of which is set in each parallel computing computer;
  • Each number of times statistics unit 403a is configured to count the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute commodity subsets in each parallel operation computer;
  • the determining module 404 includes: P determining units 404a, each determining unit is respectively disposed in each parallel computing computer;
  • Each determining unit is configured to determine each parallel operation according to the number of occurrences of each original attribute value of the same type of background attribute of all the items included in each of the shareable background attribute items in each parallel computing computer
  • Each of the computers can share a corrected attribute value of the same type of background attribute of all items included in the background attribute item subset;
  • the modifying module 405 includes: P modifying units 405a, each of which is separately disposed in each parallel computing computer;
  • Each modification unit 405a is respectively configured to modify each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute item subsets in each parallel operation computer to a correction attribute value.
  • the apparatus for correcting the attribute value of the background attribute of the product according to the embodiment, according to the identifier of each item, dividing the N items into M subsets of the shareable background attribute, according to each shareable background attribute item.
  • Each original attribute value of the same type of background attribute of all items included in the shared background attribute commodity group is modified to the corrected attribute value, and the attribute value of the background attribute of the item can be automatically modified, without manual completion, and the modification can be improved.
  • Parallel modification by P parallel computing computer can greatly accelerate the operation and further improve the modification efficiency.
  • the device corresponds to the foregoing method flow description, and the deficiencies refer to the description of the above method flow, and will not be further described.

Abstract

A method and device for correcting attribute values of commodity background attribute, which belong to the field of computer communication technologies. The method comprises: acquiring a commodity identifier of each of a number N of the commodities (101); dividing a number N of the commodities into a number M of the shareable background attribute commodity subsets (102); counting the number of occurrences of each original attribute value of the same type of background attributes of all the commodities included in each shareable background attribute commodity subset (103); according to the number of occurrences of each original attribute value of the same type of background attributes of all the commodities included in each shareable background attribute commodity subset, determining corrected attribute values of the same type of background attributes of all the commodities included in each shareable background attribute commodity subset (104); and modifying each original attribute value of the same type of background attributes of all the commodities included in each shareable background attribute commodity subset, as the corrected attribute value (105).

Description

校正商品的后台属性的属性值的方法和装置Method and apparatus for correcting attribute values of background attributes of goods
交叉参考相关引用Cross reference related reference
本申请要求2015年03月18日递交的申请号为201510119332.6、发明名称为“校正商品的后台属性的属性值的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201510119332.6, filed on March 18, 2015, entitled,,,,,,,,,,,,,,,,,, in.
技术领域Technical field
本发明涉及计算机通信技术领域,具体涉及一种校正商品的后台属性的属性值的方法和装置。The present invention relates to the field of computer communication technologies, and in particular, to a method and apparatus for correcting attribute values of background attributes of commodities.
背景技术Background technique
随着计算机通信技术的发展,越来越多的商家通过网络售卖商品,通过网络售卖商品时,商品的后台属性是描述商品的重要信息,商品的后台属性影响搜索、导购、推荐等商品呈现给消费者的效果,因此商品的后台属性很重要。然而,现有大量商品存在漏填、错填后台属性的属性值等问题,例如:举女包类目的后台属性“款式”(该属性包括贝壳包、剑桥包、保龄球包等三十余种属性值)为例。消费者在通过关键字“贝壳包”搜索,或在导购路径点击“贝壳包”标签,或期望推荐系统为其推荐更多“贝壳包”时,如果某商品实质上为贝壳包但是漏填款式属性的属性值,从而导致消费者在通过关键字“贝壳包”搜索时,该商品不会展现给消费者,造成漏召回;如果平台中某商品实质上为其他款式(如托特包)但是款式属性的属性值被错填为贝壳包,从而导致消费者在通过关键字“贝壳包”搜索,或在导购路径点击“贝壳包”标签,或期望推荐系统为其推荐更多“贝壳包”时,该商品被错误地呈现给消费者,造成错召回。后台属性的属性值漏填导致的漏召回会给消费者造成平台商品不丰富的印象;后台属性的属性值错填导致的错召回会给消费者造成平台搜索、导购或推荐产品效果不准确的印象。因此,需要定期对商品的后台属性的属性值进行补充、校正等修改。With the development of computer communication technology, more and more merchants sell goods through the Internet. When selling goods through the Internet, the background attribute of the goods is important information describing the goods. The background attributes of the products affect the search, shopping guide, recommendation and other products presented to The effect of the consumer, so the background properties of the product are important. However, there are a lot of existing products, such as missing and filling in the attribute values of the background attribute, for example, the background attribute "style" of the female bag category (this attribute includes more than 30 kinds of shell bags, Cambridge bags, bowling bags, etc.) The attribute value) is an example. When the consumer searches through the keyword "seashell bag", or clicks on the "seashell bag" label in the shopping guide path, or expects the recommendation system to recommend more "seashell bags", if a product is substantially a shell bag but is missing the style The attribute value of the attribute, so that when the consumer searches through the keyword "seashell bag", the item will not be displayed to the consumer, causing a miss recall; if a product in the platform is substantially other styles (such as a tote bag) The attribute value of the style attribute is incorrectly filled into the shell package, which causes the consumer to search through the keyword “seashell bag”, or click on the “seashell bag” label in the shopping guide path, or expect the recommendation system to recommend more “shell bags” for it. At the time, the item was incorrectly presented to the consumer, causing a false recall. The missed recall caused by the attribute value of the background attribute will cause the impression that the platform product is not rich to the consumer; the wrong recall caused by the incorrect attribute value of the background attribute will cause the platform search, shopping guide or recommended product effect to be inaccurate to the consumer. impression. Therefore, it is necessary to periodically modify, correct, and so on the attribute values of the background attributes of the product.
现有校正商品的后台属性的属性值的方法是,通过人工抽查或用户举报等方式人工发现问题,然后督促商家或运营小二手工进行补充、校正等修改。The existing method for correcting the attribute value of the background attribute of the commodity is to manually discover the problem by means of manual sampling or user report, and then urge the merchant or the operation of the second to manually make corrections such as supplementation and correction.
然而,现有校正商品的后台属性的属性值的方法主要依靠人工完成,效率非常低。However, the existing method of correcting the attribute value of the background attribute of the commodity mainly relies on manual completion, and the efficiency is very low.
发明内容Summary of the invention
为了解决现有技术的问题,本发明提供了一种校正商品的后台属性的属性值的方法和装置,可以自动对商品的后台属性的属性值进行修改,不需要依靠人工完成,可以提高修改效率。In order to solve the problems of the prior art, the present invention provides a method and apparatus for correcting an attribute value of a background attribute of an item, which can automatically modify the attribute value of the background attribute of the item, without requiring manual completion, and can improve the modification efficiency. .
为了解决上述问题,本发明公开了一种校正商品的后台属性的属性值的方法,所述方法包括:In order to solve the above problems, the present invention discloses a method for correcting an attribute value of a background attribute of an item, the method comprising:
获取N个商品中每个所述商品的识别符;其中,所述N为自然数;Obtaining an identifier of each of the N items; wherein the N is a natural number;
根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集;其中,所述M为自然数,所述M小于所述N;According to the identifier of each of the commodities, the N items are divided into M shareable background attribute product subsets; wherein, the M is a natural number, and the M is smaller than the N;
统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数;Counting the number of occurrences of each original attribute value of the same type of background attribute of all of the items included in each of the shareable background attribute item subsets;
根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;Determining, according to the number of occurrences of each of the original attribute values of the same type of background attributes of all the commodities included in each of the shareable background attribute item subsets, determining each of the shareable background attribute item subsets included Corrected attribute values for the same type of background attributes of all of the items;
将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值。Each of the original attribute values of the same type of background attribute of all of the items included in each of the subsets of the shareable background attribute items is modified to the corrected attribute value.
进一步地,每个所述商品的识别符包括:Further, the identifier of each of the commodities includes:
每个所述商品对应的图片的链接地址、每个所述商品对应的图片的内容签名、或每个所述商品的货号。a link address of a picture corresponding to each of the items, a content signature of a picture corresponding to each of the items, or a item number of each of the items.
进一步地,每个所述商品对应的图片包括:Further, the picture corresponding to each of the commodities includes:
每个所述商品对应的主展示图片、每个所述商品对应的补充展示图片、每个所述商品对应的款式色号展示图片、或每个所述商品对应的细节展示图片。a main display image corresponding to each of the commodities, a supplementary display image corresponding to each of the commodities, a style color display image corresponding to each of the commodities, or a detail display image corresponding to each of the commodities.
进一步地,根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集,包括:Further, according to the identifier of each of the commodities, dividing the N items into M subsets of shareable background attribute products, including:
为N个所述商品中的每个所述商品构建一条二元组,其中,所述二元组的第一个元素为每个所述商品的识别符、所述二元组的其他元素为每个所述商品的身份标识,以及每个所述商品的后台属性、所述后台属性的原始属性值;Constructing a binary group for each of the N of the commodities, wherein the first element of the binary group is an identifier of each of the commodities, and other elements of the binary group are An identity of each of the items, and a background attribute of each of the items, and an original attribute value of the background attribute;
将所有的所述二元组按照所述第一个元素进行排序,并将所述第一个元素相同的所述二元组聚在一起构成M个二元组集合,其中,每个所述二元组集合代表一个所述可共享后台属性商品子集。Sorting all of the two groups according to the first element, and grouping the two groups of the same element to form a set of M groups, wherein each of the The set of tuples represents a subset of the shareable background attribute items.
进一步地,根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值,包括: Further, each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets. Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集;Taking the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,与当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的所有所述原始属性值的总的出现次数进行比例计算,得到当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例;And the number of occurrences of each of the original attribute values of the same type of background attribute of all the items included in the current subset of the shareable background attribute items, and all the items included in the current subset of the shareable background attribute items Performing a proportional calculation on the total number of occurrences of all the original attribute values of the same type of background attribute of the item, and obtaining each of the same type of background attributes of all the items included in the current subset of the shareable background attribute items a distribution ratio of the original attribute values;
将得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例,与预设的修改比例阈值进行比较;And comparing the distribution ratio of each of the original attribute values of the same type of background attributes of all the commodities included in the current subset of the shareable background attribute items to a preset modification ratio threshold;
如果得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性中存在某个所述原始属性值的分布比例,大于预设的修改比例阈值,则确定某个所述原始属性值为当前所述可共享后台属性商品子集中包括的所有所述商品的与某个原始属性值对应的后台属性的校正属性值。If there is a distribution ratio of the original attribute value in the background attribute of the same type of all the commodities included in the current share of the shareable background attribute item, which is greater than a preset modification ratio threshold, determine a certain The original attribute value is a corrected attribute value of a background attribute corresponding to a certain original attribute value of all the commodities included in the current subset of the shareable background attribute.
进一步地,统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数,包括:Further, counting the number of occurrences of each original attribute value of the same type of background attribute of all the commodities included in each of the shareable background attribute items, including:
将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集;Taking the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值;Determining whether the number of the products included in the subset of the tradable background attribute items is greater than a preset threshold of the modified quantity;
如果大于预设的修改数量阈值,则统计当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;If it is greater than the preset modified quantity threshold, counting the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item;
相应地,根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值,包括:Correspondingly, each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
根据当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;Determining, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item, all the places included in the current share of the shareable background attribute a corrected attribute value of a background attribute of the same type of the item;
相应地,将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值,包括:Correspondingly, modifying each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items into the corrected attribute values comprises:
将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。Each of the original attribute values of the same type of background attribute of all of the items included in the subset of the currently shareable background attribute items is modified to a corrected attribute value.
进一步地,判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大 于预设的修改数量阈值之后,还包括:Further, determining whether the number of the products included in the current subset of the shareable background attribute items is large After the preset modified quantity threshold, it also includes:
如果小于等于预设的修改数量阈值,则判断所述当前可共享后台属性商品子集是否是第M个所述可共享后台属性商品子集;If it is less than or equal to the preset modification quantity threshold, determining whether the current shareable background attribute product subset is the Mth of the sharable background attribute commodity subset;
如果不是第M个所述可共享后台属性商品子集,则将当前所述可共享后台属性商品子集的下一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集,然而执行判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值的步骤;If it is not the Mth shareable background attribute product subset, the next subset of the shareable background attribute item of the current shareable background attribute item subset is used as the current shareable background attribute item subset And performing the step of determining whether the number of the commodities included in the subset of the shareable background attribute items is greater than a preset modification quantity threshold;
如果是第M个所述可共享后台属性商品子集,则结束。If it is the Mth of the shareable background attribute product subset, it ends.
进一步地,获取N个商品中每个所述商品的识别符之后,还包括:Further, after obtaining the identifier of each of the N items, the method further includes:
将N个所述商品中每个所述商品的识别符映射为一个整数;Mapping an identifier of each of the N of the commodities to an integer;
将每个所述商品对应的整数对预设的并行运算计算机台数P取余数;其中,所述P为自然数;Taking an integer corresponding to each of the commodities to a preset number of parallel computing computer stations P; wherein the P is a natural number;
将每个所述商品分配到所述余数对应的编号的并行运算计算机;Assigning each of the commodities to a parallel computing computer of the number corresponding to the remainder;
相应地,根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集,包括:Correspondingly, according to the identifier of each of the commodities, dividing the N items into M subsets of shareable background attribute products, including:
通过每台所述并行运算计算机根据每台所述并行运算计算机中的每个所述商品的识别符,一起将N个所述商品划分为M个所述可共享后台属性商品子集;Determining, by each of the parallel computing computers, the N items into the M subsets of the shareable background attribute products according to the identifier of each of the commodities in each of the parallel computing computers;
相应地,统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数,包括:Correspondingly, counting the number of occurrences of each original attribute value of the same type of background attribute of all the items included in each of the shareable background attribute items, including:
通过每台所述并行运算计算机统计每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;Counting, by each of the parallel computing computers, each of the original attribute values of the same type of background attribute of all of the items included in each of the subset of the shareable background attribute items in each of the parallel operation computers The number of occurrences;
相应地,根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值,包括:Correspondingly, each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
通过每台所述并行运算计算机根据每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;And each of said original attribute values of the same type of background attribute of all said items included in each of said plurality of said shareable background attribute items in each of said parallel computing computers is passed by each said parallel computing computer a number of occurrences of determining a corrected attribute value of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers;
相应地,将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值,包括:Correspondingly, modifying each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items into the corrected attribute values comprises:
通过每台所述并行运算计算机将每台所述并行运算计算机中的每个所述可共享后台 属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。Each of the parallelizable computing computers to each of the parallel computing computers Each of the original attribute values of the same type of background attribute of all of the items included in the attribute item subset is modified to a corrected attribute value.
为了解决上述问题,本发明还公开了一种校正商品的后台属性的属性值的装置,所述装置包括:In order to solve the above problem, the present invention also discloses an apparatus for correcting an attribute value of a background attribute of an item, the apparatus comprising:
获取模块,用于获取N个商品中每个所述商品的识别符;其中,所述N为自然数;An obtaining module, configured to obtain an identifier of each of the N items, wherein the N is a natural number;
划分模块,用于根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集;其中,所述M为自然数,所述M小于所述N;a dividing module, configured to divide the N items into M sharable background attribute product subsets according to an identifier of each of the commodities; wherein, the M is a natural number, and the M is smaller than the N;
统计模块,用于统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数;a statistics module, configured to count the number of occurrences of each original attribute value of the same type of background attribute of all the commodities included in each of the shareable background attribute commodity subsets;
确定模块,用于根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;a determining module, configured to determine each of the sharable background attributes according to an occurrence number of each of the original attribute values of a background attribute of a same type of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for background attributes of the same type for all of the items included in the subset of items;
修改模块,用于将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值。And a modifying module, configured to modify each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items to the corrected attribute values.
进一步地,每个所述商品的识别符包括:Further, the identifier of each of the commodities includes:
每个所述商品对应的图片的链接地址、每个所述商品对应的图片的内容签名、或每个所述商品的货号。a link address of a picture corresponding to each of the items, a content signature of a picture corresponding to each of the items, or a item number of each of the items.
进一步地,每个所述商品对应的图片包括:Further, the picture corresponding to each of the commodities includes:
每个所述商品对应的主展示图片、每个所述商品对应的补充展示图片、每个所述商品对应的款式色号展示图片、或每个所述商品对应的细节展示图片。a main display image corresponding to each of the commodities, a supplementary display image corresponding to each of the commodities, a style color display image corresponding to each of the commodities, or a detail display image corresponding to each of the commodities.
进一步地,所述划分模块包括:Further, the dividing module includes:
构建单元,用于为N个所述商品中的每个所述商品构建一条二元组,其中,所述二元组的第一个元素为每个所述商品的识别符、所述二元组的其他元素为每个所述商品的身份标识,以及每个所述商品的后台属性、所述后台属性的原始属性值;a building unit, configured to construct a dual group for each of the N of the commodities, wherein a first element of the binary group is an identifier of each of the commodities, the binary The other elements of the group are the identity of each of the items, and the background attribute of each of the items, the original attribute value of the background attribute;
排序单元,用于将所有的所述二元组按照所述第一个元素进行排序,并将所述第一个元素相同的所述二元组聚在一起构成M个二元组集合,其中,每个所述二元组集合代表一个所述可共享后台属性商品子集。a sorting unit, configured to sort all the two groups according to the first element, and group the two groups of the same element to form a set of M groups, wherein Each of the set of binary groups represents a subset of the shareable background attribute items.
进一步地,所述确定模块包括:Further, the determining module includes:
第一处理单元,用于将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前可共享后台属性商品子集;a first processing unit, configured to use the first subset of the sharable background attribute items in the subset of the sharable background attribute items as a subset of the current shareable background attribute items;
计算单元,用于将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,与当前所述可共享后台属性商品子 集中包括的所有所述商品的同一类型的后台属性的所有所述原始属性值的总的出现次数进行比例计算,得到当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例;a calculating unit, configured to display the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the subset of the currently shareable background attribute items, and the current shareable background attribute item Performing a proportional calculation on the total number of occurrences of all the original attribute values of the same type of background attributes of all the commodities included in the set, and obtaining the same type of all the commodities included in the current subset of the shareable background attribute items The distribution ratio of each of the original attribute values of the background attribute;
比较单元,用于将得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例,与预设的修改比例阈值进行比较;a comparing unit, configured to perform a distribution ratio of each of the original attribute values of the same type of background attributes of all the products included in the current subset of the shareable background attribute items, and a preset modification ratio threshold Comparison
属性值确定单元,用于如果得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性中存在某个所述原始属性值的分布比例,大于预设的修改比例阈值,则确定某个所述原始属性值为当前所述可共享后台属性商品子集中包括的所有所述商品的与某个原始属性值对应的后台属性的校正属性值。The attribute value determining unit is configured to: if a distribution ratio of the original attribute value exists in the background attribute of the same type of all the commodities included in the current subset of the shareable background attribute items, which is greater than a preset modification And a proportional threshold, determining that the original attribute value is a corrected attribute value of a background attribute corresponding to a certain original attribute value of all the commodities included in the current subset of the shareable background attribute commodity.
进一步地,所述统计模块包括:Further, the statistics module includes:
第二处理单元,用于将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集;a second processing unit, configured to use the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
第一判断单元,用于判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值;a first determining unit, configured to determine whether the number of the products included in the subset of the tradable background attribute items is greater than a preset threshold of the modified quantity;
统计单元,用于如果大于预设的修改数量阈值,则统计当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;a statistical unit, configured to: if greater than the preset modified quantity threshold, count the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute commodity;
相应地,所述确定模块包括:Correspondingly, the determining module comprises:
当前确定单元,用于根据当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;a current determining unit, configured to determine, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item, the current shareable background attribute item Corrected attribute values for the same type of background attributes of all of the items included in the subset;
相应地,所述修改模块包括:Correspondingly, the modifying module comprises:
当前修改单元,用于将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。And a current modification unit, configured to modify each of the original attribute values of the same type of background attributes of all the items included in the current subset of the shareable background attribute items to a corrected attribute value.
进一步地,所述统计模块还包括:Further, the statistics module further includes:
第二判断单元,用于如果小于等于预设的修改数量阈值,则判断所述当前可共享后台属性商品子集是否是第M个所述可共享后台属性商品子集;a second determining unit, configured to determine, if less than or equal to the preset modified quantity threshold, whether the current shareable background attribute product subset is the Mth shareable background attribute product subset;
通知单元,用于如果不是第M个所述可共享后台属性商品子集,则将当前所述可共享后台属性商品子集的下一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集,然而通知所述第一判断单元执行判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值的步骤; a notification unit, configured to use, if not the Mth, the subset of the shareable background attribute items, the next subset of the shareable background attribute items of the current shareable background attribute item subset as the current shareable a background attribute item subset, but notifying the first determining unit to perform a step of determining whether the number of the items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold;
结束单元,用于如果是第M个所述可共享后台属性商品子集,则结束。The ending unit is configured to end if it is the Mth subset of the shareable background attribute items.
进一步地,所述装置还包括:Further, the device further includes:
映射模块,用于将N个所述商品中每个所述商品的识别符映射为一个整数;a mapping module, configured to map an identifier of each of the N items into an integer;
余数计算模块,用于将每个所述商品对应的整数对预设的并行运算计算机台数P取余数;其中,所述P为自然数;a remainder calculation module, configured to take an integer corresponding to each of the commodities to a preset number of parallel computing computer stations P; wherein, the P is a natural number;
分配模块,用于将每个所述商品分配到所述余数对应的编号的并行运算计算机;An allocation module, configured to allocate each of the commodities to a parallel computing computer of a number corresponding to the remainder;
相应地,所述划分模块包括:P个划分单元,每个所述划分单元分别设置在每台所述并行运算计算机中;Correspondingly, the dividing module comprises: P dividing units, each of the dividing units being respectively disposed in each of the parallel computing computers;
P个所述划分单元,用于根据每台所述并行运算计算机中的每个所述商品的识别符,一起将N个所述商品划分为M个所述可共享后台属性商品子集;And the P dividing units are configured to divide the N items into M subsets of the shareable background attribute products according to an identifier of each of the commodities in each of the parallel computing computers;
相应地,所述统计模块包括:P个次数统计单元,每个所述次数统计单元分别设置在每台所述并行运算计算机中;Correspondingly, the statistic module includes: P number of statistics units, each of the number of statistic units is respectively disposed in each of the parallel computing computers;
每个所述次数统计单元,分别用于统计每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;Each of the number of statistics units is configured to count each of the original attributes of the same type of background attributes of all of the items included in each of the subset of the shareable background attribute items in each of the parallel operation computers The number of occurrences of the attribute value;
相应地,所述确定模块包括:P个确定单元,每个所述确定单元分别设置在每台所述并行运算计算机中;Correspondingly, the determining module includes: P determining units, each of the determining units being respectively disposed in each of the parallel computing computers;
每个所述确定单元,分别用于根据每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;Each of the determining units is configured to each of the original attributes of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers a number of occurrences of the value, determining a corrected attribute value of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers;
相应地,所述修改模块包括:P个修改单元,每个所述修改单元分别设置在每台所述并行运算计算机中;Correspondingly, the modifying module includes: P modifying units, each of the modifying units being respectively disposed in each of the parallel computing computers;
每个所述修改单元,分别用于将每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。Each of the modifying units is configured to respectively use each of the original attributes of the same type of background attribute of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers The value is modified to correct the attribute value.
与现有技术相比,本发明可以获得包括以下技术效果:Compared with the prior art, the present invention can obtain the following technical effects:
1)根据每个商品的识别符,将N个商品划分为M个可共享后台属性商品子集,根据每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值,将每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值,可以自动对商品的后台属性的属性值进行 修改,不需要依靠人工完成,可以提高修改效率。1) According to the identifier of each item, divide N items into M subsets of shareable background attribute products, according to each original of the same type of background attribute of all items included in each share of the shareable background attribute item The number of occurrences of the attribute value, the corrected attribute value of the same type of background attribute of all the items included in each of the shareable background attribute item subsets, and the background of the same type of all items included in each of the shareable background attribute item subsets Each original attribute value of the attribute is modified to correct the attribute value, and the attribute value of the background attribute of the item can be automatically performed. Modifications do not need to be done manually, which can improve the efficiency of modification.
2)通过P台并行运算计算机进行并行修改,可以极大地加速运算,进一步提高修改效率。2) Parallel modification by P parallel computing computer can greatly accelerate the operation and further improve the modification efficiency.
当然,实施本发明的任一产品必不一定需要同时达到以上所述的所有技术效果。Of course, implementing any of the products of the present invention necessarily does not necessarily require all of the technical effects described above to be achieved at the same time.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是本发明实施例的第一种校正商品的后台属性的属性值的方法流程图;1 is a flow chart of a first method for correcting attribute values of background attributes of an item according to an embodiment of the present invention;
图2是本发明实施例的第二种校正商品的后台属性的属性值的方法流程图;2 is a flow chart of a second method for correcting attribute values of background attributes of an item according to an embodiment of the present invention;
图3是本发明实施例的第三种校正商品的后台属性的属性值的方法流程图;3 is a flowchart of a third method for correcting attribute values of background attributes of an item according to an embodiment of the present invention;
图4是本发明实施例的第四种校正商品的后台属性的属性值的方法流程图;4 is a flowchart of a fourth method for correcting attribute values of background attributes of an item according to an embodiment of the present invention;
图5是本发明实施例的第一种校正商品的后台属性的属性值的装置结构示意图;FIG. 5 is a schematic structural diagram of an apparatus for correcting attribute values of background attributes of an item according to an embodiment of the present invention; FIG.
图6是本发明实施例的第二种校正商品的后台属性的属性值的装置结构示意图。FIG. 6 is a schematic structural diagram of an apparatus for correcting attribute values of background attributes of an item according to an embodiment of the present invention.
具体实施方式detailed description
以下将配合附图及实施例来详细说明本发明的实施方式,藉此对本发明如何应用技术手段来解决技术问题并达成技术功效的实现过程能充分理解并据以实施。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and embodiments, in which the present invention can be fully understood and implemented by the technical means of solving the technical problems and achieving the technical effects.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。 Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
如在说明书及权利要求当中使用了某些词汇来指称特定组件。本领域技术人员应可理解,硬件制造商可能会用不同名词来称呼同一个组件。本说明书及权利要求并不以名称的差异来作为区分组件的方式,而是以组件在功能上的差异来作为区分的准则。如在通篇说明书及权利要求当中所提及的“包含”为一开放式用语,故应解释成“包含但不限定于”。“大致”是指在可接收的误差范围内,本领域技术人员能够在一定误差范围内解决所述技术问题,基本达到所述技术效果。此外,“耦接”一词在此包含任何直接及间接的电性耦接手段。因此,若文中描述一第一装置耦接于一第二装置,则代表所述第一装置可直接电性耦接于所述第二装置,或通过其他装置或耦接手段间接地电性耦接至所述第二装置。说明书后续描述为实施本发明的较佳实施方式,然所述描述乃以说明本发明的一般原则为目的,并非用以限定本发明的范围。本发明的保护范围当视所附权利要求所界定者为准。Certain terms are used throughout the description and claims to refer to particular components. Those skilled in the art will appreciate that hardware manufacturers may refer to the same component by different nouns. The present specification and the claims do not use the difference in the name as the means for distinguishing the components, but the difference in function of the components as the criterion for distinguishing. The word "comprising" as used throughout the specification and claims is an open term and should be interpreted as "including but not limited to". "Substantially" means that within the range of acceptable errors, those skilled in the art will be able to solve the technical problems within a certain error range, substantially achieving the technical effects. In addition, the term "coupled" is used herein to include any direct and indirect electrical coupling means. Therefore, if a first device is coupled to a second device, the first device can be directly electrically coupled to the second device, or electrically coupled indirectly through other devices or coupling means. Connected to the second device. The description of the present invention is intended to be illustrative of the preferred embodiments of the invention. The scope of the invention is defined by the appended claims.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的商品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种商品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的商品或者系统中还存在另外的相同要素。It should also be noted that the terms "including", "comprising" or "comprising" or any other variations thereof are intended to encompass a non-exclusive inclusion, such that the item or system comprising a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such goods or systems. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the item or system including the element, without further limitation.
实施例描述Description of the embodiment
下面以一实施例对本发明方法的实现作进一步说明。如图1所示,为本发明实施例的一种校正商品的后台属性的属性值的方法流程图,该方法包括:The implementation of the method of the present invention is further illustrated by an embodiment below. As shown in FIG. 1 , a flowchart of a method for correcting attribute values of background attributes of an item according to an embodiment of the present invention includes:
S101:获取N个商品中每个商品的识别符;其中,N为自然数。S101: Acquire an identifier of each of the N items; wherein N is a natural number.
具体地,N个商品可以是一个或多个交易平台中的所有商品,也可以是一个或多个交易平台中的同一类目下的所有商品等,对此不做具体限定,本实施例的方法适用于任何的商品。Specifically, the N items may be all the items in the one or more trading platforms, or may be all the items in the same category in the one or more trading platforms, and are not specifically limited. The method is applicable to any commodity.
其中,每个商品的识别符包括:每个商品对应的图片的链接地址、每个商品对应的图片的内容签名(比如MD5或其他专门设计的图像签名等)、或每个商品的货号等。每个商品对应的图片包括:每个商品对应的主展示图片(可简称主图)、每个商品对应的补充展示图片(可能有多个)、每个商品对应的款式色号展示图片(SKU图)、或每个商品对应的细节展示图片(详情图)等。The identifier of each product includes: a link address of a picture corresponding to each item, a content signature of a picture corresponding to each item (such as MD5 or other specially designed image signature, etc.), or a item number of each item. The picture corresponding to each item includes: the main display picture corresponding to each item (may be referred to as the main picture), the supplementary display picture corresponding to each item (possibly multiple), and the style color number display picture corresponding to each item (SKU Figure), or the details of each product display picture (details map) and so on.
需要说明的是,如果两个商品,例如:商品A和商品B,如果商品A的图片与商品B的图片有某种相等性,如商品A的图片与商品B的图片源自同一个链接地址(也就是说商品A和商品B引用了同一张图片),或如商品A的图片与商品B的图片的内容签名一致(也就是说商品A和商品B的图片的内容一致),或如商品A的货号与商品B 的货号相同(也就是说商品A和商品B的款式相同),那么商品A和商品B很有可能是同一种商品,也就是说商品A和商品B的后台属性应该是一致(可共享)的。It should be noted that if two products, for example, product A and product B, have some equality between the picture of item A and the picture of item B, for example, the picture of item A and the picture of item B originate from the same link address. (That is, product A and product B refer to the same picture), or if the picture of item A matches the content signature of the picture of item B (that is, the content of picture of item A and item B is the same), or such as A's item number and item B The item number is the same (that is, the item A and the item B have the same style), then the item A and the item B are likely to be the same item, that is, the background attributes of the item A and the item B should be consistent (shareable). .
根据上面的说明,显然可知,商品A与商品A自己肯定是可共享后台属性关系,因此可以说这个关系是“自反”的;如果商品A对商品B来说是可共享后台属性关系,那么商品B对商品A也是可共享后台属性关系,可以说这个关系是“对称”的。如果商品A与商品B引用了同一张主展示图片,而这张主展示图片也被商品B与商品C作为主展示图片所共同引用,那么商品A与商品C也是共享后台属性关系,显然这个关系是可以“传递”的。根据离散数学的定义,符合这三个条件的关系——商品间的可共享后台属性关系是一种“等价关系”。According to the above description, it is obvious that the commodity A and the commodity A are themselves shareable with the background attribute relationship, so it can be said that the relationship is "reflexive"; if the commodity A is a shareable background attribute relationship for the commodity B, then Commodity B is also a shareable background attribute relationship for item A. It can be said that this relationship is "symmetric". If item A and item B refer to the same main display picture, and this main display picture is also commonly quoted by item B and item C as the main display picture, then item A and item C are also shared background attribute relationships, obviously this relationship It can be "delivered". According to the definition of discrete mathematics, the relationship that satisfies these three conditions—the sharable back-end attribute relationship between commodities is an “equivalent relationship”.
需要说明的是,一般来说,商品的主展示图片要遵从严格的规范,要求其一定要完整展示商品全貌且不能包含无关信息(也就是说商品的主展示图片与所属商品是高度相关的),通过商品的主展示图片的重复引用关系建立的商品间的可共享后台属性关系的可靠性更高,因此优选将商品对应的主展示图片的链接地址、或商品对应的主展示图片的内容签名作为商品的识别符。It should be noted that, in general, the main display image of the product must comply with strict specifications, and it must be fully displayed and cannot contain irrelevant information (that is, the main display image of the product is highly relevant to the product) The reliability of the shareable background attribute relationship between the products established by the repeated reference relationship of the main display image of the product is higher. Therefore, it is preferable to sign the link address of the main display image corresponding to the product or the content signature of the main display image corresponding to the product. As the identifier of the product.
S102:根据每个商品的识别符,将N个商品划分为M个可共享后台属性商品子集;其中,M为自然数,M小于N。S102: Divide the N items into M subsets of the shareable background attribute according to the identifier of each item; wherein M is a natural number and M is less than N.
具体地,根据每个商品的识别符,将N个商品划分为M个可共享后台属性商品子集,包括:Specifically, according to the identifier of each item, the N items are divided into M subsets of the shareable background attribute items, including:
为N个商品中的每个商品构建一条二元组,其中,二元组的第一个元素为每个商品的识别符、二元组的其他元素为每个商品的身份标识,以及每个商品的后台属性、后台属性的原始属性值。Constructing a two-tuple for each of the N items, wherein the first element of the binary is the identifier of each item, the other elements of the group are the identity of each item, and each The background attribute of the item and the original attribute value of the background attribute.
例如:对于N个商品中任一商品K,为其构建一条二元组,表示为PairK:keyK–nidK,<pid0,vidK,0>,<pid1,vidK,1>…。二元组PairK中,keyK为商品K的识别符(例如具体可以为商品K对应的主展示图片的链接地址),是二元组的第一个元素;二元组的其他元素由以下成员组成:nidK(为商品K的身份标识ID),<pid0,vidK,0>,<pid1,vidK,1>…(为商品K的后台属性/属性值对(即商品K的后台属性,商品K的后台属性的属性值)。<pid0,vidK,0>,<pid1,vidK,1>…比如代表<款式-贝壳>,<衣长-短款>,<鞋头-鱼嘴>等意思。For example, for any of the N items K, a binary group is constructed for it, denoted as PairK:keyK–nidK, <pid0, vidK, 0>, <pid1, vidK, 1>. In the binary group PairK, keyK is the identifier of the commodity K (for example, the link address of the main display image corresponding to the commodity K), and is the first element of the binary group; the other elements of the binary group are composed of the following members. : nidK (for the identity ID of the item K), <pid0, vidK, 0>, <pid1, vidK, 1>... (for the background attribute/attribute value pair of the item K (ie the background attribute of the item K, the item K of the item) The attribute value of the background attribute). <pid0, vidK, 0>, <pid1, vidK, 1>... for example, <style-shell>, <cloth length-short paragraph>, <shoe head-fish mouth> and so on.
将所有的二元组按照第一个元素进行排序,并将第一个元素相同的二元组聚在一起构成M个二元组集合,其中,每个二元组集合代表一个可共享后台属性商品子集。Sort all the two groups according to the first element, and group the same two groups of the first element to form a set of M groups, where each set of two sets represents a shareable background attribute A subset of goods.
S103:统计每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数。S103: Count the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the items in the subset of the shareable background attribute.
具体地,因为二元组集合中的商品有相同的识别符(例如有相同的主展示图片), 所以二元组集合中的商品很有可能代表同一种商品,二元组集合中的商品的后台属性的属性值也应该是一致的。但是,在一个可共享后台属性商品子集中,由于这些商品来自不同卖家,虽然它们的后台属性理应统一,但实际情况往往各异。因此需要统计每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,根据统计结果来确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值。Specifically, because the items in the set of the two sets have the same identifier (for example, having the same main display picture), Therefore, the items in the set of two sets are likely to represent the same item, and the attribute values of the background attributes of the items in the set of sets should also be consistent. However, in a subset of shareable background property items, since these items come from different sellers, although their background attributes should be unified, the actual situation is often different. Therefore, it is necessary to count the number of occurrences of each original attribute value of the same type of background attribute of all the items included in each subset of the shareable background attribute items, and determine all the items included in each of the shareable background attribute item subsets according to the statistical result. The correct attribute value for the same type of background property.
S104:根据每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值。S104: Determine, according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute item subsets, determine the same type of all items included in each of the shareable background attribute item subsets The corrected attribute value of the background property.
具体地,可以根据出现次数,确定出现次数最多的某原始属性值,为每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值。Specifically, the original attribute value that has the most occurrences may be determined according to the number of occurrences, and the corrected attribute value of the same type of background attribute of all the items included in each of the items in the subset of the shareable background attribute.
然而,虽然某原始属性值的出现次数最多,但是因为别的卖家都没有填该种类型的后台属性的属性值,或者出现次数最多的某原始属性值是卖家易错填的等,上述各种情况下,出现次数最多的某原始属性值也不一定是校正属性值。为稳定性考虑,可以设置当某原始属性值的分布比例达到一定的修改比例阈值时,才确定某原始属性值为每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值,如果某原始属性值的出现次数是最多,但其分布比例达不到一定的修改比例阈值,那么则认为该种类型的后台属性的属性情况复杂,不做任何修改。However, although the original attribute value has the most occurrences, because the other sellers do not fill in the attribute value of the background attribute of the type, or the original attribute value of the most frequently occurring is the seller's error-filling, etc., the above various In this case, the value of the original attribute that has the most occurrences is not necessarily the value of the correction attribute. For stability considerations, it can be determined that when the distribution ratio of a certain original attribute value reaches a certain modification ratio threshold, it is determined that the original attribute value is the same type of background attribute of all the items included in each of the shareable background attribute commodity subsets. Correct the attribute value. If the number of occurrences of an original attribute value is the most, but the distribution ratio does not reach a certain revision ratio threshold, then the attribute of the background attribute of the type is considered to be complicated and no modification is made.
具体地,为稳定性考虑,参见图2,根据每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值,包括:Specifically, for stability consideration, referring to FIG. 2, each shareable background attribute item is determined according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute item subsets. Corrected attribute values for the same type of background attribute for all items included in the subset, including:
S104a:将M个可共享后台属性商品子集中的第一个可共享后台属性商品子集作为当前可共享后台属性商品子集。S104a: The first shareable background attribute item subset of the M shareable background attribute items subset is used as the current shareable background attribute item subset.
S104b:将当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,与当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的所有原始属性值的总的出现次数进行比例计算,得到当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的分布比例。S104b: the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, and the same type of background attribute of all items included in the current shareable background attribute item subset The total number of occurrences of all the original attribute values is proportionally calculated, and the distribution ratio of each original attribute value of the same type of background attribute of all the items included in the current shareable background attribute item subset is obtained.
S104c:将得到的当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的分布比例,与预设的修改比例阈值进行比较,如果得到的当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性中存在某个原始属性值的分布比例,大于预设的修改比例阈值,则执行S104d;如果得到的当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性中不存在某个原始属性值的分布 比例,大于预设的修改比例阈值,则执行S104f。S104c: Compare the distribution ratio of each original attribute value of the same type of background attribute of all the items included in the current share of the currently shareable background attribute item, and the preset modification ratio threshold, if the current shareable background is obtained If there is a distribution ratio of a certain original attribute value in the background attribute of the same type of all items included in the attribute item subset, which is greater than the preset modification ratio threshold, S104d is performed; if the obtained current shareable background attribute item subset is included The distribution of some original attribute value does not exist in the background attribute of the same type for all items If the ratio is greater than the preset modification ratio threshold, then S104f is performed.
具体地,预设的修改比例阈值可以根据实际应用状况进行设置,比如对可靠性有更高要求的监控处罚类产品,预设的修改比例阈值可以设置为75%或者更高等,来减小检验的失误率;对补充/校正的覆盖率要求高的产品,则可以适当放松要求,预设的修改比例阈值可以设置为30%等。Specifically, the preset modification ratio threshold may be set according to actual application conditions, for example, a monitoring penalty product having higher reliability requirements, and the preset modification ratio threshold may be set to 75% or higher to reduce the inspection. The rate of failure; for products with high coverage/correction coverage requirements, the requirements can be relaxed appropriately, and the preset modification ratio threshold can be set to 30%.
S104d:确定某个原始属性值为当前可共享后台属性商品子集中包括的所有商品的与某个原始属性值对应的后台属性的校正属性值。S104d: Determine a certain attribute value as a correction attribute value of a background attribute corresponding to a certain original attribute value of all items included in the current shareable background attribute item subset.
需要说明的是,当大于预设的修改比例阈值的原始属性值存在二个以上时,可以从中选择分布比例最大的原始属性值作为校正属性值。It should be noted that when there are more than two original attribute values that are greater than the preset modification ratio threshold, the original attribute value with the largest distribution ratio may be selected as the correction attribute value.
S104e:将当前可共享后台属性商品子集中包括的所有商品的与某个原始属性值对应的后台属性的每个原始属性值修改为校正属性值,然后执行S104g。S104e: Modify each original attribute value of the background attribute corresponding to a certain original attribute value of all items included in the current shareable background attribute item subset to a corrected attribute value, and then execute S104g.
S104f:确定不修改当前可共享后台属性商品子集中包括的所有商品的与某个原始属性值对应的后台属性的每个原始属性值,然后执行S104g。S104f: Determine not to modify each original attribute value of the background attribute corresponding to a certain original attribute value of all the commodities included in the current shareable background attribute item subset, and then execute S104g.
S104g:判断当前可共享后台属性商品子集是否是第M个可共享后台属性商品子集,如果不是,则执行S104h;否则,结束。S104g: Determine whether the currently shareable background attribute product subset is the Mth shareable background attribute product subset, if not, execute S104h; otherwise, end.
S104h:将当前可共享后台属性商品子集的下一个可共享后台属性商品子集作为当前可共享后台属性商品子集,然后执行S104b。S104h: The next shareable background attribute product subset of the current shareable background attribute item subset is taken as the current shareable background attribute item subset, and then S104b is performed.
S105:将每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值。S105: Modify each original attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items to a corrected attribute value.
具体地,将每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值,具体是:如果每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的某原始属性值与校正属性值相同,则保留;如果每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的某原始属性值与校正属性值不相同,则校正;如果每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的某原始属性值缺失,则补充。Specifically, each original attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items is modified to a corrected attribute value, specifically: if each of the shareable background attribute item subsets is included The original attribute value of the same type of background attribute of all items is the same as the corrected attribute value, and is retained; if the original attribute value and correction attribute of the same type of background attribute of all items included in each of the shareable background attribute items are included If the values are not the same, the correction is made; if the original attribute value of the same type of background attribute of all the items included in each of the shareable background attribute items is missing, it is added.
优选地,参见图3,在一优选的实施例中,S103-S105可以包括:Preferably, referring to FIG. 3, in a preferred embodiment, S103-S105 may include:
S201:将M个可共享后台属性商品子集中的第一个可共享后台属性商品子集作为当前可共享后台属性商品子集。S201: The first shareable background attribute product subset of the M shareable background attribute item subsets is used as the current shareable background attribute product subset.
S202:判断当前可共享后台属性商品子集中包括的商品的个数是否大于预设的修改数量阈值,如果大于预设的修改数量阈值,则执行S203;如果小于等于预设的修改数量阈值,则执行S206。S202: Determine whether the number of items included in the current shareable background attribute item set is greater than a preset modification quantity threshold, if it is greater than a preset modification quantity threshold, execute S203; if it is less than or equal to a preset modification quantity threshold, Execute S206.
具体地,当可共享后台属性商品子集中包含一定数量的商品时,才进行统计,例如: 如果只包含一个商品,则没有修改的依据和必要。Specifically, statistics are performed when a subset of the shareable background attribute items contains a certain number of items, for example: If there is only one item, there is no basis for modification and it is necessary.
其中,预设的修改数量阈值可以根据实际应用状况进行设置,如可以设置为2个、20个等。The preset modification quantity threshold may be set according to the actual application status, for example, it may be set to 2, 20, and the like.
S203:统计当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数。S203: Count the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current attribute group of the background attribute.
S204:根据当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值。S204: Determine, according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, determine a background attribute of the same type of all items included in the current shareable background attribute item subset. Corrected attribute value.
S205:将当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值。S205: Modify each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset to a corrected attribute value.
S206:判断当前可共享后台属性商品子集是否是第M个可共享后台属性商品子集,如果不是第M个可共享后台属性商品子集,则执行S207;如果是第M个可共享后台属性商品子集,则结束。S206: Determine whether the current share of the background attribute item is a subset of the M shareable background attribute product, if not the Mth shareable background attribute product subset, execute S207; if it is the Mth shareable background attribute The subset of goods ends.
S207:将当前可共享后台属性商品子集的下一个可共享后台属性商品子集作为当前可共享后台属性商品子集,然而执行S202。S207: The next shareable background attribute item subset of the current shareable background attribute item subset is taken as the current shareable background attribute item subset, but S202 is performed.
对于平台级的产品来说,需要处理的商品的量级可能在亿级以上,如果仅在单台计算机上实现,耗时仍然是不可接受的。不过,本实施例的方法是可以高度并行的,因此对于海量商品来说处理速度也是很快。假设有N个商品,通过P台并行运行计算机来处理,具体地,参见图4,在一优选的实施例中,S102-S105可以包括:For platform-level products, the amount of merchandise that needs to be processed may be in the order of more than 100 million. If it is only implemented on a single computer, the time consumption is still unacceptable. However, the method of the present embodiment can be highly parallel, so the processing speed is also fast for a large number of commodities. Suppose that there are N items, which are processed by running the computer in parallel through the P station. Specifically, referring to FIG. 4, in a preferred embodiment, S102-S105 may include:
S301:将N个商品中每个商品的识别符映射为一个整数。S301: Map an identifier of each of the N items to an integer.
具体地,每个商品的识别符key可以看做是一个字符串,对key求hashcode即可以将key映射为一个整数。假设key=c0c1…cL由L个字符组成,那么其对应的整数(记为HCL)的计算方法为:Specifically, the identifier key of each commodity can be regarded as a character string, and the key can be mapped to an integer by hash code. Assuming that key=c0c1...cL consists of L characters, then the corresponding integer (denoted as HCL) is calculated as:
HC0=0;HC0=0;
HCL=HCL-1*Z+cL。HCL=HCL-1*Z+cL.
其中,Z为任意一个质数,通常设为31;字符c按其ASCII码(整数)表示。Where Z is any prime number, usually set to 31; character c is represented by its ASCII code (integer).
并不限于通过上述方法将商品的识别符映射为一个整数,可以通过任何可行的方式实现,对此不做具体限定。It is not limited to mapping the identifier of the commodity to an integer by the above method, and may be implemented in any feasible manner, which is not specifically limited.
S302:将每个商品的整数对预设的并行运算计算机台数P取余数;其中,P为自然数。S302: Take the integer of each commodity to the preset parallel computing computer number P to take a remainder; wherein P is a natural number.
S303:将每个商品分配到余数对应的编号的并行运算计算机。S303: Assign each item to the parallel computing computer of the number corresponding to the remainder.
具体地,P台并行运算计算机的编号分别为0-P-1。 Specifically, the numbers of the P parallel computing computers are 0-P-1, respectively.
对于每个商品,按其key对应的HC对P取余数来分发。这样所有待处理的商品会被基本均匀地分发到P台并行运算计算机之上。相当于对任务全集按余数做了一遍预切分。For each item, the HC corresponding to its key is allocated to the remainder of P. In this way, all the items to be processed are distributed substantially evenly to the P parallel computing computer. It is equivalent to pre-cutting the remainder of the task by the remainder.
S304:通过每台并行运算计算机根据每台并行运算计算机中的每个商品的识别符,一起将N个商品划分为M个可共享后台属性商品子集。S304: N pieces of goods are collectively divided into M sharable background attribute commodity subsets by each parallel computing computer according to the identifier of each commodity in each parallel computing computer.
具体地,按照商品的识别符将N个商品分配到P台并行运算计算机(每台并行运算计算机中的商品个数小于N,P台并行运算计算机中的商品个数之和为N),每台并行运算计算机与一台计算机划分可共享后台属性商品子集时的方法一样,具体如下:每台并行运算计算机为每台并行运算计算机中的每个商品建一条二元组,其中,二元组的第一个元素为每个商品的识别符、二元组的其他元素为每个商品的身份标识,以及每个商品的后台属性、后台属性的原始属性值;将所有的二元组按照第一个元素进行排序,并将第一个元素相同的二元组聚在一起构成多个二元组集合(每台并行运算计算机得到的二元组集合的个数小于M,P台并行运算计算机得到的二元组集合的个数之和为M),其中,每个二元组集合代表一个可共享后台属性商品子集。Specifically, N products are allocated to the P parallel computing computer according to the identifier of the product (the number of products in each parallel computing computer is less than N, and the sum of the number of products in the P parallel computing computer is N), for each The parallel parallel computing computer has the same method as a computer partitioning the subset of the background attribute goods, as follows: Each parallel computing computer constructs a binary group for each commodity in each parallel computing computer, wherein, binary The first element of the group is the identifier of each item, the other elements of the group are the identity of each item, and the background attribute of each item, the original attribute value of the background attribute; all the groups are The first element is sorted, and the same binary group of the first element is grouped together to form a plurality of binary group sets (the number of binary group sets obtained by each parallel computing computer is less than M, P station parallel operation The sum of the number of sets of binary groups obtained by the computer is M), wherein each set of two sets represents a subset of the shareable background attribute goods.
需要说明的是,由于同一种商品的识别符key是相同的,HC也相同,所以不管是在整体排序还是在切分后的部分排序中,同一种商品都会被分配到同一个可共享后台商品属性子集中。因此,通过任务切分后获得的可共享后台商品属性子集与在单台计算机上整体排序后获得的可共享后台商品属性子集在数量和内容上是一模一样的。也就是说,对于每台并行运算计算机来说,与采用一台计算机实现时的流程是一样的。任务切分不会影响整个流程的正确性。It should be noted that since the identifiers of the same product are the same and the HCs are the same, the same product will be assigned to the same shareable back-end product regardless of whether it is sorted in the whole or in the partial sort after the split. A subset of attributes. Therefore, the subset of shareable background product attributes obtained by task segmentation is identical in quantity and content to the subset of shareable background product attributes obtained after overall sorting on a single computer. That is to say, for each parallel computing computer, the process is the same as when using a computer. Task segmentation does not affect the correctness of the entire process.
S305:通过每台并行运算计算机统计每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数。S305: Count, by each parallel computing computer, the number of occurrences of each original attribute value of a background attribute of the same type of all items included in each of the items in the subset of shareable background attribute items in each parallel operation computer.
S306:通过每台并行运算计算机根据每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值。S306: Determine, by each parallel computing computer, the number of occurrences of each original attribute value of the same type of background attribute of all commodities included in each subset of the shareable background attribute items in each parallel operation computer, and determine each parallel operation Each of the computers can share the corrected attribute value of the same type of background attribute for all items included in the background attribute item subset.
S307:通过每台并行运算计算机将每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值。S307: Modify, by each parallel computing computer, each original attribute value of a background attribute of the same type of all commodities included in each subset of the shareable background attribute items in each parallel operation computer as a correction attribute value.
具体地,通过并行策略可以极大地加速运算,原因如下:1)将全集排序转化为在P台并行运算计算机上的部分排序,不但利用了P台并行运算计算机的并行计算能力,且将计算复杂度极大地降低了。假设全集有N个商品,那么最优的排序算法对全集排序的复杂度是N·logN;而如果转化为P台并行运算计算机的部分排序,假设均匀切分的 话复杂度为P·(N/P)·log(N/P)=N·log(N/P)=N·(logN-logP),因此相比全集排序节省了N·logP的运算量。2)通过P台并行运算计算机并行修改N个商品,对于每台并行运算计算机虽然从整个流程来看没有降低运算复杂度,但由P台并行运算计算机并行修改N个商品,N个商品的总体修改耗时会降为1/P。Specifically, the parallel strategy can greatly accelerate the operation for the following reasons: 1) Convert the ensemble sorting into partial sorting on the P parallel computing computer, which not only utilizes the parallel computing power of the P parallel computing computer, but also complicates the calculation. The degree is greatly reduced. Assuming that there are N commodities in the complete set, the complexity of the optimal sorting algorithm for the corpus sorting is N·logN; and if it is converted to partial sorting of P parallel computing computers, it is assumed that the uniform segmentation is The complexity is P·(N/P)·log(N/P)=N·log(N/P)=N·(logN-logP), so the computational complexity of N·logP is saved compared to the full set ordering. 2) Parallel modification of N commodities by P parallel computing computer. Although each parallel computing computer does not reduce the computational complexity from the whole process, the P parallel computing computer modifies N commodities in parallel, and the total of N commodities. The modification time will be reduced to 1/P.
本实施例所述的校正商品的后台属性的属性值的方法,根据每个商品的识别符,将N个商品划分为M个可共享后台属性商品子集,根据每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值,将每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值,可以自动对商品的后台属性的属性值进行修改,不需要依靠人工完成,可以提高修改效率。通过P台并行运算计算机进行并行修改,可以极大地加速运算,进一步提高修改效率。The method for correcting the attribute value of the background attribute of the product according to the embodiment, according to the identifier of each item, dividing the N items into M subsets of the shareable background attribute, according to each shareable background attribute item The number of occurrences of each original attribute value of the same type of background attribute of all items included in the set, and the corrected attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items, each of which can be Each original attribute value of the same type of background attribute of all items included in the shared background attribute commodity group is modified to the corrected attribute value, and the attribute value of the background attribute of the item can be automatically modified, without manual completion, and the modification can be improved. effectiveness. Parallel modification by P parallel computing computer can greatly accelerate the operation and further improve the modification efficiency.
如图5所示,是本发明实施例的一种校正商品的后台属性的属性值的装置结构图,该装置包括:As shown in FIG. 5, it is an apparatus structure diagram for correcting attribute values of background attributes of an item according to an embodiment of the present invention, and the apparatus includes:
获取模块401,用于获取N个商品中每个商品的识别符;其中,N为自然数;The obtaining module 401 is configured to obtain an identifier of each of the N items, where N is a natural number;
划分模块402,用于根据每个商品的识别符,将N个商品划分为M个可共享后台属性商品子集;其中,M为自然数,M小于N;The dividing module 402 is configured to divide the N items into M subsets of the sharable background attribute according to the identifier of each item; wherein, M is a natural number, and M is less than N;
统计模块403,用于统计每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数;a statistics module 403, configured to count the number of occurrences of each original attribute value of a background attribute of the same type of all items included in each subset of the shareable background attribute items;
确定模块404,用于根据每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值;a determining module 404, configured to determine, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all items included in each of the shareable background attribute item subsets, each of the shareable background attribute item subsets is included Corrected attribute values for background attributes of the same type for all items;
修改模块405,用于将每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。The modifying module 405 is configured to modify each of the original attribute values of the same type of background attributes of all items included in each subset of the shareable background attribute items to a corrected attribute value.
进一步地,每个商品的识别符包括:Further, the identifier of each item includes:
每个商品对应的图片的链接地址、每个商品对应的图片的内容签名、或每个商品的货号。The link address of the picture corresponding to each item, the content signature of the picture corresponding to each item, or the item number of each item.
进一步地,每个商品对应的图片包括:Further, the pictures corresponding to each item include:
每个商品对应的主展示图片、每个商品对应的补充展示图片、每个商品对应的款式色号展示图片、或每个商品对应的细节展示图片。The main display image corresponding to each item, the supplementary display picture corresponding to each item, the style color number display picture corresponding to each item, or the detail display picture corresponding to each item.
进一步地,划分模块402包括:Further, the dividing module 402 includes:
构建单元,用于为N个商品中的每个商品构建一条二元组,其中,二元组的第一个 元素为每个商品的识别符、二元组的其他元素为每个商品的身份标识,以及每个商品的后台属性、所述后台属性的原始属性值;a building unit for constructing a binary group for each of the N items, wherein the first of the two groups The element is an identifier of each item, the other elements of the group are the identity of each item, and the background attribute of each item, the original attribute value of the background attribute;
排序单元,用于将所有的二元组按照第一个元素进行排序,并将第一个元素相同的二元组聚在一起构成M个二元组集合,其中,每个二元组集合代表一个可共享后台属性商品子集。a sorting unit for sorting all the two groups according to the first element, and grouping the same two groups of the first element to form a set of M sets, wherein each set of the two sets represents A subset of items that can share background attributes.
进一步地,确定模块包括:Further, the determining module includes:
第一处理单元,用于将M个可共享后台属性商品子集中的第一个可共享后台属性商品子集作为当前可共享后台属性商品子集;a first processing unit, configured to use the first shareable background attribute product subset of the M shareable background attribute item subsets as a current shareable background attribute product subset;
计算单元,用于将当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,与当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的所有原始属性值的总的出现次数进行比例计算,得到当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的分布比例;a calculating unit, configured to use the same number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, and the same type of all items included in the current shareable background attribute item subset The total number of occurrences of all the original attribute values of the background attribute is proportionally calculated, and the distribution ratio of each original attribute value of the same type of background attribute of all the items included in the current shareable background attribute item subset is obtained;
比较单元,用于将得到的当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的分布比例,与预设的修改比例阈值进行比较;a comparison unit, configured to compare a distribution ratio of each original attribute value of a background attribute of the same type of all items included in the obtained current shareable background attribute item subset with a preset modification ratio threshold;
属性值确定单元,用于如果得到的当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性中存在某个原始属性值的分布比例,大于预设的修改比例阈值,则确定某个原始属性值为当前可共享后台属性商品子集中包括的所有商品的与某个原始属性值对应的后台属性的校正属性值。The attribute value determining unit is configured to determine, if the distribution ratio of the original attribute value exists in the background attribute of the same type of all the items included in the current shareable background attribute item set, which is greater than the preset modification ratio threshold, determine a certain The original attribute value is the corrected attribute value of the background attribute corresponding to a certain original attribute value of all the items included in the current shareable background attribute item subset.
进一步地,统计模块403包括:Further, the statistics module 403 includes:
第二处理单元,用于将M个可共享后台属性商品子集中的第一个可共享后台属性商品子集作为当前可共享后台属性商品子集;a second processing unit, configured to use the first shareable background attribute product subset of the M shareable background attribute item subsets as a current shareable background attribute product subset;
第一判断单元,用于判断当前可共享后台属性商品子集中包括的商品的个数是否大于预设的修改数量阈值;a first determining unit, configured to determine whether the number of items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold;
统计单元,用于如果大于预设的修改数量阈值,则统计当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数;a statistical unit, configured to count, if greater than the preset modified quantity threshold, the number of occurrences of each original attribute value of the same type of background attribute of all commodities included in the current shareable background attribute commodity subset;
相应地,确定模块404包括:Accordingly, the determining module 404 includes:
当前确定单元,用于根据当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定当前可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值;a current determining unit, configured to determine, according to the number of occurrences of each original attribute value of the same type of background attribute of all items included in the current shareable background attribute item subset, the same item of all items included in the current shareable background attribute item subset The corrected attribute value of the background property of the type;
相应地,修改模块405包括:Accordingly, the modification module 405 includes:
当前修改单元,用于将当前可共享后台属性商品子集中包括的所有商品的同一类型 的后台属性的每个原始属性值修改为校正属性值。Current modification unit for the same type of all items included in the current shareable background attribute item subset Each original attribute value of the background attribute is modified to correct the attribute value.
进一步地,统计模块403还包括:Further, the statistics module 403 further includes:
第二判断单元,用于如果小于等于预设的修改数量阈值,则判断所述当前可共享后台属性商品子集是否是第M个可共享后台属性商品子集;a second determining unit, configured to determine, if less than or equal to the preset modified quantity threshold, whether the current shareable background attribute product subset is the Mth shareable background attribute commodity subset;
通知单元,用于如果不是第M个可共享后台属性商品子集,则将当前可共享后台属性商品子集的下一个可共享后台属性商品子集作为当前可共享后台属性商品子集,然而通知第一判断单元执行判断当前可共享后台属性商品子集中包括的商品的个数是否大于预设的修改数量阈值的步骤;a notification unit, configured to: if not the Mth shareable background attribute product subset, the next shareable background attribute item subset of the current shareable background attribute item subset is used as the current shareable background attribute item subset, but the notification The first determining unit performs a step of determining whether the number of items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold;
结束单元,用于如果是第M个可共享后台属性商品子集,则结束。The ending unit is used to end if it is the Mth shareable background attribute product subset.
进一步地,参见图6,该装置还包括:Further, referring to FIG. 6, the apparatus further includes:
映射模块406,用于将N个商品中每个商品的识别符映射为一个整数;a mapping module 406, configured to map an identifier of each of the N items into an integer;
余数计算模块407,用于将每个商品对应的整数对预设的并行运算计算机台数P取余数;其中,P为自然数;a remainder calculation module 407, configured to take an integer corresponding to each commodity to a preset number of parallel computing computer stations P; wherein, P is a natural number;
分配模块408,用于将每个商品分配到所述余数对应的编号的并行运算计算机;An allocation module 408, configured to allocate each item to the parallel computing computer of the number corresponding to the remainder;
相应地,划分模块402包括:P个划分单元402a,每个划分单元402a分别设置在每台并行运算计算机中;Correspondingly, the dividing module 402 includes: P dividing units 402a, each of which is respectively disposed in each parallel computing computer;
P个划分单元402a,用于根据每台并行运算计算机中的每个商品的识别符,一起将N个商品划分为M个可共享后台属性商品子集;The P dividing units 402a are configured to divide the N items into M sub-shared background attribute commodity subsets according to the identifier of each commodity in each parallel computing computer;
相应地,统计模块403包括:P个次数统计单元403a,每个次数统计单元403a分别设置在每台并行运算计算机中;Correspondingly, the statistics module 403 includes: P number of statistics units 403a, each of which is set in each parallel computing computer;
每个次数统计单元403a,分别用于统计每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数;Each number of times statistics unit 403a is configured to count the number of occurrences of each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute commodity subsets in each parallel operation computer;
相应地,确定模块404包括:P个确定单元404a,每个确定单元分别设置在每台并行运算计算机中;Correspondingly, the determining module 404 includes: P determining units 404a, each determining unit is respectively disposed in each parallel computing computer;
每个确定单元,分别用于根据每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值;Each determining unit is configured to determine each parallel operation according to the number of occurrences of each original attribute value of the same type of background attribute of all the items included in each of the shareable background attribute items in each parallel computing computer Each of the computers can share a corrected attribute value of the same type of background attribute of all items included in the background attribute item subset;
相应地,修改模块405包括:P个修改单元405a,每个修改单元分别设置在每台并行运算计算机中;Correspondingly, the modifying module 405 includes: P modifying units 405a, each of which is separately disposed in each parallel computing computer;
每个修改单元405a,分别用于将每台并行运算计算机中的每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值。 Each modification unit 405a is respectively configured to modify each original attribute value of the same type of background attribute of all items included in each of the shareable background attribute item subsets in each parallel operation computer to a correction attribute value.
本实施例所述的校正商品的后台属性的属性值的装置,根据每个商品的识别符,将N个商品划分为M个可共享后台属性商品子集,根据每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值的出现次数,确定每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的校正属性值,将每个可共享后台属性商品子集中包括的所有商品的同一类型的后台属性的每个原始属性值修改为校正属性值,可以自动对商品的后台属性的属性值进行修改,不需要依靠人工完成,可以提高修改效率。通过P台并行运算计算机进行并行修改,可以极大地加速运算,进一步提高修改效率。The apparatus for correcting the attribute value of the background attribute of the product according to the embodiment, according to the identifier of each item, dividing the N items into M subsets of the shareable background attribute, according to each shareable background attribute item The number of occurrences of each original attribute value of the same type of background attribute of all items included in the set, and the corrected attribute value of the same type of background attribute of all items included in each subset of the shareable background attribute items, each of which can be Each original attribute value of the same type of background attribute of all items included in the shared background attribute commodity group is modified to the corrected attribute value, and the attribute value of the background attribute of the item can be automatically modified, without manual completion, and the modification can be improved. effectiveness. Parallel modification by P parallel computing computer can greatly accelerate the operation and further improve the modification efficiency.
所述装置与前述的方法流程描述对应,不足之处参考上述方法流程的叙述,不再一一赘述。The device corresponds to the foregoing method flow description, and the deficiencies refer to the description of the above method flow, and will not be further described.
上述说明示出并描述了本发明的若干优选实施例,但如前所述,应当理解本发明并非局限于本文所披露的形式,不应看作是对其他实施例的排除,而可用于各种其他组合、修改和环境,并能够在本文所述发明构想范围内,通过上述教导或相关领域的技术或知识进行改动。而本领域人员所进行的改动和变化不脱离本发明的精神和范围,则都应在本发明所附权利要求的保护范围内。 The above description illustrates and describes several preferred embodiments of the present invention, but as described above, it should be understood that the invention is not limited to the forms disclosed herein, and should not be construed as Other combinations, modifications, and environments are possible and can be modified by the above teachings or related art or knowledge within the scope of the inventive concept described herein. All changes and modifications made by those skilled in the art are intended to be within the scope of the appended claims.

Claims (16)

  1. 一种校正商品的后台属性的属性值的方法,其特征在于,所述方法包括:A method for correcting attribute values of a background attribute of an item, the method comprising:
    获取N个商品中每个所述商品的识别符;其中,所述N为自然数;Obtaining an identifier of each of the N items; wherein the N is a natural number;
    根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集;其中,所述M为自然数,所述M小于所述N;According to the identifier of each of the commodities, the N items are divided into M shareable background attribute product subsets; wherein, the M is a natural number, and the M is smaller than the N;
    统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数;Counting the number of occurrences of each original attribute value of the same type of background attribute of all of the items included in each of the shareable background attribute item subsets;
    根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;Determining, according to the number of occurrences of each of the original attribute values of the same type of background attributes of all the commodities included in each of the shareable background attribute item subsets, determining each of the shareable background attribute item subsets included Corrected attribute values for the same type of background attributes of all of the items;
    将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值。Each of the original attribute values of the same type of background attribute of all of the items included in each of the subsets of the shareable background attribute items is modified to the corrected attribute value.
  2. 如权利要求1所述的方法,其特征在于,每个所述商品的识别符包括:The method of claim 1 wherein the identifier of each of said merchandise comprises:
    每个所述商品对应的图片的链接地址、每个所述商品对应的图片的内容签名、或每个所述商品的货号。a link address of a picture corresponding to each of the items, a content signature of a picture corresponding to each of the items, or a item number of each of the items.
  3. 如权利要求2所述的方法,其特征在于,每个所述商品对应的图片包括:The method of claim 2 wherein the picture corresponding to each of said items comprises:
    每个所述商品对应的主展示图片、每个所述商品对应的补充展示图片、每个所述商品对应的款式色号展示图片、或每个所述商品对应的细节展示图片。a main display image corresponding to each of the commodities, a supplementary display image corresponding to each of the commodities, a style color display image corresponding to each of the commodities, or a detail display image corresponding to each of the commodities.
  4. 如权利要求1所述的方法,其特征在于,根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集,包括:The method of claim 1, wherein dividing the N items into M subsets of shareable background attributes according to the identifier of each of the items comprises:
    为N个所述商品中的每个所述商品构建一条二元组,其中,所述二元组的第一个元素为每个所述商品的识别符、所述二元组的其他元素为每个所述商品的身份标识,以及每个所述商品的后台属性、所述后台属性的原始属性值;Constructing a binary group for each of the N of the commodities, wherein the first element of the binary group is an identifier of each of the commodities, and other elements of the binary group are An identity of each of the items, and a background attribute of each of the items, and an original attribute value of the background attribute;
    将所有的所述二元组按照所述第一个元素进行排序,并将所述第一个元素相同的所述二元组聚在一起构成M个二元组集合,其中,每个所述二元组集合代表一个所述可共享后台属性商品子集。Sorting all of the two groups according to the first element, and grouping the two groups of the same element to form a set of M groups, wherein each of the The set of tuples represents a subset of the shareable background attribute items.
  5. 如权利要求1所述的方法,其特征在于,根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值,包括:The method of claim 1 wherein determining the number of occurrences of each of said original attribute values for a background attribute of the same type of all of said items of said plurality of said items in said subset of shareable background attributes Corrected attribute values of the same type of background attributes of all of the items included in each of the shareable background attribute merchandise subsets, including:
    将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集; Taking the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
    将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,与当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的所有所述原始属性值的总的出现次数进行比例计算,得到当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例;And the number of occurrences of each of the original attribute values of the same type of background attribute of all the items included in the current subset of the shareable background attribute items, and all the items included in the current subset of the shareable background attribute items Performing a proportional calculation on the total number of occurrences of all the original attribute values of the same type of background attribute of the item, and obtaining each of the same type of background attributes of all the items included in the current subset of the shareable background attribute items a distribution ratio of the original attribute values;
    将得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例,与预设的修改比例阈值进行比较;And comparing the distribution ratio of each of the original attribute values of the same type of background attributes of all the commodities included in the current subset of the shareable background attribute items to a preset modification ratio threshold;
    如果得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性中存在某个所述原始属性值的分布比例,大于预设的修改比例阈值,则确定某个所述原始属性值为当前所述可共享后台属性商品子集中包括的所有所述商品的与某个原始属性值对应的后台属性的校正属性值。If there is a distribution ratio of the original attribute value in the background attribute of the same type of all the commodities included in the current share of the shareable background attribute item, which is greater than a preset modification ratio threshold, determine a certain The original attribute value is a corrected attribute value of a background attribute corresponding to a certain original attribute value of all the commodities included in the current subset of the shareable background attribute.
  6. 如权利要求1所述的方法,其特征在于,统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数,包括:The method of claim 1, wherein counting the number of occurrences of each of the original attribute values of the same type of background attribute of all of the items included in each of the shareable background attribute items includes:
    将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集;Taking the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
    判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值;Determining whether the number of the products included in the subset of the tradable background attribute items is greater than a preset threshold of the modified quantity;
    如果大于预设的修改数量阈值,则统计当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;If it is greater than the preset modified quantity threshold, counting the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item;
    相应地,根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值,包括:Correspondingly, each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
    根据当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;Determining, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item, all the places included in the current share of the shareable background attribute a corrected attribute value of a background attribute of the same type of the item;
    相应地,将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值,包括:Correspondingly, modifying each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items into the corrected attribute values comprises:
    将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。Each of the original attribute values of the same type of background attribute of all of the items included in the subset of the currently shareable background attribute items is modified to a corrected attribute value.
  7. 如权利要求6所述的方法,其特征在于,判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值之后,还包括:The method of claim 6, wherein the determining whether the number of the products included in the subset of the shareable background attribute items is greater than a preset number of modification thresholds further includes:
    如果小于等于预设的修改数量阈值,则判断所述当前可共享后台属性商品子集是否 是第M个所述可共享后台属性商品子集;If the preset modification quantity threshold is less than or equal to, the judging whether the current shareable background attribute product subset is Is the Mth subset of the shareable background attribute products;
    如果不是第M个所述可共享后台属性商品子集,则将当前所述可共享后台属性商品子集的下一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集,然而执行判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值的步骤;If it is not the Mth shareable background attribute product subset, the next subset of the shareable background attribute item of the current shareable background attribute item subset is used as the current shareable background attribute item subset And performing the step of determining whether the number of the commodities included in the subset of the shareable background attribute items is greater than a preset modification quantity threshold;
    如果是第M个所述可共享后台属性商品子集,则结束。If it is the Mth of the shareable background attribute product subset, it ends.
  8. 如权利要求1-7任一权利要求所述的方法,其特征在于,获取N个商品中每个所述商品的识别符之后,还包括:The method according to any one of claims 1 to 7, wherein after obtaining the identifier of each of the N items, the method further comprises:
    将N个所述商品中每个所述商品的识别符映射为一个整数;Mapping an identifier of each of the N of the commodities to an integer;
    将每个所述商品对应的整数对预设的并行运算计算机台数P取余数;其中,所述P为自然数;Taking an integer corresponding to each of the commodities to a preset number of parallel computing computer stations P; wherein the P is a natural number;
    将每个所述商品分配到所述余数对应的编号的并行运算计算机;Assigning each of the commodities to a parallel computing computer of the number corresponding to the remainder;
    相应地,根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集,包括:Correspondingly, according to the identifier of each of the commodities, dividing the N items into M subsets of shareable background attribute products, including:
    通过每台所述并行运算计算机根据每台所述并行运算计算机中的每个所述商品的识别符,一起将N个所述商品划分为M个所述可共享后台属性商品子集;Determining, by each of the parallel computing computers, the N items into the M subsets of the shareable background attribute products according to the identifier of each of the commodities in each of the parallel computing computers;
    相应地,统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数,包括:Correspondingly, counting the number of occurrences of each original attribute value of the same type of background attribute of all the items included in each of the shareable background attribute items, including:
    通过每台所述并行运算计算机统计每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;Counting, by each of the parallel computing computers, each of the original attribute values of the same type of background attribute of all of the items included in each of the subset of the shareable background attribute items in each of the parallel operation computers The number of occurrences;
    相应地,根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值,包括:Correspondingly, each of the shareable background attribute items is determined according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for the same type of background attribute of all of the items included in the set, including:
    通过每台所述并行运算计算机根据每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;And each of said original attribute values of the same type of background attribute of all said items included in each of said plurality of said shareable background attribute items in each of said parallel computing computers is passed by each said parallel computing computer a number of occurrences of determining a corrected attribute value of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers;
    相应地,将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值,包括:Correspondingly, modifying each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items into the corrected attribute values comprises:
    通过每台所述并行运算计算机将每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值 修改为校正属性值。Each of the original attribute values of the same type of background attribute of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers by each of the parallel computing computers Modified to correct the attribute value.
  9. 一种校正商品的后台属性的属性值的装置,其特征在于,所述装置包括:An apparatus for correcting an attribute value of a background attribute of a commodity, wherein the apparatus comprises:
    获取模块,用于获取N个商品中每个所述商品的识别符;其中,所述N为自然数;An obtaining module, configured to obtain an identifier of each of the N items, wherein the N is a natural number;
    划分模块,用于根据每个所述商品的识别符,将N个所述商品划分为M个可共享后台属性商品子集;其中,所述M为自然数,所述M小于所述N;a dividing module, configured to divide the N items into M sharable background attribute product subsets according to an identifier of each of the commodities; wherein, the M is a natural number, and the M is smaller than the N;
    统计模块,用于统计每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个原始属性值的出现次数;a statistics module, configured to count the number of occurrences of each original attribute value of the same type of background attribute of all the commodities included in each of the shareable background attribute commodity subsets;
    确定模块,用于根据每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;a determining module, configured to determine each of the sharable background attributes according to an occurrence number of each of the original attribute values of a background attribute of a same type of all the commodities included in each of the shareable background attribute item subsets Corrected attribute values for background attributes of the same type for all of the items included in the subset of items;
    修改模块,用于将每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为所述校正属性值。And a modifying module, configured to modify each of the original attribute values of the same type of background attributes of all the items included in each of the subsets of the shareable background attribute items to the corrected attribute values.
  10. 如权利要求9所述的装置,其特征在于,每个所述商品的识别符包括:The apparatus of claim 9 wherein the identifier of each of said merchandise comprises:
    每个所述商品对应的图片的链接地址、每个所述商品对应的图片的内容签名、或每个所述商品的货号。a link address of a picture corresponding to each of the items, a content signature of a picture corresponding to each of the items, or a item number of each of the items.
  11. 如权利要求10所述的装置,其特征在于,每个所述商品对应的图片包括:The device according to claim 10, wherein each of the pictures corresponding to the item comprises:
    每个所述商品对应的主展示图片、每个所述商品对应的补充展示图片、每个所述商品对应的款式色号展示图片、或每个所述商品对应的细节展示图片。a main display image corresponding to each of the commodities, a supplementary display image corresponding to each of the commodities, a style color display image corresponding to each of the commodities, or a detail display image corresponding to each of the commodities.
  12. 如权利要求9所述的装置,其特征在于,所述划分模块包括:The device of claim 9, wherein the dividing module comprises:
    构建单元,用于为N个所述商品中的每个所述商品构建一条二元组,其中,所述二元组的第一个元素为每个所述商品的识别符、所述二元组的其他元素为每个所述商品的身份标识,以及每个所述商品的后台属性、所述后台属性的原始属性值;a building unit, configured to construct a dual group for each of the N of the commodities, wherein a first element of the binary group is an identifier of each of the commodities, the binary The other elements of the group are the identity of each of the items, and the background attribute of each of the items, the original attribute value of the background attribute;
    排序单元,用于将所有的所述二元组按照所述第一个元素进行排序,并将所述第一个元素相同的所述二元组聚在一起构成M个二元组集合,其中,每个所述二元组集合代表一个所述可共享后台属性商品子集。a sorting unit, configured to sort all the two groups according to the first element, and group the two groups of the same element to form a set of M groups, wherein Each of the set of binary groups represents a subset of the shareable background attribute items.
  13. 如权利要求9所述的装置,其特征在于,所述确定模块包括:The apparatus of claim 9, wherein the determining module comprises:
    第一处理单元,用于将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前可共享后台属性商品子集;a first processing unit, configured to use the first subset of the sharable background attribute items in the subset of the sharable background attribute items as a subset of the current shareable background attribute items;
    计算单元,用于将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,与当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的所有所述原始属性值的总的出现次数进行比例计算,得到当前所述可共享后台属性商品子集中包括的所有所述商品的同一 类型的后台属性的每个所述原始属性值的分布比例;a calculating unit, configured to display the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the subset of the currently shareable background attribute items, and the current shareable background attribute item Performing a proportional calculation on the total number of occurrences of all the original attribute values of the same type of background attribute of all the commodities included in the set, and obtaining the same of all the commodities included in the current subset of the shareable background attribute The distribution ratio of each of the original attribute values of the background attribute of the type;
    比较单元,用于将得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的分布比例,与预设的修改比例阈值进行比较;a comparing unit, configured to perform a distribution ratio of each of the original attribute values of the same type of background attributes of all the products included in the current subset of the shareable background attribute items, and a preset modification ratio threshold Comparison
    属性值确定单元,用于如果得到的当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性中存在某个所述原始属性值的分布比例,大于预设的修改比例阈值,则确定某个所述原始属性值为当前所述可共享后台属性商品子集中包括的所有所述商品的与某个原始属性值对应的后台属性的校正属性值。The attribute value determining unit is configured to: if a distribution ratio of the original attribute value exists in the background attribute of the same type of all the commodities included in the current subset of the shareable background attribute items, which is greater than a preset modification And a proportional threshold, determining that the original attribute value is a corrected attribute value of a background attribute corresponding to a certain original attribute value of all the commodities included in the current subset of the shareable background attribute commodity.
  14. 如权利要求9所述的装置,其特征在于,所述统计模块包括:The device of claim 9, wherein the statistical module comprises:
    第二处理单元,用于将M个所述可共享后台属性商品子集中的第一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集;a second processing unit, configured to use the first subset of the sharable background attribute items in the M subset of the shareable background attribute items as the current subset of the shareable background attribute items;
    第一判断单元,用于判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值;a first determining unit, configured to determine whether the number of the products included in the subset of the tradable background attribute items is greater than a preset threshold of the modified quantity;
    统计单元,用于如果大于预设的修改数量阈值,则统计当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;a statistical unit, configured to: if greater than the preset modified quantity threshold, count the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute commodity;
    相应地,所述确定模块包括:Correspondingly, the determining module comprises:
    当前确定单元,用于根据当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;a current determining unit, configured to determine, according to the number of occurrences of each of the original attribute values of the same type of background attribute of all the commodities included in the current share of the shareable background attribute item, the current shareable background attribute item Corrected attribute values for the same type of background attributes of all of the items included in the subset;
    相应地,所述修改模块包括:Correspondingly, the modifying module comprises:
    当前修改单元,用于将当前所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。And a current modification unit, configured to modify each of the original attribute values of the same type of background attributes of all the items included in the current subset of the shareable background attribute items to a corrected attribute value.
  15. 如权利要求14所述的装置,其特征在于,所述统计模块还包括:The device of claim 14, wherein the statistics module further comprises:
    第二判断单元,用于如果小于等于预设的修改数量阈值,则判断所述当前可共享后台属性商品子集是否是第M个所述可共享后台属性商品子集;a second determining unit, configured to determine, if less than or equal to the preset modified quantity threshold, whether the current shareable background attribute product subset is the Mth shareable background attribute product subset;
    通知单元,用于如果不是第M个所述可共享后台属性商品子集,则将当前所述可共享后台属性商品子集的下一个所述可共享后台属性商品子集作为当前所述可共享后台属性商品子集,然而通知所述第一判断单元执行判断当前所述可共享后台属性商品子集中包括的所述商品的个数是否大于预设的修改数量阈值的步骤;a notification unit, configured to use, if not the Mth, the subset of the shareable background attribute items, the next subset of the shareable background attribute items of the current shareable background attribute item subset as the current shareable a background attribute item subset, but notifying the first determining unit to perform a step of determining whether the number of the items included in the current shareable background attribute item subset is greater than a preset modification quantity threshold;
    结束单元,用于如果是第M个所述可共享后台属性商品子集,则结束。The ending unit is configured to end if it is the Mth subset of the shareable background attribute items.
  16. 如权利要求9-15任一权利要求所述的装置,其特征在于,所述装置还包括: The device according to any one of claims 9-15, wherein the device further comprises:
    映射模块,用于将N个所述商品中每个所述商品的识别符映射为一个整数;a mapping module, configured to map an identifier of each of the N items into an integer;
    余数计算模块,用于将每个所述商品对应的整数对预设的并行运算计算机台数P取余数;其中,所述P为自然数;a remainder calculation module, configured to take an integer corresponding to each of the commodities to a preset number of parallel computing computer stations P; wherein, the P is a natural number;
    分配模块,用于将每个所述商品分配到所述余数对应的编号的并行运算计算机;An allocation module, configured to allocate each of the commodities to a parallel computing computer of a number corresponding to the remainder;
    相应地,所述划分模块包括:P个划分单元,每个所述划分单元分别设置在每台所述并行运算计算机中;Correspondingly, the dividing module comprises: P dividing units, each of the dividing units being respectively disposed in each of the parallel computing computers;
    P个所述划分单元,用于根据每台所述并行运算计算机中的每个所述商品的识别符,一起将N个所述商品划分为M个所述可共享后台属性商品子集;And the P dividing units are configured to divide the N items into M subsets of the shareable background attribute products according to an identifier of each of the commodities in each of the parallel computing computers;
    相应地,所述统计模块包括:P个次数统计单元,每个所述次数统计单元分别设置在每台所述并行运算计算机中;Correspondingly, the statistic module includes: P number of statistics units, each of the number of statistic units is respectively disposed in each of the parallel computing computers;
    每个所述次数统计单元,分别用于统计每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数;Each of the number of statistics units is configured to count each of the original attributes of the same type of background attributes of all of the items included in each of the subset of the shareable background attribute items in each of the parallel operation computers The number of occurrences of the attribute value;
    相应地,所述确定模块包括:P个确定单元,每个所述确定单元分别设置在每台所述并行运算计算机中;Correspondingly, the determining module includes: P determining units, each of the determining units being respectively disposed in each of the parallel computing computers;
    每个所述确定单元,分别用于根据每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值的出现次数,确定每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的校正属性值;Each of the determining units is configured to each of the original attributes of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers a number of occurrences of the value, determining a corrected attribute value of a background attribute of the same type of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers;
    相应地,所述修改模块包括:P个修改单元,每个所述修改单元分别设置在每台所述并行运算计算机中;Correspondingly, the modifying module includes: P modifying units, each of the modifying units being respectively disposed in each of the parallel computing computers;
    每个所述修改单元,分别用于将每台所述并行运算计算机中的每个所述可共享后台属性商品子集中包括的所有所述商品的同一类型的后台属性的每个所述原始属性值修改为校正属性值。 Each of the modifying units is configured to respectively use each of the original attributes of the same type of background attribute of all of the items included in each of the subset of the shareable background attribute items in each of the parallel computing computers The value is modified to correct the attribute value.
PCT/CN2016/075938 2015-03-18 2016-03-09 Method and device for correcting attribute values of commodity background attribute WO2016146005A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510119332.6 2015-03-18
CN201510119332.6A CN106033456B (en) 2015-03-18 2015-03-18 The method and apparatus for correcting the attribute value of the backstage attribute of commodity

Publications (1)

Publication Number Publication Date
WO2016146005A1 true WO2016146005A1 (en) 2016-09-22

Family

ID=56918391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/075938 WO2016146005A1 (en) 2015-03-18 2016-03-09 Method and device for correcting attribute values of commodity background attribute

Country Status (2)

Country Link
CN (1) CN106033456B (en)
WO (1) WO2016146005A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903105B (en) * 2017-12-08 2021-11-30 北京京东尚科信息技术有限公司 Method and device for perfecting target commodity attributes
CN115063211B (en) * 2022-08-16 2022-11-11 华能能源交通产业控股有限公司 Method and device for acquiring commodity attribute data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6988090B2 (en) * 2000-10-25 2006-01-17 Fujitsu Limited Prediction analysis apparatus and program storage medium therefor
CN102043763A (en) * 2009-10-23 2011-05-04 北大方正集团有限公司 Method and device for automatically checking names
CN103019398A (en) * 2011-09-20 2013-04-03 腾讯科技(深圳)有限公司 Character input method and character input device
CN103890762A (en) * 2011-11-30 2014-06-25 乐天株式会社 Information processing device, information processing method, information processing program, and recording medium
CN104391934A (en) * 2014-11-21 2015-03-04 深圳市银雁金融配套服务有限公司 Data calibration method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103578015A (en) * 2012-08-07 2014-02-12 阿里巴巴集团控股有限公司 Method and device for achieving commodity attribute navigation
CN103744920A (en) * 2013-12-27 2014-04-23 苏州大学 Commodity attribute name-value pair extraction method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6988090B2 (en) * 2000-10-25 2006-01-17 Fujitsu Limited Prediction analysis apparatus and program storage medium therefor
CN102043763A (en) * 2009-10-23 2011-05-04 北大方正集团有限公司 Method and device for automatically checking names
CN103019398A (en) * 2011-09-20 2013-04-03 腾讯科技(深圳)有限公司 Character input method and character input device
CN103890762A (en) * 2011-11-30 2014-06-25 乐天株式会社 Information processing device, information processing method, information processing program, and recording medium
CN104391934A (en) * 2014-11-21 2015-03-04 深圳市银雁金融配套服务有限公司 Data calibration method and device

Also Published As

Publication number Publication date
CN106033456A (en) 2016-10-19
CN106033456B (en) 2019-10-08

Similar Documents

Publication Publication Date Title
US20200125571A1 (en) Systems and methods for compressing and extracting information from marketplace taxonomies
US9372928B2 (en) System and method for parallel search on explicitly represented graphs
US8990209B2 (en) Distributed scalable clustering and community detection
US20200293917A1 (en) Enhancement of machine learning-based anomaly detection using knowledge graphs
US10169810B2 (en) Product information inconsistency detection
US9607098B2 (en) Determination of product attributes and values using a product entity graph
TWI686703B (en) Method and device for data storage and business processing
US20210232611A1 (en) Systems and methods for high efficiency data querying
CN107229747A (en) A kind of large-scale data processing unit and method based on Stream Processing framework
CN112258301B (en) Commodity recommendation method, commodity recommendation device, commodity recommendation system, readable storage medium and electronic equipment
WO2023143640A9 (en) Query understanding method and apparatus for search intention, and storage medium and electronic device
WO2016146005A1 (en) Method and device for correcting attribute values of commodity background attribute
CN107832446B (en) Configuration item information searching method and computing device
US10497039B1 (en) Techniques for dynamic variations of a search query
EP4012573A1 (en) Graph reconstruction method and apparatus
WO2023098634A1 (en) Information processing method and apparatus
US11386153B1 (en) Flexible tagging and searching system
US20230063599A1 (en) Edge computing network, data transmission method and apparatus, device and storage medium
US11368408B2 (en) Dynamic visualization of requests traveling through a microservice mesh
CN107844490A (en) A kind of database divides storehouse method and device
CN115170221A (en) Commodity information aggregation method and device and electronic equipment
US11281657B2 (en) Event-driven identity graph conflation
CN111915391A (en) Commodity data processing method and device and electronic equipment
US20190095948A1 (en) Promotion compliance systems and methods
US20190317963A1 (en) Index for traversing hierarchical data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16764186

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16764186

Country of ref document: EP

Kind code of ref document: A1