CN103853710A - Coordinated training-based dual-language named entity identification method - Google Patents
Coordinated training-based dual-language named entity identification method Download PDFInfo
- Publication number
- CN103853710A CN103853710A CN201310593746.3A CN201310593746A CN103853710A CN 103853710 A CN103853710 A CN 103853710A CN 201310593746 A CN201310593746 A CN 201310593746A CN 103853710 A CN103853710 A CN 103853710A
- Authority
- CN
- China
- Prior art keywords
- named entity
- bilingual
- language
- mark
- target language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses a dual-language coordinated training-based named entity identification method, and belongs to the technical field of natural language processing in computer science. Parallel Chinese and English sentence datasets are considered as two different view of a dataset for dual-language coordinated training, a log-linear model is used for correcting projection marks in a projection process, and named entity dual-language aligned annotation consistency is introduced as a measurement index for mark confidence estimation when the model is used for predicting an unseen case. Compared with the prior art, the method has the advantages that the domain dependence of named entity identification is reduced, the advantages of dual-language identification are fused, the problem of partial identification ambiguity in single-language identification is solved, and the method is particularly suitable for the dual-language named entity synchronous identification of large-scale language materials.
Description
Technical field
The present invention relates to a kind of recognition methods of bilingual named entity, be particularly useful for as processing the early stage of mechanical translation, extensive cross-cutting bilingual corpora is carried out to the identification of named entity, belong to natural language processing (NLP) technical field in computer science.
Background technology
Named entity is the proprietary name of unique individuality.Named entity recognition is an important foundation technical barrier in natural language processing field, has become one of the technical bottleneck in the multi-language information processing such as cross-language information retrieval and mechanical translation field.
At present, researchist has developed a lot of models for named entity recognition.Wherein, because rule-based method is unfavorable for promoting between variety classes language, in the last few years, the method based on statistics had been subject to extensive concern.In statistical method, supervised learning method has good performance in named entity recognition task, but it has two weak points: one, and the method needs a large amount of labeled data to ensure the accuracy of study, is therefore unsuitable for the language that those resources are relatively poor; Its two, in the time that existing labeled data and data to be determined do not belong to same field, the performance of supervised learning method can obviously decline.Unsupervised method performance is unsatisfactory.Improving these not enough methods is exactly in conjunction with a small amount of mark language material and a large amount of un-annotated datas, adopts the coorinated training method based on semi-supervised learning.
Summary of the invention
The object of the invention is, in order to overcome prior art solving the deficiency in bilingual named entity recognition in extensive cross-cutting language material, to propose a kind of bilingual named entity recognition method based on coorinated training.
The technical solution adopted in the present invention is: by these two data sets of parallel Chinese-English bilingual sentence, two different views regarding a data set as carry out bilingual coorinated training.At Chinese-English two ends, on a small amount of labeled data, carry out respectively initial marking model training, produce two initiation sequence marking model.The initiation sequence marking model that utilization trains is carried out named entity mark to cross-cutting fraction un-annotated data, then annotation results is projected to another corresponding language end.In projection process, use a log-linear model, merging single language syntactic feature and bilingual alignment feature revises projection mark, thereby reduce the possibility of mark example mistake mark, reduce the noise of another one sequence labelling model and introduce, and then improve the quality of coorinated training.In the time utilizing sequence labelling model to predict having no example, introduce the named entity bilingual alignment mark concordance rate measurement index that degree of confidence estimates that serves as a mark, implicit expression is estimated mark degree of confidence, using not marking mark set that the mark of bilingual alignment in sample concordance rate the is the highest increment mark as the other end, break away from thus the dependence to small sample flag data, improve the generalization ability of algorithm, thereby improved the cross-cutting recognition capability of named entity.
For the bilingual collaborative identification mission of named entity is carried out smoothly, this method will adopt three steps, respectively: marking model initialization, bilingual coorinated training, bilingual named entity mark.As shown in Figure 1, specific implementation process is as follows:
Step 1, initialization sequence marking model, marked corpus Chinese-English sentence level alignment some and closed and train respectively initiation sequence marking model.Wherein, sequence labelling model can be selected condition random field (CRF), maximum entropy etc.
Step 2, as shown in Figure 2 extracts the sentence of some alignment from the un-annotated data set of Chinese-English sentence level alignment, utilizes sequence labelling model to mark respectively bilingual sentence, forms
calculate
bilingual mark concordance rate, the set of initialization mark language material increment is empty.
Described bilingual mark concordance rate refers on a small amount of bilingual un-annotated data, by the consistent ratio of mark of the alignment words after sequence labelling model mark.
The set of described mark language material increment refers in the time completing a coorinated training, adds the automatic marking language material of another model as mark language material to.
Concrete, at random from
the sentence of middle extraction 10% is right, forms
according to word alignment from
arrive
mark projection.First the named entity projected area from source language to target language is expanded, made it to hold more target language named entity hypothesis.Then merge single language feature of target language named entity and the alignment feature of bilingual named entity, set up a log-linear model projection result is revised.Revised result, as mark language material increment, re-starts model training.Model after training is again right
mark, recalculate bilingual mark concordance rate, so circulate 10 times, finally when the highest bilingual mark concordance rate corresponding mark language material increment as the source language end mark language material increment of this coorinated training.Same method is found the increment mark language material of target language end.
Single language feature of described named entity refers to the boundary combinations feature of single language end named entity, is mainly used in ensureing that increment mark language material in coorinated training meets the feature of named entity.
The alignment feature of described bilingual named entity refers to the consistance of bilingual named entity, takes full advantage of bilingual identification complementarity.
Step 3, circulation execution step two, by experiment on exploitation collection, until algorithm convergence.After circulation finishes, finally produce two bilingual sequence labelling models, the bilingual Named Entity Extraction Model training.Then large-scale cross-cutting bilingual corpora is carried out to the identification of named entity, further build named entity dictionary; Also can directly carry out the identification of named entity to single sentence to be translated, improve the quality of mechanical translation.
Beneficial effect
The present invention, by introduced the thought of coorinated training in the training process of the sequence labelling model of named entity, utilizes the complementarity of bilingual named entity recognition and the intertranslation of named entity, carries out the coorinated training of model of cognition.This method contrast prior art, can realize the identification complementation of bilingual named entity, improves recognition correct rate and the recall rate of named entity in extensive cross-cutting language material; Effectively reduce named entity recognition the field of mark language material is relied on, make model there is stronger generalization ability; The present invention produces bilingual Named Entity Extraction Model simultaneously, and the introducing of coorinated training improves the bilingual identification consistance of named entity, contributes to the structure of further named entity dictionary.Comprehensively above-mentioned, the present invention is especially suitable for the consistent identification of bilingual named entity in extensive cross-cutting language material.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method;
Fig. 2 is the schematic flow sheet of coorinated training process in the inventive method.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further details.
Based on a bilingual named entity recognition method for coorinated training, comprise the following steps:
Step 1, the bilingual sequence labelling model of initialization are trained respectively Chinese-English sequence labelling model: Cmodel (s) and Cmodel (t) on the language material of mark set Ls, the Lt of Chinese-English sentence level alignment.Mark has marked three kinds of named entities in language material altogether, is respectively PER(name), LOC(place name) and ORG(organizational structure name).Selected the set of BIO mark, all words have 7 kinds of mark: B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG and O.Word or the word combination feature of one word feature, single word feature, a 2-3 position selected in Chinese; Word, part of speech, initial letter capital and small letter Feature Combination masterplate selected in English.
The sentence that extracts 1000 alignment step 2, the un-annotated data set Us aliging from Chinese-English sentence level and Ut, utilizes respectively sequence labelling MODEL C model (s) and Cmodel (t) to mark, and forms
with
calculate bilingual mark concordance rate conformity_ration
, initialization
the set of initialization mark language material increment is empty,
In bilingual named entity coorinated training process, once certain increment mark selects mistake to make mistakes, this mistake will will further be learnt and be strengthened, and cause the hydraulic performance decline of coorinated training algorithm.This just needs coorinated training algorithm to take effective measures to prevent noise data from introducing.Named entity possesses intertranslation, and the Chinese-English named entity of correct identification should have the consistance of mark.Therefore, using the mark concordance rate that aligns as the measurement index of selecting increment mark.The calculating of alignment mark concordance rate as formula (1) as shown in:
Wherein,
, (ws
i, wt
j)
krepresent the right the k(1≤k≤K of parallel sentence) to word pair; T (ws
i), T (wt
j) represent respectively the mark at the Chinese-English two ends of named entity; U represents un-annotated data collection; N represents the sentence number in U.Because Chinese and english has larger difference on word order, in the time calculating alignment mark concordance rate, the difference of ignore-tag " B " and " I ", thinks that they are identical marks.
At random from
100 sentences of middle extraction are to forming
according to word alignment from
arrive
mark projection.Language difference between Chinese-English is larger, is only projected and is obtained target language named entity by mark, and result has part not fully up to expectations.By merging single language feature of target language named entity and the alignment feature of bilingual named entity, projection result is revised.First the named entity projected area from source language to target language is expanded, made it to hold more target language named entity hypothesis, bilingual named entity hypothesis of each named entity projection hypothesis and source language named entity composition.
in any one named entity be expressed as
by word project obtain target language end continuous and the centre word piece that comprises center of projection word as minimum candidate region
the projected area that comprises all projection words
two ends respectively to 4 words of external expansion (arrive beginning of the sentence or sentence tail may be less than 4 words) as Maximum alternative region.
At target language end, set up a sliding window, from minimum candidate region, constantly expand word to any side of sentence, until reach Maximum alternative zone boundary, thus expansion produces a series of target language end candidate named entity hypothesis.Each target language end named entity hypothesis with
combination, forms a bilingual named entity hypothesis, is expressed as
Then, construct a log-linear model, merge the syntax degree of confidence of target language named entity and the alignment degree of confidence of bilingual named entity, to all comprehensive marking of bilingual named entity hypothesis.Wherein named entity list statement method degree of confidence.Meet the syntactic feature of named entity in order to ensure the projection of target language end named entity, select the named entity syntax degree of confidence of border, left and right distribution probability as target language.Border distribution probability comprises left margin binary part of speech co-occurrence frequency and right margin binary part of speech co-occurrence frequency.Left margin binary part of speech co-occurrence frequency definition as formula (2) as shown in:
The definition of right margin binary part of speech co-occurrence frequency as formula (3) as shown in:
Wherein, the t in formula
i, t
i-1, t
i+1represent respectively border word w
ipart of speech, border word w
iprevious word w
i-1part of speech and border word w
ia rear word w
i+1part of speech; Count (*, *, *) represents named entity border word w in corpus
ithe number of times that occurs of binary part of speech combination; Count (rw
i) and count (lw
i) represent respectively the number of times that border, left and right occurs in language material.Data smoothing is processed and is used Katz back-off, computing method as formula (4) as shown in:
Merge left and right boundary information, the calculating of single statement method degree of confidence of projection named entity as formula (5) as shown in:
Maximum entropy model can merge dissimilar feature, for the alignment degree of confidence of bilingual named entity
make fundamental function
Utilize maximum entropy model to carry out modeling, as formula (6) as shown in.For each fundamental function f
m, corresponding model parameter is λ
m, m=1,2 ..., M.
Adopt 3 features, bilingual named entity alignment degree of confidence is carried out to modeling, be respectively: bilingual named entity part of speech combination co-occurrence feature, bilingual named entity intertranslation feature and bilingual named entity length linked character.
Part of speech combination co-occurrence feature refers to Chinese-English part of speech corresponding in bilingual named entity and is combined in the co-occurrence frequency in whole corpus.Concrete calculate as formula (7) as shown in:
Wherein,
represent that named entity part of speech is combined in the number of times of co-occurrence in language material, count (*, *) represents the quantity of named entity in language material.
For the bilingual named entity of candidate, the mutual translation probability between source language named entity and target language end projection named entity is used respectively
with
represent, bilingual named entity intertranslation feature as formula (8) as shown in:
For the bilingual named entity of optimum
,
difference in length approximate meet standardized normal distribution, definition length linked character as formula (9) as shown in:
Wherein,
Wherein, count (*) represents the number of characters that * comprises, and English is alphabetical number, and Chinese is Chinese character number.
The bilingual named entity Assumption set of expansion
in each hypothesis
score value be expressed as formula form (10):
Finally, obtain the bilingual named entity Assumption set of sentence to optimum by a greed search.Source language is exactly that target language named entity of supposing with the optimum bilingual named entity of source language named entity composition in the optimum projection result of target language end.Utilize the bilingual named entity hypothesis of all expansions of formula (10) distich centering to give a mark, select the right bilingual named entity Assumption set of optimum of sentence by following greedy search procedure, thereby obtain optimum target language named entity projection:
First, the bilingual named entity Assumption set of this optimum of initialization is empty;
Then, calculate the score (h of all bilingual named entity hypothesis of sentence centering according to formula (10)
i), and by descending sort;
Afterwards, choose successively bilingual named entity in one and the bilingual named entity Assumption set of current optimum and there is no the bilingual named entity hypothesis of the expansion h of border clash
iput into optimum bilingual named entity Assumption set.Repeat this step, until can not find the bilingual named entity hypothesis of the expansion satisfying condition.
?
on training sequence marking model again,
utilize sequence labelling MODEL C model (t) right
again mark, calculate
if
?
training sequence marking model Cmodel (t) ← Cmodel (Lt) again on Lt.
Similar, at random from
100 sentences of middle extraction are to forming
according to word alignment from
arrive
mark projection, projection result merges
after revising, form
Utilize sequence labelling MODEL C model (s) right
mark, recalculate
if
?
training sequence marking model again on Ls
Step 3, circulation execution step two, observe the test result of bilingual sequence labelling model on exploitation collection, until algorithm convergence, final production model Cmodel (s) and Cmodel (t).Utilize Cmodel (s) to carry out named entity recognition to source language language material, utilize Cmodel (s) to carry out named entity recognition to target language language material, and further compile named entity dictionary.
Claims (5)
1. the bilingual named entity recognition method based on coorinated training, is characterized in that comprising the following steps:
Step 1, initialization marking model; On 2000 bilingual corporas that marked named entity, train respectively the initial marking model of Chinese-English named entity;
Step 2, on the not mark named entity language material of Chinese-English sentence level alignment, utilize 10 times of cross selection increments marks, carry out bilingual coorinated training; Detailed process is as follows:
First,, from the sentence of randomly drawing 1000 alignment the set of named entity language material that do not mark of Chinese-English sentence level alignment, be expressed as
the marking model of utilizing step 1 to obtain, carries out respectively named entity mark to bilingual sentence; Calculate
bilingual mark concordance rate, the set of initialization mark language material increment is empty;
Then, at random from
the sentence of middle extraction 10% is right, forms
according to word alignment from
to marking projection, and projection named entity tab area is expanded, made it to hold more target language named entity hypothesis, bilingual named entity hypothesis of each named entity projection hypothesis and source language named entity composition; Afterwards, merge single language feature of target language named entity and the alignment feature of bilingual named entity, projection result is revised, the mark language material increment using revised result as target language end
?
on re-start target language named entity marking model training, and again right by the marking model after training
in
mark, recalculate
bilingual mark concordance rate;
Said process is carried out in circulation, carries out 10 times of intersections, and mark language material increment corresponding when bilingual mark concordance rate is the highest in circulating marks language material increment as the target language end of this coorinated training
?
on re-start target language named entity marking model training;
Make to use the same method, find the increment mark language material of source language end
?
on re-start source language named entity marking model training;
Step 3, circulation execution step two, by testing until algorithm convergence on exploitation collection; After circulation finishes, finally produce Chinese-English two named entity marking model, the bilingual Named Entity Extraction Model training; Finally, cross-cutting bilingual corpora is carried out to the identification of named entity, further build named entity dictionary.
2. a kind of bilingual named entity recognition method based on coorinated training as claimed in claim 1, is characterized in that, calculates
the method of bilingual mark concordance rate is as follows:
The set of initialization mark language material increment is empty,
Wherein,
(ws
i, wt
j)
krepresent the right the k(1≤k≤K of parallel sentence) to word pair; T (ws
i), T (wt
j) represent respectively the mark at the Chinese-English two ends of named entity; U represents un-annotated data collection; N represents the sentence number in U; In mark language material, mark altogether three kinds of named entities,---name, LOC---place name and ORG---the organizational structure's name that is respectively PER; According to BIO mark collection mark, all characters have 7 kinds of mark: B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG and O;
While calculating alignment mark concordance rate, the difference of ignore-tag " B " and " I ", thinks that they are identical marks.
3. a kind of bilingual named entity recognition method based on coorinated training as claimed in claim 1, is characterized in that in described step 2, and the method that projection named entity tab area is expanded is as follows:
First the named entity projected area from source language to target language is expanded, made it to hold more target language named entity hypothesis, bilingual named entity hypothesis of each named entity projection hypothesis and source language named entity composition;
in any one named entity be expressed as
by word project obtain target language end continuous and the centre word piece that comprises center of projection word as minimum candidate region
the projected area that comprises all projection words
two ends respectively to 4 words of external expansion as Maximum alternative region;
At target language end, set up a sliding window, from minimum candidate region, constantly expand word to any side of sentence, until reach Maximum alternative zone boundary, thus expansion produces a series of target language end candidate named entity hypothesis; Each target language end named entity hypothesis with
combination, forms a bilingual named entity hypothesis, is expressed as
4. a kind of bilingual named entity recognition method based on coorinated training as claimed in claim 1, it is characterized in that in described step 2, merge single language feature of target language named entity and the alignment feature of bilingual named entity, and the method that projection result is revised is as follows:
By constructing a log-linear model, merge the syntax degree of confidence of target language named entity and the alignment degree of confidence of bilingual named entity, to all comprehensive marking of bilingual named entity hypothesis;
For guaranteeing that the projection of target language end named entity meets the syntactic feature of named entity, selects the named entity syntax degree of confidence of border, left and right distribution probability as target language; Border distribution probability comprises left margin binary part of speech co-occurrence frequency and right margin binary part of speech co-occurrence frequency; Left margin binary part of speech co-occurrence frequency definition as formula (2) as shown in:
The definition of right margin binary part of speech co-occurrence frequency as formula (3) as shown in:
Wherein, the t in formula
i, t
i-1, t
i+1represent respectively border word w
ipart of speech, border word w
iprevious word w
i-1part of speech and border word w
ia rear word w
i+1part of speech; Count (*, *, *) represents named entity border word w in corpus
ithe number of times that occurs of binary part of speech combination; Count (rw
i) and count (lw
i) represent respectively the number of times that border, left and right occurs in language material;
Merge left and right boundary information, the calculating of single statement method degree of confidence of projection named entity as formula (4) as shown in:
Maximum entropy model can merge dissimilar feature, for the alignment degree of confidence of bilingual named entity
make fundamental function
utilize maximum entropy model to carry out modeling, as formula (5) as shown in; For each fundamental function f
m, corresponding model parameter is λ
m, m=1,2 ..., M;
adopt 3 features, bilingual named entity alignment degree of confidence is carried out to modeling, be respectively: bilingual named entity part of speech combination co-occurrence feature, bilingual named entity intertranslation feature and bilingual named entity length linked character; Part of speech combination co-occurrence feature refers to Chinese-English part of speech corresponding in bilingual named entity and is combined in the co-occurrence frequency in whole corpus; Calculate as formula (6) as shown in:
Wherein,
represent that named entity part of speech is combined in the number of times of co-occurrence in language material, count (*, *) represents the quantity of named entity in language material;
For the bilingual named entity of candidate, the mutual translation probability between source language named entity and target language end projection named entity is used respectively
with
represent, bilingual named entity intertranslation feature as formula (7) as shown in:
For the bilingual named entity of optimum
,
difference in length approximate meet standardized normal distribution, definition length linked character as formula (8) as shown in:
Wherein,
wherein, count (*) represents the number of characters that * comprises, and English is alphabetical number, and Chinese is Chinese character number;
The bilingual named entity Assumption set of expansion
in each hypothesis
score value be expressed as formula form (9):
Finally, obtain the bilingual named entity Assumption set of sentence to optimum by a greed search, thereby obtain optimum target language named entity projection; Source language is exactly the target language named entity of supposing with the optimum bilingual named entity of source language named entity composition in the optimum projection result of target language end.
5. a kind of bilingual named entity recognition method based on coorinated training as claimed in claim 4, is characterized in that, described greedy search procedure is:
First, the bilingual named entity Assumption set of this optimum of initialization is empty;
Then, according to
calculate the score (h of all bilingual named entity hypothesis of sentence centering
i) and by descending sort;
Afterwards, choose successively bilingual named entity in one and the bilingual named entity Assumption set of current optimum and there is no the bilingual named entity hypothesis of the expansion h of border clash
iput into optimum bilingual named entity Assumption set; Repeat this step, until can not find the bilingual named entity hypothesis of the expansion satisfying condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310593746.3A CN103853710B (en) | 2013-11-21 | 2013-11-21 | A kind of bilingual name entity recognition method based on coorinated training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310593746.3A CN103853710B (en) | 2013-11-21 | 2013-11-21 | A kind of bilingual name entity recognition method based on coorinated training |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103853710A true CN103853710A (en) | 2014-06-11 |
CN103853710B CN103853710B (en) | 2016-06-08 |
Family
ID=50861378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310593746.3A Active CN103853710B (en) | 2013-11-21 | 2013-11-21 | A kind of bilingual name entity recognition method based on coorinated training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103853710B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104133848A (en) * | 2014-07-01 | 2014-11-05 | 中央民族大学 | Tibetan language entity knowledge information extraction method |
CN104298714A (en) * | 2014-09-16 | 2015-01-21 | 北京航空航天大学 | Automatic massive-text labeling method based on exception handling |
CN104965821A (en) * | 2015-07-17 | 2015-10-07 | 苏州大学张家港工业技术研究院 | Data annotation method and apparatus |
CN106339371A (en) * | 2016-08-30 | 2017-01-18 | 齐鲁工业大学 | English and Chinese word meaning mapping method and device based on word vectors |
CN106649289A (en) * | 2016-12-16 | 2017-05-10 | 中国科学院自动化研究所 | Realization method and realization system for simultaneously identifying bilingual terms and word alignment |
CN107357786A (en) * | 2017-07-13 | 2017-11-17 | 山西大学 | A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data |
CN107797988A (en) * | 2017-10-12 | 2018-03-13 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on Bi LSTM |
CN107798386A (en) * | 2016-09-01 | 2018-03-13 | 微软技术许可有限责任公司 | More process synergics training based on unlabeled data |
CN107797987A (en) * | 2017-10-12 | 2018-03-13 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on Bi LSTM CNN |
CN107977353A (en) * | 2017-10-12 | 2018-05-01 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on LSTM-CNN |
CN107992468A (en) * | 2017-10-12 | 2018-05-04 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on LSTM |
WO2018153130A1 (en) * | 2017-02-22 | 2018-08-30 | 华为技术有限公司 | Translation method and apparatus |
CN108959255A (en) * | 2018-06-28 | 2018-12-07 | 北京百度网讯科技有限公司 | Entity labeled data collection construction method, device and equipment |
CN110765276A (en) * | 2019-10-21 | 2020-02-07 | 北京明略软件系统有限公司 | Entity alignment method and device in knowledge graph |
CN111062215A (en) * | 2019-12-10 | 2020-04-24 | 金蝶软件(中国)有限公司 | Named entity recognition method and device based on semi-supervised learning training |
CN111143571A (en) * | 2018-11-06 | 2020-05-12 | 马上消费金融股份有限公司 | Entity labeling model training method, entity labeling method and device |
CN111209754A (en) * | 2020-02-25 | 2020-05-29 | 桂林电子科技大学 | Data set construction method for Vietnamese entity recognition |
CN111274829A (en) * | 2020-02-07 | 2020-06-12 | 中国科学技术大学 | Sequence labeling method using cross-language information |
CN111461330A (en) * | 2020-04-03 | 2020-07-28 | 中国建设银行股份有限公司 | Multi-language knowledge base construction method and system based on multi-language resume |
CN111723587A (en) * | 2020-06-23 | 2020-09-29 | 桂林电子科技大学 | Chinese-Thai entity alignment method oriented to cross-language knowledge graph |
CN111738024A (en) * | 2020-07-29 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Entity noun tagging method and device, computing device and readable storage medium |
CN113221539A (en) * | 2021-07-08 | 2021-08-06 | 华东交通大学 | Method and system for identifying nested named entities integrated with syntactic information |
CN114610852A (en) * | 2022-05-10 | 2022-06-10 | 天津大学 | Course learning-based fine-grained Chinese syntax analysis method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7171350B2 (en) * | 2002-05-03 | 2007-01-30 | Industrial Technology Research Institute | Method for named-entity recognition and verification |
CN101295292A (en) * | 2007-04-23 | 2008-10-29 | 北大方正集团有限公司 | Method and device for modeling and naming entity recognition based on maximum entropy model |
CN101763344A (en) * | 2008-12-25 | 2010-06-30 | 株式会社东芝 | Method for training translation model based on phrase, mechanical translation method and device thereof |
CN102682763A (en) * | 2011-03-10 | 2012-09-19 | 北京三星通信技术研究有限公司 | Method, device and terminal for correcting named entity vocabularies in voice input text |
CN103268339A (en) * | 2013-05-17 | 2013-08-28 | 中国科学院计算技术研究所 | Recognition method and system of named entities in microblog messages |
-
2013
- 2013-11-21 CN CN201310593746.3A patent/CN103853710B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7171350B2 (en) * | 2002-05-03 | 2007-01-30 | Industrial Technology Research Institute | Method for named-entity recognition and verification |
CN101295292A (en) * | 2007-04-23 | 2008-10-29 | 北大方正集团有限公司 | Method and device for modeling and naming entity recognition based on maximum entropy model |
CN101763344A (en) * | 2008-12-25 | 2010-06-30 | 株式会社东芝 | Method for training translation model based on phrase, mechanical translation method and device thereof |
CN102682763A (en) * | 2011-03-10 | 2012-09-19 | 北京三星通信技术研究有限公司 | Method, device and terminal for correcting named entity vocabularies in voice input text |
CN103268339A (en) * | 2013-05-17 | 2013-08-28 | 中国科学院计算技术研究所 | Recognition method and system of named entities in microblog messages |
Non-Patent Citations (1)
Title |
---|
李波: "基于自主推理的中文命名实体识别方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑 》, no. 1, 15 January 2013 (2013-01-15) * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104133848A (en) * | 2014-07-01 | 2014-11-05 | 中央民族大学 | Tibetan language entity knowledge information extraction method |
CN104298714A (en) * | 2014-09-16 | 2015-01-21 | 北京航空航天大学 | Automatic massive-text labeling method based on exception handling |
CN104298714B (en) * | 2014-09-16 | 2017-12-08 | 北京航空航天大学 | A kind of mass text automatic marking method based on abnormality processing |
CN104965821A (en) * | 2015-07-17 | 2015-10-07 | 苏州大学张家港工业技术研究院 | Data annotation method and apparatus |
CN104965821B (en) * | 2015-07-17 | 2018-01-05 | 苏州大学 | A kind of data mask method and device |
CN106339371A (en) * | 2016-08-30 | 2017-01-18 | 齐鲁工业大学 | English and Chinese word meaning mapping method and device based on word vectors |
CN106339371B (en) * | 2016-08-30 | 2019-04-30 | 齐鲁工业大学 | A kind of English-Chinese meaning of a word mapping method and device based on term vector |
CN107798386A (en) * | 2016-09-01 | 2018-03-13 | 微软技术许可有限责任公司 | More process synergics training based on unlabeled data |
CN106649289A (en) * | 2016-12-16 | 2017-05-10 | 中国科学院自动化研究所 | Realization method and realization system for simultaneously identifying bilingual terms and word alignment |
WO2018153130A1 (en) * | 2017-02-22 | 2018-08-30 | 华为技术有限公司 | Translation method and apparatus |
US11244108B2 (en) | 2017-02-22 | 2022-02-08 | Huawei Technologies Co., Ltd. | Translation method and apparatus |
CN107357786A (en) * | 2017-07-13 | 2017-11-17 | 山西大学 | A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data |
CN107797988A (en) * | 2017-10-12 | 2018-03-13 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on Bi LSTM |
CN107992468A (en) * | 2017-10-12 | 2018-05-04 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on LSTM |
CN107977353A (en) * | 2017-10-12 | 2018-05-01 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on LSTM-CNN |
CN107797987A (en) * | 2017-10-12 | 2018-03-13 | 北京知道未来信息技术有限公司 | A kind of mixing language material name entity recognition method based on Bi LSTM CNN |
CN107797987B (en) * | 2017-10-12 | 2021-02-09 | 北京知道未来信息技术有限公司 | Bi-LSTM-CNN-based mixed corpus named entity identification method |
CN108959255A (en) * | 2018-06-28 | 2018-12-07 | 北京百度网讯科技有限公司 | Entity labeled data collection construction method, device and equipment |
CN111143571B (en) * | 2018-11-06 | 2020-12-25 | 马上消费金融股份有限公司 | Entity labeling model training method, entity labeling method and device |
CN111143571A (en) * | 2018-11-06 | 2020-05-12 | 马上消费金融股份有限公司 | Entity labeling model training method, entity labeling method and device |
CN110765276A (en) * | 2019-10-21 | 2020-02-07 | 北京明略软件系统有限公司 | Entity alignment method and device in knowledge graph |
CN111062215B (en) * | 2019-12-10 | 2024-02-13 | 金蝶软件(中国)有限公司 | Named entity recognition method and device based on semi-supervised learning training |
CN111062215A (en) * | 2019-12-10 | 2020-04-24 | 金蝶软件(中国)有限公司 | Named entity recognition method and device based on semi-supervised learning training |
CN111274829A (en) * | 2020-02-07 | 2020-06-12 | 中国科学技术大学 | Sequence labeling method using cross-language information |
CN111274829B (en) * | 2020-02-07 | 2023-06-16 | 中国科学技术大学 | Sequence labeling method utilizing cross-language information |
CN111209754B (en) * | 2020-02-25 | 2023-06-02 | 桂林电子科技大学 | Data set construction method for Vietnam entity recognition |
CN111209754A (en) * | 2020-02-25 | 2020-05-29 | 桂林电子科技大学 | Data set construction method for Vietnamese entity recognition |
CN111461330A (en) * | 2020-04-03 | 2020-07-28 | 中国建设银行股份有限公司 | Multi-language knowledge base construction method and system based on multi-language resume |
CN111461330B (en) * | 2020-04-03 | 2023-09-15 | 中国建设银行股份有限公司 | Multilingual knowledge base construction method and system based on multilingual resume |
CN111723587A (en) * | 2020-06-23 | 2020-09-29 | 桂林电子科技大学 | Chinese-Thai entity alignment method oriented to cross-language knowledge graph |
CN111738024B (en) * | 2020-07-29 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Entity noun labeling method and device, computing device and readable storage medium |
CN111738024A (en) * | 2020-07-29 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Entity noun tagging method and device, computing device and readable storage medium |
CN113221539A (en) * | 2021-07-08 | 2021-08-06 | 华东交通大学 | Method and system for identifying nested named entities integrated with syntactic information |
CN114610852A (en) * | 2022-05-10 | 2022-06-10 | 天津大学 | Course learning-based fine-grained Chinese syntax analysis method and device |
Also Published As
Publication number | Publication date |
---|---|
CN103853710B (en) | 2016-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853710A (en) | Coordinated training-based dual-language named entity identification method | |
CN103154936B (en) | For the method and system of robotization text correction | |
Hu et al. | A state-transition framework to answer complex questions over knowledge base | |
CN107766324B (en) | Text consistency analysis method based on deep neural network | |
CN108416058B (en) | Bi-LSTM input information enhancement-based relation extraction method | |
CN105068997B (en) | The construction method and device of parallel corpora | |
CN103942192B (en) | The interpretation method that a kind of bilingual maximum noun chunk separates-merges | |
CN106383818A (en) | Machine translation method and device | |
CN101866337A (en) | Part-or-speech tagging system, and device and method thereof for training part-or-speech tagging model | |
CN111476031A (en) | Improved Chinese named entity recognition method based on L attice-L STM | |
CN104915337A (en) | Translation text integrity evaluation method based on bilingual text structure information | |
CN110427619B (en) | Chinese text automatic proofreading method based on multi-channel fusion and reordering | |
CN113312922B (en) | Improved chapter-level triple information extraction method | |
CN106610937A (en) | Information theory-based Chinese automatic word segmentation method | |
Bilgin et al. | Sentiment analysis with term weighting and word vectors | |
CN102270196A (en) | Machine translation method | |
CN104317882A (en) | Decision-based Chinese word segmentation and fusion method | |
Huber et al. | Predicting above-sentence discourse structure using distant supervision from topic segmentation | |
Qi et al. | Translation-based matching adversarial network for cross-lingual natural language inference | |
Zhao | Research and design of automatic scoring algorithm for english composition based on machine learning | |
CN102945231B (en) | Construction method and system of incremental-translation-oriented structured language model | |
Tran et al. | Preordering for Chinese-Vietnamese statistical machine translation | |
CN113190690A (en) | Unsupervised knowledge graph inference processing method, unsupervised knowledge graph inference processing device, unsupervised knowledge graph inference processing equipment and unsupervised knowledge graph inference processing medium | |
Finch et al. | A bayesian model of transliteration and its human evaluation when integrated into a machine translation system | |
Su et al. | Alignment-consistent recursive neural networks for bilingual phrase embeddings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |