Jiewen Huang - New Haven CT, US Zhimin Chen - Seattle WA, US Arvind Arasu - Bothell WA, US Vivek Narasayya - Redmond WA, US
Assignee:
MICROSOFT CORPORATION - Redmond WA
International Classification:
G06F 17/30
US Classification:
707748, 707E17084
Abstract:
A set expansion system is described herein that improves precision, recall, and performance of prior set expansion methods for large sets of data. The system maintains high precision and recall by 1) identifying the qualify of particular lists and applying that quality through a weight, 2) allowing for the specification or negative examples in a set of seeds to reduce the introduction of bad entities into the set, and 3) applying a cutoff to eliminate lists that include a low number of positive matches. The system may perform multiple passes to first generate a good candidate result set and then refine the set to find a set with highest quality. The system may also apply Map Reduce or other distributed processing techniques to allow calculation in parallel. Thus, the system efficiently expands large concept sets from a potentially small set of initial seeds from readily available web data.
Daniel Abadi - New Haven CT, US Jiewen Huang - New Haven CT, US
Assignee:
Yale University - New Haven CT
International Classification:
G06F 17/30
US Classification:
707713, 707E17017
Abstract:
System, method and computer program product for processing a query are disclosed. Query processing includes partitioning the stored data into a plurality of partitions based on at least one vertex in the plurality of vertexes, storing at least another triple in the plurality of triples on the at least one node, assigning, based on the triple containing the at least one vertex, at least one partition in the plurality of partitions corresponding to the triple to at least one node in the plurality of nodes, and processing, based on the assigning, the query by processing the plurality of partitions.