CURRENT PROJECTS
Data mining for manufacturing and design processes. (Sponsor - Motorola and Manufacturing Research Center)
Manufacturing and design processes often generate large scale data sets
with lots of numeric and nominal attributes. Discovering and predicting
the hidden pattern or relationships among all of these data attributes
is pivotal in figuring out the crucial factors that affect the
manufacturing process and assisting the improvement of the production
quality accordingly. This ongoing project specifically targets at this
issue by building up a robust data mining system which can efficiently
deal with high dimensional large data sets. The techniques we have
investigated include decision trees, decision rules, neural networks and
some other traditional machine learning methods. Our current interests
focus on applying some adaptive evolutionary computational methods to
solve complex problems or evolve complex systems. One of the promising
approaches we stick to is using Gene Expression Programming (GEP)
algorithm. Belonging to the family of Genetic Algorithms (Gas), GEP is a
recently-developed evolutionary algorithm that is capable of evolving
computer programs and predicts mathematical functions from experimental
data. Because of its linear chromosome representation and its separation
of the solution and search space, GEP dramatically improves upon
traditional genetic programming with respect to complexity and time
efficiency, and can solve various types of modeling and optimization
problems. Our experiments conducted on multi-category pattern
classification problems have demonstrated the capability of GEP to
mine accurate but more compact classification rules, compared to
traditional machine learning algorithms. Further efforts include adding
incremental learning features to the algorithm for its better
performance and applying our data mining tools to more practical
manufacturing problems.
Text Summarization. (Sponsor - Motorola Advanced Technology Center)
Our interests in natural language processing mainly reside on how to use machine learning methods to accomplish certain tasks which mainly depend on human's work nowadays. Summarization is one of those such that we can not imagine our
everyday life without it. Every morning and evening the traffic reports are summaries, news headlines are summaries,
a trailer can also be regarded as the summary of a movie, and an abstract of a scientific article is of course the best representative.
Email can be treated as one of the greatest invention in the 20th century. However, some of us may get hundreds of emails per day, and we may not be able to read all of them, since we have other things to take care. But we do not want to miss the important information. Same dilemmas occur in reviewing scientific research articles, browsing web pages, and reading newspaper and magazines, etc. Today's advanced information technology often gives us not the life of ease, but more tension. Automated text summarization is what we are currently working on to alleviate the side-effects
of the information overload. It condenses the content of a document and presents the most salient to the users, and the users will know whether they should go on to read the whole document.
Modeling network routing as partially observable markov decision processes(POMDP's): (Sponsor - NSF)
Design of various routing schemes has dominated the discussions in the computer networking community for a long time.This underlines the importance of the topic.
Though almost all routing protocols have been formulated based on detailed functional parameters of the system there is no clear definition of optimality(evaluation is
done by simulating a network and testing the protocol). In this work, we present a decision theoretic approach to modeling a network routing protocol which would
function optimally in the framework. This formal definition of optimality would help us compute E-optimal routing algorithms. The various routing protocols fall out as
different policies in such a framework.Exchanging information is an essential part of any routing protocol in computer networks. The objective of any such method is to
let the system achieve what is termed as "sufficient global knowledge" to work in an optimal fashion. Network routing is not an easy problem as such a system is not only
highly dynamic but also the decision making system should be highly responsive. The study tries to describe how we can model both static and dynamic routing protocols
in networks using the Markov decision processes framework. We deal with the problem in stages , modeling the perfect information scenario first, proceeding to an
uncertain model and then try to find optimal information sharing strategies that improve the performance of the underlying data routing mechanism.
Interactive POMDPs (Sponsor - NSF)
Partially Observable Markov Decision Processes (POMDP) have been proven to be very
useful in decision making of single agents when the environment of an agent can be
modelled as a stochastic process. An additional complication arises, when there are
many intelligent self-interested agents working in an environment. An interesting
question is how will an agent model another intelligent agent? This work aims to
address this issue. The research proposes a new general framework, in which an
agent can model another "intelligent" agent working in the same environment.
MOLECULAR BIOLOGY
Restriction Mapper: (Sponsor - National
Institutes of Health)
We have developed an intelligent automated DNA restriction mapping tool
useful in problems related to the worldwide Human Genome Project. The
purpose of this system is to map these restriction enzyme cutting sites
back onto the DNA molecule by determining the original order of the digest
segments. This is difficult because segment lengths are not known exactly.
The tool that is available for public download and use, uses Pratt's
separation theory, Dempster and Shafer's theory of evidencial reasoning,
and heuristic search for finding the proper arrangement of digest
segments.
Biological Motif Modeling: (Sponsor - NSF
and NIH)
Generating small models for which there may exist very little training
data presents a crucial problem in computational biology, namely the
trade-off between model specificity and under-fitting the data. There are
a bevy of superior modeling techniques; however, certain domain specific
problems, such as modeling the regulatory regions in intergenic DNA impose
constraints on the modeling process due to the lack of sufficient data. We
have developed a general motif modeling system (named "hendrix") whose
purpose is to model short gapless motifs. Once trained, these motifs can
be used to search new data.
MANUFACTURING
Machine Optimization: (Sponsor - Motorola
and Manufacturing Research Center)
This project involves developing an intelligent system that chooses a
set of near-optimal setup parameters for an electronic assembly machine to
achieve maximum production throughput. This ongoing project explores the
simulation and use of various search techniques for finding these
parameters. The techniques we have investigated to date include the use of
genetic algorithms, local search, tabu search and expert systems.
AGENT DESIGN
Automating the Evolution of Linguistic Competence in Artificial Agents: (Sponsor - NSF)
The aim of our research is to understand and automate the mechanisms by which language can emerge among artificial, knowledge-based
and rational agents. Our ultimate goal is to design and implement agents that, upon encountering other agent(s) with which they do not share
an agent communication language, are able to initiate creation of, and further able to evolve and enrich, a mutually understandable agent communication
language (ACL).
UNCERTAINTY WITHIN COMPUTATION
Effective Methods for Building Probabilistic Networks from Large Noisy Data Sets:
We have investigated the application of machine-learning tools and techniques for inducing large-scale probabilistic
models from a raw real-world medical database. Such probabilistic models are typically employed in medical decision
support systems. The real-world datasets used to build knowledge-bases in decision systems tend to be dirty, containing
a substantial amount of errant values. We have developed two novel context-based error detection and correction
techniques for cleansing dirty datasets. A probabilistic error model using a context-driven structure and a pattern-based approach to clean temporal data and
group records that exhibit similarity based on their contextual ordering is also presented. We have also studied the
effectiveness of Bayesian network construction techniques by constructing and testing three different types of Bayesian
networks
TRANSPORTATIONITS (Intelligent Transportation Systems)
involves improving our existing roadway transportation system through the
use of information technology. The AI Lab's interests are in how best to
gather, process and disseminate this information to the public's greatest
benefit. So far, our efforts have concentrated on three main areas:
ADVANCE, GCM, and Data Fusion.
ADVANCE: (Sponsor - Illinois Department of
Transportation and US Department of Transportation(FHWA))
Traffic congestion wastes 2 billion gallons of fuel per year in the US
alone. 135 million US drivers spend 2 billion hours trapped in traffic per
year. The total cost to Americans due to trafficexceeds $100 billion per
year. Given these staggering numbers, we would like to somehow ease
traffic burdens through the use of route planning based on optimization of
various criteria. Through the use of both static and dynamic data gathered
from current traffic conditions, we would like to be able to answer
questions such as: how do I get to X, what is the fastest way to get to X,
and where is the nearest X? We have investigated or are investigating the
use of path planning, data fusion from multiple sources, automatic
accident detection, short-term travel prediction, and better man-machine
interfaces for answering these questions:
GCM: (Sponsor - Illinois DOT, Indiana DOT,
and Wisconsin DOT)
The GCM Corridor is one of four "Priority Corridors" throughout the
country. The corridor includes the greater metropolitan areas of Gary,
Chicago, and Milwaukee as well as portions of southeast Wisconsin,
northeast Illinois, and northwestern Indiana. The corridor was defined to
allow for a wide range of solutions for movements throughout the corridor.
The intent of ITS is to improve mobility by better managing the existing
transportation system, rather than simply building new roads. The goals of
the priority corridors are to provide an operational testbed for ITS
projects where they have the greatest opportunity improve regional
traffic, provide "showcases" that will help maintain public awareness and
support for ITS development, create strong institutional relationships
that will lead to greater regional cooperation, and provide an opportunity
for effectively testing new technologies.
Data Fusion: (Sponsor - National Academy of
Sciences)
The purpose of the data fusion subproject is to assist the user in
routing travel planning, and other travel related activities through the
dissemination of various traffic related information to a traveler
on-line. Once received, this data is combined in a meaningful way to
answer travel related questions the user may have. We are currently
investigating the capacity of neural networks in this realm.
|