On this page you will find brief descriptions of current projects that some of our faculty affiliates are pursuing.
Should you be interested in participating with or finding out more about each project, please contact the relevant faculty member.
Urban Analytics
Shane Jensen, Statistics
My current primary research focus is urban analytics: the quantitative analysis of the functioning of local areas within large cities. The recent explosion in data collection on so many aspects of city life gives us the opportunity to investigate urban environments at a higher resolution than ever before. Philadelphia is an ideal focus for our work as it is a city undergoing rapid change and development against a backdrop of difficult civic challenges including substantial economic disparity and dramatic variation in safety between different neighborhoods of the city. My research program is a collaborative and multidisciplinary effort that spans the fields of architecture, urban planning, criminology and statistics.
Our endeavor is to take the available data on cities and set up artificial experimental situations that allow us to learn as objectively as possible about what aspects of city environments are associated with safety and other outcomes. Our methodology involves the analysis of local neighborhood features, including crime, land use, zoning, business development, walkability, and population demographics. Bayesian hierarchical models are useful in this endeavor, as they provide a principled way to share information between proximal areas while still allowing inference for each neighborhood to be heavily influenced by local features.
You can read more about our data collection and analysis pipeline in our paper, Analysis of Urban Vibrancy and Safety in Philadelphia by C. Humphrey, S.T. Jensen, D. Small and R. Thurston. https://arxiv.org/abs/1702.07909.
You can also read more about our goals in these media articles:
https://nextcity.org/daily/entry/philly-streets-get-test-of-jane-jacobs-eyes-on-the-street-effect
http://knowledge.wharton.upenn.edu/article/urban-analytics/
Listening to the Shape of Biological Data
Junhyong Kim, Biology
All biological organisms have a generative process, which in multicellular organisms take the form of cell proliferation, cell differentiation, and morphogenesis (aggregation of cells forming geometrical assemblies such as organs). We are interested in understanding the control process of such organismal generative processes. Recently, we have been engaged in developing technologies to measure the molecular state of individual cells, which is generating new data at an unprecedented level of resolution. Nevertheless, biological data is characterized by high degree of complexity implying very large feature space (= dimensions) and high degree of noise. At the same time, the cost of collecting data is high, often involving manual steps, resulting in very sparse data. Most importantly, biological information has a certain “functional geometry” where dispersions of the points in measurement space reflect compatibility with living systems and biological function.
We are interested developing new data analysis techniques to handle such data, especially leveraging existing prior information. We would like to develop machine learning methods that have the following characteristics that are appropriate for biological data: (1) integrate data from different measurement modalities (e.g., RNA data, image data); (2) incorporate prior knowledge encapsulated in large-scale heterogeneous datasets; (3) integrate temporal models of the generative process; and (4) handle subsets of data with different inherent dimensionalities (e.g., different cell types with different degrees of freedom).
References:
Morris, J., Na, Y.-J., Zhu, H., Lee, J.-H., Giang, H., Ulyanova, A., Baltuch, G.H., Brem, S., Kung, D.K., Lucas, T.H., Isaac-Chen, H., O’Rourke, D.M., Wolf, J.A., Grady, S., Sul, J.-Y., Kim, J. and J. Eberwine. 2017. Pervasive heteroplasmy load within single cells revealed by single mitochondrion sequencing. Cell Reports 21(10):2706-2713.
Dueck, H., Eberwine, J., and Kim, J. 2016. Variation is function: Are single cell differences functionally important? BioEssays 38(2):172-180.
Kim and J. Eberwine. 2010. RNA as the state memory and mediator of cellular phenotype. Trends in Cell Biology 20(6):311-318.
Galaxy Images and Neural Networks
Bhuv Jain, Physics and Astronomy
Most of the matter in the universe is in the form of ‘dark’ particles unlike the matter we see and touch. The dynamics of both ordinary matter and dark matter drive the formation of structures in the universe, like galaxies and clusters. However, observations are now in tension with the dominant paradigm of non-interacting dark matter particles. This has motivated the hypothesis that dark matter particles could have subtle interactions. The interactions would alter the distribution of matter and indirectly impact the visible parts of galaxies as well.
Using N-body simulations we found that dark matter interactions can change observable features of disk galaxies like the Milky Way. In particular, they get disturbed and warped during passage through galaxy clusters. Observed galaxy clusters have about a hundred galaxies each. We want to use their images in different color filters to test for dark matter interactions. Using images of simulated galaxies observed at different times, we would like to train Convolution Neural Networks to:
(1) detect the warping/thickening of the visible disk (a classification problem);
(2) extract the most relevant features in the images to describe the evolution of the disk;
(3) Finally, Bayesian Neural Networks could be used to constrain the strength self-interactions of dark matter particles from a collection of galaxy images.
You can learn more about how self-interacting dark matter distorts galaxies in our paper: ‘The Morphology of Disk Galaxies in Galaxy Clusters with Dark Matter Self-Interactions’, L. Secco, A. Farah, B. Jain et al, https://arxiv.org/abs/1712.04841
(see Fig 6 for a quick visual summary)
Creating Critical Mass
Damon Centola, Annenberg
The “tipping point” is a common explanation for sudden shifts in collective behavior, but the limitations of historical evidence and conflicting theoretical models present a challenge to understanding how a small but committed group can change the behavior of an entire population. On the one hand, economic theories emphasize strategic choice and suggest that established conventions are highly stable. On the other hand, a popular physics model emphasizes the dynamics of individual interactions, predicting that a group with just 10% of the population can initiate social change, but does not for account individual preferences to conform to the majority. We investigate this paradox using a combination of computational modeling and experimental design. By synthesizing the physics approach with strategic choice theory, we generate a model showing a critical mass threshold. We then test this model with web based experiments on social coordination. We find that below this threshold, small groups have no impact on the behavior of a larger population. However, at the critical point of 25% of a population, committed minorities can sway the majority of the population to rapidly adopt a new behavior.
http://ndg.asc.upenn.edu/experiments/creating-critical-mass/
Paper can be found here.
“How Behavior Spreads: The Science of Complex Contagions” by Damon Centola
A new, counterintuitive theory for how social networks influence the spread of behavior
New social movements, technologies, and public-health initiatives often struggle to take off, yet many diseases disperse rapidly without issue. Can the lessons learned from the viral diffusion of diseases be used to improve the spread of beneficial behaviors and innovations? In How Behavior Spreads, Damon Centola presents over a decade of original research examining how changes in societal behavior–in voting, health, technology, and finance—occur and the ways social networks can be used to influence how they propagate. Centola’s startling findings show that the same conditions accelerating the viral expansion of an epidemic unexpectedly inhibit the spread of behaviors.
While it is commonly believed that “weak ties”—long-distance connections linking acquaintances—lead to the quicker spread of behaviors, in fact the exact opposite holds true. Centola demonstrates how the most well-known, intuitive ideas about social networks have caused past diffusion efforts to fail, and how such efforts might succeed in the future. Pioneering the use of Web-based methods to understand how changes in people’s social networks alter their behaviors, Centola illustrates the ways in which these insights can be applied to solve countless problems of organizational change, cultural evolution, and social innovation. His findings offer important lessons for public health workers, entrepreneurs, and activists looking to harness networks for social change.
Practical and informative, How Behavior Spreads is a must-read for anyone interested in how the theory of social networks can transform our world.
Available on Amazon: http://a.co/6cfzw6T
http://ndg.asc.upenn.edu/book/how-behavior-spreads-the-science-of-complex-contagions/
The Statistics of Causality and Forensics
Maria Cuellar, Criminology
My research is at the intersection of statistics and the law. I am evaluating how statistics can help in two respects: causality and forensics. The law often requires causal statements to be made about individuals or groups of individuals, but the questions that science usually answers do not quite answer the legal questions. I am developing new methodologies to answer legal causal questions by using causal inference and data analysis. In forensic science, several government-issued reports noted that much of the techniques used by analysts did not have valid foundations and suffered from human errors that could invalidate results. Forensic science statements are often used in legal prosecutions and in court to argue that an individual committed a crime, and thus the problem has very high stakes. I am surveying forensic analysts to detect and measure certain types of human error, and I am testing the performance of analysts by implementing blind proficiency testing in forensic laboratories.
To see how I have used statistics to understand legal questions feel free to read the articles below:
Cuellar M, Short fall arguments in court: A Probabilistic Analysis, 50 U. Mich. J. L. Reform 763 (2017).
Cuellar M; Causal reasoning and data analysis: Problems with the abusive head trauma diagnosis, Law, Probability and Risk, 2017; 16(4): 223–239.
Law and the Blockchain
Kevin Werbach, Legal Studies & Business Ethics
Blockchain and cryptocurrencies have captured the interest of developers, entrepreneurs, corporations, and governments around the world. While there is clearly an over-abundance of hype and a speculative investment frenzy, serious investment in novel technologies and business models is also occurring. One of the key questions is how to square legal and regulatory obligations with blockchain-based systems and the entities that interact with them. Blockchains are designed to be decentralized and immutable, creating serious difficulties for traditional legal regimes. My work argues that blockchain systems and the law can and must be harmonized to address significant dangers, and to allow the technology to reach its potential.
The Blockchain and the New Architecture of Trust (MIT Press, forthcoming November 2018)
Trust, But Verify: Why the Blockchain Needs the Law, 32 Berkeley Technology Law Journal __ (forthcoming 2018)
Contracts Ex Machina, 67 Duke Law Journal 101 (2017), with Nico Cornell
The Reg@Tech workshop, which I convene through the Zicklin Center on Business Ethics Research, regularly assembles a global community of key regulators, academics, and private-sector experts to discuss cryptocurrency issues, with a particular focus on initial coin offerings (ICOs). There have been over 1,000 ICOs since the start of 2017, collectively raising roughly $15 billion. On the one hand, these arrangements may promote innovative economic models that facilitate growth of novel network-based applications and democratize the process of venture fund-raising. On the other hand, because they often operate outside traditional regulatory regimes, they may undermine investor protections, open the door to fraud and other abuses, facilitate money laundering, and other problems.
Regulatory Considerations for Token Offerings (report on Spring 2018 workshop)
Initial Coin Offerings: Can Regulators Curb the Risks? (Knowledge@Wharton, March 27, 2018)
Empirical Aggregation of Economic Information
Frank Diebold, Economics
Part of my work focuses on aggregating information from different sources. For example, I develop new methods for combining views (economic forecasts) from surveys of experts. Despite the clear success of forecast combination in many economic environments, several important issues remain incompletely resolved. The issues relate to selection of the set of forecasts to combine, and whether some form of additional regularization (e.g., shrinkage) is desirable. Against this background, and also considering the frequently-found good performance of simple-average combinations, we propose LASSO-based procedures that set some combining weights to zero and shrink the survivors toward equality (“partially-egalitarian LASSO”). Ex-post analysis reveals that the optimal solution has a very simple form: The vast majority of forecasters should be discarded, and the remainder should simply be averaged. We therefore propose and explore direct subset-averaging procedures motivated by the structure of partially-egalitarian LASSO and the lessons learned. In an application to the European Central Bank Survey of Professional Forecasters, our procedures outperform simple average and median forecasts – indeed they perform approximately as well (ex ante, in real time) as the ex-post best forecaster.
A recent paper is https://www.sas.upenn.edu/~fdiebold/papers2/DieboldShinEgalitarianLasso.pdf.
Some earlier work, which looks different but which is actually closely related, is now implemented in real time at the Federal Reserve Bank of Philadelphia, https://www.philadelphiafed.org/research-and-data/real-time-center/business-conditions-index. (The Philly Fed site also has links to papers.)