PhD topic: data stories for interactive intentional analytics

The position is a 3 years fully funded PhD

Employer: University of Tours (France) in partnership with University of
Ioannina (Grece)

Lab: Laboratoire d’Informatique Fondamentale et Appliquée de Tours (LIFAT)

Location: Blois (France) with 3-12 months stay in Ioannina (Greece). The
recruited student can apply to an Erasmus grant for covering the stay in Greece.

Supervisors:
- Patrick Marcel, Associate Professor HDR, University of Tours, patrick.marcel@univ-tours.fr
- Verónika Peralta, Associate Professor, University of Tours, veronika.peralta@univ-tours.fr
- Panos Vassiliadis, Professor, University of Ioannina, panos.vassiliadis@cs.uoi.gr

Keywords: data analytics, data exploration, data stories, data quality, data
mining, intentional OLAP

Requirements: Applicants are expected to hold a Master’s degree in Computer Science,
be skilled in databases, machine learning, programming and be fluent in English.

Application: Applicants will email, before May 13th, 2019 (firm deadline), 
the following documents to the supervisors: CV, transcripts of the Master’s program, 
Master’s thesis dissertation, cover letter, reference letters.
Shortlisted applicants will be contacted for a Skype interview that will include 
a discussion of the scientific literature relevant to the topic.



Context

Can data analysis be fully automated and eventually an Artificial Intelligence
(AI) makes the decision? The debate around AI, especially Machine Learning
(ML), and their supposed capacity at automating decision making, is very intense these days. 
In the database (DB) community, and more particularly in
the data warehousing (DW) community, there is long tradition of having the
decision maker at the center of the data analysis process. At the inverse of automated application 
of algorithms [4], DW has been, since its inception, all about
facilitating the task of interactive exploration of a dataspace, and not let e.g.,
an algorithm automatically mine this space for patterns. One could even say
that DW is the ancestor of the Human-In-the-Loop Data Analysis phenomenon [2].

This PhD topic follows up from the reinvention of OLAP described in [10, 11],
and ambitions to automatize further interactive data analysis, while letting the
end user in command. This reinvention of OLAP introduces an analytics model
redefining what a query is, with respect to both what users ask the system,
what the answer entails, and how this answer is computed. An implementation
is currently being done.

The work introduced in [11] is a first step to a broader vision of interactive
data analysis [7], that opens major research questions, including:
- How to facilitate the understanding of data? This demands to precisely
define what are the answers to complex sequences of high level intentions,
and package them into coherent data stories accessible to even non expert
users.
- How to optimize the overall interactive analysis? Performance of query
processing has traditionally been the focus of the DB community, but
it has overlooked key indicators pertaining to the quality of the overall
interaction with the data.


Expected contributions

This PhD will focus on the topics listed above, and more precisely:
- In [11], it is proposed that answers to intentional operators are no more
traditional set of tuples, but dashboards including data, charts, informative summaries 
of KPI performance, as well as concise representations of
knowledge hidden in the data. The first challenge of this PhD work will be
to define how to structure such dashboards in a context where the interactive data analysis 
is a sequence of possibly complex queries, each being
a composition of intentions, in a personalized way [8].
- Facilitating data understanding also demands to automatically put query
results in a shape that is easy to grasp and that facilitates data storytelling. 
The second challenge of the PhD lies in identifying the most appropriate graphical representation 
of query answers, and to automatically
craft narratives, commenting on the highlights presented, etc. [5, 6].
- Finally, executing complex intentional statements requires an optimization
phase to decide which logical operators and model mining algorithms to
execute [7]. This optimization can be thought in terms of performance,
but also in terms of information content delivered and in terms of the
quality of the user’s experience [1], measuring the number of insights [3, 9],
controlling false discoveries [12], etc.


References
[1] Mahfoud Djedaini, Krista Drushku, Nicolas Labroche, Patrick Marcel,
Verónika Peralta, and Willeme Verdeau. Automatic assessment of interactive 
OLAP explorations. Information Systems, 82:148–163, 2019.

[2] AnHai Doan. Human-in-the-loop data analysis: A personal perspective.
In Proceedings of the Workshop on Human-In-the-Loop Data Analytics,
HILDA@SIGMOD 2018, Houston, TX, USA, June 10, 2018, pages 1:1–
1:6, 2018.

[3] Philipp Eichmann, Emanuel Zgraggen, Zheguang Zhao, Carsten Binnig,
and Tim Kraska. Towards a benchmark for interactive data exploration.
IEEE Data Eng. Bull., 39(4):50–61, 2016.

[4] Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, 
Manuel Blum, and Frank Hutter. Efficient and robust automated
machine learning. In Advances in Neural Information Processing Systems
28: Annual Conference on Neural Information Processing Systems 2015,
December 7-12, 2015, Montreal, Quebec, Canada, pages 2962–2970, 2015.

[5] Dimitrios Gkesoulis, Panos Vassiliadis, and Petros Manousis. Cinecubes:
Aiding data workers gain insights from OLAP queries. Inf. Syst., 53:60–86,
2015.

[6] Jessica Hullman, Steven M. Drucker, Nathalie Henry Riche, Bongshin Lee,
Danyel Fisher, and Eytan Adar. A deeper understanding of sequence in
narrative visualization. IEEE Trans. Vis. Comput. Graph., 19(12):2406–
2415, 2013.

[7] Patrick Marcel, Nicolas Labroche, and Panos Vassiliadis. Towards a benefit
based optimizer for interactive data analysis. In Proceedings of the 21st
International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, 
co-located with EDBT/ICDT Joint Conference,
DOLAP@EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019., 2019.

[8] Tova Milo and Amit Somech. Next-step suggestions for modern interactive
data analysis platforms. In Proceedings of the 24th ACM SIGKDD International Conference 
on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, pages 576–585, 2018.

[9] Amit Somech, Tova Milo, and Chai Ozeri. Predicting ”what is interesting”
by mining interactive-data-analysis session logs. In Advances in Database
Technology - 22nd International Conference on Extending Database Technology, 
EDBT 2019, Lisbon, Portugal, March 26-29, 2019, pages 456–467, 2019.

[10] Panos Vassiliadis and Patrick Marcel. The road to highlights is paved
with good intentions: Envisioning a paradigm shift in OLAP modeling.
In Proceedings of the 20th International Workshop on Design, Optimization, 
Languages and Analytical Processing of Big Data co-located with 10th
EDBT/ICDT Joint Conference (EDBT/ICDT 2018), Vienna, Austria,
March 26-29, 2018., 2018.

[11] Panos Vassiliadis, Patrick Marcel, and Stefano Rizzi. Beyond roll-up’s and
drill-down’s: An intentional analytics model to reinvent OLAP. Accepted
to Information Systems, https://doi.org/10.1016/j.is.2019.03.011, 2019.

[12] Zheguang Zhao, Lorenzo De Stefani, Emanuel Zgraggen, Carsten Binnig,
Eli Upfal, and Tim Kraska. Controlling false discoveries during interactive
data exploration. In Proceedings of the 2017 ACM International Conference
on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA,
May 14-19, 2017, pages 527–540, 2017.