The constant acceleration of the data growth is an integral element of modern realities. Social networks, mobile devices, data from measuring devices, business information - these are just a few types of sources that can generate huge amounts of data. Currently, the term Big Data has become quite common. Not everyone is still aware of how quickly and deeply the technology of processing large amounts of data is changing the most diverse aspects of society. Changes are taking place in various fields, creating new problems and challenges, including the field of information security, where such important aspects as confidentiality, integrity, accessibility, etc. should be in the foreground.
On the labor market, the specialists who are able to work in the field of analysis of multidimensional data of complex structure are in demand. Organizations have accumulated huge amounts of data, many of which are poorly structured. Their processing and analysis become more and more actual as business processes accelerate, competition intensifies and the price of a timely and correct decision increases. In recent years, private and personal data posted on the Internet have also become more accessible for analysis, especially in the form of “social networks”.
The classical way of training analysts does not meet these challenges, since it does not systematically cover the additional tasks of processing and analyzing data, including unstructured data of large volumes. In addition to this, there is an obvious shortage of specialists who are ready to systematically approach the solution of tasks related precisely to the methodology for processing data of various types, streamlining access to data warehouses, restructuring the warehouses, the efficiency of a data processing, and analyzing big data (requiring a reduction in dimensionality, special circuits statistical experiments, approximate methods, efficient algorithms), etc. The deficit is exacerbated by the development of related technologies: 3D printing, augmented reality, cloud computing, smart environments, etc.
As examples, one can cite the competencies indicated in vacancies at leading online platforms: working with large volumes of data, data analysis, BI, Big Data, Distributed Cache, Data-Warehouse, ETL, Business Intelligence, Hadoop, MapReduce, practice of social networks analysis, experience with Big Data, etc.
The educational program “Big Data Analytics” includes working with data in a structured and unstructured form from information systems, processing big data, Big Data technologies, working with Excel, SQL and internal analytics systems. Designing internal data warehouses, with data binding from various systems, as well as the creation of dashboards and analytical reports. Using BI-systems (Oracle, IBM, SAS and others), SQL, ETL tools and programming languages. Intelligent analysis of structured and unstructured data. Using statistics, machine learning, and advanced predictive analytics to solve key business problems.
The educational program provides training in the field of modern methods of extracting knowledge from data, mathematical methods of modeling and forecasting, modern software systems and programming methods for data analysis.
The goal of the study program is to prepare a universal specialist who has knowledge in mathematics, statistics, ICT, computer science, business and economics.
Objectives of the EP:
- To train a universal specialist who has knowledge in mathematics, statistics, ICT, computer science, business and economics. Teach students how to research large amounts of data containing disparate information, for example, market trends, customer preferences, etc.
- To develop the ability to extract the necessary information from various sources, including real-time information flows, analyze it for further business decisions and see the logical connections in the collected information system and, based on this, develop some or other of business decisions, models.
- The student should know the research methodology in the field of data science (setting research objectives, collecting data, processing and transforming data, examining data, building models and selecting methods, presenting and visualizing the results), methods and approaches to standardizing and transforming data, machine learning methods (basic methods of classification and clustering), ways of organizing data storage.
- The student should be able to solve applied problems in processing and analyzing data to identify hidden dependencies in them, apply the elements of probability theory and mathematical statistics that underlie models and methods of data science, choose the right machine learning methods to solve practical problems, organize the researchers working environment in the field of data science (Jupyter), use packages and libraries for machine learning (Matplotlib, SciPy / NumPy, Pandas, Scikit-learn).
- The student must have skills in working with tools for organizing data storage, skills in software implementation in R and Python of data processing and analysis tools, data preprocessing and visualization skills;
- The student must possess the skills of a comprehensive analysis and analytical generalization of the results of scientific research using modern scientific and technological achievements, the skills of independent data collection, study, analysis and generalization of scientific and technical information on the research subject, the ability to create theoretical models that allow predicting the properties of the studied objects , and develop proposals for implementing the results.