Introduction

The Applied Data Science program at Syracuse University’s School of Information Studies allows students to collect, manage, analyze, and develop insights using data from various domains using various tools and techniques. In courses such as Introduction to Data Science ( “IST 687”, 2021), Text Mining (“IST 736”, 2022), and Big Data Analytics (“IST 718”, 2022) reports and presentations were developed to deliver insights using Microsoft Access, SQL Server Management Studio, Python, R. The skills developed at the School of Information Studies furnish data scientists focused in the field of marketing analytics with the ability to generate value within their organizations and produce actionable recommendations.

Here is a high-level overview of what I have learned from the Applied Data Science Program

  1. Data collection and preparation: This practice area involves gathering, cleaning, and transforming raw data into a format that can be used for analysis. This can involve a variety of tasks, such as data cleaning, data integration, and data transformation.
  2. Data analysis and visualization: This practice area involves using statistical and computational methods to analyze data and extract insights from it. This can involve techniques such as descriptive statistics, regression analysis, and machine learning algorithms. Data visualization is also important in this practice area, as it can help communicate insights clearly and intuitively.
  3. Big data analytics: This practice area focuses on the analysis of large and complex datasets, often using distributed computing technologies such as Hadoop and Spark. Big data analytics requires specialized skills and tools and an understanding of distributed computing architectures.
  4. Data engineering: This practice area involves designing and implementing data pipelines and infrastructure to support data science workflows. This can involve tasks such as data storage and retrieval, data integration, and data processing.
  5. Business intelligence and decision-making: This practice area focuses on using data to support business decisions and improve organizational performance. This can involve techniques such as forecasting, optimization, and decision analysis.
  6. Data ethics and governance: This practice area involves ensuring that data science practices are ethical, transparent, and aligned with legal and regulatory frameworks. This can involve developing policies and procedures for data privacy, security, and ethical use

Here is list my protfolio

Project List

IST 687: Introduction to Data Science

Reflection and Learning Goal

The course covered the essential concepts and characteristics of data and how to manage it using R and R-Studio. It also taught me principles and practices in data screening, cleaning, and linking, as well as how to communicate the results to decision-makers.

The course helped the speaker identify problems and understand the data needed to address them. I learned to perform basic computational scripting using R and other optional tools. I also learned how to transform data through processing, linking, aggregation, summarization, and searching. Additionally, I learned how to organize and manage data at various stages of a project life-cycle.

Overall, the course seems to have provided the speaker with a solid foundation in data management and analysis using R and related tools. The skills and knowledge gained from this course may be useful in a variety of fields where data analysis and management are essential.

Learning Objectives

  1. Understand essential concepts and characteristics of data: This refers to gaining a fundamental understanding of data, including what data is, the different types of data, and the characteristics of good quality data. This includes understanding key concepts like variables, observations, data types, and data distributions.
  2. Understand scripting/code development for data management using R and R-Studio: This involves learning how to use R and R-Studio to manage and analyze data, including writing scripts and code to automate data processing tasks. This includes learning how to read data into R, manipulate and clean data, perform basic statistical analyses, and visualize data.
  3. Understand principles and practices in data screening, cleaning, and linking: This involves learning how to screen, clean, and link data to ensure that it is of high quality and suitable for analysis. This includes learning how to identify missing or erroneous data, deal with outliers, and merge data from different sources.
  4. Understand communication of results to decision makers: This involves learning how to communicate data analysis results to decision makers, including presenting results in a clear and concise manner using visualizations, tables, and charts.
  5. Identify a problem and the data needed for addressing the problem: This involves identifying a specific problem or research question, and then determining what data is needed to address the problem. This includes learning how to design data collection instruments and how to identify and obtain existing data sources.
  6. Perform basic computational scripting using R and other optional tools: This involves learning how to write basic scripts and code to automate data processing tasks using R and other optional tools. This includes learning how to read data into R, manipulate and clean data, perform basic statistical analyses, and visualize data.
  7. Transform data through processing, linking, aggregation, summarization, and searching: This involves learning how to transform data through various data processing techniques, such as linking, aggregation, summarization, and searching. This includes learning how to merge data from different sources, summarize data using statistical measures, and search for patterns in large datasets.

IST 736: Text mining

Reflection and Learning Goals

The exercise involved collecting and organizing data from external sources, such as online reviews or other text-based sources. Through this process, I was able to identify patterns within the data, grouping similar texts into clusters based on their content or characteristics.

By analyzing these clusters, you were able to gain insights into the behavior of the reviewers or authors of the text. This analysis may have helped you understand their motivations, preferences, or biases, and how these factors impact the language and content of their writing.

During the course, I also learned about advanced text-mining algorithms that can be used to extract information from large volumes of text. These algorithms allow you to identify key phrases or concepts within the text, classify documents based on their content, and group similar texts together in clusters.

Opinion mining was another important aspect of the course. This technique allows me to analyze text to identify sentiment, emotion, or opinion. By understanding the opinions and attitudes expressed in text, you can gain valuable insights into the attitudes and preferences of your target audience.

Overall, the course helped you develop the skills and knowledge needed to apply advanced text-mining techniques to real-world problems. Whether we are analyzing customer feedback, monitoring social media sentiment, or conducting research in a specific field, these techniques can help you extract valuable insights from large volumes of text data.

Text mining, also known as text analytics, is the process of extracting meaningful information and knowledge from unstructured text data. It involves several basic concepts and methods that are widely used in the field of natural language processing (NLP). Some of these concepts and methods are as follows:

  1. Document Representation: In text mining, a document is typically represented as a bag of words, where each word in the document is treated as a separate entity. This representation allows for the analysis of the frequency and distribution of words in the document.
  2. Information Extraction: Information extraction involves identifying and extracting specific pieces of information from a text document, such as names, dates, and locations. This process often involves the use of regular expressions and other techniques to extract structured data from unstructured text.
  3. Text Classification and Clustering: Text classification involves grouping similar documents into predefined categories based on their content. This process is often used in applications such as sentiment analysis, where documents are classified as positive, negative, or neutral based on the language used. Text clustering, on the other hand, involves grouping documents into clusters based on their similarity.
  4. Topic Modeling: Topic modeling is a method for identifying the underlying topics or themes in a collection of documents. This technique involves using statistical models to identify patterns in the data and group similar documents together based on their content.

To explore interesting patterns in text data, various benchmark corpora and text analysis and visualization tools are available, both commercially and open-source. Some examples of such tools include the Natural Language Toolkit (NLTK), RapidMiner, and Tableau.

Advanced text mining algorithms, such as deep learning and neural networks, can be used for information extraction, text classification and clustering, opinion mining, and other applications. These algorithms allow for more accurate and nuanced analysis of text data, but also require significant computational resources and expertise to implement.

IST 718: Big Data Analytics

Reflection and Learning Goals

This course’s practical application of the analytics techniques that I learned in your previous classes was very helpful. Building big data analytics pipelines is another important skill, as it enables me to process and analyze large volumes of data in an efficient and effective way.

Gaining actionable insights from data is ultimately what analytics is all about. By using the techniques you learned in the course, I can identify patterns, trends, and correlations in the data that can help me make better decisions and create a competitive advantage for any organization.

Translating a business challenge into an analytics challenge involves identifying a specific problem or question that a business is trying to solve or answer, and then determining the data and analytics techniques needed to address it. For example, a business challenge might be to understand why sales have been declining in a particular region or to predict which customers are most likely to churn.

Once the business challenge has been identified, different analytics techniques can be used to make predictions and gain insights. Linear and logistic regression can be used to model relationships between variables and make predictions based on those relationships. Decision trees can help identify the most important variables and their relationships to the outcome variable, while neural networks can handle complex and nonlinear relationships.

Data science can be used to gain actionable insights by identifying patterns and trends in large datasets. Python can be used to build big data analytics pipelines, which are sets of tools and techniques used to collect, store, process, and analyze large amounts of data. Classic and state-of-the-art machine learning techniques can be used to create predictive models that can help businesses make informed decisions.

Overall, this course has helped me develop a strong foundation in advanced analytics, which can be applied to a wide range of business challenges.

Favorite classes

Out of all the classes I took these would be my top pick

IST 687 Introduction to Data Science

I had the opportunity to attend a fantastic class that kickstarted my Data Science journey. I found this class to be incredibly useful as it provided me with a solid foundation and helped me to understand what I should expect in my future classes. One of the things that I enjoyed the most about this class was that it was an introduction to R, which I found fascinating.

I’ve been programming for almost a decade now, so it was refreshing to learn a new language. It was exciting to see how R could be used in data science, and it gave me a new perspective on programming. I was able to learn new techniques and approaches that I could apply to my future projects.

Overall, I am grateful for the opportunity to have taken this class. It was a great experience that has helped me grow both personally and professionally. I’m looking forward to continuing my data science journey and seeing how I can apply what I’ve learned to real-world problems.


IST 736 Text Mining


Participating in this class was a unique experience for me because I had never worked with texts in such an informative way before. It was exciting to explore the process of text mining and learn how to analyze data from a new perspective. Through this course, I was able to gain a better understanding of the value of texts and how to properly understand data.

I found the concept of text mining to be fascinating, and it was interesting to see how we could extract useful information from written words. The class provided me with a foundation on the methods used to process texts, such as pre-processing, stemming, and stop-word removal. Additionally, I was able to learn about the different techniques used to extract insights from texts, such as sentiment analysis, topic modeling, and entity recognition.

What I appreciated the most about this class was that it helped me to understand the value of texts as a whole. It’s incredible how much information is present in written language, and this class gave me the tools to extract valuable insights from it.

Overall, this class was a fantastic opportunity for me to learn a new approach to data analysis and to gain a better understanding of the value of texts. It was a great experience that I am grateful for and one that I will carry with me as I continue my academic and professional journey.


IST 652 Scripting for Data Analysis

I had the opportunity to take a class that required the use of Python, which was a great opportunity for me as I had already been familiar with the language. It was refreshing to see the basics of Python being taught in a different way than what I was used to. Additionally, the class added to my knowledge by teaching me about MongoDB and mining websites.

I found it interesting to learn how easy it was to mine websites with the help of Python. It opened up new opportunities for me to gather data and perform analysis that I would not have been able to do before. The class introduced me to web scraping, and I learned how to extract data from websites using various libraries, such as Beautiful Soup and Scrapy.

Moreover, the class covered the basics of data processing in Python, including how to work with CSV files and generate graphs based on the data. I appreciated the hands-on approach of the class, where we were given the opportunity to practice what we learned through projects and assignments.

What I enjoyed the most about this class was learning about MongoDB and its applications in data storage and retrieval. I had never worked with a document-oriented database before, and it was fascinating to see how it worked and how to utilize it in Python.

Overall, this class was a great learning experience for me. It taught me new skills and provided me with a different perspective on how to use Python. I am grateful for the opportunity to take this class, and I am excited to see how I can apply what I have learned to future projects.


IST 707 Data Analytics (Machine Learning)
Machine learning has always been an area that has fascinated me, and I was excited to take this class as it offered the perfect opportunity to understand what it takes to do machine learning. The course provided a comprehensive overview of the entire process of machine learning, which included mining and prepping data, utilizing various algorithms, and evaluating model performance.

I found the section on data mining and preparation to be particularly useful as it provided a solid foundation for the rest of the course. I learned about different data preprocessing techniques, such as normalization and scaling, and how they can impact the performance of machine learning algorithms.

Throughout the class, I had the chance to work with various algorithms, including association rule mining, clustering techniques, and decision trees. These algorithms were crucial in helping me to understand how machine learning works and how to apply it in real-world scenarios.

One of the most valuable takeaways from this class was learning how to evaluate model performance. The course taught me how to assess the accuracy of a machine learning model and how to tune it for better performance.

What I appreciated the most about this class was how practical it was. The skills and techniques I learned in this course have proven to be beneficial for my other classes, and I have been able to apply them in various projects.

In summary, this class provided me with an excellent foundation in machine learning and helped me to understand what it takes to build and evaluate a machine learning model. It was an exciting experience, and I look forward to using the skills I learned to tackle future challenges.

As someone who works extensively with databases, I was excited to explore a new avenue in the field of data analytics by taking the IST 722 Data Warehouse class. Initially, the series of complex queries and data modeling presented a significant challenge, but I was determined to overcome it.

Through the class, I gained a better understanding of the items that come under the Data Warehouse umbrella, including concepts such as star schemas, snowflake schemas, and slowly changing dimensions. This helped me to develop a comprehensive understanding of the data warehousing process and its various components.

Additionally, this class provided me with my first introduction to Hadoop and HDFS, which were new concepts to me. I learned about the various tools used in big data processing and analysis, such as Hadoop MapReduce, HBase, and Pig. This class also helped me understand the different types of data warehousing systems, including OLAP, ROLAP, and MOLAP.

Overall, this class was a valuable learning experience that broadened my knowledge of data warehousing and its applications. I gained practical skills in designing and implementing data warehouses, and I am grateful for the opportunity to have taken this class. It has helped me become a better database professional and has prepared me for more advanced courses in data science.