My primary role involves designing, training, and optimizing machine learning and Deep Learning algorithms, focusing on Natural Language Processing (NLP), computer vision, and Large Language Models (LLMs) to real-world applications such as document understanding and information completion. I primarily use Python and libraries like PyTorch, Transformers, FlairNLP, and Scikit-learn to build machine learning and deep learning models. We have dockerized every project, and we manage our model's lifecycle using MLflow while annotating our data with Label Studio. We use resources in GCP and Azure for high-computation, model training, and data storage.
This client required an API capable of delivering accurate responses about lengthy documents, designed to function like a teacher aiding in the understanding of specific subjects. With limited resources for data handling and storage, I constructed the API using Flask, Docker, OpenAI, and Scikit-learn, deployed via Azure App Services. It extracts text from documents using OCR for image-based pages or a text size ratio method for text-based documents.
During my experience in CGI, I worked in two different projects. The first one related to computer vision and the energy sector, and the second one related to natural language processing and machine learning. In the first project, I built the architecture for several neural networks, using TensorFlow and PyTorch for image segmentation. Our objective was to segment the cells of solar modules, to have them located for posterior analysis. I built 3 different architectures, the classic U-Net, Deeplabv3 and ResUnet comparing the resultant metrics and the time performance I got the Unet model as the best one for our dataset. We used also a Tensorflow Object Detection model to locate the solar modules and then we fed with them the Unet model to locate the cells. In the second project, I designed a machine learning training system for a classification problem and collaborated with the NLP team to develop a sentiment analysis model that synthesized information from various media sources. This model was integrated with other data sources to predict potential debt defaults using Scikit-learn and XGBoost models.
My initial role in a scientific setting involved applying my academic knowledge to real-world data challenges. I collaborated with a Greek partner to develop federated learning systems, enabling productive model training using client data while addressing legal and data sensitivity concerns. We replicated the client's computational architecture, with one server acting as an orchestrator and three smaller servers hosting the data hosts. Our projects focused on recognizing individuals in images, predicting ages, extracting text, and locating images, contributing to a Europol initiative to combat child abuse and illegal border crossings. My involvement in federated learning informed my second master's thesis, which provided an overview of the approach and tested it by training an image classification model across various machines with different weight merging algorithms.
In my first professional role, I learned fundamental skills in Python, Docker, and SQL, which have significantly benefited my current work. I was involved in a web maintenance project, creating data views for new features and resolving issues with existing ones. I used PySpark and SQLAlchemy to develop ETLs, updating client databases as needed.
This degree was a challenge while working at the same time. I had to work hard to find a balance between the master's classes, my job, evaluation exercises, and social life.
The professors taught a wide range of technologies, primarily based on Python and R. I learned Python modules such as Pandas, Numpy,BeautifulSoup, nltk, PySpark (SQL and ML), Scikit-learn, Matplotlib, and TensorFlow on bothCPU and GPU, along with RAPIDS (cuML, cuDF, cuGraph). In R, we studied and used many modules, the most notable being Shiny, RMarkdown, cluster, ggplot, Hidden Markov Models(HMM), adabag, xgboost, randomForest, ROCR, rpart, caret, mlbench, and e1071 for support vector machines, Naive Bayes classifier, and generalized k-nearest neighbors. Other modules included H2O, foreign,PCAmixdata, leaps, GA (genetic algorithms), lattice, latticeExtra, spdep (Spatial Dependence),GSTAT (Spatial and Spatio-Temporal Geostatistical Modelling, Prediction, and Simulation), stats, TSA, tseries, forecast, and lmtest. The master's program also introduced technologies like Pentaho, SQL (MySQL), and NoSQL (MongoDB).
The evaluation focused on practical work and creating understandable outputs from technically complex studies for non-technical audiences. We included visual informationand key annotations for data, training, and results from the models used to solve various problems. This master's program marked my first encounter with Deep Learning, which has since become my primary area of interest.
This online master's program allowed me to juggle my last year of my bachelor's degree in Mathematics, my first job, and the exams of the different subjects.
It gave me an initial understanding of applied statistics to real-world data, an introduction to Big Data technologies and tools, an overview ofBusiness Intelligence and its role in key business decision-making, web analytics, and various methods to extract crucial information.
The master's program helped me understand essential information for business decision-making, and I had my first contact with tools like Python and its main data management modules, NoSQL with MongoDB, SQL with MySQL, and web analytics with Google Analytics, as well as different types of dashboards and technologies to build them.
In university, I had my first contact with abstract science. Although I always enjoyed math, it was here that I discovered my passion for Mathematics—a field grounded in foundational axioms and definitions that allow for proving conjectures. I had the privilege of learning from an exceptional team of professors. I studied and understood topology, algebra, calculus, statistics, and their advanced concepts, including multivariate calculus, geometry, numerical series, linear and non-linear models, Markov models, Fourier analysis, functional analysis, abstract algebra, logic, differential equations, numerical analysis, and even basic physics.
The challenges of studying these subjects taught me the right way to learn, the importance of being patient and consistent in my work, and the need to take time to fully address problems, visualize all aspects, and plan a strategic approach to solve them. This degree ignited my passionfor science and strengthened my curiosity and researcher spirit.