Ela.kpi.ua



Нац?ональний техн?чний ун?верситет Укра?ни?Ки?вський пол?техн?чний ?нститут ?мен? ?горя С?корського?Факультет ?нформатики та обчислювально? техн?киКафедра обчислювально? техн?ки?На правах рукопису??До захисту допущено?УДК Зав?дувач кафедриСт?ренко С.Г.(п?дпис) (?н?ц?али, пр?звище)“” 2020 р.Маг?стерська дисертац?яз? спец?альност?: 123. Комп’ютерна ?нженер?я___________(код та назва напряму п?дготовки або спец?альност?)Спец?ал?зац?я:123. Технолог?? програмування для комп’ютерних систем та мережна тему: Метод оц?нки пози людини у поточному час? на основ? моб?льно? операц?йно? системиВиконав (-ла): студент 2 курсу, групи ?О- 84мн(шифр групи)ЗЕРКУК Абдерауф(пр?звище, ?м’я, по батьков?)(п?дпис) Науковий кер?вник проф., д.ф.-м.н., с.н.с. Горд??нко Ю. Г(посада, науковий ступ?нь, вчене звання, пр?звище та ?н?ц?али)(п?дпис) Консультант проф., д.т.н., проф. Кулаков Ю.О.(назва розд?лу)(посада, вчене звання, науковий ступ?нь, пр?звище, ?н?ц?али)(п?дпис) Рецензент (посада, науковий ступ?нь, вчене звання, науковий ступ?нь, пр?звище та ?н?ц?али)(п?дпис) Засв?дчую, що у ц?й маг?стерськ?й дисертац?? нема? запозичень з праць ?нших автор?в без в?дпов?дних посилань.Студент (п?дпис)Ки?в – 2020 рокуНац?ональний техн?чний ун?верситет Укра?ни?Ки?вський пол?техн?чний ?нститут ?мен? ?горя С?корського?Факультет (?нститут) ?нформатики та обчислювально? техн?ки (повна назва)Кафедра Обчислювально? техн?ки (повна назва)Р?вень вищо? осв?ти – другий (маг?стерський) за осв?тньо-профес?йною програмоюСпец?альн?сть 123. Комп’ютерна ?нженер?я (код ? назва)Спец?ал?зац?я 123.Технолог?? програмування для комп’ютерних систем та мереж____________________________________________________(код ? назва)ЗАТВЕРДЖУЮЗав?дувач кафедриСт?ренко С.Г.(п?дпис)(?н?ц?али, пр?звище)?? 2020 р.ЗАВДАННЯна маг?стерську дисертац?ю студентуЗЕРКУК Абдерауф(пр?звище, ?м’я, по батьков?)1. Тема дисертац??Метод оц?нки пози людини у поточному час? на основ? моб?льно? операц?йно? системи.Науковий кер?вник дисертац?? проф. к.т.н. Горд??нко Ю. Г. (пр?звище, ?м’я, по батьков?, науковий ступ?нь, вчене звання)затверджен? наказом по ун?верситету в?д ?05? _05_ 2020 р. № _59/20-сi2. Строк подання студентом дисертац?? ______3. Об'?кт досл?дження: процес оц?нки повед?нки людини в реальному час? з використанням машинного навчання, нейронних мереж, глибоких нейронних мереж та ?нших метод?в штучного ?нтелекту.4.Предмет досл?дження: способи оц?нювання пози людини, як? повинн? бути достатньо ефективними для використання в режим? реального часу в моб?льних застосунках ? моб?льних операц?йних системах.5. Перел?к завдань, як? потр?бно розробити:1. Огляд оц?нки поза людини2. Метод машинного навчання та TensorFlow3. Огляд ?снуючих р?шень4. Моделювання оц?нки пози на застосуванн? для android, 6. Консультанти розд?л?в дисертац??: Розд?лПр?звище, ?н?ц?али та посада консультантаП?дпис, датазавдання видавзавдання прийняв1 - 3 проф. д.т.н. Кулаков Ю.О.7. Дата видач? завдання Календарний план№ з/пНазва етап?в виконання маг?стерсько? дисертац??Строк виконання етап?в дисертац?? Прим?тка1Перегляд л?тературних джерел та Зб?р ?нформац??24.02.2020 – 16.03.2020Виконано2Огляд ?снуючих р?шеньПобудувати д?аграми UML17.03.2020 – 04.04.2020Виконано3Розробити додаток для Android та навчити наб?р даних06.04.2020 – 08.05.2020Виконано4Захист проекту18.05.2020Студент ______________ ЗЕРКУК Абдерауф (п?дпис) (?н?ц?али, пр?звище)Науковий кер?вник дисертац?? проф. к.т.н. Горд??нко Ю. Г. (п?дпис) (?н?ц?али, пр?звище)РефератРобота склада?ться з 75 стор?нок, 33 ?люстрац?й, 2 таблиць, додатку та 48 джерел за перел?ком посилань.Актуальн?сть роботи. робота визнача?ться поточним швидким розвитком та впровадженням штучного ?нтелекту (Ш?) у р?зних додатках. Машинн? та глибок? методи навчання в?дпов?дають за скорочення робочого навантаження та часу. Автоматизуючи деяк? процеси, алгоритми на основ? Ш? можуть полегшити виконання складно? роботи для людей. Ось чому автоматизац?я на основ? Ш? тепер фактично ? еталоном повсюдних обчислень, оск?льки це можна зробити майже скр?зь, дуже над?йно та набагато творч?ше.Зв'язок роботи з науковими програмами кафедри ОТ. Досл?дження ? створення спец?ал?зованих пристроях мовних ?нтерфейс?в т?сно пов’язан? з науковими розробками кафедри ОТ в област? проектування високо продуктивних засоб?в обчислювальною техн?ки ? метод?в Ш? в науково-досл?дних роботах кафедри. Ця робота ? частиною науково-досл?дно? д?яльност? кафедри в напрямку розробки та впровадження метод?в Ш? для р?зних наукових та промислових застосувань.Мета: Основна мета ц??? роботи - п?двищити ефективн?сть метод?в автоматично? оц?нки повед?нки людини, щоб дозволити використовувати ?х наявними гаджетами як смартфонами в режим? реального часу.Для досягнення ц??? мети було заявлено так? основн? завдання: вивчення доступних алгоритм?в та метод?в комп'ютерного зору та розп?знавання людини; анал?з ?х переваг та недол?к?в; розробка р?зних метод?в ?нтеграц?? р?зноман?тних компонент?в для розп?знавання повед?нки людини, розп?знавання людсько? пози, визначення координат к?нц?вок людини; розробка складного програмного продукту, який ефективно по?дну? ц? функц?? для використання в режим? реального часу в моб?льних пристроях.Об'?ктом досл?дження ? процес оц?нки повед?нки людини в реальному час? з використанням машинного навчання, нейронних мереж, глибоких нейронних мереж та ?нших метод?в штучного ?нтелекту.Предметом досл?дження ? способи оц?нювання пози людини, як? повинн? бути достатньо ефективними для використання в режим? реального часу в моб?льних застосунках ? моб?льних операц?йних системах.Методи досл?дження. В дисертац?? розглядаться технолог?я оц?нки пози людини в режим? реального часу для моб?льних застосунк?в та операц?йних систем на основ? метод?в комп'ютерного зору та штучного ?нтелекту ?з використанням метод?в машинного навчання та глибокого навчання. Розглянуто дек?лька сучасних п?дход?в до комп'ютерного зору, включаючи згортков? нейронн? мереж?, так? як DenseNet, ResNet, MobileNet .Наукова новизна. Була запропонована альтернативна арх?тектура, яка також дозволя? забезпечити можлив?сть покращити точн?сть та швидк?сть оц?нки повед?нки людини в реальному час? на моб?льному гаджет? за допомогою оц?нки пози людини. Запропонований п?дх?д ? система базуються на ефективн?й модел? глибокого навчання (MobileNet) та спец?ал?зованих б?бл?отеках глибокого навчання (TensorFlow / TensorFlow lite). Запропонована модель була проанал?зована, ? повед?нка людського орган?зму в р?зних позах змодельована для перев?рки ? тестування модел?. Досл?джено особливост? виявлення пози людини та виявлено переваги та недол?ки запропонованих моделейПрактичн? результати роботи полягають в тому, що:Запропоновано метод та розроблено програмне забезпечення для вдосконалення виявлення та характеристики конкретних об'?кт?в, а саме для оц?нки пози людини в умовах реального часу за допомогою моб?льних гаджет?в, таких як моб?льн? телефони.Основн? положення, що виносяться на захист:метод оц?нки пози людини в режим? реального часу для моб?льних пристро?в з використанням метод?в глибокого навчання, який ма? б?льшу точн?сть та швидк?сть у пор?внянн? ?з наявними р?шеннями.Ключов? слова штучний ?нтелект, згорткова нейронна мережа, оц?нка пози людини, машинне навчання, глибоке навчання, моб?льна операц?йна система, MobileNet, наб?р даних COCO, TensorFlow, TensorFlow lite.AbstractThis master's dissertation consists of 75 pages, 33 figures, 2 tables, 30 appendices and 50 sources according to the list of referencesThe actuality and urgency of the work is determined by to the current fast development and implementation of artificial intelligence (AI) in various applications. The machine and deep learning methods are responsible for cutting the workload and time. By automating some processes the AI-based algorithms can do the hard work for people. That is why AI-based automation is now de facto standard of ubiquitous computing, because it can be done almost everywhere, very reliably, and much more creatively.Relationship of work with scientific programs, plans, themes. It is part of the research work of the department in the direction of development and implementation of AI-based methods for various scientific and industrial applications.The main aim of this work is to increase the efficiency of the available methods of real-time human behavior estimation to allow to use them by available gadgets as smartphones in real-time.To achieve this aim, the following main tasks were stated: study of the available algorithms and methods of computer vision and human recognition; analysis of their advantages and disadvantages; development of various component integration methods for human recognition, human pose recognition, determination of human limb coordinates; development of the sophisticated software product that efficiently combines these functionalities for usage in real-time in mobile devices.The object of the research is complex AI-based systems for computer vision and human behavior estimation using machine learning, neural networks, deep neural networks and other artificial intelligence methods.The subject of the research are methods of human behavior estimation by human pose estimation methods that should be efficient enough for usage in real-time on the basis of the mobile operating systems.The dissertation considers the main method at different levels of the human pose estimation in real-time using machine learning techniques for the mobile operating system. Several current computer vision approaches are considered, including convolutional neural networks like DenseNet, ResNet, MobileNet in various applications. The alternative architecture was proposed, which also allow to provide the real-time ability to enhance prediction of human behavior estimation by human pose estimation. The proposed approach and system are based on the efficient deep learning model (MobileNet) and deep learning development framework (TensorFlow/TensorFlow lite). The proposed model was analyzed, and the human body behavior with various poses was simulated for model validating and testing. The human pose detecting abilities were investigated, and the advantages and disadvantages of the proposed models have been identified. According to the results of experimental studies, the conclusion is made that the main aim stated before was reached and the problem stated was resolved by the proposed approach and system with the better performance estimated by lightweightness of the network 2.5 MB, performance of accuracy or other measures >70%, inference speed <30 ms, etc. As a whole, it allowed to train, validate, test the model and port the system to the available mobile gadgets like smartphones.The practical significance of the results of the work. The method has been proposed and software has been developed to improve the detection and characterization of specific objects, namely, to provide human pose estimation under real-time conditions by means of the mobile gadgets, like mobile phones.Keywords: artificial intelligence, convolutional neural network, human pose estimation, machine learnnig, deep learning, MobileNet, Coco dataset, TensorFlow, TensorFlow lite.CONTENT TOC \o "1-2" \h \z \u LIST OF ABBREVIATIONS……………………………………………………………………….. PAGEREF _Toc40543896 \h 13LIST OF FIGURES………………………………………………………………………………….. PAGEREF _Toc40543897 \h 14INTRODUCTION…………………………………………………………………………………… PAGEREF _Toc40543898 \h 16SECTION 1. OVERVIEW POSE ESTIMATION AND MACHINE LEARNING PAGEREF _Toc40543900 \h 181.1. Introduction……………………………………. PAGEREF _Toc40543901 \h 181.2. Human Pose Estimation Approaches PAGEREF _Toc40543902 \h 201.3. Machine Learning and Deep Learning PAGEREF _Toc40543903 \h 211.4. Difference Between Machine Learning and Deep Learning PAGEREF _Toc40543916 \h 281.5. Neural networks………………………. PAGEREF _Toc40543918 \h 291.6. Machine learning Dataset…………. PAGEREF _Toc40543923 \h 361.7. TensorFlow………………………… PAGEREF _Toc40543924 \h 381.8. TensorFlow Lite…………………… PAGEREF _Toc40543925 \h 401.9. Related Work………………………….. PAGEREF _Toc40543928 \h 431.10. Conclusion to section 1…………………. PAGEREF _Toc40543938 \h 47 HYPERLINK \l "_Toc40543939" SECTION 2. NEEDS ANALYSIS AND SYSTEM MODELING PAGEREF _Toc40543940 \h 49 2.1. Introduction…………………………………………………………………………………… PAGEREF _Toc40543941 \h 492.2. Presentation of the Unified Process PAGEREF _Toc40543942 \h 492.3. The design process………………….. PAGEREF _Toc40543943 \h 502.4. Actors identification………………….. PAGEREF _Toc40543944 \h 512.5. Use Case Diagram…………………. PAGEREF _Toc40543945 \h 512.6. System analysis………………….. PAGEREF _Toc40543946 \h 542.6.1. Scenarios………………………….. PAGEREF _Toc40543947 \h 542.7. Design………………………………. PAGEREF _Toc40543948 \h 602.8. Conclusion to section 2………………… PAGEREF _Toc40543949 \h 62SECTION 3. SYSTEM IMPLIMENTATION PAGEREF _Toc40543951 \h 633.1. Introduction………………………… PAGEREF _Toc40543952 \h 633.2.The development environment…… PAGEREF _Toc40543953 \h 633.3.The implementation languages used PAGEREF _Toc40543954 \h 653.4.The application interface……………… PAGEREF _Toc40543955 \h 673.5.Training Dataset……………………… PAGEREF _Toc40543956 \h 703.6. Model properties……………………. PAGEREF _Toc40543957 \h 733.7. Statistics Dataset…………………. PAGEREF _Toc40543958 \h 753.8. Application screenshot………………………………………. PAGEREF _Toc40543959 \h 763.9. Conclusion to section 3………………. PAGEREF _Toc40543960 \h 78CONCLUSION………………………………………………………………………………………. PAGEREF _Toc40543961 \h 79Perspectives…………………………………… PAGEREF _Toc40543962 \h 79Future Work………………………………….. PAGEREF _Toc40543963 \h 80REFERENCES………………………………………………………………………………………. PAGEREF _Toc40543964 \h 81Appendices…………………………………………………………………………………………… PAGEREF _Toc40543965 \h 84LIST OF ABBREVIATIONSAPI – Аpplication programming interfaceNLP - Natural language processingTPU – Tensor Processing UnitURI – Uniform Resource IdentifierURL – Uniform Resource LocatorML – Machine learningDL– Deep learning API – Application Program Interface RNN – Recurrent Neural Network OMG – Object management Grope UP – Unified ProcessNN – Neural NetworkXML – Unified Modeling LanguageUML – Comment Object in ContextCNN – Recurrent Neural NetworksLSTM – Long Short-Term MemoryLIST OF FIGURES TOC \h \z \c "Figure" Fig 1.1. Pose Estimation [1] PAGEREF _Toc40543855 \h 18Fig. 1.2. HPE example PAGEREF _Toc40543856 \h 20Fig. 1.3. AI-ML-DL Diagram [4] PAGEREF _Toc40543857 \h 21Fig. 1.4. ML Diagram [16] PAGEREF _Toc40543858 \h 22Fig. 1.5. Machine learning diagram [10] PAGEREF _Toc40543859 \h 25Fig. 1.6?:ML-DL Concept [11] PAGEREF _Toc40543860 \h 26Fig. 1.7. Comparing a ML and DL [14] PAGEREF _Toc40543861 \h 27Fig. 1.8. Structure of a neural network and how training works [6] PAGEREF _Toc40543862 \h 29Fig. 1.9. Convolutional Neural Networks [19] PAGEREF _Toc40543863 \h 33Fig. 1.10. TensorFlow Diagram [34] PAGEREF _Toc40543864 \h 39Fig. 1.11. Tensorflow lite architecture [38] PAGEREF _Toc40543865 \h 42Fig. 1.12. PoseEstimation-CoreML ScreenShot1 [39] PAGEREF _Toc40543866 \h 43Fig. 1.13. PoseEstimation-CoreML ScreenShot2 [39] PAGEREF _Toc40543867 \h 44Fig. 1.14: Structure of working [39] PAGEREF _Toc40543868 \h 44Fig. 1.15:Tf-pose-estimation Screenshot [40] PAGEREF _Toc40543869 \h 46Fig. 2.4. Take pictures Sequence Diagrams PAGEREF _Toc40543870 \h 56Fig. 2.5. Record video Sequence Diagrams PAGEREF _Toc40543871 \h 58Fig. 2.6. Human Pose Estimation Sequence Diagrams PAGEREF _Toc40543872 \h 59Fig. 2.7. System Architecture PAGEREF _Toc40543873 \h 60Fig. 2.8. Overview of the system. PAGEREF _Toc40543874 \h 60Fig. 3.1. Android Studio Logo [42] PAGEREF _Toc40543875 \h 63Fig. 3.2. TensorFlow Logo [43]. PAGEREF _Toc40543876 \h 63Fig. 3.3. Anaconda Logo [44]. PAGEREF _Toc40543877 \h 64Fig. 3.4. PE Page. PAGEREF _Toc40543878 \h 66Fig. 3.5. TFlite properties PAGEREF _Toc40543879 \h 73Fig. 3.6. The distribution of different type of keypoints. [50] PAGEREF _Toc40543880 \h 74Fig. 3.8. Screenshot 2 PAGEREF _Toc40543881 \h 75Fig. 3.7. Screenshot1 PAGEREF _Toc40543882 \h 75Fig. 3.9 Screenshot 4 PAGEREF _Toc40543883 \h 76List of tables TOC \h \z \c "Table" Table 1:Machine Learning Techniques [4] PAGEREF _Toc39791582 \h 19Table 2:Use case Description PAGEREF _Toc39791583 \h 45INTRODUCTIONBackgroundAs part of my preparation for the computer science master’s degree, we did a project about Android mobile application and machine learning about Human Pose estimation in real-time. It’s an important problem that has got big attention of the community for the past few years. It is a crucial step towards understanding people in images and videos. In this paper, we write about the basics of Human Pose Estimation and review the literature on this topic.Problem definition and project objectivesStrong articulations, small and barely visible joints, occlusions, clothing, and lighting changes make this a difficult problem to estimate the human body that's why we have proposed some method to solve this problem. In short, the main purposes of our application are:Use supervised Machen learning technique, Proposed to use convolutional neural network? ,MobileNet As CNN architecture ,AI chalenge and coco api as dataset .The dissertation considers The dissertation considers the main method at different levels of the human pose estimation in real-time using machine learning techniques for the mobile operating system. convolutional neural network method this sort of neural networks is employed in applications. The alternative architectures was proposed, which, although allow you to The alternative architectures was proposed, which, although allow you to simply add the ability to Enhance Your Predictive Analytics Engine. The proposed approach and system add another layer which consists of a real Time estimation with a real-time using TensorFlow framework which allows the trained model and application to work so anizationThis project can be divided into Three parts:The first part is a small bibliographical study, whose aim is to position the background of our project. In this section, we will focus on pose estimation and machine learning tequnique;The second one will tackle the modeling of our solution. Then, introducing the functional and technical needs behind the project, and finally, the introduction of the solution design and deployment constraints;The last part will be dedicated to the project implementation. Thus, it will include a presentation of the basic functionalities of the application and its usage instructions.SECTION 1 OVERVIEW POSE ESTIMATION AND MACHINE LEARNING IntroductionAs far as humans are concerned, the main points are the main joints such as elbows, knees, wrists, etc. Humans fall into the category of elastic species. When bending our arms or legs, the main points will be in different positions for the others. Predicting where these things are is a difficult estimate of the situation. In addition, there is a major distinction to distinguish between estimation in 2D and 3D mode.3D estimation allows us to predict the true spatial location of a person or object represented. As per your expectations, 3D pose estimation is a more challenging for machine learners, due to the difficulty the resides in engendering datasets and algorithms that take into account various factors – such as an images, videos background scenes, lighting conditions, and more. In this component of the thesis, we target applications where robustness to transmutations in viewpoint are especially significant, as for instance in settings involving mobile cameras.These applications are conventionally based on a relegation task, where an optical discernment is tested against a model of kenned activity categories to determine the activity with the highest performance during the visual examination. The representation of the optical discernment is ideally simillar to human-cognate properties such as apparel and body attributes, viewpoints, occlusions, and self-occlusions and other environmental attributes such as background and lighting. Obtaining an ideal representation would allow for a represention of a short and cumbersomely substantial man ambulating on snow visually perceived from a distance and a tall svelte woman ambulating in an office corridor optically discerned from a close distance in the same manner. This controvertibly fictitious example would be followed with the appropriate supervised learning and relegation procedure to efficiently highlight any representation with one of the target activity classes.Now that we have generally understood what pose estimation is. We will visually study several methods dedicated to machine learning to evoke the balance sheet and assess its advantages and disadvantages, with the main focus on neural networks, which have become the last means of estimating the situation. Feature extraction in Machine Leaning refers to the engenderment of derived values from raw data, that can be used as input to a cognition algorithm. Features vary between explicit or implicit.Explicit features include conventional Computer Vision-predicated features such as Histogram of Oriented Gradients and Scale Invariant Feature Transform. These features are calculated explicitly afore victualing the input to the following learning algorithm. Implicit features refer to deep learning predicated feature maps like outputs from intricate Deep Convolutional Neural Networks. These feature maps are never particularly engendered but are a component of a consummate pipeline trained end-to-end.Fig SEQ Figure \* ARABIC 1.1. Pose EstimationCITATION 15se \l 1036 [1]Pose EstimationPose estimation refers to pc vision techniques that sight human figures in pictures and video, in order to do that one might verify, as an example, every time someone’s elbow appears in a picture. To further clarify, this technology does not recognize who is in a picture — there's no personal acknowledgeable data associated with creating detection. The rule is just estimating the location of key body joints .we tend to hope the accessibility of this model encouragesa lot of developers and manufacturers to experiment and apply cause detection to their own distinctive comes. Although many alternative cause detection systems are open source, all of them require specialized hardware and / or cameras, which are always very small in the system configuration. CITATION New16 \l 2057 [2] CITATION Joh00 \l 2057 [3]Human Pose Estimation ApproachesHuman pose estimation approaches can be categorized into two types ?- model-based generative methods and discriminative methods.The pictorial structure model is one of the most popular generative models for estimating the 2D human body. The model usually consists of two terms, representing the appearance of all parts of the body and the spatial relationship between adjacent parts. The length of the organ can vary in two dimensions, and a mixture of models has been suggested to model each part of the body. The spatial relationships between the articulated parts are simpler for the 3D model, because the tip length in the 3D dimensions is constant for a particular subject. PSM proposal to estimate 3D mode by distorting the space. However, the fixing area grows cubic with a precision of discrimination, which makes it complicated.Discriminatory methods consider estimating the situation as a regression problem. After the entities are extracted from the image, the mapping is recognized from the entity area to the position area. Deep Learning Approaches Instead of manually addressing structural dependencies, there is a more direct way to "integrate" the structure into the mapping function and to learn a representation that breaks down the dependencies between the output variables.Fig. 1. SEQ Figure \* ARABIC 2. HPE exampleMachine Learning and Deep LearningBefore we get to know what is machine learning (ML)and deep learning (DL) we should know that they are part of artificial intelligence as we can see on fig 4.Fig. 1. SEQ Figure \* ARABIC 3. AI-ML-DL Diagram CITATION Atu20 \l 1036 [4]Machine LearningMachine learning is the field of study where machines by examples or experience without explicitly using the high level of coding or programming. Here, human train the machines to learn from the past data. It is not only about learning, but also about understanding and basic reasoning. Here, machine learns the data, builds the prediction model on the basis of past data and when the new data comes in, it can easily predict for this data. More the data, better the model and higher the accuracy. There are many ways in which machine learns:Supervised learning patternUnsupervised learning patternSemi-supervised learningReinforcement learning14846303677285004464053738880Fig. 1. SEQ Figure \* ARABIC 4. ML Diagram [16]00Fig. 1. SEQ Figure \* ARABIC 4. ML Diagram [16]center38925500Machine learning systems are utilized all around us and are a cornerstone of the modern internet.Machine-learning systems are acclimated to recommend which product you might want to buy next on Amazon or video you optate to may want to watch on Netflix. CITATION Alp20 \l 2057 [5]Every Google search uses multiple machine-learning systems, to understand the language in your query through to personalizing your results, so fishing enthusiasts probing for "bass" aren't inundated with results about guitars. Similarly, Gmail's spam and phishing-apperception systems use machine-learning trained models to keep your inbox clear of rogue messages.One of the most conspicuous demonstrations of the potency of machine learning are virtual auxiliaries, such as Apple's Siri, Amazon's Alexa, the Google Assistant, and Microsoft Cortana.CITATION zdnht \l 1036 [6] CITATION Dau12 \l 2057 [7]Machine Learning techniquesSupervised learning?As the name suggests, here learning is being supervised. For example, using the dataset as a teacher and training the model on the basis of this data. It uses labelled data to train the model. Here, the machine knows the features of data and the labels associated with it. A labelled dataset is one where we already know the answer. For example, if you have different kinds of balls in a basket and you label the balls as per their shape. Such as, tennis ball has spherical shape, cricket ball is circular, football is spheroid etc. Now, if you will provide any ball to the machine, it will determine its type based on its shape. So, here we are already training the machine with the kind of features and labels. Supervised learning algorithms use regression, decision trees, support vector machine, and random forest techniques. CITATION TAO05 \l 2057 [8]Types of Supervised LearningGiven below are 2 types of Supervised Learning.ClassificationRegressionUnsupervised learning?In this case, the model learns through different observations and identifies the patterns in the dataset. Here, the learning is with an unlabelled dataset.Here, the machine creates clusters of a dataset based on finding different patterns in its features. For example, if we provide the machine with the dataset of football players with the respective goals and time took to complete one goal. Then, the machine makes one cluster of the players scoring high goals and less time is taken and another cluster of the players scoring less and more time taken. In this way, machine clustered the dataset based on the data fed into it and we did not provide any label to the dataset as we did in case of supervised machine learning.Unsupervised machine learning algorithms include Principal component analysis, Singular value decomposition, K-means clustering, K nearest neighbours and Hierarchical clustering.Types of Unsupervised LearningThe 3 types of Unsupervised Learning are:ClusteringVisualization AlgorithmsAnomaly DetectionSemi-supervised learning?The viability of semi-supervised learning has been boosted recently by Generative Adversarial Networks ( GANs), machine-learning systems that may use labelled information to get fully new information, for instance, making new pictures of Pokemon from existing pictures, that successively is wont to facilitate train a machine-learning model.Reinforcement learning?Reinforcement learning involves teaching the machine to think for itself by using a system of rewards.Say you have a robot whose movement you intend to control in a predefined path. You want the robot to learn to move along this predefined path without any help from you. So, you define a system of rewards : For every correct step taken by the robot a reward is made and for every incorrect move a reward is taken away. Essentially, you teach the robot to understand that the reward is good for it. The robot eventually learns,?reinforced?by its several rewards and it's mistakes the actual path.Other Machine Learning TechniquesNext table show others techniques:Table SEQ Table \* ARABIC 1.1Machine Learning TechniquesCITATION htt5 \l 1036 [9]TechniquesWorkingProbabilistic ModelsModel the probability distribution of a data set and use it to predict future outcomesDecision TreesArrive at a hierarchical decision tree structureClusteringClassify data based on closest data points appearing in the same clusterAssociated RulesA method to discover what items tend to occur together in a sample spaceDeep LearningBased on Artificial Neural Network modelsFig. 1. SEQ Figure \* ARABIC 5. Machine learning diagram CITATION Typ \l 1036 [10]Deep LearningDeep learning could be a subfield of machine learning that is particularly involved in the algorithms area unit impressed by the structure and performance of the brain referred to as artificial neural networks. Deep Learning is a sub-type of learning, but here we take the learning to a step deeper. So you’ll need more data and data on more attributes than machine learning.Fig. 1. SEQ Figure \* ARABIC 6?:ML-DL Concept CITATION Cat17 \l 1036 [11]Deep learning will vanquish the ancient technique. as an example, deep learning rules are 41% additional correct than machine learning algorithm in image classification, 27 % one more correct in biometric authentication, and 25% in voice recognition. CITATION Dee \l 1036 [12] CITATION LeC15 \l 2057 [13]Deep learning focuses on neural architectures for supervised and unsupervised learning. Its ‘deep’ component also largely makes it mostly an empirical field, a combination of science and art. This is unlike some other methods of ‘usual’ machine learning (that solves the same tasks of supervised and unsupervised learning) . Deep learning models are hard to interpret, sometimes hard to replicate, and they do not control non-linearity. I think that because of these factors, deep learning has a limited use in specifically ‘financial’ components of finance, i.e. Trading or risk management, except for high-frequency trading where you have tons of data, so that brute-force methods of deep learning can be applied. Other applications of deep learning are important, but they are not specific to finance. For example, people use deep learning to produce alternative data, and for NLP tasks such as construction of sentiments or processing of corporate filings.Difference Between Machine Learning and Deep Learninggenerally speaking, Machine Learning is a term used to refer to computers learning from examples or data. That is, a machine sees various, and learn how to interpret unseen examples new photos it has never seen, presumably through some form of generalization.Deep Learning is one mechanism for Machine Learning, in which the learning is performed via a multi-layered network of interconnected “neurons”. There are many other models that support learning.While the concept of neural networks has been known for decades, only a few years ago people started realizing that deep networks with multiple layers of neurons seem to result in good learning in many cases.CITATION htt6 \l 1036 [14]Fig. 1. SEQ Figure \* ARABIC 7. Comparing a ML and DL CITATION htt6 \l 1036 [14]Neural networks A very important group of algorithms for each supervised and unsupervised machine learning square measure neural networks. These underlie abundant of machine learning, and whereas easy models like regression toward the mean used will be accustomed build predictions supported a tiny low data features options, neural networks square measure helpful once addressing giant sets of information with several options. They are computational techniques designed to simulate the way the human brain performs a particular task, by means of a massive treatment distributed in parallel, and composed of simple processing units, these units are only mathematical elements called neurons or nodes that have a neurological property, In that it stores practical knowledge and experimental information to make it available to the user by adjusting weights. CITATION zdnht \l 1036 [6] CITATION HAY94 \l 2057 [15] CITATION Dee \l 1036 [12]The neural networks also consist of a group of processing units, one of which is called a neuron, which shows a nonlinear and simple non-linear model of the artificial neuron:The human being also has input units that connect him to the outside world, which are his five senses, so also the neural networks need input units. And processing units where calculations are made by controlling weights and we obtain a suitable reaction for each of the inputs to the network. Input units form a layer called the input layer, and processing units make up the processing layer and they are the outputs of the grid. And between each of these layers there is a layer of interconnects that connect each layer with the next layer in which weights are set for each interface, and the network contains only one of the input units, but it may contain more than one layer of processing layers. CITATION Moh17 \l 1036 [16] CITATION Min19 \l 1036 [17]Fig. 1. SEQ Figure \* ARABIC 8. Structure of a neural network and how training works CITATION zdnht \l 1036 [6]The weights represent the primary information that the network will learn, so we must update the weights during the training phase, and for this update several different algorithms are used depending on the type of network. One of the most important of these algorithms is the Back Propagation Algorithm algorithm that is used to train fully linked, frontal, multi-layered, and nonlinear neural networks. This algorithm is a generalization of the error correction training method. This algorithm is implemented through two main phases:Front propagation stage There is no modification of synaptic weights, and this stage begins with the introduction of the input shape of the network, where each treatment component of the input element layer is allocated to one of the components of the beam that represents the income, and the values ??of the input vector components cause excitement to the input layer units and this is followed by a front spread of that excitation across the rest of the layers the network.The phase of reverse propagation, which is the stage of adjusting the network weights The standard reverse propagation algorithm is the gradient descent algorithm that allows network weights to move on the negative side of the performance function.Creation Neural Networks exampleThe first step in training a network is to create the network, using multiple aids. Each subordinate is dedicated to creating one of the types of neuronal networks with distinct characteristics, and since we want to create a frontal network, we will use the newff function, which needs four input parameters, which are:An array that contains the minimum and maximum values ??for each component of the input beam and can be replaced by minmax (p) that specifies the smallest and largest value in the input field.An array containing the number of neurons in each layer of the network.An array containing activation dependency names for each layer.The name of the training user used.Example:network1=newff( [0 5],[10,6,2],{tansig,logsig,purlin}, traingd)This instruction creates a forward network with reverse diffusion, where the input field is located between the values ??0 and 5. This network consists of two hidden layers and an output layer, the first hidden layer contains ten neurons while the second hidden layer contains six neurons, while the output layer consists of neurons Output, and the activation dependencies for these layers are tansig for the first hidden layer, logsig for the second, and purlin for the output layer, and the continuation of the training used in this network is traingd.There are several parameters for the traingd function and these parameters can be modified, namely:Learning rate: Determines the speed of inclination and bias change.Show: Command to show training status.Epoch: parameter to stop the training process, as the network stops training if the number of iterations reaches the specified number of epochs.Goal: a minimum error has been specified.min_grad: the minimal tendency with which training stands.The above parameters are determined by default when creating a network but can be controlled and redefined. After understanding NN following are the two important types of neural networks Convolutional Neural NetworksRecurrent Neural NetworksConvolutional Neural networks The convolutional neural networks are designed to method information through multiple layers of arrays. CNN is a type of nerve-feeding network, in which artificial neurons can respond to surrounding units within a portion of the coverage area, and have excellent image processing performance on a large scale. Convolutional neural networks are very similar to normal neural networks, and they consist of neurons with recognizable weights and siding constants. Each neuron receives some input and performs some raster product calculations, and the result is the result of each class, and some calculation techniques in normal neural networks are still applicable here. However, the default input of the convolutional neural network is an image that allows us to encode certain properties in the network structure, which makes the forward feeding function more efficient and reduces a large number of parameters. A convolutional neural network uses three basic ideas CITATION KRI12 \l 2057 [18]Local respective fieldsConvolutionPoolingA typical convolutional neural network consists of following layers:Input Layer: This layer is responsible for resizing input image to a fixed size and normalize pixel intensity values.Convolution Layer: Image convolution is a set of parallel feature maps. It consists of sliding different convolution kernels on the input image and performing certain operations. Pooling Layer: Pooling layers perform max or average operation on the output of convolution layers to retain the most significant information about patterns and forget insignificant onesFully Connected Layer: These are final logistic regression layers which maps visual features to desired output functionsOutput Layer: This layer contains probabilities of classes for each image input.Fig. 1. SEQ Figure \* ARABIC 9. Convolutional Neural Networks CITATION Sum15 \l 1036 [19] Popular CNN ArchitecturesThere are many popular CNN architectures:CITATION htt10 \l 1036 [20] CITATION San18 \l 1036 [21] CITATION Moh17 \l 1036 [16] CITATION Min19 \l 1036 [17]LeNet-5?convolution neural network architecture classified digits, digitized 32×32 pixel greyscale input images.AlexNet is a network from 8 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals..GoogleNet is a deep learning convolution neural network architecture designed for image classification and recognition . error rate of less than 7%, which is close to the level of human performance.MobileNet are a class of lightweight Convolution Neural Network( CNN ) that are majorly targeted for devices with lower computational power than our normal PC’s with GPU.they are less accurate than our normal CNN’s but they are faster specifically catering for the needs of low computing power devices. CNNs ApplicationsThis is some failed that can we Image recognition Convolutional neural networks are commonly used in image recognition systems.Video analytics Compared to image recognition, video analysis is much more difficult. CNN is also often used for such problems.Natural language processing Convolutional neural networks are also often used for Natural language processing. The CNN model has been proven to be effective in dealing with various Natural language processing issues, such as semantic analysis, search result extraction, sentence modeling, classification, prediction. Drug discovery Convolutional neural networks have been used in drug discovery. Convolutional neural networks are used to predict the interaction between molecules and proteins in order to find targeted sites and find potential therapies that are more likely to be safe and effective.Recurrent Neural networks Recurrent Neural Network is a neural network. Simple RNN cannot handle the problem of recurrence, exponential weight explosion, or gradient disappearance, and it is difficult to capture long-term time correlation; combining different LSTM can solve this problem well. Time recurrent neural network can be described as a dynamic time behaviour as and feedforward neural networks (feedforward neural network) to accept different input representing a specific structure, RNN state cycle in the own network, it is possible to more intensive time-series structure of the input. Handwriting recognition is the earliest research result of the successful use of RNN.CITATION htt7 \l 1036 [22]RNN is a type of network constructed in a structurally recursive manner, such as Recursive Auto encoder, which is used to parse sentences in the neural network analysis method of natural language processing.The basic RNN is a network of continuous layers organized by artificial neurons. Each node in a given layer is connected to every other node in the next continuous layer through Directed graph unidirectional connection. Each node neuron has a time-varying real-value activation. Each connection synapse has a real-value weight that can be modified. Nodes are either input nodes receiving data from outside the network, or output nodes in results, or hidden nodes modifying data from input to output. CITATION SCH97 \l 2057 [23] CITATION htt8 \l 1036 [24] CITATION GRA13 \l 2057 [25]Some examples of sequence prediction problems include:One-to-Many: An observation as input mapped to a sequence with multiple steps as an output.Many-to-One: A sequence of multiple steps as input mapped to class or quantity prediction.Many-to-Many: A sequence of multiple steps as input mapped to a sequence with multiple steps as output.The Many-to-Many problem is often referred to as sequence-to-sequence, or seq2seq for short. Machine learning DatasetA dataset is the information or data required in data science or machine learning. The data is normally obtained from historical observations. There are normally two or three datasets in a project: a training dataset, and either a development dataset or a validation and a test dataset. Training datasets are used for building models. The other datasets are used for fine-tuning and picking the best performing model, and for checking how well the chosen model generalizes well to unseen examples. CITATION Ali18 \l 1036 [26]Datasets are essential for training AI algorithms and developing machine learning models. Having a large, high-quality dataset is crucial for any machine learning projects. The data is actually split into different datasets which have different uses—training dataset and test dataset. CITATION Xia17 \l 1036 [27]In general, or in a Machine learning perspective: A data set is a collection of related entities and values under a business that may be accessed either individually or as a whole element, which are organized with a data structure. CITATION Qui09 \l 1036 [28] CITATION Qui091 \l 1036 [29]Two ways to collect a data for your Model:Rely on open source dataCollect your data in a right waySome Open datasets MNIST The MNIST database (Modified National Institute of Standards and Technology database) Many machine learning libraries like sklearn in python already provide easy access to the MNIST dataset. MNIST has now become a standard training dataset for digits in English. There are similar databases by others for training datasets for digits in other languages and for characters in English and other languages..CITATION htt9 \l 1036 [30] COCOCOCO stands for Common Objects in Context. It is a collection of labeled images that can be used to train your model for Object Detection, Segmentation, Keypoint detection, etc. CITATION Ziter \l 1036 [31] It has several features: More than 330K images 1.5 million object instances More than 90 categories Each image has 5 captions More than 250,000 people for the body jointsOpen ImageOpen Images is a dataset that consists of almost 9 million URLs for images. These images have been annotated with image-level labels bounding boxes spanning thousands of classes. The dataset contains a training set of 9,011,219 images, a validation set of 41,260 images and a test set of 125,436 images. CITATION htt9 \l 1036 [30] AI Challenge AI Challenge is a platform of open datasets and programming competitions for artificial intelligence talents around the world. AI Challenger's goal is to support the development of AI talents by providing with high quality and rich data resources, and to solve real world problems by accelerating the integration of research and applications. 8892 teams from 65 countries participated in the inaugural AI Challenger competitions in 2017, which has made it the largest dataset platform for scientific research and the largest non-commercial competition platform in China.TensorFlowTensorFlow is one of the most popular open source libraries in ML that is used to create simple as wells as sophisticated ML models. TensorFlow is derived from 2 terms: Tensors and DataFlow graphs.TensorFlow accepts inputs in the form of matrices of higher dimensions called Tensors. Each Tensor has its own dimensionality and a rank. These tensors are then computed in the form a computational graph. Below is an example of how Tensors look like and how does a DataFlow graph works.TensorFlow manipulates data by creating a DataFlow graph or a Computational graph. It consists of nodes and edges that perform operations and do manipulations like addition, subtraction, multiplication, etc.TensorFlow is now being widely used to build complicated ML models. Here are some really good examples of what you can do using TensorFlow:Voice RecognitionAuto TranslationImage RecognitionTime Series AnalysisRobot NavigationTensorFlow can be used for Object Detection on images or videos. TensorFlow provides an API that helps you do it. TensorFlow permits developers to make knowledge flow graphs structures that describe however data moves through a graph or a series of process nodes. every node within the graph represents a calculation, and every affiliation or edge between nodes could be a three-d knowledge array or tensor.TensorFlow provides all of this for the software engineer through Python Programming language. Python is simple to be told and works with and provides convenient ways that to specific however high-level abstractions are often coupled along. The actual mathematics operations, however, aren't performed in Python. The libraries of transformations that area unit accessible through TensorFlow area unit written as superior C++ binaries.Basically TensorFlow is a:A Computing graph processorWith Tensors as main data elementsApart from that, it includes many sequential and parallel operations related to statistical and deep learning.So it is very similar to other Machine Learning toolkits but has the support of Machine Learning team staff composed of many of the leading professionals (Including one of Caffe's architects, builder of one of the fastest convolutional network frameworks).It also has a dynamic visualization toolkit (TensorBoard) And a high-performance model server (Serving) CITATION Ser \l 1036 [32] CITATION Mar16 \l 2057 [33] CITATION Wha17 \l 1036 [34].Fig. 1. SEQ Figure \* ARABIC 10. TensorFlow DiagramCITATION Wha17 \l 1036 [34]TensorFlow Lite?Google introduced the TensorFlow Lite library for mobile machine learning. The new version is the next evolution of TensorFlow Mobile.The TensorFlow machine learning library already runs on a huge number of platforms: from server racks to tiny IoT devices. But since the introduction of machine learning models has grown exponentially over the past few years, they are already required to be deployed even on mobile and embedded devices. That is why Google introduced a new version of the library, TensorFlow Lite. CITATION Ric \l 1036 [35] CITATION Espace_réservé2 \l 1036 [36] CITATION Ten1 \l 1036 [37].?Among the main advantages of the Lite version, it is worth highlighting:Lightweight?: provides quick initialization and launch of small machine learning models on mobile devices;Cross-platform?: model training is possible on a large number of mobile platforms, including Android and iOS;Speed?: the library is optimized for use on mobile devices, supports hardware acceleration.TensorFlow Lite already supports a number of models, trained and optimized for mobile devices:MobileNet?: a class of computer vision models for identifying about 1000 objects, specially designed for efficient execution on mobile and embedded devices;Inception v3?: an image recognition model similar in functionality to MobileNet, but providing higher accuracy and, at the same time, having a larger size;Smart Reply?: a dialogue model on the device that provides answers to incoming messages in instant messengers with one touch.?Used?primarily by Android Wear apps.Advantages of Tensorflow Lite :Easily convert TensorFlow models to TensorFlow fatless models for mobile optimized modelsDevelop ML applications for iOS and robot devices with easeAn additional economical various to mobile model enablement compared to server-based modelsEnable offline illation on mobile devicesTensorflow Lite permits you to run machine learning models quickly on mobile associated embedded devices with low latency thus you'll be able to perform typical machine learning on these devices while not exploitation an external API or server. which suggests your models will run on devices offline.Alternatives to Tensorflow lite?Core ML: this can be a framework discharged by Apple to form iOS solely dedicated models. It’s associate degree iOS-only different from TensorFlow Lite.PyTorch Mobile: PyTorch may be a standard machine learning framework and is employed extensively in machine learning-related analysis. Hosting Models on Cloud Services : (GCP, AWS) and exposing practicality via REST APIs.Fig. 1. SEQ Figure \* ARABIC 11. Tensorflow lite architectureCITATION htt1 \l 1036 [38]Let's consider each component separately:TensorFlow Model?:?TensorFlow?trained model stored on disk;TensorFlow Lite Converter?: a program that converts a model into the TensorFlow Lite format;TensorFlow Lite Model File?:?FlatBuffers-?based model file format?optimized for maximum speed and minimum size.Related WorkThis work is based on other research papers?: PoseEstimation-CoreML This project is Pose Estimation on iOS with Core ML.If you are interested in iOS + Machine Learnin. CITATION tuc \l 1036 [39]Fig. 1. SEQ Figure \* ARABIC 12. PoseEstimation-CoreML ScreenShot1 CITATION tuc \l 1036 [39]Features:Estimate body pose on a imageInference with camera's pixel buffer on real-timeInference with a photo library's imageVisualize as heatmapsVisualize as lines and pointsPose capturing and pose matchingFig. 1. SEQ Figure \* ARABIC 13. PoseEstimation-CoreML ScreenShot2 CITATION tuc \l 1036 [39]How it works?:Fig. 1. SEQ Figure \* ARABIC 14: Structure of working CITATION tuc \l 1036 [39]RequirementsXcode 9.2+iOS 11.0+Swift 4 Tf-pose-estimation?'Openpose', human pose estimation algorithm, have been implemented using Tensorflow. It also provides several variants that have some changes to the network structure for?real-time processing on the CPU or low-power embedded devices. CITATION Ild19 \l 1036 [40]Important Updates:Add new models using mobilenet-v2 architecture. Post-processing part is implemented in c++. It is required compiling the part. Arguments in run.py script changed. Support dynamic input size.Dependencies:You need dependencies below.python3tensorflow 1.4.1+opencv3, protobuf, python3-tkFig. 1. SEQ Figure \* ARABIC 15:Tf-pose-estimation Screenshot CITATION Ild19 \l 1036 [40]Conclusion to section 1after analysis of some aspects of the problem and careful consideration of their characteristics, the following decision was made as to the selected variants among many alternatives: - among types of learning (supervised, unsupervised, ...) the supervised learning type was selected because Supervised learning uses Classification techniques to predict discrete responses and to Use classification if your data can be tagged, categorized, or separated into specific groups or classes.-Among datasets (MNIST, FashionMNIST, ImageNet, COCO, etc...) COCO API was selected because it contains labeling related to pose estimation and it's a large image dataset designed for object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation.- among types of neural networks (NNs), like dense, convolutional, recurrent, the CNNs was selected, because it uses special convolution and pooling operations and performs parameter sharing. This enables CNN models to run on any device, making them universally attractive.- among convolutional NNs (ResNet, DenseNet, Inception, ...) the MobileNet was selected because it's small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embedding, and segmentation similar to how other popular large scale models. we have succeeded to reach all the aforementioned goals. Moreover, the experience allowed us to expand the knowledge we acquired during our studies and enrich our competences by exploring new areas like Convolutional Neural Networks, Human pose estimation, neural networks, TensorFlow Lite &TensorFlow. It is important to mention that our project will have a bigger impact than the one we were supposed to cover in the beginning. And In this practice we learned how to use the trained model on an android project (Android Studio) and we get to know more about how to realize this and propose a solution from others experiences.SECTION 2NEEDS ANALYSIS AND SYSTEM MODELING IntroductionAfter giving an overview of the background of our project, we will now present the modeling of the solution, through a brief definition of the adopted design approach. Next, we will study the technical and functional requirements of the application; then, we analyze and conclude the chapter, speaking of the static system architecture outlined by the class diagram. To realize all those tasks, we adopted the modeling process UP (Unified Process) offered in the modeling language UML (Unified Modeling Language) whose diagrams are widely used and easily customizable in each step (i.e. phase) of any development project.Presentation of the Unified Process The unified Modelling Language (UML) CITATION The01 \l 1036 [41], is a notation describing a graphical language with rules for creating analysis and design methods. In addition, UML is an assisting tool for the project.The UML authors recommended the use of a process to standardize procedures as part of modelling of a computer application, several model approaches have been described and formalized and among these process is the UP.The Unified Process (UP) CITATION The01 \l 1036 [41], is a use-case-driven, architecture-centric, iterative and incremental development process framework that controls the Object Management Group's (OMG). The UP is widely applied to a lot of software systems, including small-scale and large-scale projects having numerous degrees of managerial and technical complexity, all over different application domains and organizational cultures.The UP is a "concept" a process framework that supplies an infrastructure for carrying out projects but not the technical details required for executing them. Above all, it is a software development process framework, a lifecycle model involving context, collaborations, and interactions. When using the UP on a project, a Development Case, an instance of the process framework specifies what elements of the UP are used all through the project. A "UP-based" Development Case is an example of the UP, a configured or tailored subset of the RUP content (which may possibly be further improved) that addresses the width and depth of the UP framework. The design processThe UML design process includes the creation of assorted graphical or text-primarily based documents. These documents area units are known as artifacts and that they describe the output of a step within the method all in UML. The UML style method is made up of 2 parts which are: Analysis (What is that the problem?) and style (How the matter ought to be solved?).The purpose for this analysis and style method is to allow the project to be classified into part elements which give the subsequent project characteristics: detail is hidden, the system is standard, elements area unit connected and move, layer complexness, and elements can also be reused in alternative product. Furthermore, the project is divided into phases that describe its practical implementation. There are area unit with many iterations in every part. Project phases and general p.c of the project are: origin (5%), Elaboration (20%), Construction (65%), and Transition (10%).The traditional style method contains a step by step method. This method primarily includes: necessities Definition, Analysis, Design, Installation, and Testing. Styles can possess unvaried processes in the UML. Mainly, this means that every iteration adds some operate (based on a use case) to the system. Extra options or functions area unit further in succeeding iterations permits a project to become more active and as feedback is given, additions or modifications is made with less effort. Design steps are broken into iterations which may be executed over and over. Each iteration includes:Analysis (In this phase, we will capitalize on the use case and sequence diagrams in our project),Design (In this phase, we will use the class diagram),Code + Design +Test + Integration.Actors identificationOn this application there is one type actor which is the mobile phone users. The user can be anyone has installed the application.The users can use the application to view, take pictures or record videos for pose estimation action.Use Case DiagramFig. 2.1:Global Use Case Diagram.Use case DescriptionTable SEQ Table \* ARABIC 2.1.Use case DescriptionCamera view ActorsMobile UserDescriptionOn this part of application the user can view what he went to see pose estimation PreconditionThe user must install the application on his mobile phone.Post-conditionAfter the user choose , he can see on real time the results.Basic flowThe user must chose (pose estimation),Then this view will displayed on real time .ExceptionError message “Permission request” allow app to take pictures and record video. Pose EstimationActorsThe mobile User.DescriptionIf the user chose this option the application directly will show the camera view at same time detect human figures in videos, so that one could determine, for example, where someone's elbow shows up in an image.PreconditionThe user mast chose this from the main page .Post-conditionOn this part the user can also take pictures or record video . Basic flowThe camera must be pointed to a human being.ExceptionError message “Permission request” allow app to take pictures and record video.Take picturesActorsThe mobile User.DescriptionTake pictures of pose estimation .PreconditionThe device must get enough memory space.Post-conditionAfter the picture has been done it will be saved automatically on the device.Basic flowClick on the option “Take picture symbol”.ExceptionError message “Permission request” allow app to take pictures and record video.No enough space on device.Record VideoActorsThe mobile User.DescriptionRecord video of action pose estimation .PreconditionThe device must get enough memory space.Post-conditionAfter the recording has been done it will be saved automatically on the device.Basic flowClick on the option “record symbol”.ExceptionError message “Permission request” allow app to take pictures and record video.No enough space on device.System analysisScenarios and sequence DiagramsCamera viewThe user launches the application,The system check the permission to use the camera,If the user denied:The system displays an error message and stop the user to access to camera. Else the user is allowed:The system displays the main page,The user chose what (Pose estimation),A camera view will open .Fig. 2. Camera View Sequence Diagrams.Pose Estimation viewThe user chose to view pose estimation,The system check the permission to use the camera,If the user denied:The system displays an error message and stop the user to access to camera. Else the user is allowed:The system displays the camera view page with pose estimation,The user point the camera to Human ,The system process the human body through CNN algorithm,Display detected human pose in videos on real time ,System show a simulation to the body in this position .Fig. 2.3. Pose Estimation view Sequence DiagramsTake pictures The user press “Take picture”,The system check the device space ,If the user denied:The system displays an error message that no enough space. Else the user is allowed:The system saves the picture on device memory ,The user continuing viewing through the camera.Fig. SEQ Figure \* ARABIC 2.4. Take pictures Sequence DiagramsRecording videoThe user press record video,The system check the device space ,If the user denied:The system displays an error message that no enough space. Else the user is allowed:The system start recording,The user press stop to end recording,The system save the video ,The user continuing viewing through the camera.Fig. SEQ Figure \* ARABIC 2.5. Record video Sequence DiagramsHuman Pose estimationThe user start the aoolicationApplication request camera ;Display the camera view;Camera extracted features to the trained neural network ;The rained NN predicted features ;Use get got simulation for pose estimated body. Fig. SEQ Figure \* ARABIC 2.6. Human Pose Estimation Sequence DiagramsDesignSystem description The global architecture of the application is represented in (Figure 22). The main activity in our application is viewing pose estimation through the mobile phone camera, the user must give permission to the app for using the camera. After that he can see camera viewing page Furthermore, he can take pictures and record video.Fig. SEQ Figure \* ARABIC 2.7. System ArchitectureFig. SEQ Figure \* ARABIC 2.8. Overview of the system.Conclusion to section 2To resolve the problems stated in chapter 1 and form all the information we got we have tried to explain more the concept of the problem and how to realize it to an application using simple diagrams to help for the better understanding of the problem and the solution.In this chapter, we presented the design describing the future functions of our system in order to facilitate its implementation. We used the UML modelling language and we introduced the different diagrams we used to model the system. Moreover, we gave a brief overview of our objectives for this project. In the next chapter, we will talk about the implementation of our proposition.PART 3SYSTEM IMPLIMENTATIONIntroductionAfter completing the conception of the platform stage, will begin in this chapter the implementation part which constitutes the last part of this report and aims to conclude our work. To do so, we will begin by specifying the hardware and software environment. Then, we cover the different programming languages, tools, and technologies used. Finally, we present some screenshots of the mobile application.The development environmentThe machine environment In the coding phase, we have used a machine configured as follows:Acer ASPIRE A515 machineRAM: 6 GBHard Drive: 256 GB SSDCPU: Intel (R) Core (TM) i3GPU: NVIDIA GeForce MX 1302G VRAMSystem Type: Windows 10 professional (64 bit)The software environmentIn the application development, the following software have been used: Android StudioFig. SEQ Figure \* ARABIC 3.1. Android Studio Logo CITATION non15 \l 1033 [42]The android studio is a official Integrated Development Environment (IDE) for android app development, based on IntelliJ IDEA and developed by GOOGLE and JETBRAINs over 9861 companies use android studio for android app development . TensorFlowFig. SEQ Figure \* ARABIC 3.2. TensorFlow LogoCITATION htt2 \l 1036 [43].TensorFlow is a machine learning / deep learning / multilayer neural network library from Google. Libraries using data flow graphs can be used to describe complex networks in an easy-to-understand manner. With high versatility, it can be used from the research level to real products.AnacondaFig. SEQ Figure \* ARABIC 3.3. Anaconda LogoCITATION htt4 \l 1036 [44].Anaconda is a Python distribution by Continuum org. It comes with large set of Python librabries and sophisticated package management tool conda. Conda package manager is better than Python's native package management tool Pip as conda resolves non-python dependencies as well. There is a smaller version of anaconda available called miniconda. It comes with relatively less number of preinstalled libraries. Anaconda also have other tools like Jupyter Notebook. [45]The implementation languages usedIn order to create the platform, we needed to handle many several languages:JAVAJava is strictly an object oriented programming language. A Java program is compiled down to bytecode by the Java compiler. The bytecode can be interpreted by Java virtual machine, which runs on multiple platforms, Mac, PC or Unix computers into machine code. JIT compiler compiles bytecode into native machine code “just in time” to run, thus improving the performance of JVM. CITATION JAV15 \l 1033 [45].XML XML is a standard set of rules for writing documents in a way that a computer can read them. It doesn't say anything about what the documents mean, or how they will be used.Because tools to write and read XML exist in every programming language, it has become a popular way to transmit data between computer programs, especially across networks.A programmer or information worker can create an "XML language" that describes any kind of data that they need; for example, a word processing document, a web page, or a series of updates to a website (RSS, or Atom) CITATION And15 \l 1033 [46].KOTLINThe Kotlin is a nimble open-source language that can deal with coding errors much more easily. Instead of discovering all the problems in testing as with Java, Kotlin identifies the issues in compile time. Obviously, this saves developers a lot of time, and this is one of the key reasons Kotlin is on the rise among Android app designers.Surprisingly, Kotlin only launched in early 2016. Already, it’s made a huge impact. Another great feature of Kotlin is that it’s capable of integrating with Android Studio, another favorite of Android app developers. PythonPython is an object-oriented programming language that provides rapid application development. It was released in 1991 by Guido van Rossum. It has huge demand in the Rapid Application Development field due to its dynamic binding and dynamic typing options. Python needs a unique syntax that covers readability which makes the language easy and simple to learn. It supports modules and packages that enable code reuse and program modularity.The application interfacePose Estimation page On next figure we can see the screenshot of the most important thing on this app which is pose estimation and our main goal this particular page is composed from a lot of kotlin class and from tow xml files.Fig. SEQ Figure \* ARABIC 3.4. PE Page.Kotlin For this view we have used many classes the obtain this page (FrameLayout,FitTexterview,Camera2BasicFragment,CamerActivity,DrawView,ImageClacifier,ImageClacifierFloatIncepton)next we try to explain some classes :CmaeraActivity : This is Main ‘Activity ’ for camera app on this class we run the phone camera Camera2BasicFragment : On this class we get all the other classes and run our app for Pose estimation .From here we run basic fragment for the camera, Takes photos and classify them periodically Load the model and labels create either a new ImageClassifierQuantizedMobileNet or an ImageClassifierFloatInception and Shows the UI thread for the classification results.DrawingView : This class is for drawing the simulation of the human body .ImageClacifier and ImageClassifierFloatInception : Classifies images with Tensorflow Lite and Pose Estimator.All this classes are in Annexes with code description .XMLTo obtain this view we have created Tow XMl Files one is for the camera view and other for drawing .<?xml version="1.0" encoding="utf-8 ANY KIND, either express or implied. See the License for the specific language governing permissions and<FrameLayout xmlns:android="" xmlns:tools="" android:id="@+id/container" android:layout_width="match_parent" android:layout_height="match_parent" android:background="#000000" tools:context="com.edvard.android.tflitecamerademo.CameraActivity" />This code is to run Camera view which is connected to next code also : <FrameLayout xmlns:android="" xmlns:tools="" android:background="@color/white" android:layout_width="match_parent" android:layout_height="match_parent"><com.edvard.poseestimation. FrameLayout android:id="@+id/layout_frame" android:layout_width="match_parent" android:layout_height="430dp" > <com.edvard.poseestimation. TextureView android:id="@+id/texture" android:layout_width="match_parent" android:layout_height="430dp" /> <com.edvard.poseestimation.DrawView android:id="@+id/drawview_g" android:layout_width="match_parent" android:layout_height="430dp" /> </com.edvard.poseestimation. FrameLayout><com.edvard.poseestimation. FrameLayout android:id="@+id/layout_frame_g" android:layout_width="match_parent" android:layout_height="305dp" android:layout_marginTop="430dp"> <com.edvard.poseestimation. TextureView android:id="@+id/texture_g" android:layout_width="match_parent" android:layout_height="305dp" android:layout_marginTop="0dp" android:layout_gravity="center_vertical" /> <com.edvard.poseestimation.DrawView android:id="@+id/drawview" android:layout_width="match_parent" android:layout_height="match_parent" /> </com.edvard.poseestimation. FrameLayout></FrameLayout> Us we can we have two FrameLayout first is for camera and the second is for drawing the pose estimation .Training Dataset A machine learning?model?is a function with learnable parameters that maps an input to a desired output. The optimal parameters are obtained by training the model on data.Training involves several steps:Getting a?batch?of data to the model.Asking the model to make a paring that prediction with the "true" value.Deciding how much to change each parameter so the model can make a better prediction in the future for that batch.The training?dataset only contains single person images and it come from the competition of?AI Challenger.we transfer the annotation into COCO format for using the data augment code from?tf-pose-estimationCITATION htt3 \l 1036 [47].Steps for Training All file needed are on CITATION htt12 \l 1036 [48]Install Anaconda Create Virtual Environment conda create -n {env_name} python={python_version} anacondaStart the environment.Source conda activate {env_name}Install the requirementscd {tf2-mobile-pose-estimation_path}pip install -r requirements.txtDownload COCO dataset.Special script that will help you to download and unpack needed COCO datasets. Please fill COCO_DATASET_PATH with path that is used in current version of repository. python downloader.py --download-path=COCO_DATASET_PATHRun The ProjectIn order to use the project you have to:Prepare the dataset(ai_challenger dataset) Run the model using:python train.pyConvert the model to TFLite model# Convert to frozen pb.cd trainingpython3 src/gen_frozen_pb.py \--checkpoint=<you_training_model_path>/model-xxx --output_graph=<you_output_model_path>/model-xxx.pb \--size=192 --model=mv2_cpm_2# If you update tensorflow to 1.9, run following command.python3 src/gen_tflite_coreml.py \--frozen_pb=forzen_graph.pb \--input_node_name='image' \--output_node_name='Convolutional_Pose_Machine/stage_5_out' \--output_path='./' \--type=tflite # Convert to tflite.# See for more information.bazel-bin/tensorflow/contrib/lite/toco/toco \--input_file=<you_output_model_path>/model-xxx.pb \--output_file=<you_output_tflite_model_path>/mv2-cpm.tflite \--input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE \--inference_type=FLOAT \--input_shape="1,192,192,3" \--input_array='image' \--output_array='Convolutional_Pose_Machine/stage_5_out'Then, place the tflite file in?android_demo/app/src/main/assetsTrain by ordinary wayAll file needed are on CITATION htt11 \l 1036 [49]Change tensorflow-gpu==1.4.0 to tensorflow==1.4.0 on requirements.txt.install the dependencies.cd trainingpip3 install -r requirements.txtBeside, you also need to install?cocoapiEdit the parameters files in experiments folder, it contains almost all the hyper-parameters and other configuration you need to define in training. After that, passing the parameters file to start the training:cd trainingpython3 src/train.py experiments/mv2_cpm.cfgAfter 12 hour training, the model is almost coverage on 3 Nvidia 1080Ti graphics cards, below is the corresponding plot on tensorboard.Model properties Figure 29 represent properties model file that has been integrated on application as .tflite file.this description made from netron.Fig. SEQ Figure \* ARABIC 3.5. TFlite propertiesStatistics DatasetFig. SEQ Figure \* ARABIC 3.6. The distribution of different type of keypoints. CITATION Jia \l 1036 [50]Tests A and B have ratios of 70%, 10%, 10%, and 10%, which represents 210 000, 30 000, 30 000, and 30 000 images, in that same order.The 210 000 images represent 378 374 human figures with approximately 5 million key points. Of all the labelled human key points, 78.4% are labelled as “visible” while the remainder is labelled “not visible.” Fig. 26 shows the distribution of various types of key points. CITATION Jia \l 1036 [50] Application screenshot On next figures we can see screenshots from the application :32111956694805Fig. SEQ Figure \* ARABIC 3.8. Screenshot 200Fig. SEQ Figure \* ARABIC 3.8. Screenshot 23211307-1270Fig. SEQ Figure \* ARABIC 3.7. Screenshot1114935094933001211580234315Fig. SEQ Figure \* ARABIC 3.9 Screenshot 400Fig. SEQ Figure \* ARABIC 3.9 Screenshot 4Conclusion to section 3In this chapter, we described the realization of our mobile application by specifying the development environment, the implementation of the report, and the approach used for the realization. In Fact, at this moment, we finished the implementation and testing of all use cases, while respecting the elaborate design. In other words, the final version of the installed software is actually running on an Android phone.our result we can say it was a simple and better performance for human pose estimation and it's a lightweight network, an iterative training strategy. the COCO dataset has achieved pretty decent results on Our methods when compared with those other methods, and we think that our methods are efficient than them in terms of inference speed. Performance of accuracy 70 %, inference speed time per image 30 ms comparing to TencenFlow tutorial method speed time was 55 ms, and performance of accuracy was 70 % using the same device.CONCLUSIONThis thesis focuses on human pose estimation, using smartphones. there are too many applications, such as human-computer interfaces, healthcare, robotics, surveillance, and security, etc. Despite continuing efforts in this area, these problems remain unresolved, particularly in non-cooperative environments. Pose estimation and recognition of activities pose many challenges, such as occultations, variations in points of view, human morphologies, and physical appearances, complex backgrounds, the articulated nature of the human body, and the diversity of people's behavior. Problems are defined in the following, and corresponding research goals are stated thereafter.In this paper, we present a simple and better performance for human pose estimation and it's a lightweight network, an iterative training strategy. the COCO dataset has achieved pretty decent results on Our methods when compared with those other methods, and we think that our methods are efficient than them in terms of inference speed. We hope if somebody develops their models that our methods will be helpful. And we also hope our methods could inspire more ideas on the lightweight human pose estimation field.It was estimated that has been succeeded to reach all the aforementioned goals. Moreover, the experience allowed us to expand the knowledge we acquired during our studies and enrich our competences by exploring new areas like mobile development & modeling and machine learning. It is important to mention that our project will have a bigger impact than the one we were supposed to cover in the beginning. PerspectivesBesides all the interesting features of our application, There are many improvements that can be made to increase post-processing step performance and to make it more accurate. Also, it may be interesting to combine it with the 2D> 3D mapping to reconstruct the 3D mode. A list of possible improvements is shown below:Develop application for IOS operating system.Use a different roughly to connect the joints closest to the actual skeleton bones. The bones are not straight.Performing a more powerful filter to take out the disposable pose.Perform an estimate pose on the video stream2D -> 3D mappingFuture Work We would like to point out potential research tracks to follow the ideas presented in this thesis and to carry the scientific findings one step further. Thus, this section will first propose possible future work to address the limitations that are mentioned above, and imagine new related challenges and we like to add human action recognition to the application.REFERENCESReferences[1] "Human-Pose-Estimation-and-Activity-Classification-Bearman-Stanford," semantic scholar, 2015. [Online]. Available: .[2] Newell, A., Yang, K., & Deng, J. (2016, October). Stacked hourglass networks for human pose estimation. In?European conference on computer vision?(pp. 483-499). Springer, Cham.. [3] Johnson, S., & Everingham, M. (2010, August). Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation. In?bmvc?(Vol. 2, No. 4, p. 5)..[4] Atul, "AI vs Machine Learning vs Deep Learning," 02 Mar 2020. [Online]. Available: .[5] E. Alpaydin, "ntroduction to machine learning," MIT press., 2020. [6] Heath ,N, "What is machine learning? Everything you need to know," zdnet, 14 Sep 2018. [Online]. Available: .[7] Daumé III, H. (2012). A course in machine learning.?Publisher, ciml. info,?5, 69.[8] Tao, Dacheng, Xuelong Li, Weiming Hu, Stephen Maybank, and Xindong Wu. "Supervised tensor learning." In?Fifth IEEE International Conference on Data Mining (ICDM'05), pp. 8-pp. IEEE, 2005. [9] "Techniques of Machine Learning," [Online]. Available: .[10] Castano, A. P. (2018).?Practical Artificial Intelligence: Machine Learning, Bots, and Agent Solutions Using C. Apress.[11] "Cat-Dog-classifier--CNN," 20 Sep 2017. [Online]. Available: .[12] "Deep Learning Tutorial for Beginners: Neural Network Classification," guru99, [Online]. Available: .[13] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning.?nature,?521(7553), 436-444..[14] "deep learning," [Online]. Available: .[15] S. HAYKIN, Neural networks: a comprehensive foundation., United States: Prentice Hall PTR, 1994. [16] M. Al-Qizwini, I. Barjasteh, H. Al-Qassab and H. Radha, "Deep learning algorithm for autonomous driving using GoogLe," Los Angeles, CA, USA, 2017. [17] Tan, Mingxing, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. "Mnasnet: Platform-aware neural architecture search for mobile." In?Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820-2828. 2019.[18] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In?Advances in neural information processing systems?(pp. 1097-1105).[19] S. Saha, "A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way," towards data science, 15 Dec 2015. [Online]. Available: .[20] "convolutional-neural-network-architecture," [Online]. Available: .[21] Lee, S. G., Sung, Y., Kim, Y. G., & Cha, E. Y. (2018). Variations of AlexNet and GoogLeNet to Improve Korean Character Recognition Performance.?Journal of Information Processing Systems,?14(1).[22] "recurrent-neural-network," [Online]. Available: .[23] Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks.?IEEE transactions on Signal Processing,?45(11), 2673-2681.[24] "machinelearningmastery," rnn-neural-networks, [Online]. Available: .[25] Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In?2013 IEEE international conference on acoustics, speech and signal processing?(pp. 6645-6649).[26] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. (2018). The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale.?arXiv preprint arXiv:1811.00982..[27] Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.[28] Quionero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (2009).?Dataset shift in machine learning. The MIT Press.[29] Sra, S., Nowozin, S., & Wright, S. J. (Eds.). (2012).?Optimization for machine learning. Mit Press.[30] "comprehensive-collection-deep-learning-datasets," [Online]. Available: .[31] Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014, September). Microsoft coco: Common objects in context. In?European conference on computer vision?(pp. 740-755). Springer, Cham.[32] S. Yegulalp, "What is TensorFlow? The machine learning library explained," info world, [Online]. Available: .[33] Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Kudlur, M. (2016). Tensorflow: A system for large-scale machine learning. In?12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16)?(pp. 265-283).[34] A. Unruh, machine intelligence platform, 09 Nov 2017. [Online]. Available: .[35] R. Alake, "A Beginner’s Introduction To TensorFlow Lite," towards data science, [Online]. Available: .[36] J. Tang, Intelligent Mobile Projects with TensorFlow, 2018. [37] "TensorFlow Lite guide," TensorFlow, [Online]. Available: .[38] N. Karthikeyan, machine learning projects for mobile applications, October 2018. [39] "PoseEstimation-CoreML," [Online]. Available: .[40] I. Kim, "tf-pose-estimation," Jul 2019. [Online]. Available: .[41] The Computer Technology Documentation Project, "Unified Modeling Language Guide," 13 May 2001. [Online]. Available: . [Accessed 27 April 2015].[42] "nonagon," [Online]. Available: . [Accessed 13 May 2015].[43] "An end-to-end open source machine learning platform," [Online]. Available: .[44] "Learn Python for Data Science and Analytics," [Online]. Available: .[45] "JAVA," ORACLE, [Online]. Available: . [Accessed 11 May 2015].[46] "Android Developer," Google, [Online]. Available: . [Accessed 11 May 2015].[47] "Single pose estimation for iOS and android using TensorFlow 2.0," [Online]. Available: .[48] "mobile-pose-estimation," [Online]. Available: .[49] "PoseEstimationForMobile," no. . [50] Wu, J., Zheng, H., Zhao, B., Li, Y., Yan, B., Liang, R., ... & Wang, Y. (2019, July). Large-scale datasets for going deeper in image understanding. In?2019 IEEE International Conference on Multimedia and Expo (ICME)?(pp. 1480-1485). [51] Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B. (2014). 2d human pose estimation: New benchmark and state of the art analysis. In?Proceedings of the IEEE Conference on computer Vision and Pattern Recognition?(pp. 3686-3693).[52] Yang, Y., & Ramanan, D. (2011, June). Articulated pose estimation with flexible mixtures-of-parts. In?CVPR 2011?(pp. 1385-1392).Appendicespackage com.raouf.poseestimationimport android.annotation.SuppressLintimport android.app.AlertDialogimport android.app.Dialogimport android.app.DialogFragmentimport android.app.Fragmentimport android.content.Contextimport android.content.pm.PackageManagerimport android.content.res.Configurationimport android.graphics.ImageFormatimport android.graphics.Matriximport android.graphics.Pointimport android.graphics.RectFimport android.graphics.SurfaceTextureimport android.hardware.camera2.CameraAccessExceptionimport android.hardware.camera2.CameraCaptureSessionimport android.hardware.camera2.CameraCharacteristicsimport android.hardware.camera2.CameraDeviceimport android.hardware.camera2.CameraManagerimport android.hardware.camera2.CaptureRequestimport android.hardware.camera2.CaptureResultimport android.hardware.camera2.TotalCaptureResultimport android.media.ImageReaderimport android.os.Bundleimport android.os.Handlerimport android.os.HandlerThreadimport android.support.v13.app.FragmentCompatimport android.support.v4.content.ContextCompatimport android.util.Logimport android.util.Sizeimport android.view.LayoutInflaterimport android.view.Surfaceimport android.view.TextureViewimport android.view.Viewimport android.view.ViewGroupimport android.widget.RadioGroupimport android.widget.TextViewimport android.widget.Toastimport java.io.IOExceptionimport java.util.ArrayListimport java.util.Arraysimport java.util.Collectionsimport java.paratorimport java.util.concurrent.Semaphoreimport java.util.concurrent.TimeUnit/** * Basic fragments for the Camera. */class Camera2BasicFragment : Fragment(), FragmentCompat.OnRequestPermissionsResultCallback { private val lock = Any() private var runClassifier = false private var checkedPermissions = false private var textView: TextView? = null private var textureView: TextureView? = null private var textureView2: TextureView? = null private var layoutFrame: FraL? = null private var layoutFrame2: FraL? = null private var drawView: DrawView? = null private var drawView2: DrawView? = null private var classifier: ImageClassifier? = null private var layoutBottom: ViewGroup? = null private var radiogroup: RadioGroup? = null /** * [TextureView.SurfaceTextureListener] handles several lifecycle events on a [ ]. */ private val surfaceTextureListener = object : TextureView.SurfaceTextureListener { override fun onSurfaceTextureAvailable( texture: SurfaceTexture, width: Int, height: Int ) { openCamera(width, height) } override fun onSurfaceTextureSizeChanged( texture: SurfaceTexture, width: Int, height: Int ) { configureTransform(width, height) } override fun onSurfaceTextureDestroyed(texture: SurfaceTexture): Boolean { return true } override fun onSurfaceTextureUpdated(texture: SurfaceTexture) {} } /** * ID of the current [CameraDevice]. */ private var cameraId: String? = null /** * A [CameraCaptureSession] for camera preview. */ private var captureSession: CameraCaptureSession? = null /** * A reference to the opened [CameraDevice]. */ private var cameraDevice: CameraDevice? = null /** * The [android.util.Size] of camera preview. */ private var previewSize: Size? = null /** * [CameraDevice.StateCallback] is called when [CameraDevice] changes its state. */ private val stateCallback = object : CameraDevice.StateCallback() { override fun onOpened(currentCameraDevice: CameraDevice) { // This method is called when the camera is opened. We start camera preview here. cameraOpenCloseLock.release() cameraDevice = currentCameraDevice createCameraPreviewSession() } override fun onDisconnected(currentCameraDevice: CameraDevice) { cameraOpenCloseLock.release() currentCameraDevice.close() cameraDevice = null } override fun onError( currentCameraDevice: CameraDevice, error: Int ) { cameraOpenCloseLock.release() currentCameraDevice.close() cameraDevice = null val activity = activity activity?.finish() } } /** * An additional thread for running tasks that shouldn't block the UI. */ private var backgroundThread: HandlerThread? = null /** * A [Handler] for running tasks in the background. */ private var backgroundHandler: Handler? = null /** * An [ImageReader] that handles image capture. */ private var imageReader: ImageReader? = null /** * [CaptureRequest.Builder] for the camera preview */ private var previewRequestBuilder: CaptureRequest.Builder? = null /** * [CaptureRequest] generated by [.previewRequestBuilder] */ private var previewRequest: CaptureRequest? = null /** * A [Semaphore] to prevent the app from exiting before closing the camera. */ private val cameraOpenCloseLock = Semaphore(1) /** * A [CameraCaptureSession.CaptureCallback] that handles events related to capture. */ private val captureCallback = object : CameraCaptureSession.CaptureCallback() { override fun onCaptureProgressed( session: CameraCaptureSession, request: CaptureRequest, partialResult: CaptureResult ) { } override fun onCaptureCompleted( session: CameraCaptureSession, request: CaptureRequest, result: TotalCaptureResult ) { } } private val requiredPermissions: Array<String> get() { val activity = activity return try { val info = activity .packageManager .getPackageInfo(activity.packageName, PackageManager.GET_PERMISSIONS) val ps = info.requestedPermissions if (ps != null && ps.isNotEmpty()) { ps } else { arrayOf() } } catch (e: Exception) { arrayOf() } } /** * Takes photos and classify them periodically. */ private val periodicClassify = object : Runnable { override fun run() { synchronized(lock) { if (runClassifier) { classifyFrame() } } backgroundHandler!!.post(this) } } /** * Shows a [Toast] on the UI thread for the classification results. * * @param text The message to show */ private fun showToast(text: String) { val activity = activity activity?.runOnUiThread { textView!!.text = text drawView!!.invalidate() } } /** * Layout the preview and buttons. */ override fun onCreateView( inflater: LayoutInflater, container: ViewGroup?, savedInstanceState: Bundle? ): View? { return inflater.inflate(R.layout.fragment_camera2_basic, container, false) } /** * Connect the buttons to their event handler. */ override fun onViewCreated( view: View, savedInstanceState: Bundle? ) { textureView = view.findViewById(R.id.texture) textureView2 = view.findViewById(R.id.texture_g) textView = view.findViewById(R.id.text) layoutFrame = view.findViewById(R.id.layout_frame) layoutFrame2 = view.findViewById(R.id.layout_frame_g) drawView = view.findViewById(R.id.drawview) drawView2 = view.findViewById(R.id.drawview_g) layoutBottom = view.findViewById(R.id.layout_bottom) radiogroup = view.findViewById(R.id.radiogroup); radiogroup!!.setOnCheckedChangeListener { group, checkedId -> if(checkedId==R.id.radio_cpu){ startBackgroundThread(Runnable { classifier!!.initTflite(false) }) } else { startBackgroundThread(Runnable { classifier!!.initTflite(true) }) } } } /** * Load the model and labels. */ override fun onActivityCreated(savedInstanceState: Bundle?) { super.onActivityCreated(savedInstanceState) try { // create either a new ImageClassifierQuantizedMobileNet or an ImageClassifierFloatInception // classifier = new ImageClassifierQuantizedMobileNet(getActivity()); classifier = ImageClassifierFloatInception.create(activity) if (drawView != null) drawView!!.setImgSize(classifier!!.imageSizeX, classifier!!.imageSizeY) drawView2!!.setImgSize(classifier!!.imageSizeX, classifier!!.imageSizeY) } catch (e: IOException) { Log.e(TAG, "Failed to initialize an image classifier.", e) } } @Synchronized override fun onResume() { super.onResume() backgroundThread = HandlerThread(HANDLE_THREAD_NAME) backgroundThread!!.start() backgroundHandler = Handler(backgroundThread!!.getLooper()) runClassifier = true startBackgroundThread(Runnable { classifier!!.initTflite(true) }) startBackgroundThread(periodicClassify) // When the screen is turned off and turned back on, the SurfaceTexture is already // available, and "onSurfaceTextureAvailable" will not be called. In that case, we can open // a camera and start preview from here (otherwise, we wait until the surface is ready in // the SurfaceTextureListener). if (textureView!!.isAvailable) { openCamera(textureView!!.width, textureView!!.height) } else { textureView!!.surfaceTextureListener = surfaceTextureListener } } override fun onPause() { closeCamera() stopBackgroundThread() super.onPause() } override fun onDestroy() { classifier!!.close() super.onDestroy() } /** * Sets up member variables related to camera. * * @param width The width of available size for camera preview * @param height The height of available size for camera preview */ private fun setUpCameraOutputs( width: Int, height: Int ) { val activity = activity val manager = activity.getSystemService(Context.CAMERA_SERVICE) as CameraManager try { for (cameraId in manager.cameraIdList) { val characteristics = manager.getCameraCharacteristics(cameraId) // We don't use a front facing camera in this sample. val facing = characteristics.get(CameraCharacteristics.LENS_FACING) if (facing != null && facing == CameraCharacteristics.LENS_FACING_FRONT) { continue } val map = characteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURATION_MAP) ?: continue // // For still image captures, we use the largest available size. val largest = Collections.max( Arrays.asList(*map.getOutputSizes(ImageFormat.JPEG)), CompareSizesByArea() ) imageReader = ImageReader.newInstance( largest.width, largest.height, ImageFormat.JPEG, /*maxImages*/ 2 ) // Find out if we need to swap dimension to get the preview size relative to sensor // coordinate. val displayRotation = activity.windowManager.defaultDisplay.rotation /* Orientation of the camera sensor */ val sensorOrientation = characteristics.get(CameraCharacteristics.SENSOR_ORIENTATION)!! var swappedDimensions = false when (displayRotation) { Surface.ROTATION_0, Surface.ROTATION_180 -> if (sensorOrientation == 90 || sensorOrientation == 270) { swappedDimensions = true } Surface.ROTATION_90, Surface.ROTATION_270 -> if (sensorOrientation == 0 || sensorOrientation == 180) { swappedDimensions = true } else -> Log.e(TAG, "Display rotation is invalid: $displayRotation") } val displaySize = Point() activity.windowManager.defaultDisplay.getSize(displaySize) var rotatedPreviewWidth = width var rotatedPreviewHeight = height var maxPreviewWidth = displaySize.x var maxPreviewHeight = displaySize.y if (swappedDimensions) { rotatedPreviewWidth = height rotatedPreviewHeight = width maxPreviewWidth = displaySize.y maxPreviewHeight = displaySize.x } if (maxPreviewWidth > MAX_PREVIEW_WIDTH) { maxPreviewWidth = MAX_PREVIEW_WIDTH } if (maxPreviewHeight > MAX_PREVIEW_HEIGHT) { maxPreviewHeight = MAX_PREVIEW_HEIGHT } previewSize = chooseOptimalSize( map.getOutputSizes(SurfaceTexture::class.java), rotatedPreviewWidth, rotatedPreviewHeight, maxPreviewWidth, maxPreviewHeight, largest ) // We fit the aspect ratio of TextureView to the size of preview we picked. val orientation = resources.configuration.orientation if (orientation == Configuration.ORIENTATION_LANDSCAPE) { layoutFrame!!.setAspectRatio(previewSize!!.width, previewSize!!.height) textureView!!.setAspectRatio(previewSize!!.width, previewSize!!.height) drawView!!.setAspectRatio(previewSize!!.width, previewSize!!.height) drawView2!!.setAspectRatio(previewSize!!.width, previewSize!!.height) } else { layoutFrame!!.setAspectRatio(previewSize!!.height, previewSize!!.width) textureView!!.setAspectRatio(previewSize!!.height, previewSize!!.width) drawView!!.setAspectRatio(previewSize!!.height, previewSize!!.width) drawView2!!.setAspectRatio(previewSize!!.height, previewSize!!.width) } this.cameraId = cameraId return } } catch (e: CameraAccessException) { Log.e(TAG, "Failed to access Camera", e) } catch (e: NullPointerException) { // Currently an NPE is thrown when the Camera2API is used but not supported on the // device this code runs. ErrorDialog.newInstance(getString(R.string.camera_error)) .show(childFragmentManager, FRAGMENT_DIALOG) } } /** * Opens the camera specified by [Camera2BasicFragment.cameraId]. */ @SuppressLint("MissingPermission") private fun openCamera( width: Int, height: Int ) { if (!checkedPermissions && !allPermissionsGranted()) { FragmentCompat.requestPermissions(this, requiredPermissions, PERMISSIONS_REQUEST_CODE) return } else { checkedPermissions = true } setUpCameraOutputs(width, height) configureTransform(width, height) val activity = activity val manager = activity.getSystemService(Context.CAMERA_SERVICE) as CameraManager try { if (!cameraOpenCloseLock.tryAcquire(2500, TimeUnit.MILLISECONDS)) { throw RuntimeException("Time out waiting to lock camera opening.") } manager.openCamera(cameraId!!, stateCallback, backgroundHandler) } catch (e: CameraAccessException) { Log.e(TAG, "Failed to open Camera", e) } catch (e: InterruptedException) { throw RuntimeException("Interrupted while trying to lock camera opening.", e) } } private fun allPermissionsGranted(): Boolean { for (permission in requiredPermissions) { if (ContextCompat.checkSelfPermission( activity, permission ) != PackageManager.PERMISSION_GRANTED ) { return false } } return true } override fun onRequestPermissionsResult( requestCode: Int, permissions: Array<String>, grantResults: IntArray ) { super.onRequestPermissionsResult(requestCode, permissions, grantResults) } /** * Closes the current [CameraDevice]. */ private fun closeCamera() { try { cameraOpenCloseLock.acquire() if (null != captureSession) { captureSession!!.close() captureSession = null } if (null != cameraDevice) { cameraDevice!!.close() cameraDevice = null } if (null != imageReader) { imageReader!!.close() imageReader = null } } catch (e: InterruptedException) { throw RuntimeException("Interrupted while trying to lock camera closing.", e) } finally { cameraOpenCloseLock.release() } } /** * Starts a background thread and its [Handler]. */ @Synchronized protected fun startBackgroundThread(r: Runnable) { if (backgroundHandler != null) { backgroundHandler!!.post(r) } } /** * Stops the background thread and its [Handler]. */ private fun stopBackgroundThread() { backgroundThread!!.quitSafely() try { backgroundThread!!.join() backgroundThread = null backgroundHandler = null synchronized(lock) { runClassifier = false } } catch (e: InterruptedException) { Log.e(TAG, "Interrupted when stopping background thread", e) } } /** * Creates a new [CameraCaptureSession] for camera preview. */ private fun createCameraPreviewSession() { try { val texture = textureView!!.surfaceTexture!! // We configure the size of default buffer to be the size of camera preview we want. texture.setDefaultBufferSize(previewSize!!.width, previewSize!!.height) // This is the output Surface we need to start preview. val surface = Surface(texture) // We set up a CaptureRequest.Builder with the output Surface. previewRequestBuilder = cameraDevice!!.createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW) previewRequestBuilder!!.addTarget(surface) // Here, we create a CameraCaptureSession for camera preview. cameraDevice!!.createCaptureSession( Arrays.asList(surface), object : CameraCaptureSession.StateCallback() { override fun onConfigured(cameraCaptureSession: CameraCaptureSession) { // The camera is already closed if (null == cameraDevice) { return } // When the session is ready, we start displaying the preview. captureSession = cameraCaptureSession try { // Auto focus should be continuous for camera preview. previewRequestBuilder!!.set( CaptureRequest.CONTROL_AF_MODE, CaptureRequest.CONTROL_AF_MODE_CONTINUOUS_PICTURE ) // Finally, we start displaying the camera preview. previewRequest = previewRequestBuilder!!.build() captureSession!!.setRepeatingRequest( previewRequest!!, captureCallback, backgroundHandler ) } catch (e: CameraAccessException) { Log.e(TAG, "Failed to set up config to capture Camera", e) } } override fun onConfigureFailed(cameraCaptureSession: CameraCaptureSession) { showToast("Failed") } }, null ) } catch (e: CameraAccessException) { Log.e(TAG, "Failed to preview Camera", e) } } /** * Configures the necessary [android.graphics.Matrix] transformation to `textureView`. This * method should be called after the camera preview size is determined in setUpCameraOutputs and * also the size of `textureView` is fixed. * * @param viewWidth The width of `textureView` * @param viewHeight The height of `textureView` */ private fun configureTransform( viewWidth: Int, viewHeight: Int ) { val activity = activity if (null == textureView || null == previewSize || null == activity) { return } val rotation = activity.windowManager.defaultDisplay.rotation val matrix = Matrix() val viewRect = RectF(0f, 0f, viewWidth.toFloat(), viewHeight.toFloat()) val bufferRect = RectF(0f, 0f, previewSize!!.height.toFloat(), previewSize!!.width.toFloat()) val centerX = viewRect.centerX() val centerY = viewRect.centerY() if (Surface.ROTATION_90 == rotation || Surface.ROTATION_270 == rotation) { bufferRect.offset(centerX - bufferRect.centerX(), centerY - bufferRect.centerY()) matrix.setRectToRect(viewRect, bufferRect, Matrix.ScaleToFit.FILL) val scale = Math.max( viewHeight.toFloat() / previewSize!!.height, viewWidth.toFloat() / previewSize!!.width ) matrix.postScale(scale, scale, centerX, centerY) matrix.postRotate((90 * (rotation - 2)).toFloat(), centerX, centerY) } else if (Surface.ROTATION_180 == rotation) { matrix.postRotate(180f, centerX, centerY) } textureView!!.setTransform(matrix) } /** * Classifies a frame from the preview stream. */ private fun classifyFrame() { if (classifier == null || activity == null || cameraDevice == null) { showToast("Uninitialized Classifier or invalid context.") return } val bitmap = textureView!!.getBitmap(classifier!!.imageSizeX, classifier!!.imageSizeY) val textToShow = classifier!!.classifyFrame(bitmap) bitmap.recycle() drawView!!.setDrawPoint(classifier!!.mPrintPointArray!!, 0.5f) drawView2!!.setDrawPoint(classifier!!.mPrintPointArray!!, 0.5f) showToast(textToShow) } /** * Compares two `Size`s based on their areas. */ private class CompareSizesByArea : Comparator<Size> { override fun compare( lhs: Size, rhs: Size ): Int { // We cast here to ensure the multiplications won't overflow return java.lang.Long.signum( lhs.width.toLong() * lhs.height - rhs.width.toLong() * rhs.height ) } } /** * Shows an error message dialog. */ class ErrorDialog : DialogFragment() { override fun onCreateDialog(savedInstanceState: Bundle): Dialog { val activity = activity return AlertDialog.Builder(activity) .setMessage(arguments.getString(ARG_MESSAGE)) .setPositiveButton( android.R.string.ok ) { dialogInterface, i -> activity.finish() } .create() } companion object { private val ARG_MESSAGE = "message" fun newInstance(message: String): ErrorDialog { val dialog = ErrorDialog() val args = Bundle() args.putString(ARG_MESSAGE, message) dialog.arguments = args return dialog } } } companion object { /** * Tag for the [Log]. */ private const val TAG = "TfLiteCameraDemo" private const val FRAGMENT_DIALOG = "dialog" private const val HANDLE_THREAD_NAME = "CameraBackground" private const val PERMISSIONS_REQUEST_CODE = 1 /** * Max preview width that is guaranteed by Camera2 API */ private const val MAX_PREVIEW_WIDTH = 1920 /** * Max preview height that is guaranteed by Camera2 API */ private const val MAX_PREVIEW_HEIGHT = 1080 /** * Resizes image. * * * Attempting to use too large a preview size could exceed the camera bus' bandwidth limitation, * resulting in gorgeous previews but the storage of garbage capture data. * * * Given `choices` of `Size`s supported by a camera, choose the smallest one that is * at least as large as the respective texture view size, and that is at most as large as the * respective max size, and whose aspect ratio matches with the specified value. If such size * doesn't exist, choose the largest one that is at most as large as the respective max size, and * whose aspect ratio matches with the specified value. * * @param choices The list of sizes that the camera supports for the intended output class * @param textureViewWidth The width of the texture view relative to sensor coordinate * @param textureViewHeight The height of the texture view relative to sensor coordinate * @param maxWidth The maximum width that can be chosen * @param maxHeight The maximum height that can be chosen * @param aspectRatio The aspect ratio * @return The optimal `Size`, or an arbitrary one if none were big enough */ private fun chooseOptimalSize( choices: Array<Size>, textureViewWidth: Int, textureViewHeight: Int, maxWidth: Int, maxHeight: Int, aspectRatio: Size ): Size { // Collect the supported resolutions that are at least as big as the preview Surface val bigEnough = ArrayList<Size>() // Collect the supported resolutions that are smaller than the preview Surface val notBigEnough = ArrayList<Size>() val w = aspectRatio.width val h = aspectRatio.height for (option in choices) { if (option.width <= maxWidth && option.height <= maxHeight && option.height == option.width * h / w ) { if (option.width >= textureViewWidth && option.height >= textureViewHeight) { bigEnough.add(option) } else { notBigEnough.add(option) } } } // Pick the smallest of those big enough. If there is no one big enough, pick the // largest of those not big enough. return when { bigEnough.size > 0 -> Collections.min(bigEnough, CompareSizesByArea()) notBigEnough.size > 0 -> Collections.max(notBigEnough, CompareSizesByArea()) else -> { Log.e(TAG, "Couldn't find any suitable preview size") choices[0] } } } fun newInstance(): Camera2BasicFragment { return Camera2BasicFragment() } }}package com.raouf.poseestimationimport android.content.Contextimport android.graphics.Canvasimport android.graphics.Paintimport android.graphics.Paint.Style.FILLimport android.graphics.PointFimport android.util.AttributeSetimport android.view.Viewimport java.util.ArrayListclass DrawView : View { private var mRatioWidth = 0 private var mRatioHeight = 0 private val mDrawPoint = ArrayList<PointF>() private var mWidth: Int = 0 private var mHeight: Int = 0 private var mRatioX: Float = 0.toFloat() private var mRatioY: Float = 0.toFloat() private var mImgWidth: Int = 0 private var mImgHeight: Int = 0 private val mColorArray = intArrayOf( resources.getColor(R.color.color_top, null), resources.getColor(R.color.color_neck, null), resources.getColor(R.color.color_l_shoulder, null), resources.getColor(R.color.color_l_elbow, null), resources.getColor(R.color.color_l_wrist, null), resources.getColor(R.color.color_r_shoulder, null), resources.getColor(R.color.color_r_elbow, null), resources.getColor(R.color.color_r_wrist, null), resources.getColor(R.color.color_l_hip, null), resources.getColor(R.color.color_l_knee, null), resources.getColor(R.color.color_l_ankle, null), resources.getColor(R.color.color_r_hip, null), resources.getColor(R.color.color_r_knee, null), resources.getColor(R.color.color_r_ankle, null), resources.getColor(R.color.color_background, null) ) private val circleRadius: Float by lazy { dip(3).toFloat() } private val mPaint: Paint by lazy { Paint(Paint.ANTI_ALIAS_FLAG or Paint.DITHER_FLAG).apply { style = FILL strokeWidth = dip(2).toFloat() textSize = sp(13).toFloat() } } constructor(context: Context) : super(context) constructor( context: Context, attrs: AttributeSet? ) : super(context, attrs) constructor( context: Context, attrs: AttributeSet?, defStyleAttr: Int ) : super(context, attrs, defStyleAttr) fun setImgSize( width: Int, height: Int ) { mImgWidth = width mImgHeight = height requestLayout() } /** * Scale according to the device. * @param point 2*14 */ fun setDrawPoint( point: Array<FloatArray>, ratio: Float ) { mDrawPoint.clear() var tempX: Float var tempY: Float for (i in 0..13) { tempX = point[0][i] / ratio / mRatioX tempY = point[1][i] / ratio / mRatioY mDrawPoint.add(PointF(tempX, tempY)) } } fun setAspectRatio( width: Int, height: Int ) { if (width < 0 || height < 0) { throw IllegalArgumentException("Size cannot be negative.") } mRatioWidth = width mRatioHeight = height requestLayout() } override fun onDraw(canvas: Canvas) { super.onDraw(canvas) if (mDrawPoint.isEmpty()) return var prePointF: PointF? = null mPaint.color = 0xff6fa8dc.toInt() val p1 = mDrawPoint[1] for ((index, pointF) in mDrawPoint.withIndex()) { if (index == 1) continue when (index) { //0-1 0 -> { canvas.drawLine(pointF.x, pointF.y, p1.x, p1.y, mPaint) } // 1-2, 1-5, 1-8, 1-11 2, 5, 8, 11 -> { canvas.drawLine(p1.x, p1.y, pointF.x, pointF.y, mPaint) } else -> { if (prePointF != null) { mPaint.color = 0xff6fa8dc.toInt() canvas.drawLine(prePointF.x, prePointF.y, pointF.x, pointF.y, mPaint) } } } prePointF = pointF } for ((index, pointF) in mDrawPoint.withIndex()) { mPaint.color = mColorArray[index] canvas.drawCircle(pointF.x, pointF.y, circleRadius, mPaint) } } override fun onMeasure( widthMeasureSpec: Int, heightMeasureSpec: Int ) { super.onMeasure(widthMeasureSpec, heightMeasureSpec) val width = View.MeasureSpec.getSize(widthMeasureSpec) val height = View.MeasureSpec.getSize(heightMeasureSpec) if (0 == mRatioWidth || 0 == mRatioHeight) { setMeasuredDimension(width, height) } else { if (width < height * mRatioWidth / mRatioHeight) { mWidth = width mHeight = width * mRatioHeight / mRatioWidth } else { mWidth = height * mRatioWidth / mRatioHeight mHeight = height } } setMeasuredDimension(mWidth, mHeight) mRatioX = mImgWidth.toFloat() / mWidth mRatioY = mImgHeight.toFloat() / mHeight }}package com.raouf.poseestimationimport android.app.Activityimport android.graphics.Bitmapimport android.os.SystemClockimport android.util.Logimport org.tensorflow.lite.Interpreterimport org.tensorflow.lite.gpu.GpuDelegateimport java.io.FileInputStreamimport java.io.IOExceptionimport java.lang.Longimport java.nio.ByteBufferimport java.nio.ByteOrderimport java.nio.MappedByteBufferimport java.nio.channels.FileChannel.MapMode/** * Classifies images with Tensorflow Lite. */abstract class ImageClassifier/** Initializes an `ImageClassifier`. */@Throws(IOException::class)internal constructor( activity: Activity, val imageSizeX: Int, // Get the image size along the x axis. val imageSizeY: Int, // Get the image size along the y axis. private val modelPath: String, // Get the name of the model file stored in Assets. // Get the number of bytes that is used to store a single color channel value. numBytesPerChannel: Int) { /* Preallocated buffers for storing image data in. */ private val intValues = IntArray(imageSizeX * imageSizeY) /** An instance of the driver class to run model inference with Tensorflow Lite. */ protected var tflite: Interpreter? = null /** A ByteBuffer to hold image data, to be feed into Tensorflow Lite as inputs. */ protected var imgData: ByteBuffer? = null var mPrintPointArray: Array<FloatArray>? = null val activity = activity fun initTflite(useGPU: Boolean){ val tfliteOptions = Interpreter.Options() tfliteOptions.setNumThreads(1) if(useGPU){ tfliteOptions.addDelegate(GpuDelegate()) } tflite = Interpreter(loadModelFile(activity), tfliteOptions) } init { imgData = ByteBuffer.allocateDirect( DIM_BATCH_SIZE * imageSizeX * imageSizeY * DIM_PIXEL_SIZE * numBytesPerChannel ) imgData!!.order(ByteOrder.nativeOrder()) Log.d(TAG, "Created a Tensorflow Lite Image Classifier.") } /** Classifies a frame from the preview stream. */ public fun classifyFrame(bitmap: Bitmap): String { if (tflite == null) { Log.e(TAG, "Image classifier has not been initialized; Skipped.") return "Uninitialized Classifier." } convertBitmapToByteBuffer(bitmap) // Here's where the magic happens!!! val startTime = SystemClock.uptimeMillis() runInference() val endTime = SystemClock.uptimeMillis() Log.d(TAG, "Timecost to run model inference: " + Long.toString(endTime - startTime)) bitmap.recycle() // Print the results. // String textToShow = printTopKLabels(); return Long.toString(endTime - startTime) + "ms" } /** Closes tflite to release resources. */ fun close() { tflite!!.close() tflite = null } /** Memory-map the model file in Assets. */ @Throws(IOException::class) private fun loadModelFile(activity: Activity): MappedByteBuffer { val fileDescriptor = activity.assets.openFd(modelPath) val inputStream = FileInputStream(fileDescriptor.fileDescriptor) val fileChannel = inputStream.channel val startOffset = fileDescriptor.startOffset val declaredLength = fileDescriptor.declaredLength return fileChannel.map(MapMode.READ_ONLY, startOffset, declaredLength) } /** Writes Image data into a `ByteBuffer`. */ private fun convertBitmapToByteBuffer(bitmap: Bitmap) { if (imgData == null) { return } imgData!!.rewind() bitmap.getPixels(intValues, 0, bitmap.width, 0, 0, bitmap.width, bitmap.height) // Convert the image to floating point. var pixel = 0 val startTime = SystemClock.uptimeMillis() for (i in 0 until imageSizeX) { for (j in 0 until imageSizeY) { val v = intValues[pixel++] addPixelValue(v) } } val endTime = SystemClock.uptimeMillis() Log.d( TAG, "Timecost to put values into ByteBuffer: " + Long.toString(endTime - startTime) ) } /** * Add pixelValue to byteBuffer. * * @param pixelValue */ protected abstract fun addPixelValue(pixelValue: Int) /** * Read the probability value for the specified label This is either the original value as it was * read from the net's output or the updated value after the filter was applied. * * @param labelIndex * @return */ protected abstract fun getProbability(labelIndex: Int): Float /** * Set the probability value for the specified label. * * @param labelIndex * @param value */ protected abstract fun setProbability( labelIndex: Int, value: Number ) /** * Get the normalized probability value for the specified label. This is the final value as it * will be shown to the user. * * @return */ protected abstract fun getNormalizedProbability(labelIndex: Int): Float /** * Run inference using the prepared input in [.imgData]. Afterwards, the result will be * provided by getProbability(). * * * This additional method is necessary, because we don't have a common base for different * primitive data types. */ protected abstract fun runInference() companion object { /** Tag for the [Log]. */ private const val TAG = "TfLiteCameraDemo" /** Number of results to show in the UI. */ private const val RESULTS_TO_SHOW = 3 /** Dimensions of inputs. */ private const val DIM_BATCH_SIZE = 1 private const val DIM_PIXEL_SIZE = 3 private const val FILTER_STAGES = 3 private const val FILTER_FACTOR = 0.4f }}package com.raouf.poseestimationimport android.app.Activityimport android.util.Logimport org.opencv.core.CvTypeimport org.opencv.core.Matimport org.opencv.core.Sizeimport org.opencv.imgproc.Imgprocimport java.io.IOException/** * Pose Estimator */class ImageClassifierFloatInception private constructor( activity: Activity, imageSizeX: Int, imageSizeY: Int, private val outputW: Int, private val outputH: Int, modelPath: String, numBytesPerChannel: Int = 4 // a 32bit float value requires 4 bytes ) : ImageClassifier(activity, imageSizeX, imageSizeY, modelPath, numBytesPerChannel) { /** * An array to hold inference results, to be feed into Tensorflow Lite as outputs. * This isn't part of the super class, because we need a primitive array here. */ private val heatMapArray: Array<Array<Array<FloatArray>>> = Array(1) { Array(outputW) { Array(outputH) { FloatArray(14) } } } private var mMat: Mat? = null override fun addPixelValue(pixelValue: Int) { //bgr imgData!!.putFloat((pixelValue and 0xFF).toFloat()) imgData!!.putFloat((pixelValue shr 8 and 0xFF).toFloat()) imgData!!.putFloat((pixelValue shr 16 and 0xFF).toFloat()) } override fun getProbability(labelIndex: Int): Float { // return heatMapArray[0][labelIndex]; return 0f } override fun setProbability( labelIndex: Int, value: Number ) { // heatMapArray[0][labelIndex] = value.floatValue(); } override fun getNormalizedProbability(labelIndex: Int): Float { return getProbability(labelIndex) } override fun runInference() { tflite?.run(imgData!!, heatMapArray) if (mPrintPointArray == null) mPrintPointArray = Array(2) { FloatArray(14) } if (!CameraActivity.isOpenCVInit) return // Gaussian Filter 5*5 if (mMat == null) mMat = Mat(outputW, outputH, CvType.CV_32F) val tempArray = FloatArray(outputW * outputH) val outTempArray = FloatArray(outputW * outputH) for (i in 0..13) { var index = 0 for (x in 0 until outputW) { for (y in 0 until outputH) { tempArray[index] = heatMapArray[0][y][x][i] index++ } } mMat!!.put(0, 0, tempArray) Imgproc.GaussianBlur(mMat!!, mMat!!, Size(5.0, 5.0), 0.0, 0.0) mMat!!.get(0, 0, outTempArray) var maxX = 0f var maxY = 0f var max = 0f // Find keypoint coordinate through maximum values for (x in 0 until outputW) { for (y in 0 until outputH) { val center = get(x, y, outTempArray) if (center > max) { max = center maxX = x.toFloat() maxY = y.toFloat() } } } if (max == 0f) { mPrintPointArray = Array(2) { FloatArray(14) } return } mPrintPointArray!![0][i] = maxX mPrintPointArray!![1][i] = maxY// Log.i("TestOutPut", "pic[$i] ($maxX,$maxY) $max") } } private operator fun get( x: Int, y: Int, arr: FloatArray ): Float { return if (x < 0 || y < 0 || x >= outputW || y >= outputH) -1f else arr[x * outputW + y] } companion object { /** * Create ImageClassifierFloatInception instance * * @param imageSizeX Get the image size along the x axis. * @param imageSizeY Get the image size along the y axis. * @param outputW The output width of model * @param outputH The output height of model * @param modelPath Get the name of the model file stored in Assets. * @param numBytesPerChannel Get the number of bytes that is used to store a single * color channel value. */ fun create( activity: Activity, imageSizeX: Int = 192, imageSizeY: Int = 192, outputW: Int = 96, outputH: Int = 96, modelPath: String = "model.tflite", numBytesPerChannel: Int = 4 ): ImageClassifierFloatInception = ImageClassifierFloatInception( activity, imageSizeX, imageSizeY, outputW, outputH, modelPath, numBytesPerChannel) }} ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download